Switch to new FluidDoc format (#47)

* Start over * Added basic docs Added .travis.yml Added scripts to build documentation on the Travis. * Disable several deploy script commend for testing. * fixed the deploy_docs.sh script. * Update travis.yml * Renamed docs to doc * update .gitignore. * Delete .gitignore * Update .travis.yml * Update .travis.yml * Update deploy_docs.sh * Update .travis.yml * Develop doc (#1) * Add paddle submodule * Update the build script. * Update script * Use gen_doc_lib instead. * Move files around * cache external * Update submodule * try to cache batch 1 * add test code * Update Paddle submodule * Update submodule * update script to print more * update python path * test * test * test * clean up the code * Update Script (#2) * add new file * Develop doc (#3) * Add the rest of the submodules into the system * Provide first symlink fit a line * Update the rest of book to symlinks * add models submodule * Add link for models * Update deploy_docs

Switch to new FluidDoc format (#47)
* Start over * Added basic docs Added .travis.yml Added scripts to build documentation on the Travis. * Disable several deploy script commend for testing. * fixed the deploy_docs.sh script. * Update travis.yml * Renamed docs to doc * update .gitignore. * Delete .gitignore * Update .travis.yml * Update .travis.yml * Update deploy_docs.sh * Update .travis.yml * Develop doc (#1) * Add paddle submodule * Update the build script. * Update script * Use gen_doc_lib instead. * Move files around * cache external * Update submodule * try to cache batch 1 * add test code * Update Paddle submodule * Update submodule * update script to print more * update python path * test * test * test * clean up the code * Update Script (#2) * add new file * Develop doc (#3) * Add the rest of the submodules into the system * Provide first symlink fit a line * Update the rest of book to symlinks * add models submodule * Add link for models * Update deploy_docs
f42d65aa · Jeff Wang · GitHub · ea2ed599 · ea2ed599 · f42d65aa
924 changed file
--- a/.gitignore
+++ b/.gitignore
-.env
-.DS_Store
-._.DS_Store
-*.mo
--- a/.gitmodules
+++ b/.gitmodules
-[submodule "paddle"]
-	path = paddle
-	url = https://github.com/PaddlePaddle/Paddle.git
-[submodule "book"]
-	path = book
-	url = https://github.com/PaddlePaddle/book.git
-[submodule "anakin"]
-	path = anakin
-	url = https://github.com/PaddlePaddle/Anakin.git
-[submodule "mobile"]
-	path = mobile
-	url = https://github.com/PaddlePaddle/paddle-mobile.git
+[submodule "external/Paddle"]
+	path = external/Paddle
+	url = https://github.com/PaddlePaddle/Paddle
+[submodule "external/book"]
+	path = external/book
+	url = https://github.com/PaddlePaddle/book
+[submodule "external/Anakin"]
+	path = external/Anakin
+	url = https://github.com/PaddlePaddle/Anakin
+[submodule "external/paddle-mobile"]
+	path = external/paddle-mobile
+	url = https://github.com/PaddlePaddle/paddle-mobile
+[submodule "external/models"]
+	path = external/models
+	url = https://github.com/PaddlePaddle/models
--- a/.travis.yml
+++ b/.travis.yml
+language: cpp
+cache:
+  bundler: true
+  directories:
+    - $HOME/.ccache
+    - $HOME/.cache/pip
+    - $HOME/docker
+ #   - $TRAVIS_BUILD_DIR/external/
+    - $TRAVIS_BUILD_DIR/external/Paddle/build/third_party
+
+sudo: required
+dist: trusty
+services:
+  - docker
+os:
+  - linux
+env:
+  - JOB=doc
+  - JOB=lite_lib
+
+addons:
+  apt:
+    packages:
+      - git
+      - python
+      - python-pip
+      - python2.7-dev
+      - golang
+  ssh_known_hosts: 13.229.163.131
+before_install:
+  -  sudo pip install pylint pytest astroid isort 
+  # Load cached docker images
+  #- if [[ -d $HOME/docker ]]; then ls $HOME/docker/*.tar.gz | xargs -I {file} sh -c "zcat {file} | docker load"; fi
+  
+script:
+  - |
+     if [ $JOB == "doc" ]; then scripts/deploy_docs.sh 
+     fi
+     
+     if [ $JOB == "lite_lib" ]; then scripts/build_doc_lib_lite.sh 
+     fi 
+#before_cache:
+#  # Save tagged docker images
+#  - >
+#    mkdir -p $HOME/docker && docker images -a --filter='dangling=false' --format 'paddlepaddle/paddle:latest-dev {{.ID}}'
+#    | xargs -n 2 -t sh -c 'test -e $HOME/docker/$1.tar.gz || docker save $0 | gzip -2 > $HOME/docker/$1.tar.gz'
+    
+notifications:
+  email:
+    on_success: change
+    on_failure: always
--- a/Makefile
+++ b/Makefile
-# Makefile for Sphinx documentation
-#
-
-# You can set these variables from the command line.
-SPHINXOPTS    =
-SPHINXBUILD   = sphinx-build
-PAPER         =
-BUILDDIR      = build
-
-# User-friendly check for sphinx-build
-ifeq ($(shell which $(SPHINXBUILD) >/dev/null 2>&1; echo $$?), 1)
-$(error The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed, then set the SPHINXBUILD environment variable to point to the full path of the '$(SPHINXBUILD)' executable. Alternatively you can add the directory with the executable to your PATH. If you don't have Sphinx installed, grab it from http://sphinx-doc.org/)
-endif
-
-# Internal variables.
-PAPEROPT_a4     = -D latex_paper_size=a4
-PAPEROPT_letter = -D latex_paper_size=letter
-ALLSPHINXOPTS   = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) source
-# the i18n builder cannot share the environment and doctrees with the others
-I18NSPHINXOPTS  = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) source
-
-.PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest coverage gettext
-
-help:
-	@echo "Please use \`make <target>' where <target> is one of"
-	@echo "  html       to make standalone HTML files"
-	@echo "  dirhtml    to make HTML files named index.html in directories"
-	@echo "  singlehtml to make a single large HTML file"
-	@echo "  pickle     to make pickle files"
-	@echo "  json       to make JSON files"
-	@echo "  htmlhelp   to make HTML files and a HTML help project"
-	@echo "  qthelp     to make HTML files and a qthelp project"
-	@echo "  applehelp  to make an Apple Help Book"
-	@echo "  devhelp    to make HTML files and a Devhelp project"
-	@echo "  epub       to make an epub"
-	@echo "  latex      to make LaTeX files, you can set PAPER=a4 or PAPER=letter"
-	@echo "  latexpdf   to make LaTeX files and run them through pdflatex"
-	@echo "  latexpdfja to make LaTeX files and run them through platex/dvipdfmx"
-	@echo "  text       to make text files"
-	@echo "  man        to make manual pages"
-	@echo "  texinfo    to make Texinfo files"
-	@echo "  info       to make Texinfo files and run them through makeinfo"
-	@echo "  gettext    to make PO message catalogs"
-	@echo "  changes    to make an overview of all changed/added/deprecated items"
-	@echo "  xml        to make Docutils-native XML files"
-	@echo "  pseudoxml  to make pseudoxml-XML files for display purposes"
-	@echo "  linkcheck  to check all external links for integrity"
-	@echo "  doctest    to run all doctests embedded in the documentation (if enabled)"
-	@echo "  coverage   to run coverage check of the documentation (if enabled)"
-
-clean:
-	rm -rf $(BUILDDIR)/*
-
-html:
-	$(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html
-	@echo
-	@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."
-
-dirhtml:
-	$(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml
-	@echo
-	@echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml."
-
-singlehtml:
-	$(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml
-	@echo
-	@echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml."
-
-pickle:
-	$(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle
-	@echo
-	@echo "Build finished; now you can process the pickle files."
-
-json:
-	$(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json
-	@echo
-	@echo "Build finished; now you can process the JSON files."
-
-htmlhelp:
-	$(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp
-	@echo
-	@echo "Build finished; now you can run HTML Help Workshop with the" \
-	      ".hhp project file in $(BUILDDIR)/htmlhelp."
-
-qthelp:
-	$(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp
-	@echo
-	@echo "Build finished; now you can run "qcollectiongenerator" with the" \
-	      ".qhcp project file in $(BUILDDIR)/qthelp, like this:"
-	@echo "# qcollectiongenerator $(BUILDDIR)/qthelp/PaddlePaddleFluid.qhcp"
-	@echo "To view the help file:"
-	@echo "# assistant -collectionFile $(BUILDDIR)/qthelp/PaddlePaddleFluid.qhc"
-
-applehelp:
-	$(SPHINXBUILD) -b applehelp $(ALLSPHINXOPTS) $(BUILDDIR)/applehelp
-	@echo
-	@echo "Build finished. The help book is in $(BUILDDIR)/applehelp."
-	@echo "N.B. You won't be able to view it unless you put it in" \
-	      "~/Library/Documentation/Help or install it in your application" \
-	      "bundle."
-
-devhelp:
-	$(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp
-	@echo
-	@echo "Build finished."
-	@echo "To view the help file:"
-	@echo "# mkdir -p $$HOME/.local/share/devhelp/PaddlePaddleFluid"
-	@echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/PaddlePaddleFluid"
-	@echo "# devhelp"
-
-epub:
-	$(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub
-	@echo
-	@echo "Build finished. The epub file is in $(BUILDDIR)/epub."
-
-latex:
-	$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
-	@echo
-	@echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex."
-	@echo "Run \`make' in that directory to run these through (pdf)latex" \
-	      "(use \`make latexpdf' here to do that automatically)."
-
-latexpdf:
-	$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
-	@echo "Running LaTeX files through pdflatex..."
-	$(MAKE) -C $(BUILDDIR)/latex all-pdf
-	@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
-
-latexpdfja:
-	$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
-	@echo "Running LaTeX files through platex and dvipdfmx..."
-	$(MAKE) -C $(BUILDDIR)/latex all-pdf-ja
-	@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
-
-text:
-	$(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text
-	@echo
-	@echo "Build finished. The text files are in $(BUILDDIR)/text."
-
-man:
-	$(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man
-	@echo
-	@echo "Build finished. The manual pages are in $(BUILDDIR)/man."
-
-texinfo:
-	$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
-	@echo
-	@echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo."
-	@echo "Run \`make' in that directory to run these through makeinfo" \
-	      "(use \`make info' here to do that automatically)."
-
-info:
-	$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
-	@echo "Running Texinfo files through makeinfo..."
-	make -C $(BUILDDIR)/texinfo info
-	@echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo."
-
-gettext:
-	$(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale
-	@echo
-	@echo "Build finished. The message catalogs are in $(BUILDDIR)/locale."
-
-changes:
-	$(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes
-	@echo
-	@echo "The overview file is in $(BUILDDIR)/changes."
-
-linkcheck:
-	$(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck
-	@echo
-	@echo "Link check complete; look for any errors in the above output " \
-	      "or in $(BUILDDIR)/linkcheck/output.txt."
-
-doctest:
-	$(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest
-	@echo "Testing of doctests in the sources finished, look at the " \
-	      "results in $(BUILDDIR)/doctest/output.txt."
-
-coverage:
-	$(SPHINXBUILD) -b coverage $(ALLSPHINXOPTS) $(BUILDDIR)/coverage
-	@echo "Testing of coverage in the sources finished, look at the " \
-	      "results in $(BUILDDIR)/coverage/python.txt."
-
-xml:
-	$(SPHINXBUILD) -b xml $(ALLSPHINXOPTS) $(BUILDDIR)/xml
-	@echo
-	@echo "Build finished. The XML files are in $(BUILDDIR)/xml."
-
-pseudoxml:
-	$(SPHINXBUILD) -b pseudoxml $(ALLSPHINXOPTS) $(BUILDDIR)/pseudoxml
-	@echo
-	@echo "Build finished. The pseudo-XML files are in $(BUILDDIR)/pseudoxml."
--- a/README.md
+++ b/README.md
-# Fluid Documentation Skeleton
-
-## Build
-
-To build documentation, you need have a linux machine and have python2, virtualenv, gmake installed.
-
-### Preparation
-
-You need to create a `virtualenv` instead of polute the global python library path
-
-```bash
-virtualenv .env
-```
-
-You can enter virtualenv by
-
-```bash
-source .env/bin/activate
-```
-
-You can exit virtualenv by
-
-```bash
-deactivate
-```
-
-### Install dependencies
-
-```bash
-# enter virtualenv
-source .env/bin/activate
-# install dependencies
-pip install -r requirements.txt
-```
-
-### Make HTML
-
-```bash
-# make clean  # make clean to regenerate toctree. Just `make html` may have a cache.
-make html
-```
-and the html files will be generated to `build/html`. You can open `build/html/index.html` with your browser to see the documentation.
-
-## Edit
-
-### Edit documentation
-
-It is suggested to use `reStructuredText` because it is the only official markup language supportted by our documentation generating system, sphinx. `markdown` can also be used. However, since the `markdown` has so many dialects, there is no guarantee that the `markdown` source file can be rendered well.
-
-The `reStructuredText` cheatsheet is [here](http://docutils.sourceforge.net/docs/user/rst/quickref.html).
-
-
-### Edit structure
-
-The `sphinx` (our documentation generating system) uses `toctree` to organize documentation. `toctree` means `table of content tree`. 
-
-Please see the [sphinx documentation](http://www.sphinx-doc.org/en/master/), especially [`toctree` directives](http://www.sphinx-doc.org/en/master/usage/restructuredtext/directives.html)
--- a/anakin @ b9d95555
+++ b/anakin @ b9d95555
-Subproject commit b9d95555a73f3e02aa169251cd319053b6d7d642
--- a/book @ f4b5cc83
+++ b/book @ f4b5cc83
-Subproject commit f4b5cc835ef77e55cfc001d51f8f77565475dc45
--- a/build/.gitignore
+++ b/build/.gitignore
-*
--- a/doc/about/about_us.rst
+++ b/doc/about/about_us.rst
+=========
+关于我们
+=========
+
+什么是PaddlePaddle
+--------------------
+
+- PaddlePaddle是百度自主研发并开源的深度学习框架，它能够让开发者和企业安全、快速地实现自己的AI想法
+
+- 项目团队汇聚了全球顶级的深度学习科学家，致力于为开发者和企业提供最好的深度学习研发体验
+
+- 框架具有易学、易用、安全、高效四大特性，是最适合中国开发者和企业的深度学习工具
+
+PaddlePaddle的技术特色
+-------------------------
+
+- 新一代深度学习框架： PaddlePaddle是基于“深度学习编程语言”的新一代深度学习框架，在保证性能的同时，极大的提升了框架对模型的表达能力，能够描述任意潜在可能出现的模型
+
+- 对大规模计算更加友好：经过百度内多种大规模计算业务的打磨，PaddlePaddle在分布式计算上表现优异，基于EDL技术能够节约大量计算资源，同时也能支持大规模稀疏模型的训练
+
+- 提供可视化的深度学习：通过Visual DL可以帮助开发者方便的观测训练整体趋势、数据样本质量和中间结果、参数分布和变化趋势、以及模型的结构，帮助开发者更便捷的完成编程过程
+
+提供基于PaddlePaddle的教育体系
+--------------------------------
+
+- 深度学习课程：百度与中国市场顶级的教育、培训机构共同开发了深度学习精品课程以及学习教材，帮助开发者从零掌握深度学习
+
+- 深度学习实训：对于目的是科研和学习的用户，PaddlePaddle提供了无需安装、线上运行的开发环境，并提供算法、算力、数据支持
+
+- 线下培训：提供丰富、高质量的线下教育活动，如青年教师培训、线下实战营、沙龙等多种形式的培训和交流
+
+
+提供基于PaddlePaddle的AI服务
+------------------------------
+
+- EadyDL：可以帮助零算法基础的企业快速完成一个深度学习任务，只需少量的数据即可得到优质的模型
+
+- AI市场：提供标准化的AI 能力、产品的交易机制，帮助企业快速找到所需，有效开展AI业务
+
+- 深度学习竞赛： PaddlePaddle汇聚顶尖深度学习开发者，企业可以发布自己的商业问题，通过竞赛方式快速找到最优的解决方案
+
+你对PaddlePaddle有任何的问题都可以通过以下方式联系到我们
+-----------------------------------------------------------
+
+- 学习/使用问题：可以在 `PaddlePaddle开源社区 <https://github.com/PaddlePaddle/Paddle/issues>`_，以及 `PaddlePaddle中文社区 <http://ai.baidu.com/forum/topic/list/168>`_ 向我们反馈
+
+- 对PaddlePaddle框架发展的建议：可发送邮件至Paddle-better@baidu.com
+
+我们期待与你一起打造世界顶级深度学习框架，共同推动AI技术的进步
+
+
+
+PaddlePaddle团队
--- a/source/advanced_usage/benchmark.rst
+++ b/source/advanced_usage/benchmark.rst
--- a/source/advanced_usage/deploy/anakin_arm_benchmark.md
+++ b/source/advanced_usage/deploy/anakin_arm_benchmark.md
--- a/doc/fluid/advanced_usage/deploy/anakin_example.md
+++ b/doc/fluid/advanced_usage/deploy/anakin_example.md
+# Example
+Anakin目前只支持NCHW的格式
+示例文件在test/framework/net下
+
+## 在NV的GPU上运行CNN模型
+示例文件为打开example_nv_cnn_net.cpp，整体流程如下：
+- 将模型的的path设置为anakin模型的路径，初始化NV平台的图对象。 anakin模型可以通过转换器转化caffe或fluid的模型得到
+- 根据模型设置网络图的输入尺寸，进行图优化
+- 根据优化后的网络图初始化网络执行器
+- 取出网络的输入tensor，将数据拷贝到输入tensor
+- 运行推导
+- 取出网络的输出tensor
+
+以NV平台为例演示Anakin框架的使用方法，注意编译时需要打开GPU编译开关
+
+## 在X86上运行RNN模型
+示例文件为example_x86_rnn_net.cpp
+整体流程与在NV的GPU上运行CNN模型相似，不同之处如下：
+- 使用X86标识初始化图对象和网络执行器对象
+- rnn模型的输入尺寸是可变的，初始化图时的输入维度是维度的最大值，输入维度N代表总的词的个数。还需要设置输入tensor的seq_offset来标示这些词是如何划分为句子的,如{0,5,12}表示共有12个词，其中第0到第4个词是第一句话，第5到第11个词是第二句话
+
+以X86平台为例演示Anakin框架的使用方法，注意编译时需要打开X86编译开关
+
+## 在NV的GPU上使用Anakin的线程池运行CNN模型
+示例文件为example_nv_cnn_net_multi_thread.cpp ，示例使用worker的同步预测接口
+整体流程与在NV的GPU上运行CNN模型相似，不同之处如下：
+- 用模型地址和线程池大小初始化worker对象
+- 将输入tensor注入任务队列,获得输出tensor
--- a/source/advanced_usage/deploy/anakin_gpu_benchmark.md
+++ b/source/advanced_usage/deploy/anakin_gpu_benchmark.md
--- a/doc/fluid/advanced_usage/deploy/anakin_tutorial.md
+++ b/doc/fluid/advanced_usage/deploy/anakin_tutorial.md
+# Anakin 使用教程 ##
+
+本教程将会简略的介绍Anakin的工作原理，一些基本的Anakin API，以及如何调用这些API。
+  
+## 内容 ###
+
+- [Anakin的工作原理](#principle)
+- [Anakin APIs](#api)
+- [示例代码](#example)
+
+## <span id = 'principle'> Anakin的工作原理</span> ###
+
+![Anakin_principle](../pics/anakin_fm_ch.png)
+
+用Anakin来进行前向计算主要分为三个步骤：
+
+- 将外部模型通过[Anakin Parser](Converter_ch.md)解析为Anakin模型  
+  在使用Anakin之前，用户必须将所有其他模型转换成Anakin模型，我们提供了转换脚本，用户可通过[Anakin Parser](Converter_ch.md)进行模型转换。
+- 生成Anakin计算图
+  加载Anakin模型生成原始计算图，然后需要对原始计算图进行优化。你只需要调用相应的API优化即可。
+- 执行计算图  
+  Anakin会选择不同硬件平台执行计算图。
+
+
+## <span id ='api'>Anakin APIs </span> ###
+### Tensor ####
+
+`Tensor`提供基础的数据操作和管理，为ops提供统一的数据接口。`Tensor`包含以下几个属性：   
+
+- Buffer  
+   数据存储区
+- Shape  
+   数据的维度信息
+- Event  
+   用于异步计算的同步
+
+ `Tensor` 类包含三个`Shape`对象， 分别是`_shape`, `_valid_shape`和 `offset`。 `_shape`为`tensor`真正空间信息，`_valid_shape`表示当前`tensor`使用的空间信息， `_offset`表示当前`tensor`数据指针相对于真正数据空间的信息。 `Tensor`不同维度与分别与数学中的向量、矩阵等相对应如下表所示。
+
+
+Dimentions | Math entity |
+ :----: | :----:
+1 | vector
+2 | matrix
+3 | 3-tensor
+n | n-tensor
+
+#### 声明tensor对象
+
+`Tensor`接受三个模板参数:
+
+
+```c++
+ template<typename TargetType, DataType datatype, typename LayOutType = NCHW>
+ class Tensor .../* Inherit other class */{
+  //some implements
+  ...
+ };
+```
+
+TargetType是平台类型，如X86，GPU等等，在Anakin内部有相应的标识与之对应；datatype是普通的数据类型，在Anakin内部也有相应的标志与之对应；[LayOutType](#layout)是数据分布类型，如batch x channel x height x width [NxCxHxW], 在Anakin内部用一个struct来标识。 Anakin中数据类型与基本数据类型的对应如下:
+
+1. <span id='target'>TargetType</sapn>
+
+ Anakin TargetType | platform
+  :----: | :----:|
+  NV | NVIDIA GPU
+  ARM | ARM
+  AMD | AMD GPU
+  X86 | X86
+  NVHX86 | NVIDIA GPU with Pinned Memory
+
+2. <sapn id='datatype'>DataType</span>
+
+Anakin DataType | C++ | Description 
+:---: | :---: | :---: |
+AK_HALF | short | fp16
+AK_FLOAT | float | fp32
+AK_DOUBLE | double | fp64
+AK_INT8 | char | int8
+AK_INT16 | short | int16
+AK_INT32 | int | int32
+AK_INT64 | long | int64
+AK_UINT8 | unsigned char | uint8
+AK_UINT16 | unsigned short | uint8
+AK_UINT32 | unsigned int | uint32
+AK_STRING | std::string | /
+AK_BOOL | bool | /
+AK_SHAPE | / | Anakin Shape 
+AK_TENSOR | / | Anakin Tensor 
+
+
+3. <span id = 'layout'>LayOutType </span>
+
+Anakin LayOutType ( Tensor LayOut ) | Tensor Dimention | Tensor Support | Op Support
+:---: | :---: | :---: | :---: |
+W | 1-D | YES | NO
+HW | 2-D | YES | NO
+WH | 2-D | YES | NO
+NW | 2-D | YES | YES
+NHW | 3-D | YES |YES
+NCHW ( default ) | 4-D | YES | YES
+NHWC | 4-D | YES | NO
+NCHW_C4 | 5-D | YES | YES
+
+
+理论上，Anakin支持申明1维以上的tensor，但是对于Anakin中的Op来说，只支持NW、NHW、NCHW、NCHW_C4这四种LayOut，其中NCHW是默认的LayOutType，NCHW_C4是专门针对于int8这种数据类型的。
+
+
+例子
+
+> 下面的代码将展示如何使用tensor， 我们建议先看看这些示例。
+
+> 要想获得更多关于tensor的信息， 请参考 *soure_path/core/tensor.h*
+
+> 1. 使用shape对象初始化tensor
+``` c++  
+  //create a null tensor. A null tensor holds for nothing.
+  //tensor's buffer  is resident at CPU and its datatype is AK_FLOAT.
+  //tensor's Layout is NCHW(default)
+   Tensor<X86, AK_FLOAT> mytensor;
+
+   //1. using shape object to create a tensor.
+   Shape shape1(NUM); //1-D shape. NUM is the number of dimention.
+   Tensor<X86, AK_FLOAT, W> mytensor1(shape1); //1-D tensor.
+
+  // A 4-D shape
+   Shape shape2(N, C, H, W); // batch x channel x height x width
+```
+
+>`注意：Shape的维度必须和tensor的`[LayoutType](#layout)`相同，比如Shape(N,C,H,W), 那么Tensor的 LayoutType必须是NCHW，否则会出错。如下列代码所示`  
+
+
+```c++
+   // A 4-D tensor.
+   Tensor<X86, AK_FLOAT> mytensor2(shape2);  //right
+
+   //A 4-D tensor which is resident at GPU and its datatype is AK_INT8
+   Tensor<NV, AK_INT8> mytensor3(shape2);   //right
+   
+   Tensor<X86, AK_FLOAT, NHW> mytensor4(shape2); //wrong!! shape's dimetion must be equal to tensor's Layout.
+   Tensor<NV, AK_FLOAT, NCHW_C4> mytensor5(shape2); //wrong!!!!
+
+```
+
+> 2. 使用现有的数据和shape初始化tensor
+
+```c++
+
+   /**
+   *  A construtor of Tensor.
+   *  data_ptr is a pointer to any data type of data
+   *  TargetType is type of a platform [Anakin TargetType]
+   *  id : device id
+   *  shape: a Anakin shape
+   */
+   Tensor(Dtype* data_ptr, TargetType_t target, int id, Shape shape);
+
+   //using existing data feed to a tensor
+   Tensor<X86, AK_FLOAT> mytensor(data_ptr, TargetType, device_id, shape); //shape must has dimention (N, C, H, W).
+
+```
+
+> 3. 使用tensor初始化tensor
+
+```c++
+   Tensor<NV, AK_FLOAT> tensor(exist_tensor);
+```
+
+
+> 提示： 你可以用` typedef Tensor<X86, AK_FLOAT> Tensor4d_X86 `方便定义tensor
+
+
+#### 填充tensor数据区
+
+
+填充数据区得看你申明tensor的方式， 下面展示了如何填充tensor的数据区。
+
+```c++
+首先来看看tensor的四种声明方式：
+
+1. Tensor<X86, AK_FLOAT> mytensor;
+2. Tensor<X86, AK_FLOAT, W> mytensor1(shape1);
+3. Tensor<X86, AK_FLOAT> mytensor(data_ptr, TargetType, device_id, shape);
+4. Tensor<NV, AK_FLOAT> tensor(exist_tensor);
+
+
+相关的声明方式的数据填充方法如下：
+
+1：声明一个空的tensor，此时没有为其分配内存，所以，我们需要手动的为其分配内存。
+            
+            //parama shape
+            mytensor.re_alloc(Shape shape); 
+
+            //Get writable pointer to mytensor.
+            //parama index (int): where you start to write.
+            //Dtype is your data type such int, float or double.
+            Dtype *p = mytensor.mutable_data(index/*=0*/);
+            //write data to mytensor
+            for(int i = 0; i < mytensor.size(); i++){
+              p[i] = 1.0f;
+            }
+            //do something ...
+
+2: 这种声明方式会自动分配内存 
+
+          //Get writable pointer to mytensor.
+          //parama index (int): where you start to write.
+          //Dtype is your data type such int, float or double.
+          Dtype *p = mytensor1.mutable_data(index/*=0*/);
+          //write data to mytensor
+          for(int i = 0; i < mytensor.size(); i++){
+            p[i] = 1.0f;
+          }
+          //do something ...
+
+ 
+3：在该种声明方式中，我们仍不需要手动为其分配内存。但在构造函数内部是否为其分配内存，得依情况而定。如果data_ptr和申明的
+tensor都在都一个目标平台上，那么该tensor就会与data_ptr共享内存空间，相反，如果他们不在同一个平台上（如data_ptr在X86上，而
+tensor在GPU上），那么此时tensor就会开辟一个新的内存空间，并将data_ptr所指向的数据拷贝到tensor的buffer中。
+
+          //Get writable pointer to mytensor.
+          //parama index (int): where you start to write.
+          //Dtype is your data type such int, float or double.
+          Dtype *p = mytensor.mutable_data(index/*=0*/);
+          //write data to mytensor
+          for(int i = 0; i < mytensor.size(); i++){
+            p[i] = 1.0f;
+          }
+          //do something ...
+
+4：该种方式仍不需要手动分配内存
+
+          //Get writable pointer to mytensor.
+          //parama index (int): where you start to write.
+          //Dtype is your data type such int, float or double.
+          Dtype *p = mytensor.mutable_data(index/*=0*/);
+          //write data to mytensor
+          for(int i = 0; i < mytensor.size(); i++){
+            p[i] = 1.0f;
+          }
+          //do something ...
+
+
+另外，你还可以获取一个tensor的可读指针，示例如下：
+        //Get read-only pointer to mytensor.
+        //parama index (int): where you start to read.
+        //Dtype is your data type such int, float or double.
+         Dtype *p = mytensor.data(index/*=0*/);
+        //do something ...
+```
+
+如果想更详细的了解tensor，请查阅*soure_path/saber/core/tensor.h*
+
+#### 获取tensor的shape
+
+```c++
+//some declarations
+// ...
+Shape shape = mytensor.shape();
+
+//Get a first dimetion size of tesor, if it has.
+int d1 = shape[0];
+
+//Get a second dimention size of tensor, if it has.
+int d2 = shape[1];
+
+...
+
+//Get a n-th dimention size of tensor, if it has.
+int dn = shape[n-1];
+
+
+//Get a tensor's dimention
+int dims = mytensor.dims();
+
+//Get the size of tensor.
+//size = d1 x d2 x ... x dn.
+int size = mytensor.size();
+
+//Get the size of tensor at interval [Di, Dj)
+// form i-th dimention to j-th dimention, but not including the j-th dimention.
+// which means di x (di+1) x ... x (dj -1)
+int size = mytensor.count(start, end);
+```
+
+#### 设置tensor的shape
+
+我们可以用tensor的成员函数set_shape来设置tensor的shape。 下面是set_shape的定义
+
+
+```c++
+/**
+ * \brief set a tensor's shape
+ * \param valid_shape [a Shape object]
+ * \param shape [a Shape object]
+ * \param offset [a Shape object]
+ * \return the status of this operation, that means whether it success * or not.
+ */
+SaberStatus set_shape(Shape valid_shape, Shape shape = Shape::zero(TensorAPI::layout_dims::value), Shape offset = Shape::minusone(TensorAPI::layout_dims::value)); 
+```
+
+这个成员函数只设置tensor的shape。这些shape对象(valid_shape, shape, offset)的[LayOutType](#layout)必须和当前的tensor的相应三个shape对象的LayOutType相同，如果不同就会出错，返回SaberInvalidValue。 如果相同，那么将成功设置tensor的shape。
+
+```c++
+
+// some declarations
+// ...
+//valid_shape, shape , offset are Shape object;
+//All these Shape object's LayOutType must be equal to mytensor's.
+mytensor.set_shape(valid_shape, shape, offset);
+
+```
+
+#### 重置 tensor的shape
+
+```c++
+//some declarations
+Shape shape, valid_shape, offset;
+
+//do some initializations
+... 
+mytensor.reshape(valid_shape, shape, offset);
+```
+
+注意： Reshape操作仍然需要shape的[LayOutType](#layout) 与tensor的相同
+
+
+### Graph ###
+
+`Graph`类负责加载Anakin模型生成计算图、对图进行优化、存储模型等操作。
+
+#### 图的声明
+
+与`Tensor`一样，graph也接受三个模板参数。
+
+```c++
+
+template<typename TargetType, DataType Dtype, Precision Ptype>
+class Graph ... /* inherit other class*/{
+  
+  //some implements
+  ...
+
+};
+```
+
+前面已经介绍过[TargetType](#target)和[DataType](#datatype)是Anakin内部自定义数据类型。[TargetType](#target)表示平台类型 (如NV、X86), [DataType](#datatype)是Anakin基本数据类型与C++/C中的基本数据类型相对应。 [Precision](#precision)为op所支持的精度类型, 稍后我们在介绍它。
+
+
+```c++
+
+//Create a empty graph object.
+Graph graph = Graph<NV, AK_FLOAT, Precision::FP32> tmp();
+
+//Create a pointer to a empty graph.
+Graph *graph = new Graph<NV, AK_FLOAT, Precision::FP32>();
+
+//Create a pointer to a empty graph.
+auto graph = new Graph<NV, AK_FLOAT, Precision::FP32>();
+
+```
+
+#### 加载 Anakin 模型
+
+```c++
+//some declarations
+...
+auto graph = new Graph<NV, AK_FLOAT, Precision::FP32>();
+std::string model_path = "the/path/to/where/your/models/are";
+const char *model_path1 = "the/path/to/where/your/models/are";
+
+//Loading Anakin model to generate a compute graph.
+auto status = graph->load(model_path);
+
+//Or this way.
+auto status = graph->load(model_path1);
+//Check whether load operation success.
+if(!status){
+  std::cout << "error" << endl;
+  //do something...
+}
+
+```
+
+#### 优化计算图
+
+```c++
+//some declarations
+...
+//Load graph.
+...
+//According to the ops of loaded graph, optimize compute graph.
+graph->Optimize();
+
+```
+
+> 注意： 第一次加载原始图，必须要优化。
+
+#### 保存模型
+
+你可以在任何时候保存模型， 特别的， 你可以保存一个优化的模型，这样，下次再加载模型时，就不必进行优化操作。
+
+
+```c++
+//some declarations
+...
+//Load graph.
+...
+// save a model
+//save_model_path: the path to where your model is.
+auto status = graph->save(save_model_path);
+
+//Checking
+if(!status){
+  cout << "error" << endl;
+  //do somethin...
+}
+```
+
+#### 重新设置计算图里的tensor的shape
+
+```c++
+//some declarations
+...
+//Load graph.
+...
+vector<int> shape{10, 256, 256, 10};
+//input_name : std::string.
+//Reshape a tensor named input_name.
+graph->Reshape(input_name, shape);//Note: shape is a vector, not a Shape object.
+```
+
+#### 设置 batch size
+
+`Graph` 支持重新设置batch size的大小。
+
+```c++
+//some declarations
+...
+//Load graph.
+...
+//input_name : std::string.
+//Reset a tensor named input_name.
+int new_batch_size = 4;
+graph->ResetBatchSize(input_name, new_batch_size);
+```
+
+###  Net ###
+
+
+`Net` 是计算图的执行器。你可以通过Net对象获得输入和输出
+#### Creating a graph executor
+
+`Net`接受四个模板参数。  
+
+
+```c++
+template<typename TargetType, DataType Dtype, Precision PType OpRunType RunType = OpRunType::ASYNC>
+class Net{
+  //some implements
+  ...
+
+};
+```
+由于有些Op可能支持多种精度，我们可以通过Precision来指定。OpRunType表示同步或异步类型，异步是默认类型。OpRunType::SYNC表示同步，在GPU上只有单个流；OpRunType::ASYNC表示异步，在GPU上有多个流并以异步方式执行。实际上，Precision和OpRunType都是enum class, 详细设计请参考*source_root/framework/core/types.h*.
+
+
+1. <span id = 'precision'> Precision </span>
+
+Precision | Op support
+:---: | :---:
+Precision::INT4 | NO
+Precision::INT8 | NO
+Precision::FP16 | NO
+Precision::FP32 | YES
+Precision::FP64 | NO
+
+现在Op的精度只支持FP32， 但在将来我们会支持剩下的Precision.
+
+
+
+2. OpRunType
+
+OpRunType | Sync/Aync |Description
+:---: | :---: | :---:
+OpRunType::SYNC | Synchronization | single-stream on GPU
+OpRunType::ASYNC | Asynchronization | multi-stream on GPU
+
+用graph对象创建一个执行器。
+```c++
+//some declarations
+...
+//Create a pointer to a graph.
+auto graph = new Graph<NV, AK_FLOAT, Precision::FP32>();
+//do something...
+...
+
+//create a executor
+Net<NV, AK_FLOAT, Precision::FP32> executor(*graph);
+
+```
+
+#### 获取输入输出tensor
+
+
+获取输入输出tensor，并填充输入tensor的buffer。如果想要获取输入和输出tensor，那么必须指定输入的名字，如"input_0", "input_1", "input_2", ..., 必须传入如上字符串才能够获得输入tensor。另外，如果想知道input_i对应哪个输入，你需要去dash board查看，如何使用dash board请看[Anakin Parser](Converter_ch.md)。请看如下示例代码
+
+```c++
+//some declaratinos
+...
+
+//create a executor
+//TargetType is NV [NVIDIA GPU]
+Net<NV, AK_FLOAT, Precision::FP32> executor(*graph);
+
+//Get the first input tensor.
+//The following tensors(tensor_in0, tensor_in2 ...) are resident at GPU.
+//Note: Member function get_in returns an pointer to tensor.
+Tensor<NV, AK_FLOAT>* tensor_in0 = executor.get_in("input_0");
+
+//If you have multiple input tensors
+//You just type this code below.
+Tensor<NV, AK_FLOAT>* tensor_in1 = executor.get_in("input_1");
+...
+auto tensor_inn = executor.get_in("input_n");
+```
+
+当得到输入tensor之后，就可以填充它的数据区了。
+
+```c++
+//This tensor is resident at GPU.
+auto tensor_d_in = executor.get_in("input_0");
+
+//If we want to feed above tensor, we must feed the tensor which is resident at host. And then copy the host tensor to the device's one.
+
+//using Tensor4d = Tensor<Ttype, Dtype>;
+Tensor4d<X86, AK_FLOAT> tensor_h_in; //host tensor;
+//Tensor<X86, AK_FLOAT> tensor_h_in; 
+
+//Allocate memory for host tensor.
+tensor_h_in.re_alloc(tensor_d_in->valid_shape());
+//Get a writable pointer to tensor.
+float *h_data = tensor_h_in.mutable_data();
+
+//Feed your tensor.
+/** example
+for(int i = 0; i < tensor_h_in.size(); i++){
+  h_data[i] = 1.0f;
+}
+*/
+//Copy host tensor's data to device tensor.
+tensor_d_in->copy_from(tensor_h_in);
+
+// And then
+```
+
+
+类似的，我们可以利用成员函数get_out来获得输出tensor。但与获得输入tensor不同的是， 我们需要指定输入tensor结点的名字，这个可以从dash board中看到，请从[Anakin Parser](Converter_ch.md)中查看dash board的使用方法。假如有个输出结点叫pred_out, 那么我们可以通过如下代码获得相应的输出tensor：
+```c++
+//Note: this tensor are resident at GPU.
+Tensor<NV, AK_FLOAT>* tensor_out_d = executor.get_out("pred_out");
+
+```
+
+
+#### Executing graph
+
+
+当一切准备就绪后，我们就可以执行真正的计算了！
+```c++
+executor.prediction();
+```
+ 
+## <span id='example'> 示例代码 </span> ##
+
+下面的例子展示了如何调用Anakin。
+
+在这儿之前， 请确保你已经有了Anakin模型。如果还没有，那么请使用[Anakin Parser](Converter_ch.md)转换你的模型。
+
+### Single-thread
+
+单线程例子在 *source_root/test/framework/net/net_exec_test.cpp`*
+
+```c++
+
+std::string model_path = "your_Anakin_models/xxxxx.anakin.bin";
+// Create an empty graph object.
+auto graph = new Graph<NV, AK_FLOAT, Precision::FP32>();
+// Load Anakin model.
+auto status = graph->load(model_path);
+if(!status ) {
+    LOG(FATAL) << " [ERROR] " << status.info();
+}
+// Reshape
+graph->Reshape("input_0", {10, 384, 960, 10});
+// You must optimize graph for the first time.
+graph->Optimize();
+// Create a executer.
+Net<NV, AK_FLOAT, Precision::FP32> net_executer(*graph);
+
+//Get your input tensors through some specific string such as "input_0", "input_1", and 
+//so on. 
+//And then, feed the input tensor.
+//If you don't know Which input do these specific string ("input_0", "input_1") correspond with, you can launch dash board to find out.
+auto d_tensor_in_p = net_executer.get_in("input_0");
+Tensor4d<X86, AK_FLOAT> h_tensor_in;
+auto valid_shape_in = d_tensor_in_p->valid_shape();
+for (int i=0; i<valid_shape_in.size(); i++) {
+    LOG(INFO) << "detect input dims[" << i << "]" << valid_shape_in[i]; //see tensor's dimentions
+}
+h_tensor_in.re_alloc(valid_shape_in);
+float* h_data = h_tensor_in.mutable_data();
+for (int i=0; i<h_tensor_in.size(); i++) {
+    h_data[i] = 1.0f;
+}
+d_tensor_in_p->copy_from(h_tensor_in);
+
+//Do inference.
+net_executer.prediction();
+
+//Get result tensor through the name of output node.
+//And also, you need to see the dash board again to find out how many output nodes are and remember their name.
+
+//For example, you've got a output node named obj_pre_out
+//Then, you can get an output tensor.
+auto d_tensor_out_0_p = net_executer.get_out("obj_pred_out"); //get_out returns a pointer to output tensor.
+auto d_tensor_out_1_p = net_executer.get_out("lc_pred_out"); //get_out returns a pointer to output tensor.
+//......
+// do something else ...
+//...
+//save model.
+//You might not optimize the graph when you load the saved model again.
+std::string save_model_path = model_path + std::string(".saved");
+auto status = graph->save(save_model_path);
+if (!status ) {
+    LOG(FATAL) << " [ERROR] " << status.info();
+}
+
+```
--- a/source/advanced_usage/deploy/build_and_install_lib_cn.rst
+++ b/source/advanced_usage/deploy/build_and_install_lib_cn.rst
--- a/doc/fluid/advanced_usage/deploy/convert_paddle_to_anakin.md
+++ b/doc/fluid/advanced_usage/deploy/convert_paddle_to_anakin.md
+# 模型转换指南
+
+Anakin 支持不同框架的模型预测。但由于格式的差别，Anakin 需要您预先转换模型。本文档介绍如何转换模型。
+
+## 简介
+
+Anakin 模型转换器输入支持 Caffe 和 Fluid 两种格式的预测模型，模型包含网络结构（model 或 prototxt）和权重参数（param 或 caffemodel）。   
+
+模型转换的输出是一个 bin 文件，它作为 Anakin 框架的 graph 参数导入。   
+
+您还可以使用模型转换器的 launch board 功能生成网络结构的 HTML 预览。   
+
+
+## 系统要求
+
+- python 2.7+
+- pyyaml
+- flask
+- protobuf 3.5+
+
+
+## 用法
+
+### 1、环境
+转换器所需的依赖标注于 *系统要求* 一节。
+
+### 2、配置
+您需要对 *config.yaml* 文件进行修改以告知您的需求。工程中给出了 *config.yaml* 示例，下面作进一步说明。
+
+#### config.yaml
+```bash
+OPTIONS:
+    Framework: CAFFE       # 依框架类型填写 CAFFE 或 FLUID
+    SavePath: ./output     # 转换结束后模型的保存位置
+    ResultName: googlenet  # 输出模型的名字
+    Config:
+        LaunchBoard: ON    # 是否生成网络结构预览页面
+        Server:
+            ip: 0.0.0.0
+            port: 8888     # 从一个可用端口访问预览页面
+        OptimizedGraph:    # 当您使用了 Anakin 框架的 Optimized 功能时，才应该打开此项
+            enable: OFF
+            path: /path/to/anakin_optimized_anakin_model/googlenet.anakin.bin.saved
+    LOGGER:
+        LogToPath: ./log/  # 生成日志的路径
+        WithColor: ON
+
+TARGET:
+    CAFFE:
+        # 当 Framework 为 CAFFE 时需填写
+        ProtoPaths:
+            - /path/to/caffe/src/caffe/proto/caffe.proto
+        PrototxtPath: /path/to/your/googlenet.prototxt
+        ModelPath: /path/to/your/googlenet.caffemodel
+
+    FLUID:
+        # 当 Framework 为 FLUID 时需填写
+        Debug: NULL
+        ProtoPaths:
+            - /
+        PrototxtPath: /path/to/fluid/inference_model
+        ModelPath: /path/to/fluid/inference_model
+	# ...
+```
+
+### 3、转换
+在完成配置文件的修改后，您只需执行 ```python converter.py``` 就可以进行模型转换了。
+
+
+### 4、预览
+最后一步，就是在浏览器中查看令人振奋的转换结果！网址是在 *config.yaml* 中配置的，例如 http://0.0.0.0:8888 。
+
+> 注意：若您使用了默认的 IP 地址 0.0.0.0，请在预览时使用真实的服务器地址 real_ip:port 替代它。
--- a/doc/fluid/advanced_usage/deploy/how_to_add_anakin_op.md
+++ b/doc/fluid/advanced_usage/deploy/how_to_add_anakin_op.md
+# 如何增加新的Operator
+
+## 基本概念
+
+简单介绍下几个同Operator相关的基本概念，详情请参考设计文档。
+
+```framework```: 上层的逻辑代码，负责从parser中获取参数及weights，添加op时主要修改framework/operator目录下的内容。
+
+```saber```: 底层的实现代码，Anakin通过saber封装了不同的backends，不同的实现(impl)分别特化出自己的实现，外层framework通过不同的template进入各自的impl完成调用。各个op的parameter放在saber/saber_funcs_param.h文件中，增加op主要修改saber/funcs下的内容。
+
+saber的文件结构：
+* saber/funcs下的是各个funcs的外部接口，这一层的op与具体的设备实现无关，只与各op完成的功能有关。由于跟实现(impl)无关，本层文件明均不带impl。
+* saber/funcs/impl下是各个op的impl声明，特定设备需要完成该层声明的特化版本，如saber/funcs/impl/x86实现了上一层impl声明的x86特化版本，saber/funcs/impl/cuda实现了上一层impl声明的NV特化版本。当增加新的backends时需要特化出新的实现。本层代码同实现相关，均带有```impl_```前缀。
+* saber/funcs/impl/cuda/base/cuda_c内有cuda```.cu```扩展名的文件，添加cuda的kernel需要在该文件目录下添加。
+* saber/funcs/impl/cuda/base/sass 内有不同架构的汇编代码编译的静态库。
+
+### 涉及到的基类及各个类之前的关系
+
+简单介绍相关的基类
+
+* ```anakin::Operator```: framework的operator基类，位于framework/core/operator/operator.h
+
+* ```anakin::saber::BaseFunc```: saber对外的op接口基类，提供统一的对外接口，位于saber/funcs/base.h。BaseFunc的```compute_output_shape```接口只根据input的shape和param的参数计算输出的shape，并通过```tensor```的```set_shape```接口(只设置shape，不分配空间)设置到output中。```operator()```接口为各个op的计算接口。
+
+* ```ankain::saber::ImplBase```: saber设备实现的op的接口，所有设备相关实现的基类。位于saber/funcs/impl/impl_base.h。实现版本中这里分为两类，一类以```vender_```为前缀，带有```vender_```代码意为使用第三方库来实现该op，如cudnn的conv，或mkl的conv等等，这类op的性能我们难以调优，因此单独列为一类。另一类是带有源码的saber实现，这些实现都带有```saber_```为前缀，此类实现带有源码，能够通过后续优化不断提升性能，实现起名时需要注意这一点。
+
+## 添加operator
+
+添加一个新的op需要以下几步：
+
+1. 添加saber的param
+2. 定义saber的Operator类
+3. 定义新的impl声明
+3. 完成新的impl实现
+4. 增加framework的实现或特化
+
+接下来就针对这几步，以一个简单例子为例介绍实现。
+
+例如我们要添加新的Mul op。给出计算公式如下：$$Out = alpha \dot X * Y$$
+
+### 为operator增加param
+
+涉及到的文件：```saber/saber_funcs_param.h```。如果之前已经存在需要添加的op的param，这一步可以跳过。
+这里```XXXParam```是一个```struct```。包含一个无参数的构造函数，含参数的构造函数，复制构造函数，```operator=()```及```operator==()```。
+```
+template <typename opTensor> // 能够获得target, datatype, layout
+struct MulParam{
+  MulParam()
+    : alpha(0)
+  {}
+  MulParam(float alpha_in)
+    : alpha(alpha_in)
+  {}
+  MulParam(const MulParam& right)
+    : alpha(right.alpha)
+  {}
+  MulParam &operator=(const MulParam &right) {
+    alpha = right.alpha;
+  }
+  bool operator==(const MulParam &right) {
+    return alpha == right.alpha;
+  }
+  float alpha;
+};
+```
+
+### 定义Operator类
+涉及到的文件:```saber/funcs/mul.h```。如果之前定义过该op的类，这里需要修改输入的impl定义头文件。
+下面给出一个相对完整的定义结构供参考。
+```
+//不同的设备需要包含对应的operator实现.[详见](#impl)
+#ifdef NVIDIA_GPU
+#include "saber/funcs/impl/cuda/saber_mul.h"
+#include "saber/funcs/impl/cuda/vender_mul.h"
+#endif
+//如果一个设备现在还没有对应的operator实现，需要包含声明。[详见](#declare)
+#ifdef USE_X86_PLACE
+#include "saber/funcs/impl/impl_mul.h"
+#endif
+namespace anakin {
+namespace saber {
+template<typename TargetType,
+        DataType OpDtype,
+        DataType inDtype = AK_FLOAT,
+        DataType outDtype = AK_FLOAT,
+        typename LayOutType_op = NCHW,
+        typename LayOutType_in = NCHW,
+        typename LayOutType_out = NCHW>
+class Mul : public BaseFunc<
+        Tensor<TargetType, inDtype, LayOutType_in>,
+        Tensor<TargetType, outDtype, LayOutType_out>,
+        Tensor<TargetType, OpDtype, LayOutType_op>,
+        ImplBase, MulParam> {
+public:
+    using BaseFunc<
+            Tensor<TargetType, inDtype, LayOutType_in>,
+            Tensor<TargetType, outDtype, LayOutType_out>,
+            Tensor<TargetType, OpDtype, LayOutType_op>,
+            ImplBase, MulParam>::BaseFunc;
+    Mul() = default;
+    typedef Tensor<TargetType, inDtype, LayOutType_in> InDataTensor;
+    typedef Tensor<TargetType, outDtype, LayOutType_out> OutDataTensor;
+    typedef Tensor<TargetType, OpDtype, LayOutType_op> OpTensor;
+    typedef MulParam<OpTensor> Param_t;
+    typedef std::vector<InDataTensor *> Input_v;
+    typedef std::vector<OutDataTensor *> Output_v;
+    typedef std::vector<Shape> Shape_v;
+
+    virtual SaberStatus compute_output_shape(const Input_v &input,
+                                             Output_v &output, Param_t &param) override {
+        //计算输出的shape，
+        Shape output_shape = (input[0]->valid_shape());
+        /* code */
+        return output[0]->set_shape(output_shape);
+    }
+    virtual SaberStatus init_impl(ImplEnum implenum) override {
+      // 不同设备均使用此init_impl, 此接口创建对应impl的实现。
+      switch (implenum) {
+            case VENDER_IMPL:
+                this->_impl.push_back(new VenderMul <TargetType,
+                OpDtype, inDtype, outDtype,
+                LayOutType_op, LayOutType_in, LayOutType_out>);
+                return SaberSuccess;
+            case SABER_IMPL:
+                this->_impl.push_back(new SaberMul <TargetType,
+                OpDtype, inDtype, outDtype,
+                LayOutType_op, LayOutType_in, LayOutType_out>);
+                return SaberSuccess;
+            default:
+                return SaberUnImplError;
+        }
+    }
+private:
+    virtual void pick_best_static() override {
+        if (true) // some condition?
+            this->_best_impl = this->_impl[0];
+    }
+    virtual void pick_best_specify(ImplEnum implenum) override {
+        this->_best_impl = this->_impl[0];
+    }
+};
+} // namespace saber
+} // namespace anakin
+```
+
+### 为operator增加新的impl<span id="declare">声明</span>
+
+涉及的文件:```saber/funcs/impl/impl_mul.h```。不同的设备都特化同一个声明，特化版本放在对应的文件夹下，这里的声明就是给出所有设备的统一声明。下面给出一个参考。
+```
+#include "saber/funcs/impl/impl_macro.h"
+namespace anakin{
+namespace saber{
+DEFINE_OP_CLASS(Mul, MulParam); // 第一个参数是op的名字，第二个是对应param的名字
+}
+}
+```
+
+### 完成新的operator特定后端<span id="impl">实现</span>
+
+涉及的文件:```saber/funcs/impl/xxx/vender_mul.h```或```saber/funcs/impl/xxx/saber_mul.h```
+这里```xxx```指代特定的一种设备。```vender```是指的使用第三方库实现的op，```saber```指的源码实现的op。这里以cuda的vender实现为例，简单介绍一下特化出的函数的几个基本接口。
+
+```
+// include 对应的声明
+#include "saber/funcs/impl/impl_mul.h"
+
+namespace anakin{
+namespace saber{
+template <DataType OpDtype,
+    DataType inDtype,
+    DataType outDtype,
+    typename LayOutType_op,
+    typename LayOutType_in,
+    typename LayOutType_out>
+class VenderMul<NV, //偏特化出需要的后端。
+    OpDtype, inDtype, outDtype,
+    LayOutType_op, LayOutType_in, LayOutType_out> :
+    public ImplBase<
+        Tensor<NV, inDtype, LayOutType_in>,
+        Tensor<NV, outDtype, LayOutType_out>,
+        Tensor<NV, OpDtype, LayOutType_op>,
+        MulParam<Tensor<NV, OpDtype, LayOutType_op> > >
+{
+public:
+    typedef Tensor<NV, inDtype, LayOutType_in> DataTensor_in;
+    typedef Tensor<NV, outDtype, LayOutType_out> DataTensor_out;
+    typedef Tensor<NV, OpDtype, LayOutType_op> OpTensor;
+    typedef typename DataTensor_in::Dtype InDataType;
+    typedef typename DataTensor_out::Dtype OutDataType;
+    typedef typename OpTensor::Dtype OpDataType;
+    VenderMul(){}
+    ~VenderMul() {}
+
+    virtual SaberStatus init(const std::vector<DataTensor_in *>& inputs,
+                            std::vector<DataTensor_out *>& outputs,
+                            MulParam<OpTensor>& param, Context<NV>& ctx) {
+        this->_ctx = ctx;
+        create(inputs, outputs, param, ctx);
+    }
+
+    virtual SaberStatus create(const std::vector<DataTensor_in *>& inputs,
+                            std::vector<DataTensor_out *>& outputs,
+                            MulParam<OpTensor>& param, Context<NV>& ctx) {
+        // set内部参数
+    }
+
+    virtual SaberStatus dispatch(const std::vector<DataTensor_in*>& inputs,
+                          std::vector<DataTensor_out*>& outputs,
+                        MulParam<OpTensor>& param) {
+        // dispatch kernel.
+    }
+
+private:
+};
+}
+}
+```
+```init```和```create```的区别：```init```接口是第一次初始化op的时候进入的接口，此函数只在第一次初始化op时调用，这个接口一般放一些只需要执行一次的代码，如malloc或者create之类的函数。```create```函数除了第一次init执行外，在输入发生变化或者param发生变化时会再次触发，create一般放置set函数，设置内部变量，当input发生变化时这里执行一些同input或weights直接相关的代码。但create因为触发位置在网络内，如果```create```函数执行了一些严重耗时的操作，这里会拖慢整个op的执行时间，需要慎重选择操作放置的位置。
+### 添加framework的特化
+
+涉及的文件:```framework/operators/mul.h```和```framework/operators/mul.cpp```。
+这里简单介绍下如果添加或修改framework内的operator
+
+```
+#include "framework/core/base.h"
+#include "framework/core/data_types.h"
+#include "framework/core/operator/operator.h"
+#include "utils/logger/logger.h"
+#include "saber/funcs/mul.h" // 需要包对应的saber头文件
+namespace anakin {
+namespace ops {
+template<typename Ttype, DataType Dtype, Precision Ptype>
+class MulHelper;
+
+template<typename Ttype, DataType Dtype, Precision Ptype>
+class Mul : public Operator<Ttype, Dtype, Ptype> {
+public:
+    Mul() {}
+    /// forward impl
+    virtual void operator() (OpContext<Ttype> &ctx,
+                             const std::vector<Tensor4dPtr<Ttype, Dtype> >& ins,
+                             std::vector<Tensor4dPtr<Ttype, Dtype> >& outs) {
+        LOG(ERROR) << "Not Impl Yet Operator power<TargetType:"<<"unknown"<<","
+                   <<type_id<typename DataTypeWarpper<Dtype>::type>().type_info()<<">";
+    }
+    friend class MulHelper<Ttype, Dtype, Ptype>;
+};
+template<typename Ttype, DataType Dtype, Precision Ptype>
+class MulHelper : public OperatorHelper<Ttype, Dtype, Ptype> {
+public:
+    MulHelper() = default;
+    ~MulHelper();
+    Status InitParam() override;
+
+    Status Init(OpContext<Ttype> &ctx,
+                const std::vector<Tensor4dPtr<Ttype, Dtype> >& ins,
+                std::vector<Tensor4dPtr<Ttype, Dtype> >& outs) override;
+    Status InferShape(const std::vector<Tensor4dPtr<Ttype, Dtype> >& ins,
+                      std::vector<Tensor4dPtr<Ttype, Dtype> >& outs) override;
+
+public:
+    saber::MulParam<Tensor4d<Ttype, Dtype>> _param_mul;
+    saber::Mul<Ttype, Dtype> _funcs_mul;
+};
+}
+} /* namespace anakin */
+```
+对应的```.cpp```文件如下：
+```
+#include "framework/operators/mul.h"
+
+namespace anakin {
+namespace ops {
+
+#ifdef USE_CUDA
+template<>
+void Mul<NV, AK_FLOAT, Precision::FP32>::operator()(
+    OpContext<NV>& ctx,
+    const std::vector<Tensor4dPtr<NV, AK_FLOAT> >& ins,
+    std::vector<Tensor4dPtr<NV, AK_FLOAT> >& outs) {
+    auto* impl =
+        static_cast<MulHelper<NV, AK_FLOAT, Precision::FP32>*>(this->_helper);
+    auto& param =
+        static_cast<MulHelper<NV, AK_FLOAT, Precision::FP32>*>(this->_helper)->_param_mul;
+    impl->_funcs_mul(ins, outs, param, ctx);
+}
+#endif
+
+template<typename Ttype, DataType Dtype, Precision Ptype>
+Status MulHelper<Ttype, Dtype, Ptype>::InitParam() {
+    auto alpha = GET_PARAMETER(float, alpha);
+    MulParam<Tensor4d<Ttype, Dtype>> param_mul(alpha);
+    _param_mul = param_mul;
+    return Status::OK();
+}
+
+template<typename Ttype, DataType Dtype, Precision Ptype>
+Status MulHelper<Ttype, Dtype, Ptype>::Init(OpContext<Ttype>& ctx,
+        const std::vector<Tensor4dPtr<Ttype, Dtype> >& ins,
+        std::vector<Tensor4dPtr<Ttype, Dtype> >& outs) {
+
+    SABER_CHECK(_funcs_mul.init(ins, outs, _param_mul, SPECIFY, VENDER_IMPL, ctx));
+    return Status::OK();
+}
+
+template<typename Ttype, DataType Dtype, Precision Ptype>
+Status MulHelper<Ttype, Dtype, Ptype>::InferShape(const
+        std::vector<Tensor4dPtr<Ttype, Dtype> >& ins,
+        std::vector<Tensor4dPtr<Ttype, Dtype> >& outs) {
+    SABER_CHECK(_funcs_mul.compute_output_shape(ins, outs, _param_mul));
+    return Status::OK();
+}
+
+#ifdef USE_CUDA
+template class MulHelper<NV, AK_FLOAT, Precision::FP32>;
+#endif
+#ifdef USE_ARM_PLACE
+template class MulHelper<ARM, AK_FLOAT, Precision::FP32>;
+#endif
+// register helper
+#ifdef USE_CUDA
+ANAKIN_REGISTER_OP_HELPER(Mul, MulHelper, NV, AK_FLOAT, Precision::FP32);
+#endif
+#ifdef USE_ARM_PLACE
+ANAKIN_REGISTER_OP_HELPER(Mul, MulHelper, ARM, AK_FLOAT, Precision::FP32);
+#endif
+//! register op
+ANAKIN_REGISTER_OP(Mul)
+.Doc("Mul operator")
+#ifdef USE_CUDA
+.__alias__<NV, AK_FLOAT, Precision::FP32>("mul")
+#endif
+#ifdef USE_ARM_PLACE
+.__alias__<ARM, AK_FLOAT, Precision::FP32>("mul")
+#endif
+.num_in(1)
+.num_out(1)
+.Args<float>("alpha", " alpha of Mul "); //注册
+
+} /* namespace ops */
+
+} /* namespace anakin */
+```
+
+## 实现单元测试
+涉及的文件:```test/saber/xxx/test_saber_funcs_mul_xxx.cpp```
+在对应的test下需要添加新的单元测试
+
+```
+TEST(TestSaberFuncNV, test_depthwise_conv) {
+
+    // init tensors and some param.
+
+    // start Reshape & doInfer
+    Context<NV> ctx1(0, 1, 1);
+
+    // create param
+    MulParam<Tensor<NV, AK_FLOAT, NCHW> > param(alpha);
+
+    std::vector<Tensor<NV, AK_FLOAT, NCHW>*> input;
+    std::vector<Tensor<NV, AK_FLOAT, NCHW>*> output;
+
+    // create saber op
+    Mul<NV, AK_FLOAT, AK_FLOAT, AK_FLOAT, NCHW> mul;
+
+    // compute output shape
+    mul.compute_output_shape(input, output, param);
+
+    // re_alloc output tensors memory based on output shape
+    output[0]->re_alloc(output[0]->shape());
+
+    // init saber op(calling init and create)
+    mul.init(input, output, param, SPECIFY, VENDER_IMPL, ctx1);
+
+    // call operator()
+    mul(input, output, param, ctx1);
+
+    // cuda specified, record events
+    cudaStream_t cuda_stream = ctx1.get_compute_stream();
+    output[0]->record_event(cuda_stream);
+    output_dev.sync();
+    
+    // param changed 
+    param.alpha = 2.0;
+    // auto calling saber op(create and dispatch)
+    mul(input, output, param, ctx1);
+
+    cudaDeviceSynchronize();
+    CUDA_CHECK(cudaPeekAtLastError());
+}
+
+int main(int argc, const char** argv){
+    anakin::saber::Env<NV>::env_init();
+
+    // initial logger
+    //logger::init(argv[0]);
+    InitTest();
+    RUN_ALL_TESTS(argv[0]);
+    return 0;
+}
+
+```
+## 调试及注意事项
+
+一个op需要有对外的op接口和内部实现，由于存在saber/funcs/impl的非特化版本声明，当有op在某种设备下没有对应实现时，也能够编译，但此时是没有任何实现的空实现，
--- a/doc/fluid/advanced_usage/deploy/how_to_support_new_device_in_anakin.md
+++ b/doc/fluid/advanced_usage/deploy/how_to_support_new_device_in_anakin.md
+# 如何支持一个新的设备
+
+## 概览
+
+添加一个新的设备需要以下3个步骤：
+
+* [在`CMakeList`中添加设备的支持](#0001)
+* [在`saber`中添加设备的实现](#0002)
+* [在`framework`中添加设备的具体化或实例化](#0003)
+
+假设新设备的名称为`TNEW`, 以下将以这个设备名称进行演示。
+
+## <span id = '0001'> 在`CMakeList`中添加设备的支持 </span> ##
+
+* 修改根目录`CMakeList.txt`
+```cmake
+#select the plantform to build
+anakin_option(USE_GPU_PLACE "Select the build mode for GPU place." NO)
+anakin_option(USE_X86_PLACE "Select the build mode for X86 place." NO)
+anakin_option(USE_ARM_PLACE "Select the build mode for ARM place." NO)
+anakin_option(USE_TNEW_PLACE "Select the build mode for ARM place." YES)
+```
+
+* 修改`saber/CMakeList.txt`
+
+根据新增设备的目录完善`saber`目录下的`CMakeList.txt`。
+```cmake
+if(USE_TNEW_PLACE)
+    anakin_fetch_files_with_suffix(${ANAKIN_SABER}/core/impl/tnew "cpp" ANAKIN_SABER_BASE_SRC)
+    anakin_fetch_files_with_suffix(${ANAKIN_SABER}/funcs/impl/tnew "cpp" ANAKIN_SABER_BASE_SRC)
+endif()
+```
+
+* 修改`test/CMakeList.txt`
+
+新增设备的单测文件放在`test/saber/tnew`目录下，修改`test`目录下的`CMakeList.txt`。
+```cmake
+if(USE_TNEW_PLACE)
+    anakin_fetch_files_with_suffix(${ANAKIN_UNIT_TEST}/saber/tnew "cpp" ANAKIN_TEST_CASE_SRC)
+endif()
+```
+
+* 修改`cmake/anakin_config.h.in`
+```c++
+// plantform to use
+#cmakedefine USE_GPU_PLACE
+
+#cmakedefine USE_X86_PLACE
+
+#cmakedefine USE_ARM_PLACE
+
+#cmakedefine USE_TNEW_PLACE
+```
+
+* 其他依赖和编译选项    
+修改`cmake`目录下的`compiler_options.cmake`和`find_modules.cmake`
+
+
+## <span id = '0002'> 在`saber`中添加设备的实现 </span> ##
+`saber`是`Anakin`的基础计算库，对外提供设备无关的统一的API，设备相关的实现都会封装到`TargetWrapper`中。
+
+### 在`saber/saber_types.h`中添加设备
+
+```c++
+enum TargetTypeEnum {
+    eINVALID = -1,
+    eNV = 1,
+    eAMD = 2,
+    eARM = 3,
+    eX86 = 4,
+    eNVHX86 = 5,
+    eTNEW = 6
+};
+
+typedef TargetType<eNV> NV;
+typedef TargetType<eARM> ARM;
+typedef TargetType<eAMD> AMD;
+typedef TargetType<eX86> X86;
+typedef TargetType<eTNEW> TNEW;
+
+```
+
+### 在`saber/core`中添加设备的实现
+
+1. 在`target_traits.h`中添加新设备
+
+* 增加设备类型
+```c++
+struct __cuda_device{};
+struct __arm_device{};
+struct __amd_device{};
+struct __x86_device{};
+struct __tnew_device{};
+```
+
+* `TargetTypeTraits`模板具体化
+```c++
+template <>
+struct TargetTypeTraits<TNEW> {
+    typedef __xxx_target target_category;//根据实际设备是host端还是device端进行选择
+    typedef __tnew_device target_type;
+};
+```
+
+2. 在`data_traits.h`中特化`DataTrait`模板类
+
+如果设备需要特殊的数据类型，则特化出设备的`DataTrait`类的实现，例如opencl数据类型的实现如下：
+```c++
+#ifdef USE_OPENCL
+struct ClMem{
+    ClMem(){
+        dmem = nullptr;
+        offset = 0;
+    }
+
+    ClMem(cl_mem* mem_in, int offset_in = 0) {
+        dmem = mem_in;
+        offset = offset_in;
+    }
+
+    ClMem(ClMem& right) {
+        dmem = right.dmem;
+        offset = right.offset;
+    }
+
+    ClMem& operator=(ClMem& right) {
+        this->dmem = right.dmem;
+        this->offset = right.offset;
+        return *this;
+    }
+
+    ClMem& operator+(int offset_in) {
+        this->offset += offset_in;
+        return *this;
+    }
+
+    int offset{0};
+    cl_mem* dmem;
+};
+
+template <>
+struct DataTrait<AMD, AK_FLOAT> {
+    typedef ClMem Dtype;
+    typedef float dtype;
+};
+
+template <>
+struct DataTrait<AMD, AK_DOUBLE> {
+    typedef ClMem Dtype;
+    typedef double dtype;
+};
+
+template <>
+struct DataTrait<AMD, AK_INT8> {
+    typedef ClMem Dtype;
+    typedef char dtype;
+};
+#endif //use_opencl
+```
+
+3. 在`target_wrapper.h`中特化`TargetWrapper`模板类
+
+特化`TargetWrapper`模板类，在`target_wrapper.h`中声明函数，具体如下：
+```c++
+template <>
+struct TargetWrapper<TNEW, __xxx_target> { //根据TNEW的具体类型修改__xxx_target，__host_target或者__device_target
+
+    typedef xxx_event event_t;          //根据设备实现xxx_event
+    typedef xxx_stream stream_t;        //根据设备实现xxx_stream
+
+    static void get_device_count(int& count);
+
+    static void set_device(int id);
+
+    //We should add strategy to avoid malloc directly
+    static void mem_alloc(void** ptr, size_t n);
+
+    static void mem_free(void* ptr);
+
+    static void mem_set(void* ptr, int value, size_t n);
+
+    static void create_event(event_t& event, bool flag = false);
+
+    static void create_stream(stream_t& stream);
+
+    static void create_stream_with_flag(stream_t& stream, unsigned int flag);
+
+    static void create_stream_with_priority(stream_t& stream, unsigned int flag, int priority);
+
+    static void destroy_stream(stream_t& stream);
+
+    static void destroy_event(event_t& event);
+
+    static void record_event(event_t& event, stream_t stream);
+
+    static void query_event(event_t& event);
+
+    static void sync_event(event_t& event);
+
+    static void sync_stream(event_t& event, stream_t& stream);
+
+    static void sync_memcpy(void* dst, int dst_id, const void* src, int src_id, \
+                            size_t count, __DtoD);
+
+    static void async_memcpy(void* dst, int dst_id, const void* src, int src_id, \
+                             size_t count, stream_t& stream, __DtoD);
+
+    static void sync_memcpy(void* dst, int dst_id, const void* src, int src_id, \
+                            size_t count, __HtoD);
+
+    static void async_memcpy(void* dst, int dst_id, const void* src, int src_id, \
+                             size_t count, stream_t& stream, __HtoD);
+
+    static void sync_memcpy(void* dst, int dst_id, const void* src, int src_id, \
+                            size_t count, __DtoH);
+
+    static void async_memcpy(void* dst, int dst_id, const void* src, int src_id, \
+                             size_t count, stream_t& stream, __DtoH);
+
+    static void sync_memcpy_p2p(void* dst, int dst_dev, const void* src, \
+                                int src_dev, size_t count);
+
+    static void async_memcpy_p2p(void* dst, int dst_dev, const void* src, \
+                                 int src_dev, size_t count, stream_t& stream);
+
+    static int get_device_id();
+};
+
+```
+
+4. 在`impl/`目录下添加设备目录和实现
+
+在`saber/core/impl`目录下添加设备目录`tnew`。
+* 实现`TargetWrapper<TNEW, __xxx_target>`结构体中各函数的定义。    
+如果`TargetWrapper<TNEW, __xxx_target>`的实现与默认的模板类一致，则不用特化出该类。
+
+```c++
+typedef TargetWrapper<TNEW, __xxx_target> TNEW_API;
+void TNEW_API::get_device_count(int &count) {
+    // add implementation
+}
+
+void TNEW_API::set_device(int id){
+    // add implementation
+}
+        
+void TNEW_API::mem_alloc(void** ptr, size_t n){
+    // add implementation
+}
+        
+void TNEW_API::mem_free(void* ptr){
+    if(ptr != nullptr){
+        // add implementation
+    }
+}
+...
+
+```
+
+* 特化实现`device.h`中的`Device<TNEW>`
+
+```c++
+template <>
+void Device<TNEW>::create_stream() {
+    // add implementation
+}
+
+template <>
+void Device<TNEW>::get_info() {
+
+    // add implementation
+}
+
+```
+
+### 在`saber/funcs`中实现设备相关的op
+
+参考[如何增加新的Operator](addCustomOp.md)
+
+
+## <span id = '0003'> 在`framework`中添加设备的具体化或实例化 </span> ##
+
+### `framework/core`
+
+* `net.cpp`中添加实例化
+
+```c++
+#ifdef USE_TNEW_PLACE
+template class Net<TNEW, AK_FLOAT, Precision::FP32, OpRunType::ASYNC>;
+template class Net<TNEW, AK_FLOAT, Precision::FP32, OpRunType::SYNC>;
+#endif
+```
+
+* `operator_func.cpp`中添加实例化
+
+```c++
+#ifdef USE_TNEW_PLACE
+template class OperatorFunc<TNEW, AK_FLOAT, Precision::FP32>;
+#endif
+```
+
+* `worker.cpp`中添加实例化
+
+```c++
+#ifdef USE_TNEW_PLACE
+template class Worker<TNEW, AK_FLOAT, Precision::FP32, OpRunType::ASYNC>;
+template class Worker<TNEW, AK_FLOAT, Precision::FP32, OpRunType::SYNC>;
+#endif
+```
+
+* `operator_attr.cpp`中添加实例化
+
+```c++
+template
+OpAttrWarpper& OpAttrWarpper::__alias__<TNEW, AK_FLOAT, Precision::FP32>(const std::string& op_name);
+template
+OpAttrWarpper& OpAttrWarpper::__alias__<TNEW, AK_FLOAT, Precision::FP16>(const std::string& op_name);
+template
+OpAttrWarpper& OpAttrWarpper::__alias__<TNEW, AK_FLOAT, Precision::INT8>(const std::string& op_name);
+```
+
+* `parameter.h`中添加设备的实现
+
+```c++
+#ifdef USE_TNEW_PLACE
+template<typename Dtype>
+class PBlock<Dtype, TNEW> {
+public:
+	typedef Tensor4d<TNEW, DataTypeRecover<Dtype>::type> type;
+
+	PBlock() {
+		_inner_tensor = std::make_shared<type>(); 
+	}
+	...
+}
+#endif //TNEW
+```
+
+* `type_traits_extend.h`中添加设备的实现
+
+```c++
+template<>
+struct target_host<saber::TNEW> {
+    typedef saber::X86 type; //根据TNEW选择正确的host type
+};
+```
+
+### `framework/graph`
+
+* `graph.cpp`中添加实例化
+  
+```c++
+  #ifdef USE_TNEW_PLACE
+  template class Graph<TNEW, AK_FLOAT, Precision::FP32>;
+  template class Graph<TNEW, AK_FLOAT, Precision::FP16>;
+  template class Graph<TNEW, AK_FLOAT, Precision::INT8>;
+  #endif
+```
+
+### `framework/model_parser`
+
+* `parser.cpp`中添加实例化
+  
+```c++
+  #ifdef USE_TNEW_PLACE
+  template
+  Status load<TNEW, AK_FLOAT, Precision::FP32>(graph::Graph<TNEW, AK_FLOAT, Precision::FP32>* graph,
+          const char* model_path);
+  template
+  Status load<TNEW, AK_FLOAT, Precision::FP16>(graph::Graph<TNEW, AK_FLOAT, Precision::FP16>* graph,
+          const char* model_path);
+  template
+  Status load<TNEW, AK_FLOAT, Precision::INT8>(graph::Graph<TNEW, AK_FLOAT, Precision::INT8>* graph,
+          const char* model_path);
+  
+  template
+  Status save<TNEW, AK_FLOAT, Precision::FP32>(graph::Graph<TNEW, AK_FLOAT, Precision::FP32>* graph,
+          std::string& model_path);
+  template
+  Status save<TNEW, AK_FLOAT, Precision::FP16>(graph::Graph<TNEW, AK_FLOAT, Precision::FP16>* graph,
+          std::string& model_path);
+  template
+  Status save<TNEW, AK_FLOAT, Precision::INT8>(graph::Graph<TNEW, AK_FLOAT, Precision::INT8>* graph,
+          std::string& model_path);
+  
+  template
+  Status load<TNEW, AK_FLOAT, Precision::FP32>(graph::Graph<TNEW, AK_FLOAT, Precision::FP32>* graph,
+          std::string& model_path);
+  template
+  Status load<TNEW, AK_FLOAT, Precision::FP16>(graph::Graph<TNEW, AK_FLOAT, Precision::FP16>* graph,
+          std::string& model_path);
+  template
+  Status load<TNEW, AK_FLOAT, Precision::INT8>(graph::Graph<TNEW, AK_FLOAT, Precision::INT8>* graph,
+          std::string& model_path);
+  
+  template
+  Status save<TNEW, AK_FLOAT, Precision::FP32>(graph::Graph<TNEW, AK_FLOAT, Precision::FP32>* graph,
+          const char* model_path);
+  template
+  Status save<TNEW, AK_FLOAT, Precision::FP16>(graph::Graph<TNEW, AK_FLOAT, Precision::FP16>* graph,
+          const char* model_path);
+  template
+  Status save<TNEW, AK_FLOAT, Precision::INT8>(graph::Graph<TNEW, AK_FLOAT, Precision::INT8>* graph,
+          const char* model_path);
+  #endif
+```
+
+* `model_io.cpp`中添加实例化
+
+```c++
+#ifdef USE_TNEW_PLACE
+template class NodeIO<TNEW, AK_FLOAT, Precision::FP32>;
+template class NodeIO<TNEW, AK_FLOAT, Precision::FP16>;
+template class NodeIO<TNEW, AK_FLOAT, Precision::INT8>;
+#endif
+```
+
+### `framework/operators`
+
+为`framework/operators`目录下所有op添加实例化或具体化
+以`activation.cpp`为例，实例化如下：
+
+```c++
+#ifdef USE_TNEW_PLACE
+INSTANCE_ACTIVATION(TNEW, AK_FLOAT, Precision::FP32);
+INSTANCE_ACTIVATION(TNEW, AK_FLOAT, Precision::FP16);
+INSTANCE_ACTIVATION(TNEW, AK_FLOAT, Precision::INT8);
+template class ActivationHelper<TNEW, AK_FLOAT, Precision::FP32>;
+ANAKIN_REGISTER_OP_HELPER(Activation, ActivationHelper, TNEW, AK_FLOAT, Precision::FP32);
+#endif
+```
+
+如果TNEW设备函数的实现与现有模板实现不一致，可以特化实现如下（以init()为例）：
+```c++
+#ifdef USE_TNEW_PLACE
+INSTANCE_ACTIVATION(TNEW, AK_FLOAT, Precision::FP32);
+INSTANCE_ACTIVATION(TNEW, AK_FLOAT, Precision::FP16);
+INSTANCE_ACTIVATION(TNEW, AK_FLOAT, Precision::INT8);
+template <>
+Status ActivationHelper<TNEW, AK_FLOAT, Precision::FP32>::Init(OpContext<TNEW> &ctx,\
+        const std::vector<Tensor4dPtr<TNEW, AK_FLOAT> >& ins, \
+                std::vector<Tensor4dPtr<TNEW, AK_FLOAT> >& outs) {
+    SABER_CHECK(_funcs_activation.init(ins, outs, _param_activation, SPECIFY, SABER_IMPL, ctx)); //在这里选择实现方式
+    return Status::OK();
+}
+ANAKIN_REGISTER_OP_HELPER(Activation, ActivationHelper, TNEW, AK_FLOAT, Precision::FP32);
+#endif
+```
+
+在`ANAKIN_REGISTER_OP(Activation)`中添加TNEW的注册
+
+```c++
+#ifdef USE_TNEW_PLACE
+.__alias__<TNEW, AK_FLOAT, Precision::FP32>("activation")
+#endif
+```
+
+## 注意事项
+不要修改`Tensor`/`Buffer`/`Env`/`Context`这些类函数的接口和实现
--- a/source/advanced_usage/deploy/index_anakin.rst
+++ b/source/advanced_usage/deploy/index_anakin.rst
--- a/source/advanced_usage/deploy/index_mobile.rst
+++ b/source/advanced_usage/deploy/index_mobile.rst
--- a/source/advanced_usage/deploy/index_native.rst
+++ b/source/advanced_usage/deploy/index_native.rst
--- a/doc/fluid/advanced_usage/deploy/install_anakin.md
+++ b/doc/fluid/advanced_usage/deploy/install_anakin.md
+## 从源码编译安装Anakin ##
+
+我们已经在CentOS 7.3上成功的安装和测试了Anakin，对于其他操作系统，我们将很快支持。
+
+### 安装概览 ###
+
+* [在CentOS上安装 Anakin]()
+* [在Ubuntu上安装 Anakin]()
+* [在ARM上安装 Anakin](run_on_arm_ch.md)
+* [验证安装]()
+
+
+### 在CentOS上安装 Anakin ###
+#### 1. 系统要求 ####
+
+*  make 3.82+
+*  cmake 2.8.12+
+*  gcc 4.8.2+
+*  g++ 4.8.2+
+*  其他需要补充的。。。
+
+#### 2. 编译CPU版Anakin ####
+
+暂时不支持
+
+#### 3. 编译支持NVIDIA GPU的Anakin ####
+
+- 3.1. 安装依赖
+  - 3.1.1 protobuf  
+    >$ git clone https://github.com/google/protobuf  
+    >$ cd protobuf  
+    >$ git submodule update --init --recursive  
+    >$ ./autogen.sh  
+    >$ ./configure --prefix=/path/to/your/insall_dir  
+    >$ make  
+    >$ make check  
+    >$ make install  
+    >$ sudo ldconfig
+
+
+    如安装protobuf遇到任何问题，请访问[这里](https://github.com/google/protobuf/blob/master/src/README.md)
+
+- 3.2 CUDA Toolkit
+  - [CUDA 8.0](https://developer.nvidia.com/cuda-zone) or higher. 具体信息参见[NVIDIA's documentation](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/).
+  - [cuDNN v7](https://developer.nvidia.com/cudnn). 具体信息参见[NVIDIA's documentation](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/). 
+- 3.3  编译Anakin
+  >$ git clone https:/xxxxx  
+  >$ cd anakin  
+  >$ mkdir build  
+  >$ camke ..  
+  >$ make
+
+
+#### 4. 编译支持AMD GPU的Anakin ####
+
+暂时还不支持
+
+
+### 在Ubuntu上安装 Anakin ###
+
+暂时还不支持
+
+
+### 在ARM上安装 Anakin ###
+
+暂时还不支持
+
+### 验证安装 ###
+we are coming soon...
--- a/doc/fluid/advanced_usage/deploy/mobile_build.md
+++ b/doc/fluid/advanced_usage/deploy/mobile_build.md
+# 环境搭建
+## 使用 docker
+### 1. 安装 docker
+安装 docker 的方式，参考官方文档 [https://docs.docker.com/install/](https://docs.docker.com/install/)
+### 2. 使用 docker 搭建构建环境
+首先进入 paddle-mobile 的目录下，执行 `docker build`
+以 Linux/Mac 为例 (windows 建议在 'Docker Quickstart Terminal' 中执行)
+```
+$ docker build -t paddle-mobile:dev - < Dockerfile
+```
+使用 `docker images` 可以看到我们新建的 image
+```
+$ docker images
+REPOSITORY      TAG     IMAGE ID       CREATED         SIZE
+paddle-mobile   dev     33b146787711   45 hours ago    372MB
+```
+### 3. 使用 docker 构建
+进入 paddle-mobile 目录，执行 docker run
+```
+$ docker run -it --mount type=bind,source=$PWD,target=/paddle-mobile paddle-mobile:dev
+root@5affd29d4fc5:/ # cd /paddle-mobile
+# 生成构建 android 产出的 Makefile
+root@5affd29d4fc5:/ # rm CMakeCache.txt
+root@5affd29d4fc5:/ # cmake -DCMAKE_TOOLCHAIN_FILE=tools/toolchains/arm-android-neon.cmake
+# 生成构建 linux 产出的 Makefile
+root@5affd29d4fc5:/ # rm CMakeCache.txt
+root@5affd29d4fc5:/ # cmake -DCMAKE_TOOLCHAIN_FILE=tools/toolchains/arm-linux-gnueabi.cmake
+```
+### 4. 设置编译选项
+可以通过 ccmake 设置编译选项
+```
+root@5affd29d4fc5:/ # ccmake .
+                                                     Page 1 of 1
+ CMAKE_ASM_FLAGS
+ CMAKE_ASM_FLAGS_DEBUG
+ CMAKE_ASM_FLAGS_RELEASE
+ CMAKE_BUILD_TYPE
+ CMAKE_INSTALL_PREFIX             /usr/local
+ CMAKE_TOOLCHAIN_FILE             /paddle-mobile/tools/toolchains/arm-android-neon.cmake
+ CPU                              ON
+ DEBUGING                         ON
+ FPGA                             OFF
+ LOG_PROFILE                      ON
+ MALI_GPU                         OFF
+ NET                              googlenet
+ USE_EXCEPTION                    ON
+ USE_OPENMP                       OFF
+```
+修改选项后，按 `c`, `g` 更新 Makefile
+### 5. 构建
+使用 make 命令进行构建
+```
+root@5affd29d4fc5:/ # make
+```
+### 6. 查看构建产出
+构架产出可以在 host 机器上查看，在 paddle-mobile 的目录下，build 以及 test/build 下，可以使用 adb 指令或者 scp 传输到 device 上执行
+
+## 不使用 docker
+不使用 docker 的方法，可以直接用 cmake 生成 makefile 后构建。使用 ndk 构建 android 应用需要正确设置 NDK_ROOT。构建 linux 应用需要安装 arm-linux-gnueabi-gcc 或者类似的交叉编译工具，可能需要设置 CC，CXX 环境变量，或者在 tools/toolchains/ 中修改 arm-linux-gnueabi.cmake，或者增加自己需要的 toolchain file。
--- a/doc/fluid/advanced_usage/deploy/mobile_dev.md
+++ b/doc/fluid/advanced_usage/deploy/mobile_dev.md
+# iOS开发文档
+
+## 编译
+
+### 一. 使用 build.sh 编译
+
+```sh 
+sh build.sh ios
+
+# 如果只想编译某个特定模型的 op, 则需执行以下命令
+sh build.sh ios googlenet
+
+# 在这个文件夹下, 你可以拿到生成的 .a 库
+cd ../build/release/ios/build
+
+```
+
+### 二. 使用 xcode 编译
+
+我们提供了 ios 开发更为熟悉的 xcode 编译环境:
+在 ios/ 目录下打开 PaddleMobile.xcworkspace 即可编译 PaddleMobile 或者 运行 Demo
+
+### 三. 集成
+
+#### 如使用 c++ 接口
+将 
+
+```
+libpaddle-mobile.a 
+io.h  
+program.h 
+types.h 
+lod_tensor.h 
+tensor.h
+```
+拖入工程, io.h 为接口文件, 可在 [github](https://github.com/PaddlePaddle/paddle-mobile/blob/develop/src/io/io.h)上查看接口注释
+
+#### 如使用 oc 接口
+将在xcode 编译生成的
+```
+libPaddleMobile.a 
+PaddleMobile.h
+```
+拖入工程, 接口如下:
+
+```
+/*
+	创建单例对象
+*/
+ (instancetype)sharedInstance;
+
+/*
+	load 模型, 开辟内存
+*/
+- (BOOL)load:(NSString *)modelPath andWeightsPath:(NSString *)weighsPath;
+
+/*
+	进行预测, means 和 scale 为训练模型时的预处理参数, 如训练时没有做这些预处理则直接使用 predict
+*/
+- (NSArray *)predict:(CGImageRef)image means:(NSArray<NSNumber *> *)means scale:(float)scale;
+
+/*
+	进行预测
+*/
+- (NSArray *)predict:(CGImageRef)image;
+
+/*
+	清理内存
+*/
+- (void)clear;
+
+```
--- a/doc/fluid/advanced_usage/deploy/native_infer.rst
+++ b/doc/fluid/advanced_usage/deploy/native_infer.rst
+Paddle 预测 API
+===============
+
+为了更简单方便的预测部署，Fluid 提供了一套高层 API
+用来隐藏底层不同的优化实现。
+
+`预测库相关代码 <https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/fluid/inference/api>`__
+包括
+
+-  头文件 ``paddle_inference_api.h`` 定义了所有的接口
+-  库文件\ ``libpaddle_fluid.so`` 或 ``libpaddle_fluid.a``
+
+编译和依赖可以参考 :ref:`install_or_build_cpp_inference_lib` 。
+
+下面是一些 API 概念的介绍
+
+PaddleTensor
+------------
+
+PaddleTensor 定义了预测最基本的输入输出的数据格式，其定义是
+
+.. code:: cpp
+
+    struct PaddleTensor {
+      std::string name;  // variable name.
+      std::vector<int> shape;
+      PaddleBuf data;  // blob of data.
+      PaddleDType dtype;
+    };
+
+-  ``name`` 用于指定输入数据对应的 模型中variable 的名字
+   （暂时没有用，但会在后续支持任意 target 时启用）
+-  ``shape`` 表示一个 Tensor 的 shape
+-  ``data`` 数据以连续内存的方式存储在\ ``PaddleBuf``
+   中，\ ``PaddleBuf``
+   可以接收外面的数据或者独立\ ``malloc``\ 内存，详细可以参考头文件中相关定义。
+-  ``dtype`` 表示 Tensor 的数据类型
+
+engine
+------
+
+高层 API 底层有多种优化实现，我们称之为 engine，目前有三种 engine
+
+-  原生 engine，由 paddle 原生的 forward operator
+   组成，可以天然支持所有paddle 训练出的模型，
+-  Anakin engine，封装了
+   `Anakin <https://github.com/PaddlePaddle/Anakin>`__
+   ，在某些模型上性能不错，但只能接受自带模型格式，无法支持所有 paddle
+   模型，
+-  TensorRT mixed engine，用子图的方式支持了
+   `TensorRT <https://developer.nvidia.com/tensorrt>`__ ，支持所有paddle
+   模型，并自动切割部分计算子图到 TensorRT 上加速（WIP）
+
+其实现为
+
+.. code:: cpp
+
+    enum class PaddleEngineKind {
+      kNative = 0,       // Use the native Fluid facility.
+      kAnakin,           // Use Anakin for inference.
+      kAutoMixedTensorRT // Automatically mixing TensorRT with the Fluid ops.
+    };
+
+预测部署过程
+------------
+
+总体上分为以下步骤
+
+1. 用合适的配置创建 ``PaddlePredictor``
+2. 创建输入用的 ``PaddleTensor``\ ，传入到 ``PaddlePredictor`` 中
+3. 获取输出的 ``PaddleTensor`` ，将结果取出
+
+下面完整演示一个简单的模型，部分细节代码隐去
+
+.. code:: cpp
+
+    #include "paddle_inference_api.h"
+
+    // 创建一个 config，并修改相关设置
+    paddle::NativeConfig config;
+    config.model_dir = "xxx";
+    config.use_gpu = false;
+    // 创建一个原生的 PaddlePredictor
+    auto predictor =
+          paddle::CreatePaddlePredictor<NativeConfig, PaddleEngineKind::kNative>(config);
+    // 创建输入 tensor
+    int64_t data[4] = {1, 2, 3, 4};
+    paddle::PaddleTensor tensor{.name = "",
+                                .shape = std::vector<int>({4, 1}),
+                                .data = PaddleBuf(data, sizeof(data)),
+                                .dtype = PaddleDType::INT64};
+    // 创建输出 tensor，输出 tensor 的内存可以复用
+    std::vector<paddle::PaddleTensor> outputs;
+    // 执行预测
+    CHECK(predictor->Run(slots, &outputs));
+    // 获取 outputs ...
+
+编译时，联编 ``libpaddle_fluid.a/.so`` 即可。
+
+详细代码参考
+------------
+
+-  `inference
+   demos <https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/fluid/inference/api/demo_ci>`__
+-  `复杂单线程/多线程例子 <https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/inference/api/api_impl_tester.cc>`__
--- a/doc/fluid/advanced_usage/deploy/run_anakin_on_arm.md
+++ b/doc/fluid/advanced_usage/deploy/run_anakin_on_arm.md
+## 源码编译 Anakin ##
+
+目前Anakin支持ARM Android平台，采用Android NDK交叉编译工具链，已在mac os和centos上编译和测试通过。
+
+### 安装概览 ###
+
+* [系统需求](#0001)
+* [安装第三方依赖](#0002)
+* [Anakin源码编译](#0003)
+* [验证安装](#0004)
+
+
+### <span id = '0001'> 1. 系统需求 </span> ###
+
+*  宿主机: linux, mac    
+*  cmake 3.8.2+    
+*  Android NDK r14, Linux 版本[从这里下载](https://dl.google.com/android/repository/android-ndk-r14b-linux-x86_64.zip)
+
+### <span id = '0002'> 2. 安装第三方依赖 </span> ###
+
+- 2.1 protobuf3.4.0     
+   源码从这里[下载](https://github.com/google/protobuf/releases/tag/v3.4.0)    
+ - 2.1.1 为宿主机编译protobuf     
+ ```bash
+   $ tar -xzf protobuf-3.4.0.tar.gz  
+   $ cd protobuf-3.4.0   
+   $ ./autogen.sh  
+   $ ./configure    
+   $ make  
+   $ make check   
+   $ make install
+   ```
+   上述 $make install 执行后，可在 /usr/local/include/google 找到 libprotobuf 所需的头文件,将整个google文件夹拷贝至Anakin/third-party/arm-android/protobuf/下，
+   如有问题，请点[这里](https://github.com/google/protobuf/blob/v3.4.0/src/README.md)。
+   然后将已经生成文件清除。
+ ```bash
+   $ make distclean
+   ```
+ - 2.1.1 交叉编译Android`armeabi-v7a`的protobuf，注意设置ANDROID_NDK的路径，以及ARCH_ABI、HOSTOSN的值，   
+ ```bash
+
+   $ export ANDROID_NDK=your_ndk_path 
+   $ ARCH_ABI="arm-linux-androideabi-4.9"
+   $ HOSTOSN="darwin-x86_64"
+   $ export SYSROOT=$ANDROID_NDK/platforms/android-9/arch-arm  
+   $ export PREBUILT=$ANDROID_NDK/toolchains/$ARCH_ABI
+   $ export LDFLAGS="--sysroot=$SYSROOT"
+   $ export LD="$ANDROID_NDK/toolchains/$ARCH_ABI/prebuilt/$HOSTOSN/arm-linux-androideabi/bin/ld $LDFLAGS"
+   $ export LIBS="-llog $ANDROID_NDK/sources/cxx-stl/gnu-libstdc++/4.9/libs/armeabi-v7a/libgnustl_static.a"
+   $ export CPPFLAGS=""
+   $ export INCLUDES="-I$ANDROID_NDK/sources/cxx-stl/gnu-libstdc++/4.9/include/ -I$ANDROID_NDK/platforms/android-9/arch-arm/usr/include/ -I$ANDROID_NDK/sources/cxx-stl/gnu-libstdc++/4.9/libs/armeabi-v7a/include/"
+   $ export CXXFLAGS="-march=armv7-a -mfloat-abi=softfp -DGOOGLE_PROTOBUF_NO_RTTI --sysroot=$SYSROOT"
+   $ export CCFLAGS="$CXXFLAGS"
+   $ export CXX="$PREBUILT/prebuilt/$HOSTOSN/bin/arm-linux-androideabi-g++ $CXXFLAGS"
+   $ export CC="$CXX"
+   $ export RANLIB="$ANDROID_NDK/toolchains/$ARCH_ABI/prebuilt/$HOSTOSN/bin/arm-linux-androideabi-ranlib"  
+   $ ./autogen.sh  
+   $ ./configure --host=arm-linux-androideabi --with-sysroot=$SYSROOT --enable-cross-compile --with-protoc=protoc --disable-shared CXX="$CXX" CC="$CC" LD="$LD"  
+   $ make
+  ```
+  
+  编译生成 *.a 静态库，若希望编译*.so 动态链接库 ，请在./configure参数中改--disable-shared为--disable-static --enable-shared。  
+  生成文件在src/.libs/下，将生成的文件拷贝至Anakin/third-party/arm-android/protobuf/lib下。  
+  在[cmake](../../cmake/find_modules.cmake)中更新`ARM_RPOTO_ROOT`的路径。        
+  ```cmake
+  set(ARM_RPOTO_ROOT "${CMAKE_SOURCE_DIR}/third-party/arm-android/protobuf")
+  ```
+  
+- 2.2 opencv 2.4.3+(optional)    
+    Anakin只在examples示例中使用opencv   
+    Android系统的opencv从[这里下载](https://opencv.org/releases.html)    
+    解压后将 `3rdparty/libs/armeabi-v7a`中的库文件拷贝到`libs/armeabi-v7a`    
+    在[cmake](../../cmake/find_modules.cmake)中搜索`anakin_find_opencv`, 
+    并设置 `include_directories` 和 `LINK_DIRECTORIES`为自己安装的库的路径。   
+    ```cmake
+    include_directories(${CMAKE_SOURCE_DIR}/third-party/arm-android/opencv/sdk/native/jni/include/)
+    LINK_DIRECTORIES(${CMAKE_SOURCE_DIR}/third-party/arm-android/opencv/sdk/native/libs/armeabi-v7a/)
+    ```
+### <span id = '0003'> 3. Anakin源码编译 </span> ###
+
+#### 编译Android版本
+
+   克隆[源码](https://github.com/PaddlePaddle/Anakin/tree/arm)
+```bash
+    cd your_dir
+    git clone https://github.com/PaddlePaddle/Anakin.git
+    cd Anakin
+    git fetch origin arm
+    git checkout arm
+  ```
+  修改`android_build.sh`    
+- 修改NDK路径    
+  ```bash
+    #modify "your_ndk_path" to your NDK path
+    export ANDROID_NDK=your_ndk_path
+  ```
+- 修改ARM 处理器架构     
+  对于32位ARM处理器, 将ANDROID_ABI 设置为 `armeabi-v7a with NEON`， 
+  对于64位ARM处理器, 可以将ANDROID_ABI 设置为 `armeabi-v7a with NEON`或者`arm64-v8a`。        
+  目前我们只支持 `armeabi-v7a with NEON`；`arm64-v8a` 还在开发中。      
+  ```bash
+      -DANDROID_ABI="armeabi-v7a with NEON"
+  ```
+- 设置Android API    
+  根据Android系统的版本设置API level， 例如API Level 21 -> Android 5.0.1    
+  ```bash
+      -DANDROID_NATIVE_API_LEVEL=21
+  ```
+
+- 选择编译静态库或动态库    
+  设置`BUILD_SHARED=NO`编译静态库    
+  设置`BUILD_SHARED=YES`编译动态库    
+  ```bash
+      -DBUILD_SHARED=NO
+  ```
+- OpenMP多线程支持    
+  设置`USE_OPENMP=YES`开启OpenMP多线程    
+  ```bash
+      -DUSE_OPENMP=YES
+  ```
+  
+- 编译单测文件    
+  设置`BUILD_WITH_UNIT_TEST=YES`将会编译单测文件    
+    ```bash
+        -DBUILD_WITH_UNIT_TEST=YES
+    ```
+
+- 编译示例文件    
+  设置`BUILD_EXAMPLES=YES`将会编译示例文件    
+    ```bash
+        -DBUILD_EXAMPLES=YES
+    ```
+  
+- 开启opencv    
+  如果使用opencv，设置`USE_OPENCV=YES`    
+    ```bash
+        -DUSE_OPENCV=YES
+    ```
+    
+- 开始编译    
+  运行脚本 `android_build.sh` 将自动编译Anakin     
+  ```bash
+      ./android_build.sh
+  ```
+
+### <span id = '0004'> 4. 验证安装 </span> ###    
+  编译好的库会放在目录`${Anakin_root}/output`下；    
+  编译好的单测文件会放在`${Anakin_root}/output/unit_test`目录下；    
+  编译好的示例文件会放在`${Anakin_root}/output/examples`目录下。
+  
+  对于Android系统，打开设备的调试模式，通过ADB可以访问的目录是`data/local/tmp`，通过ADB push将测试文件、模型和数据发送到设备目录， 运行测试文件。
--- a/doc/fluid/advanced_usage/development/contribute_to_paddle.md
+++ b/doc/fluid/advanced_usage/development/contribute_to_paddle.md
+# 如何贡献代码
+
+我们真诚地感谢您的贡献，欢迎通过 GitHub 的 fork 和 pull request 流程来提交代码。
+
+## 代码要求
+- 代码注释请遵守 [Doxygen](http://www.stack.nl/~dimitri/doxygen/) 的样式。
+- 确保编译器选项 `WITH_STYLE_CHECK` 已打开，并且编译能通过代码样式检查。
+- 所有代码必须具有单元测试。
+- 通过所有单元测试。
+- 请遵守[提交代码的一些约定](#提交代码的一些约定)。
+
+以下教程将指导您提交代码。
+## [Fork](https://help.github.com/articles/fork-a-repo/)
+
+跳转到[PaddlePaddle](https://github.com/PaddlePaddle/Paddle) GitHub首页，然后单击 `Fork` 按钮，生成自己目录下的仓库，比如 <https://github.com/USERNAME/Paddle>。
+
+## 克隆（Clone）
+
+将远程仓库 clone 到本地：
+
+```bash
+➜  git clone https://github.com/USERNAME/Paddle
+➜  cd Paddle
+```
+
+
+## 创建本地分支
+
+Paddle 目前使用[Git流分支模型](http://nvie.com/posts/a-successful-git-branching-model/)进行开发，测试，发行和维护，具体请参考 [Paddle 分支规范](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/releasing_process.md#paddle-分支规范)。
+
+所有的 feature 和 bug fix 的开发工作都应该在一个新的分支上完成，一般从 `develop` 分支上创建新分支。
+
+使用 `git checkout -b` 创建并切换到新分支。
+
+```bash
+➜  git checkout -b my-cool-stuff
+```
+
+值得注意的是，在 checkout 之前，需要保持当前分支目录 clean，否则会把 untracked 的文件也带到新分支上，这可以通过 `git status` 查看。
+
+## 使用 `pre-commit` 钩子
+
+Paddle 开发人员使用 [pre-commit](http://pre-commit.com/) 工具来管理 Git 预提交钩子。 它可以帮助我们格式化源代码（C++，Python），在提交（commit）前自动检查一些基本事宜（如每个文件只有一个 EOL，Git 中不要添加大文件等）。
+
+`pre-commit`测试是 Travis-CI 中单元测试的一部分，不满足钩子的 PR 不能被提交到 Paddle，首先安装并在当前目录运行它：
+
+```bash
+➜  pip install pre-commit
+➜  pre-commit install
+```
+
+Paddle 使用 `clang-format` 来调整 C/C++ 源代码格式，请确保 `clang-format` 版本在 3.8 以上。
+
+注：通过`pip install pre-commit`和`conda install -c conda-forge pre-commit`安装的`yapf`稍有不同的，Paddle 开发人员使用的是`pip install pre-commit`。
+
+## 开始开发
+
+在本例中，我删除了 README.md 中的一行，并创建了一个新文件。
+
+通过 `git status` 查看当前状态，这会提示当前目录的一些变化，同时也可以通过 `git diff` 查看文件具体被修改的内容。
+
+```bash
+➜  git status
+On branch test
+Changes not staged for commit:
+  (use "git add <file>..." to update what will be committed)
+  (use "git checkout -- <file>..." to discard changes in working directory)
+
+	modified:   README.md
+
+Untracked files:
+  (use "git add <file>..." to include in what will be committed)
+
+	test
+
+no changes added to commit (use "git add" and/or "git commit -a")
+```
+
+## 构建和测试
+
+编译 PaddlePaddle 的源码以及生成文档需要多种开发工具。为了方便大家，我们的标准开发流程是把这些工具都装进一个Docker image，称为*开发镜像*，通常名字是 `paddle:latest-dev` 或者 `paddle:[version tag]-dev` 如 `paddle:0.11.0-dev`。然后所有用 `cmake && make` 的地方（比如IDE配置里）都用 `docker run paddle:latest-dev`来代替。
+
+如要build这个开发镜像，在源码目录树的根目录中运行：
+
+```bash
+➜  docker build -t paddle:latest-dev .
+```
+
+随后可以用这个开发镜像开始build PaddlePaddle的源码。比如如果要build一个不依赖GPU，但是支持AVX指令集，并且包括unit tests的PaddlePaddle，可以：
+
+```bash
+➜  docker run -v $(pwd):/paddle -e "WITH_GPU=OFF" -e "WITH_AVX=ON" -e "WITH_TESTING=ON" paddle:latest-dev
+```
+
+这个过程除了编译PaddlePaddle为 `./build/libpaddle.so`，并且输出一个 `./build/paddle.deb`文件之外，还会输出一个 `build/Dockerfile`。我们只需要运行下面命令把编译好的PaddlePaddle打包成一个*生产镜像*（`paddle:prod`）：
+
+```bash
+➜  docker build -t paddle:prod -f build/Dockerfile .
+```
+
+如果要运行所有的单元测试，可以用如下命令：
+
+```bash
+➜  docker run -it -v $(pwd):/paddle paddle:latest-dev bash -c "cd /paddle/build && ctest"
+```
+
+关于构建和测试的更多信息，请参见[使用Docker安装运行](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/v2/build_and_install/docker_install_cn.rst)。
+
+## 提交（commit）
+
+接下来我们取消对 README.md 文件的改变，然后提交新添加的 test 文件。
+
+```bash
+➜  git checkout -- README.md
+➜  git status
+On branch test
+Untracked files:
+  (use "git add <file>..." to include in what will be committed)
+
+	test
+
+nothing added to commit but untracked files present (use "git add" to track)
+➜  git add test
+```
+
+Git 每次提交代码，都需要写提交说明，这可以让其他人知道这次提交做了哪些改变，这可以通过`git commit` 完成。
+
+```bash
+➜  git commit
+CRLF end-lines remover...............................(no files to check)Skipped
+yapf.................................................(no files to check)Skipped
+Check for added large files..............................................Passed
+Check for merge conflicts................................................Passed
+Check for broken symlinks................................................Passed
+Detect Private Key...................................(no files to check)Skipped
+Fix End of Files.....................................(no files to check)Skipped
+clang-formater.......................................(no files to check)Skipped
+[my-cool-stuff c703c041] add test file
+ 1 file changed, 0 insertions(+), 0 deletions(-)
+ create mode 100644 233
+```
+
+## 保持本地仓库最新
+
+在准备发起 Pull Request 之前，需要同步原仓库（<https://github.com/PaddlePaddle/Paddle>）最新的代码。
+
+首先通过 `git remote` 查看当前远程仓库的名字。
+
+```bash
+➜  git remote
+origin
+➜  git remote -v
+origin	https://github.com/USERNAME/Paddle (fetch)
+origin	https://github.com/USERNAME/Paddle (push)
+```
+
+这里 origin 是我们 clone 的远程仓库的名字，也就是自己用户名下的 Paddle，接下来我们创建一个原始 Paddle 仓库的远程主机，命名为 upstream。
+
+```bash
+➜  git remote add upstream https://github.com/PaddlePaddle/Paddle
+➜  git remote
+origin
+upstream
+```
+
+获取 upstream 的最新代码并更新当前分支。
+
+```bash
+➜  git fetch upstream
+➜  git pull upstream develop
+```
+
+## Push 到远程仓库
+
+将本地的修改推送到 GitHub 上，也就是 https://github.com/USERNAME/Paddle。
+
+```bash
+# 推送到远程仓库 origin 的 my-cool-stuff 分支上
+➜  git push origin my-cool-stuff
+```
+
+## 建立 Issue 并完成 Pull Request
+
+建立一个 Issue 描述问题，并记录它的编号。
+
+切换到所建分支，然后点击 `New pull request`。
+
+<img width="295" alt="screen shot 2017-04-26 at 9 09 28 pm" src="https://cloud.githubusercontent.com/assets/11692045/25436054/a6d98c66-2ac4-11e7-9cb1-18dd13150230.png">
+
+选择目标分支：
+
+<img width="750" alt="screen shot 2017-04-26 at 9 11 52 pm" src="https://cloud.githubusercontent.com/assets/11692045/25436139/f83b1e6c-2ac4-11e7-8c0e-add499023c46.png">
+
+在 PR 的描述说明中，填写 `resolve #Issue编号` 可以在这个 PR 被 merge 后，自动关闭对应的 Issue，具体请见 <https://help.github.com/articles/closing-issues-via-commit-messages/>。
+
+接下来等待 review，如果有需要修改的地方，参照上述步骤更新 origin 中的对应分支即可。
+
+## 删除远程分支
+
+在 PR 被 merge 进主仓库后，我们可以在 PR 的页面删除远程仓库的分支。
+
+<img width="775" alt="screen shot 2017-04-26 at 9 18 24 pm" src="https://cloud.githubusercontent.com/assets/11692045/25436457/e4cdd472-2ac5-11e7-9272-badc76c4a23e.png">
+
+也可以使用 `git push origin :分支名` 删除远程分支，如：
+
+```bash
+➜  git push origin :my-cool-stuff
+```
+
+## 删除本地分支
+
+最后，删除本地分支。
+
+```bash
+# 切换到 develop 分支
+➜  git checkout develop 
+
+# 删除 my-cool-stuff 分支
+➜  git branch -D my-cool-stuff
+```
+
+至此，我们就完成了一次代码贡献的过程。
+
+## 提交代码的一些约定
+
+为了使评审人在评审代码时更好地专注于代码本身，请您每次提交代码时，遵守以下约定：
+
+1. 请保证Travis-CI 中单元测试能顺利通过。如果没过，说明提交的代码存在问题，评审人一般不做评审。
+2. 提交PUll Request前：
+   - 请注意commit的数量：
+     - 原因：如果仅仅修改一个文件但提交了十几个commit，每个commit只做了少量的修改，这会给评审人带来很大困扰。评审人需要逐一查看每个commit才能知道做了哪些修改，且不排除commit之间的修改存在相互覆盖的情况。
+     - 建议：每次提交时，保持尽量少的commit，可以通过`git commit --amend`补充上次的commit。对已经Push到远程仓库的多个commit，可以参考[squash commits after push](http://stackoverflow.com/questions/5667884/how-to-squash-commits-in-git-after-they-have-been-pushed)。
+   - 请注意每个commit的名称：应能反映当前commit的内容，不能太随意。
+3. 如果解决了某个Issue的问题，请在该PUll Request的**第一个**评论框中加上：`fix #issue_number`，这样当该PUll Request被合并后，会自动关闭对应的Issue。关键词包括：close, closes, closed, fix, fixes, fixed, resolve, resolves, resolved，请选择合适的词汇。详细可参考[Closing issues via commit messages](https://help.github.com/articles/closing-issues-via-commit-messages)。
+
+此外，在回复评审人意见时，请您遵守以下约定：
+
+1. 评审人的每个意见都必须回复（这是开源社区的基本礼貌，别人帮了忙，应该说谢谢）：
+   - 对评审意见同意且按其修改完的，给个简单的`Done`即可；
+   - 对评审意见不同意的，请给出您自己的反驳理由。
+2. 如果评审意见比较多：
+   - 请给出总体的修改情况。
+   - 请采用[start a review](https://help.github.com/articles/reviewing-proposed-changes-in-a-pull-request/)进行回复，而非直接回复的方式。原因是每个回复都会发送一封邮件，会造成邮件灾难。
--- a/doc/fluid/advanced_usage/development/cpu_profiling_cn.md
+++ b/doc/fluid/advanced_usage/development/cpu_profiling_cn.md
+# CPU性能调优
+
+此教程会介绍如何使用Python的cProfile包、Python库yep、Google perftools来进行性能分析 (profiling) 与调优（performance tuning）。
+
+Profling 指发现性能瓶颈。系统中的瓶颈可能和程序员开发过程中想象的瓶颈相去甚远。Tuning 指消除瓶颈。性能优化的过程通常是不断重复地 profiling 和 tuning。
+
+PaddlePaddle 用户一般通过调用 Python API 编写深度学习程序。大部分 Python API 调用用 C++ 写的 libpaddle.so。所以 PaddlePaddle 的性能分析与调优分为两个部分:
+
+* Python 代码的性能分析
+* Python 与 C++ 混合代码的性能分析
+
+
+## Python代码的性能分析
+
+### 生成性能分析文件
+
+Python标准库中提供了性能分析的工具包，[cProfile](https://docs.python.org/2/library/profile.html)。生成Python性能分析的命令如下:
+
+```bash
+python -m cProfile -o profile.out main.py
+```
+
+其中 `main.py` 是我们要分析的程序，`-o`标识了一个输出的文件名，用来存储本次性能分析的结果。如果不指定这个文件，`cProfile`会打印到标准输出。
+
+### 查看性能分析文件
+
+`cProfile` 在main.py 运行完毕后输出`profile.out`。我们可以使用[`cprofilev`](https://github.com/ymichael/cprofilev)来查看性能分析结果。`cprofilev`是一个Python的第三方库。使用它会开启一个HTTP服务，将性能分析结果以网页的形式展示出来：
+
+```bash
+cprofilev -a 0.0.0.0 -p 3214 -f profile.out main.py
+```
+
+其中`-a`标识HTTP服务绑定的IP。使用`0.0.0.0`允许外网访问这个HTTP服务。`-p`标识HTTP服务的端口。`-f`标识性能分析的结果文件。`main.py`标识被性能分析的源文件。
+
+用Web浏览器访问对应网址，即可显示性能分析的结果：
+
+```
+   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
+        1    0.284    0.284   29.514   29.514 main.py:1(<module>)
+     4696    0.128    0.000   15.748    0.003 /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/fluid/executor.py:20(run)
+     4696   12.040    0.003   12.040    0.003 {built-in method run}
+        1    0.144    0.144    6.534    6.534 /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/v2/__init__.py:14(<module>)
+```
+
+每一列的含义是:
+
+<table>
+<thead>
+<tr>
+<th>列名</th>
+<th>含义 </th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td> ncalls</td>
+<td> 函数的调用次数</td>
+</tr>
+<tr>
+<td>tottime</td>
+<td> 函数实际使用的总时间。该时间去除掉本函数调用其他函数的时间</td>
+</tr>
+<tr>
+<td> percall </td>
+<td> tottime的每次调用平均时间</td>
+</tr>
+<tr>
+<td> cumtime</td>
+<td> 函数总时间。包含这个函数调用其他函数的时间</td>
+</tr>
+<tr>
+<td> percall</td>
+<td> cumtime的每次调用平均时间</td>
+</tr>
+<tr>
+<td> filename:lineno(function) </td>
+<td> 文件名, 行号，函数名 </td>
+</tr>
+</tbody>
+</table>
+
+
+### 寻找性能瓶颈
+
+通常`tottime`和`cumtime`是寻找瓶颈的关键指标。这两个指标代表了某一个函数真实的运行时间。
+
+将性能分析结果按照tottime排序，效果如下:
+
+```text
+     4696   12.040    0.003   12.040    0.003 {built-in method run}
+   300005    0.874    0.000    1.681    0.000 /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/v2/dataset/mnist.py:38(reader)
+   107991    0.676    0.000    1.519    0.000 /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/fluid/framework.py:219(__init__)
+     4697    0.626    0.000    2.291    0.000 /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/fluid/framework.py:428(sync_with_cpp)
+        1    0.618    0.618    0.618    0.618 /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/fluid/__init__.py:1(<module>)
+```
+
+可以看到最耗时的函数是C++端的`run`函数。这需要联合我们第二节`Python`与`C++`混合代码的性能分析来进行调优。而`sync_with_cpp`函数的总共耗时很长，每次调用的耗时也很长。于是我们可以点击`sync_with_cpp`的详细信息，了解其调用关系。
+
+```text
+Called By:
+
+   Ordered by: internal time
+   List reduced from 4497 to 2 due to restriction <'sync_with_cpp'>
+
+Function                                                                                                 was called by...
+                                                                                                             ncalls  tottime  cumtime
+/home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/fluid/framework.py:428(sync_with_cpp)  <-    4697    0.626    2.291  /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/fluid/framework.py:562(sync_with_cpp)
+/home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/fluid/framework.py:562(sync_with_cpp)  <-    4696    0.019    2.316  /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/fluid/framework.py:487(clone)
+                                                                                                                  1    0.000    0.001  /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/fluid/framework.py:534(append_backward)
+
+
+Called:
+
+   Ordered by: internal time
+   List reduced from 4497 to 2 due to restriction <'sync_with_cpp'>
+```
+
+通常观察热点函数间的调用关系，和对应行的代码，就可以了解到问题代码在哪里。当我们做出性能修正后，再次进行性能分析(profiling)即可检查我们调优后的修正是否能够改善程序的性能。
+
+
+
+## Python与C++混合代码的性能分析
+
+### 生成性能分析文件
+
+C++的性能分析工具非常多。常见的包括`gprof`, `valgrind`, `google-perftools`。但是调试Python中使用的动态链接库与直接调试原始二进制相比增加了很多复杂度。幸而Python的一个第三方库`yep`提供了方便的和`google-perftools`交互的方法。于是这里使用`yep`进行Python与C++混合代码的性能分析
+
+使用`yep`前需要安装`google-perftools`与`yep`包。ubuntu下安装命令为
+
+```bash
+apt update
+apt install libgoogle-perftools-dev
+pip install yep
+```
+
+安装完毕后，我们可以通过
+
+```bash
+python -m yep -v main.py
+```
+
+生成性能分析文件。生成的性能分析文件为`main.py.prof`。
+
+命令行中的`-v`指定在生成性能分析文件之后，在命令行显示分析结果。我们可以在命令行中简单的看一下生成效果。因为C++与Python不同，编译时可能会去掉调试信息，运行时也可能因为多线程产生混乱不可读的性能分析结果。为了生成更可读的性能分析结果，可以采取下面几点措施:
+
+1. 编译时指定`-g`生成调试信息。使用cmake的话，可以将CMAKE_BUILD_TYPE指定为`RelWithDebInfo`。
+2. 编译时一定要开启优化。单纯的`Debug`编译性能会和`-O2`或者`-O3`有非常大的差别。`Debug`模式下的性能测试是没有意义的。
+3. 运行性能分析的时候，先从单线程开始，再开启多线程，进而多机。毕竟单线程调试更容易。可以设置`OMP_NUM_THREADS=1`这个环境变量关闭openmp优化。
+
+### 查看性能分析文件
+
+在运行完性能分析后，会生成性能分析结果文件。我们可以使用[`pprof`](https://github.com/google/pprof)来显示性能分析结果。注意，这里使用了用`Go`语言重构后的`pprof`，因为这个工具具有web服务界面，且展示效果更好。
+
+安装`pprof`的命令和一般的`Go`程序是一样的，其命令如下:
+
+```bash
+go get github.com/google/pprof
+```
+
+进而我们可以使用如下命令开启一个HTTP服务:
+
+```bash
+pprof -http=0.0.0.0:3213 `which python`  ./main.py.prof
+```
+
+这行命令中，`-http`指开启HTTP服务。`which python`会产生当前Python二进制的完整路径，进而指定了Python可执行文件的路径。`./main.py.prof`输入了性能分析结果。
+
+访问对应的网址，我们可以查看性能分析的结果。结果如下图所示:
+
+![result](./pprof_1.png)
+
+
+### 寻找性能瓶颈
+
+与寻找Python代码的性能瓶颈类似，寻找Python与C++混合代码的性能瓶颈也是要看`tottime`和`cumtime`。而`pprof`展示的调用图也可以帮助我们发现性能中的问题。
+
+例如下图中，
+
+![kernel_perf](./pprof_2.png)
+
+在一次训练中，乘法和乘法梯度的计算占用2%-4%左右的计算时间。而`MomentumOp`占用了17%左右的计算时间。显然，`MomentumOp`的性能有问题。
+
+在`pprof`中，对于性能的关键路径都做出了红色标记。先检查关键路径的性能问题，再检查其他部分的性能问题，可以更有次序的完成性能的优化。
--- a/doc/fluid/advanced_usage/development/gpu_profiling_cn.rst
+++ b/doc/fluid/advanced_usage/development/gpu_profiling_cn.rst
+============
+GPU性能调优
+============
+
+..  contents::
+
+此教程将向您分步介绍如何使用内置的定时工具、 **nvprof** 或 **nvvp** 来运行性能分析和调优。
+
+- 什么是性能分析？
+- 为什么需要性能分析？
+- 如何进行性能分析？
+- 性能分析工具介绍
+- 详细教程
+- 性能分析小技巧
+
+什么是性能分析？
+================
+在软件工程的范畴里，性能分析（Profiling）是一个动态程序分析的术语，它可以指测量一个程序的空间（内存）复杂度或时间复杂度，
+也可以说是某些特定指令的使用情况，或者是函数调用的频率和耗时等。通常情况下，分析得到的信息用于协助进行程序的优化。
+
+简单来说，性能分析工具是用于给应用程序的性能做定量分析的。如果想很好的理解程序的行为，那程序分析工具是必不可少的利器。简单的性能分析，可以告诉您某个操作到底花了多长时间？而更深入的分析，甚至能解释为什么某个操作花了很长时间？
+
+为什么需要性能分析？
+============================
+训练好一个深层神经网络通常要耗费非常长的时间，所以性能也就逐步变成了深度学习领域最重要的指标。
+而优化性能的首要任务，是需要了解哪些步骤拖慢了整体。
+如果某一块根本就不怎么耗时，那也就不需要急着优化性能啦！
+
+如何进行性能分析？
+========================
+为了达到性能最优，您可以采用下面五个步骤：
+
+- 对代码进行性能分析
+- 找到运行慢的部分
+- 找到运行慢的原因
+- 修改成更快的版本
+- 再次对代码进行性能分析
+
+Usually, processor has two key performance limits include float point throughput and
+memory throughput. For GPU,  it also need more parallelism to fulfill its potential.
+This is why they can be so fast.
+
+通常情况下，处理器有两个关键性能限制：一个是浮点计算量，另一个是内存操作量。
+GPU则还需要高并行性，才能发挥其全部能力。这正是它们速度快的原因。
+
+性能分析工具介绍
+======================
+就通常的GPU性能分析来说，市面上已经有NVIDIA或第三方提供的众多工具。
+
+**nvprof** 是Nvidia性能分析工具， **nvvp** 则是带GUI的Nvidia可视化性能分析工具。
+在这个教程中，我们主要会介绍nvprof和nvvp。
+
+:code:`test_GpuProfiler` from :code:`paddle/legacy/math/tests` directory will be used to evaluate
+above profilers.
+
+:code:`paddle/legacy/math/test` 目录中的 :code:`test_GpuProfiler` 就是用于展示上述分析工具的用法。
+
+.. literalinclude:: ../../../../paddle/legacy/math/tests/test_GpuProfiler.cpp
+   :language: c++
+   :lines: 137-151
+   :linenos:
+
+上述的代码片段包含了两种方法，您可以任意使用一个或两个来对感兴趣的代码段做性能分析。
+
+1. :code:`REGISTER_TIMER_INFO` 是一个内置的定时器封装，可以用来计算CPU函数或cuda内核的时间消耗。
+
+2. :code:`REGISTER_GPU_PROFILER` is a general purpose wrapper object of :code:`cudaProfilerStart` and :code:`cudaProfilerStop` to avoid
+program crashes when CPU version of PaddlePaddle invokes them.
+
+3. :code:`REGISTER_GPU_PROFILER` 是一个封装对象，封装了 :code:`cudaProfilerStart` 和 :code:`cudaProfileStop` 两个操作；同时其内部实现可以避免纯CPU版本PaddlePaddle在执行本语句时发生崩溃。
+
+您会在接下来的部分中获得更多的细节介绍。
+
+详细教程
+============
+
+内置定时器
+------------
+
+如果想要启用PaddlePaddle的内置定时器，您首先需要在相关代码段中加入 :code:`REGISTER_TIMER_INFO`。
+接下来就可以使用 :code:`printStatus` 或者 :code:`printAllStatus` 函数来将信息输出到界面中。
+下面举个简单的例子：
+
+1. 加入 :code:`REGISTER_TIMER_INFO` 和 :code:`printAllStatus` 函数（如高亮部分）。
+
+    .. literalinclude:: ../../../../paddle/legacy/math/tests/test_GpuProfiler.cpp
+        :language: c++
+        :lines: 137-151
+        :emphasize-lines: 8-12,14
+        :linenos:
+
+2. cmake配置中将 **WITH_TIMER** 打开，重新编译PaddlePaddle。
+
+    .. code-block:: bash
+
+        cmake .. -DWITH_TIMER=ON
+        make
+
+3. 执行您的代码，并观察结果(如高亮部分）。
+
+    .. code-block:: bash
+        :emphasize-lines: 1,12-15
+
+        > ./paddle/legacy/math/tests/test_GpuProfiler
+        I1117 11:13:42.313065 2522362816 Util.cpp:155] commandline: ./paddle/legacy/math/tests/test_GpuProfiler
+        I1117 11:13:42.845065 2522362816 Util.cpp:130] Calling runInitFunctions
+        I1117 11:13:42.845208 2522362816 Util.cpp:143] Call runInitFunctions done.
+        [==========] Running 1 test from 1 test case.
+        [----------] Global test environment set-up.
+        [----------] 1 test from Profiler
+        [ RUN      ] Profiler.BilinearFwdBwd
+        I1117 11:13:42.845310 2522362816 test_GpuProfiler.cpp:114] Enable GPU Profiler Stat: [testBilinearFwdBwd] "numSamples = 10, channels = 16, im
+        gSizeX = 64, imgSizeY = 64"
+        I1117 11:13:42.850154 2522362816 ThreadLocal.cpp:37] thread use undeterministic rand seed:20659751
+        I1117 11:13:42.981501 2522362816 Stat.cpp:130] ======= StatSet: [GlobalStatInfo] status ======
+        I1117 11:13:42.981539 2522362816 Stat.cpp:133] Stat=testBilinearFwdBwd     total=136.141    avg=136.141    max=136.141    min=136.141   count=1
+        I1117 11:13:42.981572 2522362816 Stat.cpp:141] ======= BarrierStatSet status ======
+        I1117 11:13:42.981575 2522362816 Stat.cpp:154] --------------------------------------------------
+        [       OK ] Profiler.BilinearFwdBwd (136 ms)
+        [----------] 1 test from Profiler (136 ms total)
+
+        [----------] Global test environment tear-down
+        [==========] 1 test from 1 test case ran. (136 ms total)
+        [  PASSED  ] 1 test.
+
+nvprof 工具
+----------------
+
+要使用命令行分析工具 **nvprof**，您按如下步骤操作即可：
+
+1. 将 :code:`REGISTER_GPU_PROFILER` 函数加到代码中（参考强调部分）。
+
+    .. literalinclude:: ../../../../paddle/legacy/math/tests/test_GpuProfiler.cpp
+        :language: c++
+        :lines: 137-151
+        :emphasize-lines: 6-7
+        :linenos:
+
+2. cmake中将 **WITH_PROFILER** 配置打开，重新编译PaddlePaddle。
+
+    .. code-block:: bash
+
+        cmake .. -DWITH_PROFILER=ON
+        make
+
+3. 使用 **nvprof** 来分析执行文件。
+
+    .. code-block:: bash
+
+        nvprof  ./paddle/legacy/math/tests/test_GpuProfiler
+
+然后，您就能获得如下的分析结果：
+
+.. code-block:: bash
+
+    ==78544== Profiling application: ./paddle/legacy/math/tests/test_GpuProfiler
+    ==78544== Profiling result:
+    Time(%)     Time     Calls       Avg       Min       Max  Name
+    27.60%  9.6305ms         5  1.9261ms  3.4560us  6.4035ms  [CUDA memcpy HtoD]
+    26.07%  9.0957ms         1  9.0957ms  9.0957ms  9.0957ms  KeBilinearInterpBw
+    23.78%  8.2977ms         1  8.2977ms  8.2977ms  8.2977ms  KeBilinearInterpFw
+    22.55%  7.8661ms         2  3.9330ms  1.5798ms  6.2863ms  [CUDA memcpy DtoH]
+
+    ==78544== API calls:
+    Time(%)     Time     Calls       Avg       Min       Max  Name
+    46.85%  682.28ms         8  85.285ms  12.639us  682.03ms  cudaStreamCreateWithFlags
+    39.83%  580.00ms         4  145.00ms     302ns  550.27ms  cudaFree
+    9.82%   143.03ms         9  15.892ms  8.7090us  142.78ms  cudaStreamCreate
+    1.23%   17.983ms         7  2.5690ms  23.210us  6.4563ms  cudaMemcpy
+    1.23%   17.849ms         2  8.9247ms  8.4726ms  9.3768ms  cudaStreamSynchronize
+    0.66%   9.5969ms         7  1.3710ms  288.43us  2.4279ms  cudaHostAlloc
+    0.13%   1.9530ms        11  177.54us  7.6810us  591.06us  cudaMalloc
+    0.07%   1.0424ms         8  130.30us  1.6970us  453.72us  cudaGetDevice
+    0.04%   527.90us        40  13.197us     525ns  253.99us  cudaEventCreateWithFlags
+    0.03%   435.73us       348  1.2520us     124ns  42.704us  cuDeviceGetAttribute
+    0.03%   419.36us         1  419.36us  419.36us  419.36us  cudaGetDeviceCount
+    0.02%   260.75us         2  130.38us  129.32us  131.43us  cudaGetDeviceProperties
+    0.02%   222.32us         2  111.16us  106.94us  115.39us  cudaLaunch
+    0.01%   214.06us         4  53.514us  28.586us  77.655us  cuDeviceGetName
+    0.01%   115.45us         4  28.861us  9.8250us  44.526us  cuDeviceTotalMem
+    0.01%   83.988us         4  20.997us     578ns  77.760us  cudaSetDevice
+    0.00%   38.918us         1  38.918us  38.918us  38.918us  cudaEventCreate
+    0.00%   34.573us        31  1.1150us     279ns  12.784us  cudaDeviceGetAttribute
+    0.00%   17.767us         1  17.767us  17.767us  17.767us  cudaProfilerStart
+    0.00%   15.228us         2  7.6140us  3.5460us  11.682us  cudaConfigureCall
+    0.00%   14.536us         2  7.2680us  1.1490us  13.387us  cudaGetLastError
+    0.00%   8.6080us        26     331ns     173ns     783ns  cudaSetupArgument
+    0.00%   5.5470us         6     924ns     215ns  2.6780us  cuDeviceGet
+    0.00%   5.4090us         6     901ns     328ns  3.3320us  cuDeviceGetCount
+    0.00%   4.1770us         3  1.3920us  1.0630us  1.8300us  cuDriverGetVersion
+    0.00%   3.4650us         3  1.1550us  1.0810us  1.2680us  cuInit
+    0.00%      830ns         1     830ns     830ns     830ns  cudaRuntimeGetVersion
+
+
+nvvp 工具
+--------------
+
+如果想使用可视化的分析器 **nvvp**，您可以导入 :code:`nvprof -o ...` 的输出，或者从工具的界面里运行您的应用。
+
+**备注: nvvp 也支持CPU的性能分析** (需在nvvp界面中选上才能开启）
+
+..  image:: nvvp1.png
+    :align: center
+    :scale: 33%
+
+从内核函数的角度， **nvvp** 可以精确说明一个长耗时操作的具体原因。
+同时，如下图所示， **nvvp** 的内核block使用情况、寄存器使用情况和共享内存使用情况能让我们对GPU的整体使用有更好的理解。
+
+
+..  image:: nvvp2.png
+    :align: center
+    :scale: 33%
+
+而从应用的角度， **nvvp** 可以帮您提供一些定位性能瓶颈的建议。
+例如，下图中就展示了一些关于内存数据迁徙和计算资源利用率的建议，为您做性能调优提供了方向。
+
+..  image:: nvvp3.png
+    :align: center
+    :scale: 33%
+
+..  image:: nvvp4.png
+    :align: center
+    :scale: 33%
+
+性能分析小技巧
+==================
+
+- 开始阶段，从 **nvprof** 和 **nvvp** 的输出信息入手是个不错的选择。
+- 接下来可以考虑下时间线的分析。
+- 如果真想挖掘内核深处的某个秘密，您最好先确认：这一块的耗时比例真的太高，值得深入分析。
+- 可能的情况下，试着让输出的分析数据和理论值对应。
+
+    1) 例如，如果我知道内核花了10ms来移动1GB数据，那我会期望分析工具统计到速度是100GB/s。
+    2) 若有不一致之处，很有可能实际应用就是没有按照您的预期情况运行。
+- 了解您的硬件：如果您的GPU理论可以达到6 TFLOPs（6万亿次浮点运算每秒），而当前已经有5.5 TFLOPs了，那估计这里的潜力就没啥好挖的了……
+
+性能分析是性能优化的关键一步。有的时候简简单单的改变就能在性能上产生明显的优化效果！
+当然，具体情况因人而异。
+
+参考资料
+===========
+Jeremy Appleyard, `GPU Profiling for Deep Learning <http://www.robots.ox.ac.uk/~seminars/seminars/Extra/2015_10_08_JeremyAppleyard.pdf>`_, 2015
--- a/doc/fluid/advanced_usage/development/host_memory_profiling_cn.md
+++ b/doc/fluid/advanced_usage/development/host_memory_profiling_cn.md
+# 堆内存分析和优化
+
+计算机程序都可能有内存泄漏的风险。**内存泄漏**一般是由于程序在堆(heap)上分配了内存而没有释放，随着程序的运行占用的内存越来越大，一方面会影响程序的稳定性，可能让运行速度越来越慢，或者造成oom，甚至会影响运行程序的机器的稳定性，造成宕机。
+
+
+目前有很多内存泄漏分析工具，比较经典的有[valgrind](http://valgrind.org/docs/manual/quick-start.html#quick-start.intro), [gperftools](https://gperftools.github.io/gperftools/)。
+
+因为Fluid是用Python驱动C++ core来运行，valgrind直接分析非常困难，需要自己编译debug版本的、带valgrind支持的专用Python版本，而且输出的信息中大部分是Python自己的符号和调用信息，分析起来很困难，另外使用valgrind会让程序运行速度变得非常慢，所以不建议使用。
+
+本教程主要介绍[gperftools](https://gperftools.github.io/gperftools/)的使用。
+
+gperftool主要支持以下四个功能：
+
+- thread-caching malloc
+- heap-checking using tcmalloc
+- heap-profiling using tcmalloc
+- CPU profiler
+
+Paddle也提供了基于gperftool的[CPU性能分析教程](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/fluid/howto/optimization/cpu_profiling_cn.md)。
+
+对于堆内存的分析，主要用到thread-caching malloc和heap-profiling using tcmalloc。
+
+## 环境
+
+本教程基于paddle提供的Docker开发环境paddlepaddle/paddle:latest-dev，基于Ubuntu 16.04.4 LTS环境。
+
+## 使用流程
+
+- 安装google-perftools
+
+```
+apt-get install libunwind-dev 
+apt-get install google-perftools
+```
+
+- 安装pprof
+
+```
+go get -u github.com/google/pprof
+```
+
+- 设置运行环境
+
+```
+export PPROF_PATH=/root/gopath/bin/pprof
+export PPROF_BINARY_PATH=/root/gopath/bin/pprof
+export LD_PRELOAD=/usr/lib/libtcmalloc.so.4
+```
+
+- 使用heap profile来运行python程序。本质上是周期性的对堆的分配情况做一次快照。
+
+```
+# HEAPPROFILE 设置生成的堆分析文件的目录和文件前缀
+# HEAP_PROFILE_ALLOCATION_INTERVAL 设置每分配多少存储dump一次dump，默认1GB
+env HEAPPROFILE="./perf_log/test.log" HEAP_PROFILE_ALLOCATION_INTERVAL=209715200 python trainer.py
+```
+
+随着程序的运行，会在perf_log这个文件夹下生成很多文件，如下：
+
+```
+-rw-r--r-- 1 root root 1.0M Jun  1 15:00 test.log.0001.heap
+-rw-r--r-- 1 root root 1.0M Jun  1 15:00 test.log.0002.heap
+-rw-r--r-- 1 root root 1.0M Jun  1 15:00 test.log.0003.heap
+-rw-r--r-- 1 root root 1.0M Jun  1 15:00 test.log.0004.heap
+-rw-r--r-- 1 root root 1.0M Jun  1 15:00 test.log.0005.heap
+-rw-r--r-- 1 root root 1.0M Jun  1 15:00 test.log.0006.heap
+```
+
+- 使用pprof对heap文件进行分析。分析有两种模式：
+	- 完整模式。会对当前heap做一个分析，显示目前分配内存一些调用路径。
+
+	```
+	pprof --pdf python test.log.0012.heap
+	```
+	上述命令会生成一个profile00x.pdf的文件，可以直接打开，例如：[memory_cpu_allocator](https://github.com/jacquesqiao/Paddle/blob/bd2ea0e1f84bb6522a66d44a072598153634cade/doc/fluid/howto/optimization/memory_cpu_allocator.pdf)。从下图可以看出，在CPU版本fluid的运行过程中，分配存储最多的模块式CPUAllocator. 而别的模块相对而言分配内存较少，所以被忽略了，这对于分配内存泄漏是很不方便的，因为泄漏是一个缓慢的过程，在这种图中是无法看到的。
+	
+	![result](https://user-images.githubusercontent.com/3048612/40964027-a54033e4-68dc-11e8-836a-144910c4bb8c.png)
+	
+	- Diff模式。可以对两个时刻的heap做diff，把一些内存分配没有发生变化的模块去掉，而把增量部分显示出来。
+	```
+	pprof --pdf --base test.log.0010.heap python test.log.1045.heap
+	```
+	生成的结果为：[`memory_leak_protobuf`](https://github.com/jacquesqiao/Paddle/blob/bd2ea0e1f84bb6522a66d44a072598153634cade/doc/fluid/howto/optimization/memory_leak_protobuf.pdf)
+	
+	从图中可以看出：ProgramDesc这个结构，在两个版本之间增长了200MB+，所以这里有很大的内存泄漏的可能性，最终结果也确实证明是这里造成了泄漏。
+	
+	![result](https://user-images.githubusercontent.com/3048612/40964057-b434d5e4-68dc-11e8-894b-8ab62bcf26c2.png)
+	![result](https://user-images.githubusercontent.com/3048612/40964063-b7dbee44-68dc-11e8-9719-da279f86477f.png)
+	
--- a/doc/fluid/advanced_usage/development/new_op.md
+++ b/doc/fluid/advanced_usage/development/new_op.md
+# 如何写新的Operator
+
+ - [概念简介](#概念简介)
+ - [实现C++类](#实现c类)
+   - [定义ProtoMaker类](#定义protomaker类)
+   - [定义Operator类](#定义operator类)
+   - [定义OpKernel类](#定义opkernel类)
+   - [注册Operator](#注册operator)
+   - [编译](#编译)
+ - [绑定Python](#绑定python)
+ - [实现单元测试](#实现单元测试)
+   - [前向Operator单测](#前向operator单测)
+   - [反向Operator单测](#反向operator单测)
+   - [编译和执行](#编译和执行)
+ - [注意事项](#注意事项)
+
+
+## 概念简介
+
+简单介绍需要用到基类，详细介绍请参考设计文档。
+
+- `framework::OperatorBase`: Operator(简写，Op)基类。
+- `framework::OpKernel`: Op计算函数的基类，称作Kernel。
+- `framework::OperatorWithKernel`：继承自OperatorBase，Op有计算函数，称作有Kernel。
+- `class OpProtoAndCheckerMaker`：描述该Op的输入、输出、属性、注释,主要用于Python API接口生成
+
+依据是否包含kernel，可以将Op分为两种：包含Kernel的Op和不包含kernel的Op，前者Op的定义继承自`OperatorWithKernel`，后者继承自`OperatorBase`。本教程主要介绍带Kernel的Op如何写，简单总结Op需要包含的内容如下：
+
+<table>
+<thead>
+<tr>
+<th>内容</th>
+<th>定义位置</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td>OpProtoMake定义 </td>
+<td>.cc 文件，Backward Op不需要定义OpProtoMake </td>
+</tr>
+<tr>
+<td>Op定义 </td>
+<td> .cc 文件</td>
+</tr>
+<tr>
+<td>Kernel实现 </td>
+<td> CPU、CUDA共享Kernel实现在.h 文件中，否则，CPU 实现在.cc 文件中，CUDA 实现在.cu 文件中。</td>
+</tr>
+<tr>
+<td>注册Op </td>
+<td> Op注册实现在.cc 文件；Kernel注册CPU实现在.cc 文件中，CUDA实现在.cu 文件中</td>
+</tr>
+</tbody>
+</table>
+
+
+实现新的op都添加至目录[paddle/fluid/operators](https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/fluid/operators)下，文件命名以`*_op.h`（如有） 、 `*_op.cc` 、`*_op.cu`（如有）结尾。**系统会根据文件名自动构建op和其对应的Python扩展。**
+
+
+下面以矩阵乘操作，即[MulOp](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/operators/mul_op.cc)为例来介绍如何写带Kernel的Operator。
+
+
+## 实现C++类
+
+
+### 定义ProtoMaker类
+
+矩阵乘法的公式：$Out = X * Y$, 可见该计算由两个输入，一个输出组成。
+
+首先定义`ProtoMaker`来描述该Op的输入、输出，并添加注释：
+
+```cpp
+class MulOpMaker : public framework::OpProtoAndCheckerMaker {
+ public:
+  MulOpMaker(OpProto *proto, OpAttrChecker *op_checker)
+      : OpProtoAndCheckerMaker(proto, op_checker) {
+    AddInput("X", "(Tensor), 2D tensor of size (M x K)");
+    AddInput("Y", "(Tensor), 2D tensor of size (K x N)");
+    AddOutput("Out", "(Tensor), 2D tensor of size (M x N)");
+    AddComment(R"DOC(
+Two Element Mul Operator.
+The equation is: Out = X * Y
+)DOC");
+  }
+};
+```
+
+[`MulOpMaker`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/operators/mul_op.cc#L76-L127)继承自`framework::OpProtoAndCheckerMaker`，构造函数含有2个参数：
+
+   - `framework::OpProto` ： 前者存储Op的输入输出和参数属性，将用于Python API接口的生成。
+   - `framework::OpAttrChecker` ：后者用于检查参数属性的合法性。
+
+构造函数里通过`AddInput`添加输入参数，通过`AddOutput`添加输出参数，通过`AddComment`添加Op的注释。这些函数会将对应内容添加到`OpProto`中。
+
+上面的代码在`MulOp`中添加两个输入`X`和`Y`，添加了一个输出`Out`，并解释了各自含义，命名请遵守[命名规范](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/fluid/dev/name_convention.md)。
+
+
+再以[`ScaleOp`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/operators/scale_op.cc#L38-L55)为例：
+
+```cpp
+template <typename AttrType>
+class ScaleOpMaker : public framework::OpProtoAndCheckerMaker {
+ public:
+  ScaleOpMaker(OpProto *proto, OpAttrChecker *op_checker)
+      : OpProtoAndCheckerMaker(proto, op_checker) {
+    AddInput("X", "(Tensor) Input tensor of scale operator.");
+    AddOutput("Out", "(Tensor) Output tensor of scale operator.");
+    AddComment(R"DOC(
+Scale operator
+$$Out = scale*X$$
+)DOC");
+    AddAttr<AttrType>("scale",
+                      "(float, default 1.0)"
+                      "The scaling factor of the scale operator.")
+        .SetDefault(1.0);
+  }
+};
+```
+
+这个例子有`AddAttr<AttrType>("scale", "...").SetDefault(1.0);` : 增加`scale`系数，作为参数属性，并且设置默认值为1.0。
+
+### 定义GradProtoMaker类
+每个Op的必须有一个对应的GraProtoMaker，若未定制对应前向Op的GradProtoMaker，fluid提供了DefaultGradProtoMaker，默认注册会使用全部输入输出，包括Input, Output, Output@Grad等，使用不需要的变量的会造成显存浪费。
+下面示例定义了ScaleOp的GradProtoMaker。
+
+```cpp
+class ScaleGradMaker : public framework::SingleGradOpDescMaker {
+ public:
+  using framework::SingleGradOpDescMaker::SingleGradOpDescMaker;
+
+  std::unique_ptr<framework::OpDesc> Apply() const override {
+    auto *grad_op = new framework::OpDesc();
+    grad_op->SetType("scale");
+    grad_op->SetInput("X", OutputGrad("Out"));
+    grad_op->SetOutput("Out", InputGrad("X"));
+    grad_op->SetAttr("scale", GetAttr("scale"));
+    return std::unique_ptr<framework::OpDesc>(grad_op);
+  }
+};
+```
+
+### 定义Operator类
+
+下面实现了MulOp的定义：
+
+```cpp
+class MulOp : public framework::OperatorWithKernel {
+ public:
+  using framework::OperatorWithKernel::OperatorWithKernel;
+
+ protected:
+  void InferShape(const framework::InferShapeContext &ctx) const override {
+    auto dim0 = ctx.Input<Tensor>("X")->dims();
+    auto dim1 = ctx.Input<Tensor>("Y")->dims();
+    PADDLE_ENFORCE_EQ(dim0.size(), 2,
+                      "input X(%s) should be a tensor with 2 dims, a matrix",
+                      ctx.op_.Input("X"));
+    PADDLE_ENFORCE_EQ(dim1.size(), 2,
+                      "input Y(%s) should be a tensor with 2 dims, a matrix",
+                      ctx.op_.Input("Y"));
+    PADDLE_ENFORCE_EQ(
+        dim0[1], dim1[0],
+        "First matrix's width must be equal with second matrix's height.");
+    ctx.Output<Tensor>("Out")->Resize({dim0[0], dim1[1]});
+  }
+};
+```
+
+[`MulOp`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/operators/mul_op.cc#L22)继承自`OperatorWithKernel`。`public`成员：
+
+```cpp
+using framework::OperatorWithKernel::OperatorWithKernel;
+```
+
+这句表示使用基类`OperatorWithKernel`的构造函数，也可写成：
+
+```cpp
+MulOp(const std::string &type, const framework::VariableNameMap &inputs,
+      const framework::VariableNameMap &outputs,
+      const framework::AttributeMap &attrs)
+  : OperatorWithKernel(type, inputs, outputs, attrs) {}
+```
+
+还需要重写`InferShape`接口。`InferShape`为const函数，不能修改Op的成员变量，参数为`const framework::InferShapeContext &ctx`，通过该参数可获取到输入输出以及属性。它的功能是：
+
+  - 1). 做检查， 尽早报错：检查输入数据维度、类型等是否合法。
+  - 2). 设置输出Tensor的形状。
+
+通常`OpProtoMaker`和`Op`类的定义写在`.cc`文件中，和下面将要介绍的注册函数一起放在`.cc`中
+
+### 定义OpKernel类
+
+`MulKernel`继承自`framework::OpKernel`，带有下面两个模板参数:
+
+- `typename DeviceContext`: 表示设备类型，不同设备(CPU、CUDA)共享同一个Kernel时，需加该模板参数，不共享则不加，一个不共享的例子是[`OnehotCrossEntropyOpKernel`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/operators/cross_entropy_op.h#L43)。
+
+- `typename T` : 表示数据类型，如`float`, `double`等。
+
+需要为`MulKernel`类重写`Compute`接口。
+- `Compute`接受一个输入参数：`const framework::ExecutionContext& context`。
+- 与`InferShapeContext`相比，`ExecutionContext`增加了设备类型，同样可获取到输入输出和属性参数。
+- `Compute`函数里实现`OpKernel`的具体计算逻辑。
+
+下面是 `MulKernel` `Compute`的实现：
+
+  ```cpp
+  template <typename DeviceContext, typename T>
+  class MulKernel : public framework::OpKernel {
+  public:
+  void Compute(const framework::ExecutionContext& context) const override {
+    auto* X = context.Input<Tensor>("X");
+    auto* Y = context.Input<Tensor>("Y");
+    auto* Z = context.Output<Tensor>("Out");
+    Z->mutable_data<T>(context.GetPlace());
+    auto& device_context = context.template device_context<DeviceContext>();
+    math::matmul<DeviceContext, T>(*X, false, *Y, false, 1, Z, 0, device_context);
+  }
+  };
+  ```
+
+需要注意：**不同设备(CPU、CUDA)共享一个Op定义，是否则共享同一个`OpKernel`，取决于`Compute`调用的函数是否支持不同设备。**
+
+`MulOp`的CPU、CUDA实现共享同一个`Kernel`。`OpKernel`不共享的例子可以参考：[`OnehotCrossEntropyOpKernel`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/operators/cross_entropy_op.h#L43)。
+
+为了使`OpKernel`的计算过程书写更加简单，并且CPU、CUDA的代码可以复用，我们通常借助 Eigen unsupported Tensor模块来实现`Compute`接口。关于在PaddlePaddle中如何使用Eigen库，请参考[使用文档](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/fluid/dev/use_eigen_cn.md)。
+
+到此，前向Op实现完成。接下来，需要在`.cc`文件中注册该op和kernel。
+反向Op类的定义，反向OpKernel的定义与前向Op类似，这里不再赘述。**但需注意反向Op没有`ProtoMaker`**。
+
+### 注册Operator
+
+- 在`.cc`文件中注册前向、反向Op类，注册CPU Kernel。
+
+    ```cpp
+    namespace ops = paddle::operators;
+    REGISTER_OPERATOR(mul, ops::MulOp, ops::MulOpMaker,
+                  paddle::framework::DefaultGradOpDescMaker<true>)
+    REGISTER_OPERATOR(mul_grad, ops::MulGradOp)
+    REGISTER_OP_CPU_KERNEL(mul, ops::MulKernel<paddle::platform::CPUDeviceContext, float>);
+    REGISTER_OP_CPU_KERNEL(mul_grad,
+                  ops::MulGradKernel<paddle::platform::CPUDeviceContext, float>);
+    ```
+
+   在上面的代码中：
+
+    - `REGISTER_OPERATOR` ： 注册`ops::MulOp`类，类型名为`mul`，该类的`ProtoMaker`为`ops::MulOpMaker`，注册`ops::MulOpGrad`，类型名为`mul_grad`。
+    - `REGISTER_OP_CPU_KERNEL` ：注册`ops::MulKernel`类，并特化模板参数为`paddle::platform::CPUPlace`和`float`类型，同理，注册`ops::MulGradKernel`类。
+
+
+- 在 `.cu`文件中注册CUDA Kernel。
+    - 请注意，如果CUDA Kernel的实现基于Eigen unsupported模块，那么在 `.cu`的开始请加上宏定义 `#define EIGEN_USE_GPU`，代码示例如下：
+
+    ```cpp
+    // if use Eigen unsupported module before include head files
+    #define EIGEN_USE_GPU
+
+    namespace ops = paddle::operators;
+    REGISTER_OP_CUDA_KERNEL(mul, ops::MulKernel<paddle::platform::CUDADeviceContext, float>);
+    REGISTER_OP_CUDA_KERNEL(mul_grad,
+                           ops::MulGradKernel<paddle::platform::CUDADeviceContext, float>);
+    ```
+
+### 编译
+
+运行下面命令可以进行编译：
+
+```
+make mul_op
+```
+
+## 绑定Python
+
+系统会对新增的op自动绑定Python，并链接到生成的lib库中。
+
+## 实现单元测试
+
+单测包括对比前向Op不同设备(CPU、CUDA)的实现、对比反向OP不同设备(CPU、CUDA)的实现、反向Op的梯度测试。下面介绍介绍[`MulOp`的单元测试](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/tests/unittests/test_mul_op.py)。
+
+### 前向Operator单测
+
+Op单元测试继承自`OpTest`。各项更加具体的单元测试在`TestMulOp`里完成。测试Operator，需要：
+
+1. 在`setUp`函数定义输入、输出，以及相关的属性参数。
+2. 生成随机的输入数据。
+3. 在Python脚本中实现与前向operator相同的计算逻辑，得到输出值，与operator前向计算的输出进行对比。
+4. 反向计算已经自动集成进测试框架，直接调用相应接口即可。
+
+
+  ```python
+  import unittest
+  import numpy as np
+  from op_test import OpTest
+
+
+  class TestMulOp(OpTest):
+      def setUp(self):
+          self.op_type = "mul"
+          self.inputs = {
+              'X': np.random.random((32, 84)).astype("float32"),
+              'Y': np.random.random((84, 100)).astype("float32")
+          }
+          self.outputs = {'Out': np.dot(self.inputs['X'], self.inputs['Y'])}
+
+      def test_check_output(self):
+          self.check_output()
+
+      def test_check_grad_normal(self):
+          self.check_grad(['X', 'Y'], 'Out', max_relative_error=0.5)
+
+      def test_check_grad_ingore_x(self):
+          self.check_grad(
+              ['Y'], 'Out', max_relative_error=0.5, no_grad_set=set("X"))
+
+      def test_check_grad_ingore_y(self):
+          self.check_grad(
+              ['X'], 'Out', max_relative_error=0.5, no_grad_set=set('Y'))
+  ```
+
+上面的代码首先导入依赖的包，下面是对`setUp`函数中操作的重要变量的详细解释：
+
+- `self.op_type = "mul" ` : 定义类型，与operator注册时注册的类型一致。
+- `self.inputs` : 定义输入，类型为`numpy.array`，并初始化。
+- `self.outputs` : 定义输出，并在Python脚本中完成与operator同样的计算逻辑，返回Python端的计算结果。
+
+### 反向operator单测
+
+而反向测试中：
+- `test_check_grad_normal`中调用`check_grad`使用数值法检测梯度正确性和稳定性。
+  - 第一个参数`["X", "Y"]` : 指定对输入变量`X`、`Y`做梯度检测。
+  - 第二个参数`"Out"` : 指定前向网络最终的输出目标变量`Out`。
+  - 第三个参数`max_relative_error`：指定检测梯度时能容忍的最大错误值。
+- `test_check_grad_ingore_x`和`test_check_grad_ingore_y`分支用来测试只需要计算一个输入梯度的情况。
+
+
+### 编译和执行
+
+`python/paddle/fluid/tests/unittests/` 目录下新增的 `test_*.py` 单元测试会被自动加入工程进行编译。
+
+请注意，**不同于Op的编译测试，运行单元测试测时需要编译整个工程**，并且编译时需要打开`WITH_TESTING`, 即`cmake paddle_dir -DWITH_TESTING=ON`。编译成功后，执行下面的命令来运行单元测试：
+
+```bash
+make test ARGS="-R test_mul_op -V"
+```
+
+或者:
+
+```bash
+ctest -R test_mul_op
+```
+
+## 注意事项
+
+- 注册Op时的类型名，需要和该Op的名字一样。即不允许在`A_op.cc`里面，注册`REGISTER_OPERATOR(B, ...)`等，这将会导致单元测试出错。
+- 如果Op没有实现CUDA Kernel，请不要创建空的`*_op.cu`，这将会导致单元测试出错。
+- 如果多个Op依赖一些共用的函数，可以创建非`*_op.*`格式的文件来存放，如`gather.h`文件。
+
+### PADDLE_ENFORCE使用注意
+
+实现Op时检查数据的合法性需要使用PADDLE_ENFORCE以及PADDLE_ENFORCE_EQ等宏定义，基本格式如下：
+
+```
+PADDLE_ENFORCE(表达式, 错误提示信息)
+PADDLE_ENFORCE_EQ(比较对象A, 比较对象B, 错误提示信息)
+```
+
+如果表达式为真，或者比较对象A=B，则检查通过，否则会终止程序运行，向用户反馈相应的错误提示信息。
+为了确保提示友好易懂，开发者需要注意其使用方法。
+
+#### 总体原则
+
+任何使用了PADDLE_ENFORCE与PADDLE_ENFORCE_**检查的地方，必须有详略得当的备注解释！**错误提示信息**不能为空！
+
+#### 提示信息书写标准
+
+1. [required] 哪里错了？为什么错了？
+    - 例如：`ValueError: Mismatched label shape`
+2. [optional] 期望的输入是什么样的？实际的输入是怎样的？
+    - 例如：`Expected labels dimension=1. Received 4.`
+3. [optional] 能否给出修改意见？
+    - 例如：`Suggested Fix:If your classifier expects one-hot encoding label,check your n_classes argument to the estimatorand/or the shape of your label.Otherwise, check the shape of your label.`
+
+如果并非必要或者简洁的描述即可表达清楚以上要点，根据情况书写亦可。
+
+##### FAQ 典型问题
+
+1. 无报错信息或报错信息过于简单，不能给用户提供有效的提示！
+
+问题示例1 ：未写提示信息
+```
+PADDLE_ENFORCE(ctx->HasInput("X"), "");
+```
+问题示例2 ：提示信息过于简单
+```
+PADDLE_ENFORCE(i != nullptr, "i must be set"); // i是什么？
+```
+
+2. 在报错信息中使用开发人员定义的变量缩写，不易理解！
+
+问题示例：
+```
+PADDLE_ENFORCE(forward_pd != nullptr,
+                    "Fail to find eltwise_fwd_pd in device context");  //eltwise_fwd_pd用户可能看不懂
+```
+
+3. OP内部调用非法接口：Op内部如果出现Output = ShareDataWith(Input) 
+问题示例：
+```cpp
+auto *out = ctx.Output<framework::LoDTensor>("Out");
+auto *in = ctx.Input<framework::LoDTensor>("X");
+out->ShareDataWith(*in);
+```
+Op内部如果出现Output = ShareDataWith(Input)，相当于operator图的中有一条隐藏边，连接了Input和Output，这条边无法在图分析中表达，引发基于图优化的错误。
+
+4. OP实现的性能实践
+调用了eigen的broadcast, chop等操作，性能会比手写cuda kernel差几倍以上。此时cpu的实现可以复用eigen，gpu实现可以实现cuda kernel.
+
+
+#### OP InferShape检查提示信息特别说明
+
+- 检查输入输出变量，请统一遵循以下格式
+`Input(变量名) of OP名 operator should not be null.`  
+
+正确示例：
+```
+PADDLE_ENFORCE(ctx->HasInput("Input"),
+                        "Input(Input) of LSTMP operator should not be null.");
+```
+
+- 反向Op的输入输出检查，要写明反向Op的名字
+
+正确示例：
+```
+PADDLE_ENFORCE(ctx->HasInput("X"),
+                        "Input(X) of LoDResetGrad opreator should not be null.");
+```
--- a/source/advanced_usage/development/nvvp1.png
+++ b/source/advanced_usage/development/nvvp1.png
--- a/source/advanced_usage/development/nvvp2.png
+++ b/source/advanced_usage/development/nvvp2.png
--- a/source/advanced_usage/development/nvvp3.png
+++ b/source/advanced_usage/development/nvvp3.png
--- a/source/advanced_usage/development/nvvp4.png
+++ b/source/advanced_usage/development/nvvp4.png
--- a/source/advanced_usage/development/pprof_1.png
+++ b/source/advanced_usage/development/pprof_1.png
--- a/source/advanced_usage/development/pprof_2.png
+++ b/source/advanced_usage/development/pprof_2.png
--- a/source/advanced_usage/development/timeline.jpeg
+++ b/source/advanced_usage/development/timeline.jpeg
--- a/doc/fluid/advanced_usage/development/timeline_cn.md
+++ b/doc/fluid/advanced_usage/development/timeline_cn.md
+# 如何使用timeline工具做性能分析
+
+1. 在训练的主循环外加上`profiler.start_profiler(...)`和`profiler.stop_profiler(...)`。运行之后，代码会在`/tmp/profile`目录下生成一个profile的记录文件。
+
+	**提示：**
+	请不要在timeline记录信息时运行太多次迭代，因为timeline中的记录数量和迭代次数是成正比的。
+
+	```python
+    for pass_id in range(pass_num):
+        for batch_id, data in enumerate(train_reader()):
+            if pass_id == 0 and batch_id == 5:
+                profiler.start_profiler("All")
+            elif pass_id == 0 and batch_id == 10:
+                profiler.stop_profiler("total", "/tmp/profile")
+            exe.run(fluid.default_main_program(),
+                    feed=feeder.feed(data),
+                    fetch_list=[])
+	            ...
+	```
+
+1. 运行`python paddle/tools/timeline.py`来处理`/tmp/profile`，这个程序默认会生成一个`/tmp/timeline`文件，你也可以用命令行参数来修改这个路径，请参考[timeline.py](https://github.com/PaddlePaddle/Paddle/blob/develop/tools/timeline.py)。
+```python
+python Paddle/tools/timeline.py --profile_path=/tmp/profile --timeline_path=timeline
+```
+
+1. 打开chrome浏览器，访问<chrome://tracing/>，用`load`按钮来加载生成的`timeline`文件。
+
+	![chrome tracing](./tracing.jpeg)
+
+1. 结果如下图所示，可以放到来查看timetime的细节信息。
+
+	![chrome timeline](./timeline.jpeg)
--- a/source/advanced_usage/development/tracing.jpeg
+++ b/source/advanced_usage/development/tracing.jpeg
--- a/doc/fluid/advanced_usage/development/write_docs.rst
+++ b/doc/fluid/advanced_usage/development/write_docs.rst
+#############
+如何贡献文档
+#############
+
+PaddlePaddle的文档包括中英文两个部分。文档都是通过 ``cmake`` 驱动 ``sphinx`` 编译生成的，PaddlePaddle.org工具可以帮助我们实现这一编译过程，并提供更好的预览效果。
+
+如何构建文档
+============
+
+PaddlePaddle的文档构建有两种方式，分别为使用paddlepaddle.org工具和不使用paddlepaddle.org工具，两种方式都有各自的优点，前者方便预览，后者方便开发者进行调试。这两种方式中又分别有使用docker和不使用docker的两种构建方法。
+
+我们建议使用PaddlePaddle.org工具来构建文档。
+
+使用PaddlePaddle.org工具
+------------------------
+这个是目前推荐的使用方法。除了可以自动编译文档，还可以直接在网页中预览文档，需要注意的是，采用后续说明的其它方式虽然也可以预览文档，但是文档的样式与官网文档是不一致的，使用PaddlePaddle.org工具进行编译才能产生与官网文档样式一致的预览效果。
+
+PaddlePaddle.org工具可以配合Docker使用，需要在系统里先安装好Docker工具包。Docker安装请参考 `Docker的官网 <https://docs.docker.com/>`_ 。安装好Docker之后即可用以下命令启动工具
+
+..  code-block:: bash
+
+    mkdir paddlepaddle # Create paddlepaddle working directory
+    cd paddlepaddle
+
+    # Clone the content repositories
+    git clone https://github.com/PaddlePaddle/Paddle.git
+    git clone https://github.com/PaddlePaddle/book.git
+    git clone https://github.com/PaddlePaddle/models.git
+    git clone https://github.com/PaddlePaddle/Mobile.git
+
+    # Please specify the working directory through -v
+    docker run -it -p 8000:8000 -v `pwd`:/var/content paddlepaddle/paddlepaddle.org:latest
+
+注意: PaddlePaddle.org 会在 -v (volume) 指定的内容存储库运行命令
+之后再用网页连到 http://localhost:8000 就可以在网页上生成需要的文档
+编译后的文件将被存储在工作目录 <paddlepaddle working directory>/.ppo_workspace/content。
+
+如果不想使用Docker，你还可以通过运行Django框架直接激活工具的服务器。使用下面的命令来运行它。
+
+..  code-block:: bash
+
+    mkdir paddlepaddle # Create paddlepaddle working directory
+    cd paddlepaddle
+
+    # Clone the content repositories and PaddlePaddle.org
+    git clone https://github.com/PaddlePaddle/Paddle.git
+    git clone https://github.com/PaddlePaddle/book.git
+    git clone https://github.com/PaddlePaddle/models.git
+    git clone https://github.com/PaddlePaddle/Mobile.git
+    git clone https://github.com/PaddlePaddle/PaddlePaddle.org.git
+
+    # Please specify the PaddlePaddle working directory. In the current setting, it should be pwd
+    export CONTENT_DIR=<path_to_paddlepaddle_working_directory>
+    export ENV=''
+    cd PaddlePaddle.org/portal/
+    pip install -r requirements.txt
+    python manage.py runserver
+
+工具服务器将读取环境变量 CONTENT_DIR 搜索代码库。请指定的PaddlePaddle工作目录给环境变量 CONTENT_DIR。
+之后再用网页连到 http://localhost:8000 就可以在网页上生成需要的文档。
+编译后的文件将被存储在工作目录 <paddlepaddle working directory>/.ppo_workspace/content。
+
+想了解更多PaddlePaddle.org工具的详细信息，可以 `点击这里 <https://github.com/PaddlePaddle/PaddlePaddle.org/blob/develop/README.cn.md>`_ 。
+
+不使用PaddlePaddle.org工具
+--------------------------
+
+使用Docker构建PaddlePaddle的文档，需要在系统里先安装好Docker工具包。Docker安装请参考 `Docker的官网 <https://docs.docker.com/>`_ 。该方法与 `从源码编译PaddlePaddle <http://paddlepaddle.org/docs/develop/documentation/zh/build_and_install/build_from_source_cn.html>`_ 相似，通过从源码中构建可用于编译PaddlePaddle文档的Docker镜像并运行，在进入Docker容器后使用源码中的脚本构建PaddlePaddle文档，具体步骤如下：
+
+.. code-block:: bash
+
+   git clone https://github.com/PaddlePaddle/Paddle.git
+   cd Paddle
+
+   # 从源码中构建可用于编译PaddlePaddle文档的Docker镜像
+   docker build -t paddle:dev .
+   docker run -it -v $PWD:/paddle -e "WITH_GPU=OFF" -e "WITH_TESTING=OFF" -e "WITH_DOC=ON" paddle:dev /bin/bash
+
+   # 进入Docker容器后使用build.sh脚本构建PaddlePaddle文档
+   bash -x /paddle/paddle/scripts/docker/build.sh
+
+注：上述命令把当前目录（源码根目录）映射为 container 里的 :code:`/paddle` 目录。
+
+编译完成后，会产生 ``doc/v2`` 和 ``doc/fluid`` 两个目录，在这两个目录下分别都生成 ``cn/html/`` 、 ``en/html`` 、 ``api/en/html`` 共三个子目录，分别进入这些目录下，执行以下命令：
+
+.. code-block:: bash
+
+   python -m SimpleHTTPServer 8088
+
+在浏览器中输入 http://localhost:8088 就可以看到编译生成的 ``v2`` 和 ``fluid`` 两种版本的中/英文的文档页面和英文的API页面。
+
+如果不想使用Docker，也可以使用以下命令直接构建PaddlePaddle文档，即
+
+.. code-block:: bash
+
+   git clone https://github.com/PaddlePaddle/Paddle.git
+   cd Paddle
+   mkdir -p build
+   cd build
+   cmake .. -DCMAKE_BUILD_TYPE=Release -DWITH_GPU=OFF -DWITH_MKL=OFF -DWITH_DOC=ON
+
+   # 如果只需要构建使用文档，则执行以下命令
+   make -j $processors paddle_docs
+
+   # 如果只需要构建API，则执行以下命令
+   make -j $processors paddle_apis
+
+其中$processors代表启动和CPU核一样多的进程来并行编译，可以根据本机的CPU核数设置相应的值。
+
+编译完成后，同样会产生 ``doc/v2`` 和 ``doc/fluid`` 两个目录，如果选择构建文档则会在这两个目录下分别都生成 ``cn/html/`` 、 ``en/html`` 两个子目录，选择构建API则会在这两个目录下分别生成 ``api/en/html`` 目录，分别进入这些子目录下，执行以下命令：
+
+.. code-block:: bash
+
+   python -m SimpleHTTPServer 8088
+
+在浏览器中输入 http://localhost:8088 就可以看到编译生成的 ``v2`` 和 ``fluid`` 两种版本的中/英文的文档页面和英文的API页面。下图为生成的 ``v2`` 英文文档首页示例。注意，示例中由于使用了sphinx的原始主题，所以页面的风格与官网并不一致，但这并不影响开发者进行调试。
+
+..  image:: src/doc_en.png
+    :align: center
+    :scale: 60 %
+
+如何书写文档
+============
+
+PaddlePaddle文档使用 `sphinx`_ 自动生成，用户可以参考sphinx教程进行书写。
+
+如何更新www.paddlepaddle.org
+============================
+
+更新的文档以PR的形式提交到github中，提交方式参见 `如何贡献文档 <http://www.paddlepaddle.org/docs/develop/documentation/zh/dev/write_docs_cn.html>`_ 。
+目前PaddlePaddle的develop分支的文档是自动触发更新的，用户可以分别查看最新的 `中文文档 <http://www.paddlepaddle.org/docs/develop/documentation/zh/getstarted/index_cn.html>`_ 和
+`英文文档 <http://www.paddlepaddle.org/docs/develop/documentation/en/getstarted/index_en.html>`_ 。
+
+
+..  _cmake: https://cmake.org/
+..  _sphinx: http://www.sphinx-doc.org/en/1.4.8/
--- a/source/advanced_usage/index.rst
+++ b/source/advanced_usage/index.rst
--- a/source/advanced_usage/pics/anakin_fm_ch.png
+++ b/source/advanced_usage/pics/anakin_fm_ch.png
--- a/doc/fluid/api/CMakeLists.txt
+++ b/doc/fluid/api/CMakeLists.txt
+# configured documentation tools and intermediate build results
+set(BINARY_BUILD_DIR_EN "${CMAKE_CURRENT_BINARY_DIR}/en/_build")
+
+# Sphinx cache with pickled ReST documents
+set(SPHINX_CACHE_DIR_EN "${CMAKE_CURRENT_BINARY_DIR}/en/_doctrees")
+
+# HTML output director
+set(SPHINX_HTML_DIR_EN "${CMAKE_CURRENT_BINARY_DIR}/en/html")
+
+set(IMPORT_PADDLE_STRING "import paddle")
+set(IMPORT_PADDLEV2_STRING "import paddle.v2")
+
+configure_file(
+    "${CMAKE_CURRENT_SOURCE_DIR}/../../templates/conf.py.en.in"
+    "${BINARY_BUILD_DIR_EN}/conf.py"
+    @ONLY)
+
+sphinx_add_target(paddle_fluid_apis
+                  html
+                  ${BINARY_BUILD_DIR_EN}
+                  ${SPHINX_CACHE_DIR_EN}
+                  ${CMAKE_CURRENT_SOURCE_DIR}
+                  ${SPHINX_HTML_DIR_EN})
+
+add_dependencies(paddle_fluid_apis  gen_proto_py framework_py_proto copy_paddle_pybind paddle_python)
--- a/doc/fluid/api/average.rst
+++ b/doc/fluid/api/average.rst
+..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
+    !DO NOT EDIT THIS FILE MANUALLY!
+
+=============
+fluid.average
+=============
+
+.. _api_fluid_average_WeightedAverage:
+
+WeightedAverage
+---------------
+
+..  autoclass:: paddle.fluid.average.WeightedAverage
+    :members:
+    :noindex:
+
--- a/doc/fluid/api/backward.rst
+++ b/doc/fluid/api/backward.rst
+..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
+    !DO NOT EDIT THIS FILE MANUALLY!
+
+==============
+fluid.backward
+==============
+
+.. _api_fluid_backward_append_backward:
+
+append_backward
+---------------
+
+..  autofunction:: paddle.fluid.backward.append_backward
+    :noindex:
+
+.. _api_fluid_backward_calc_gradient:
+
+calc_gradient
+-------------
+
+..  autofunction:: paddle.fluid.backward.calc_gradient
+    :noindex:
+
--- a/doc/fluid/api/clip.rst
+++ b/doc/fluid/api/clip.rst
+..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
+    !DO NOT EDIT THIS FILE MANUALLY!
+
+==========
+fluid.clip
+==========
+
+.. _api_fluid_clip_ErrorClipByValue:
+
+ErrorClipByValue
+----------------
+
+..  autoclass:: paddle.fluid.clip.ErrorClipByValue
+    :members:
+    :noindex:
+
+.. _api_fluid_clip_GradientClipByValue:
+
+GradientClipByValue
+-------------------
+
+..  autoclass:: paddle.fluid.clip.GradientClipByValue
+    :members:
+    :noindex:
+
+.. _api_fluid_clip_GradientClipByNorm:
+
+GradientClipByNorm
+------------------
+
+..  autoclass:: paddle.fluid.clip.GradientClipByNorm
+    :members:
+    :noindex:
+
+.. _api_fluid_clip_GradientClipByGlobalNorm:
+
+GradientClipByGlobalNorm
+------------------------
+
+..  autoclass:: paddle.fluid.clip.GradientClipByGlobalNorm
+    :members:
+    :noindex:
+
--- a/doc/fluid/api/data/data_reader.rst
+++ b/doc/fluid/api/data/data_reader.rst
+=====================
+Data Reader Interface
+=====================
+
+
+DataTypes
+=========
+
+..  autofunction:: paddle.v2.data_type.dense_array
+    :noindex:
+
+..  autofunction:: paddle.v2.data_type.integer_value
+    :noindex:
+
+..  autofunction:: paddle.v2.data_type.integer_value_sequence
+    :noindex:
+
+..  autofunction:: paddle.v2.data_type.integer_value_sub_sequence
+    :noindex:
+
+..  autofunction:: paddle.v2.data_type.sparse_binary_vector
+    :noindex:
+
+..  autofunction:: paddle.v2.data_type.sparse_binary_vector_sequence
+    :noindex:
+
+..  autofunction:: paddle.v2.data_type.sparse_binary_vector_sub_sequence
+    :noindex:
+
+..  autofunction:: paddle.v2.data_type.sparse_float_vector
+    :noindex:
+
+..  autofunction:: paddle.v2.data_type.sparse_float_vector_sequence
+    :noindex:
+
+..  autofunction:: paddle.v2.data_type.sparse_float_vector_sub_sequence
+    :noindex:
+
+..  autofunction:: paddle.v2.data_type.sparse_non_value_slot
+    :noindex:
+
+..  autofunction:: paddle.v2.data_type.sparse_value_slot
+    :noindex:
+
+..  autoclass:: paddle.v2.data_type.InputType
+    :members:
+    :noindex:
+
+DataFeeder
+==========
+
+..  automodule:: paddle.v2.data_feeder
+    :members:
+    :noindex:
+
+Reader
+======
+
+..  automodule:: paddle.reader
+    :members:
+    :noindex:
+
+..  automodule:: paddle.reader.creator
+    :members:
+    :noindex:
+
+minibatch
+=========
+
+..  automodule:: paddle.v2.minibatch
+    :members:
+    :noindex:
--- a/doc/fluid/api/data/dataset.rst
+++ b/doc/fluid/api/data/dataset.rst
+Dataset
+=======
+
+..  automodule:: paddle.dataset
+    :members:
+    :noindex:
+
+mnist
+++++
+
+..  automodule:: paddle.dataset.mnist
+    :members:
+    :noindex:
+
+cifar
+++++
+
+..  automodule:: paddle.dataset.cifar
+    :members:
+    :noindex:
+
+conll05
+++++++
+
+..  automodule:: paddle.dataset.conll05
+    :members: get_dict,get_embedding,test
+    :noindex:
+
+imdb
++++
+
+..  automodule:: paddle.dataset.imdb
+    :members:
+    :noindex:
+
+imikolov
++++++++
+
+..  automodule:: paddle.dataset.imikolov
+    :members:
+    :noindex:
+
+movielens
+++++++++
+
+..  automodule:: paddle.dataset.movielens
+    :members:
+    :noindex:
+
+..  autoclass:: paddle.dataset.movielens.MovieInfo
+    :noindex:
+
+..  autoclass:: paddle.dataset.movielens.UserInfo
+    :noindex:
+
+sentiment
+++++++++
+
+..  automodule:: paddle.dataset.sentiment
+    :members:
+    :noindex:
+
+uci_housing
+++++++++++
+
+..  automodule:: paddle.dataset.uci_housing
+    :members:
+    :noindex:
+
+wmt14
+++++
+
+..  automodule:: paddle.dataset.wmt14
+    :members:
+    :noindex:
+
+wmt16
+++++
+
+..  automodule:: paddle.dataset.wmt16
+    :members:
+    :noindex:
--- a/doc/fluid/api/data/image.rst
+++ b/doc/fluid/api/data/image.rst
+Image Interface
+===============
+
+..  automodule:: paddle.v2.image
+    :members:
--- a/doc/fluid/api/data_feeder.rst
+++ b/doc/fluid/api/data_feeder.rst
+..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
+    !DO NOT EDIT THIS FILE MANUALLY!
+
+=================
+fluid.data_feeder
+=================
+
+.. _api_fluid_data_feeder_DataFeeder:
+
+DataFeeder
+----------
+
+..  autoclass:: paddle.fluid.data_feeder.DataFeeder
+    :members:
+    :noindex:
+
--- a/doc/fluid/api/executor.rst
+++ b/doc/fluid/api/executor.rst
+..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
+    !DO NOT EDIT THIS FILE MANUALLY!
+
+==============
+fluid.executor
+==============
+
+.. _api_fluid_executor_Executor:
+
+Executor
+--------
+
+..  autoclass:: paddle.fluid.executor.Executor
+    :members:
+    :noindex:
+
+.. _api_fluid_executor_global_scope:
+
+global_scope
+------------
+
+..  autofunction:: paddle.fluid.executor.global_scope
+    :noindex:
+
+.. _api_fluid_executor_scope_guard:
+
+scope_guard
+-----------
+
+..  autofunction:: paddle.fluid.executor.scope_guard
+    :noindex:
+
+.. _api_fluid_executor__switch_scope:
+
+_switch_scope
+-------------
+
+..  autofunction:: paddle.fluid.executor._switch_scope
+    :noindex:
+
--- a/doc/fluid/api/fluid.rst
+++ b/doc/fluid/api/fluid.rst
+..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
+    !DO NOT EDIT THIS FILE MANUALLY!
+
+=====
+fluid
+=====
+
+.. _api_fluid_Block:
+
+Block
+-----
+
+..  autoclass:: paddle.fluid.Block
+    :members:
+    :noindex:
+
+.. _api_fluid_Variable:
+
+Variable
+--------
+
+..  autoclass:: paddle.fluid.Variable
+    :members:
+    :noindex:
+
+.. _api_fluid_Program:
+
+Program
+-------
+
+..  autoclass:: paddle.fluid.Program
+    :members:
+    :noindex:
+
+.. _api_fluid_Operator:
+
+Operator
+--------
+
+..  autoclass:: paddle.fluid.Operator
+    :members:
+    :noindex:
+
+.. _api_fluid_default_startup_program:
+
+default_startup_program
+-----------------------
+
+..  autofunction:: paddle.fluid.default_startup_program
+    :noindex:
+
+.. _api_fluid_default_main_program:
+
+default_main_program
+--------------------
+
+..  autofunction:: paddle.fluid.default_main_program
+    :noindex:
+
+.. _api_fluid_program_guard:
+
+program_guard
+-------------
+
+..  autofunction:: paddle.fluid.program_guard
+    :noindex:
+
+.. _api_fluid_get_var:
+
+get_var
+-------
+
+..  autofunction:: paddle.fluid.get_var
+    :noindex:
+
+.. _api_fluid_Executor:
+
+Executor
+--------
+
+..  autoclass:: paddle.fluid.Executor
+    :members:
+    :noindex:
+
+.. _api_fluid_global_scope:
+
+global_scope
+------------
+
+..  autofunction:: paddle.fluid.global_scope
+    :noindex:
+
+.. _api_fluid_scope_guard:
+
+scope_guard
+-----------
+
+..  autofunction:: paddle.fluid.scope_guard
+    :noindex:
+
+.. _api_fluid__switch_scope:
+
+_switch_scope
+-------------
+
+..  autofunction:: paddle.fluid._switch_scope
+    :noindex:
+
+
+.. _api_fluid_make_channel:
+
+make_channel
+------------
+
+..  autofunction:: paddle.fluid.make_channel
+    :noindex:
+
+.. _api_fluid_channel_send:
+
+channel_send
+------------
+
+..  autofunction:: paddle.fluid.channel_send
+    :noindex:
+
+.. _api_fluid_channel_recv:
+
+channel_recv
+------------
+
+..  autofunction:: paddle.fluid.channel_recv
+    :noindex:
+
+.. _api_fluid_channel_close:
+
+channel_close
+-------------
+
+..  autofunction:: paddle.fluid.channel_close
+    :noindex:
+
+.. _api_fluid_Select:
+
+Select
+------
+
+..  autoclass:: paddle.fluid.Select
+    :members:
+    :noindex:
+
+.. _api_fluid_Trainer:
+
+Trainer
+-------
+
+..  autoclass:: paddle.fluid.Trainer
+    :members:
+    :noindex:
+
+.. _api_fluid_BeginEpochEvent:
+
+BeginEpochEvent
+---------------
+
+..  autoclass:: paddle.fluid.BeginEpochEvent
+    :members:
+    :noindex:
+
+.. _api_fluid_EndEpochEvent:
+
+EndEpochEvent
+-------------
+
+..  autoclass:: paddle.fluid.EndEpochEvent
+    :members:
+    :noindex:
+
+.. _api_fluid_BeginStepEvent:
+
+BeginStepEvent
+--------------
+
+..  autoclass:: paddle.fluid.BeginStepEvent
+    :members:
+    :noindex:
+
+.. _api_fluid_EndStepEvent:
+
+EndStepEvent
+------------
+
+..  autoclass:: paddle.fluid.EndStepEvent
+    :members:
+    :noindex:
+
+.. _api_fluid_CheckpointConfig:
+
+CheckpointConfig
+----------------
+
+..  autoclass:: paddle.fluid.CheckpointConfig
+    :members:
+    :noindex:
+
+.. _api_fluid_Inferencer:
+
+Inferencer
+----------
+
+..  autoclass:: paddle.fluid.Inferencer
+    :members:
+    :noindex:
+
+.. _api_fluid_DistributeTranspiler:
+
+DistributeTranspiler
+--------------------
+
+..  autoclass:: paddle.fluid.DistributeTranspiler
+    :members:
+    :noindex:
+
+.. _api_fluid_memory_optimize:
+
+memory_optimize
+---------------
+
+..  autofunction:: paddle.fluid.memory_optimize
+    :noindex:
+
+.. _api_fluid_release_memory:
+
+release_memory
+--------------
+
+..  autofunction:: paddle.fluid.release_memory
+    :noindex:
+
+.. _api_fluid_ParallelExecutor:
+
+ParallelExecutor
+----------------
+
+..  autoclass:: paddle.fluid.ParallelExecutor
+    :members:
+    :noindex:
+
+.. _api_fluid_ExecutionStrategy:
+
+ExecutionStrategy
+-----------------
+
+..  autoclass:: paddle.fluid.ExecutionStrategy
+    :members:
+    :noindex:
+
+.. _api_fluid_BuildStrategy:
+
+BuildStrategy
+-------------
+
+..  autoclass:: paddle.fluid.BuildStrategy
+    :members:
+    :noindex:
+
+.. _api_fluid_create_lod_tensor:
+
+create_lod_tensor
+-----------------
+
+..  autofunction:: paddle.fluid.create_lod_tensor
+    :noindex:
+
+.. _api_fluid_create_random_int_lodtensor:
+
+create_random_int_lodtensor
+---------------------------
+
+..  autofunction:: paddle.fluid.create_random_int_lodtensor
+    :noindex:
+
+.. _api_fluid_LoDTensor:
+
+LoDTensor
+---------
+
+..  autoclass:: paddle.fluid.LoDTensor
+    :members:
+    :noindex:
+
+.. _api_fluid_CPUPlace:
+
+CPUPlace
+--------
+
+..  autoclass:: paddle.fluid.CPUPlace
+    :members:
+    :noindex:
+
+.. _api_fluid_CUDAPlace:
+
+CUDAPlace
+---------
+
+..  autoclass:: paddle.fluid.CUDAPlace
+    :members:
+    :noindex:
+
+.. _api_fluid_CUDAPinnedPlace:
+
+CUDAPinnedPlace
+---------------
+
+..  autoclass:: paddle.fluid.CUDAPinnedPlace
+    :members:
+    :noindex:
+
+.. _api_fluid_Tensor:
+
+Tensor
+------
+
+..  autoclass:: paddle.fluid.Tensor
+    :members:
+    :noindex:
+
+.. _api_fluid_ParamAttr:
+
+ParamAttr
+---------
+
+..  autoclass:: paddle.fluid.ParamAttr
+    :members:
+    :noindex:
+
+.. _api_fluid_WeightNormParamAttr:
+
+WeightNormParamAttr
+-------------------
+
+..  autoclass:: paddle.fluid.WeightNormParamAttr
+    :members:
+    :noindex:
+
+.. _api_fluid_DataFeeder:
+
+DataFeeder
+----------
+
+..  autoclass:: paddle.fluid.DataFeeder
+    :members:
+    :noindex:
+
+.. _api_fluid_Scope:
+
+Scope
+-----
+
+..  autoclass:: paddle.fluid.Scope
+    :members:
+    :noindex:
+
--- a/doc/fluid/api/gen_doc.py
+++ b/doc/fluid/api/gen_doc.py
+#   Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import print_function
+import argparse
+import sys
+import types
+
+import paddle.fluid as fluid
+
+
+def parse_arg():
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--submodules', nargs="*")
+    parser.add_argument(
+        'module', type=str, help='Generate the documentation of which module')
+    return parser.parse_args()
+
+
+class DocGenerator(object):
+    def __init__(self, module_name=None, stream=sys.stdout):
+        if module_name == "":
+            module_name = None
+        self.stream = stream
+        if module_name is None:
+            self.module_name = "fluid"
+        else:
+            self.module_name = "fluid." + module_name
+        if module_name is None:
+            self.module = fluid
+        else:
+            if not hasattr(fluid, module_name):
+                raise ValueError("Cannot find fluid.{0}".format(module_name))
+            else:
+                self.module = getattr(fluid, module_name)
+        self.stream.write('''..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
+    !DO NOT EDIT THIS FILE MANUALLY!
+
+''')
+
+        self._print_header_(self.module_name, dot='=', is_title=True)
+
+    def print_submodule(self, submodule_name):
+        submodule = getattr(self.module, submodule_name)
+        if submodule is None:
+            raise ValueError("Cannot find submodule {0}".format(submodule_name))
+        self.print_section(submodule_name)
+
+        for item in submodule.__all__:
+            self.print_item(item)
+
+    def print_current_module(self):
+        for item in self.module.__all__:
+            self.print_item(item)
+
+    def print_section(self, name):
+        self._print_header_(name, dot='=', is_title=False)
+
+    def print_item(self, name):
+        item = getattr(self.module, name, None)
+        if item is None:
+            return
+        if isinstance(item, types.TypeType):
+            self.print_class(name)
+        elif isinstance(item, types.FunctionType):
+            self.print_method(name)
+        else:
+            pass
+
+    def print_class(self, name):
+        self._print_ref_(name)
+        self._print_header_(name, dot='-', is_title=False)
+        self.stream.write('''..  autoclass:: paddle.{0}.{1}
+    :members:
+    :noindex:
+
+'''.format(self.module_name, name))
+
+    def print_method(self, name):
+        self._print_ref_(name)
+        self._print_header_(name, dot='-', is_title=False)
+        self.stream.write('''..  autofunction:: paddle.{0}.{1}
+    :noindex:
+
+'''.format(self.module_name, name))
+
+    def _print_header_(self, name, dot, is_title):
+        dot_line = dot * len(name)
+        if is_title:
+            self.stream.write(dot_line)
+            self.stream.write('\n')
+        self.stream.write(name)
+        self.stream.write('\n')
+        self.stream.write(dot_line)
+        self.stream.write('\n')
+        self.stream.write('\n')
+
+    def _print_ref_(self, name):
+        self.stream.write(".. _api_{0}_{1}:\n\n".format("_".join(
+            self.module_name.split(".")), name))
+
+
+def main():
+    args = parse_arg()
+    gen = DocGenerator(args.module)
+    if args.submodules is None:
+        gen.print_current_module()
+    else:
+        for submodule_name in args.submodules:
+            gen.print_submodule(submodule_name)
+
+
+if __name__ == '__main__':
+    main()
--- a/source/api_reference/gen_doc.sh
+++ b/source/api_reference/gen_doc.sh
--- a/source/api_reference/index.rst
+++ b/source/api_reference/index.rst
--- a/doc/fluid/api/initializer.rst
+++ b/doc/fluid/api/initializer.rst
+..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
+    !DO NOT EDIT THIS FILE MANUALLY!
+
+=================
+fluid.initializer
+=================
+
+.. _api_fluid_initializer_Constant:
+
+Constant
+--------
+
+..  autoclass:: paddle.fluid.initializer.Constant
+    :members:
+    :noindex:
+
+.. _api_fluid_initializer_Uniform:
+
+Uniform
+-------
+
+..  autoclass:: paddle.fluid.initializer.Uniform
+    :members:
+    :noindex:
+
+.. _api_fluid_initializer_Normal:
+
+Normal
+------
+
+..  autoclass:: paddle.fluid.initializer.Normal
+    :members:
+    :noindex:
+
+.. _api_fluid_initializer_Xavier:
+
+Xavier
+------
+
+..  autoclass:: paddle.fluid.initializer.Xavier
+    :members:
+    :noindex:
+
+.. _api_fluid_initializer_Bilinear:
+
+Bilinear
+--------
+
+..  autoclass:: paddle.fluid.initializer.Bilinear
+    :members:
+    :noindex:
+
+.. _api_fluid_initializer_MSRA:
+
+MSRA
+----
+
+..  autoclass:: paddle.fluid.initializer.MSRA
+    :members:
+    :noindex:
+
+.. _api_fluid_initializer_force_init_on_cpu:
+
+force_init_on_cpu
+-----------------
+
+..  autofunction:: paddle.fluid.initializer.force_init_on_cpu
+    :noindex:
+
+.. _api_fluid_initializer_init_on_cpu:
+
+init_on_cpu
+-----------
+
+..  autofunction:: paddle.fluid.initializer.init_on_cpu
+    :noindex:
+
+.. _api_fluid_initializer_ConstantInitializer:
+
+ConstantInitializer
+-------------------
+
+..  autoclass:: paddle.fluid.initializer.ConstantInitializer
+    :members:
+    :noindex:
+
+.. _api_fluid_initializer_UniformInitializer:
+
+UniformInitializer
+------------------
+
+..  autoclass:: paddle.fluid.initializer.UniformInitializer
+    :members:
+    :noindex:
+
+.. _api_fluid_initializer_NormalInitializer:
+
+NormalInitializer
+-----------------
+
+..  autoclass:: paddle.fluid.initializer.NormalInitializer
+    :members:
+    :noindex:
+
+.. _api_fluid_initializer_XavierInitializer:
+
+XavierInitializer
+-----------------
+
+..  autoclass:: paddle.fluid.initializer.XavierInitializer
+    :members:
+    :noindex:
+
+.. _api_fluid_initializer_BilinearInitializer:
+
+BilinearInitializer
+-------------------
+
+..  autoclass:: paddle.fluid.initializer.BilinearInitializer
+    :members:
+    :noindex:
+
+.. _api_fluid_initializer_MSRAInitializer:
+
+MSRAInitializer
+---------------
+
+..  autoclass:: paddle.fluid.initializer.MSRAInitializer
+    :members:
+    :noindex:
+
--- a/doc/fluid/api/io.rst
+++ b/doc/fluid/api/io.rst
+..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
+    !DO NOT EDIT THIS FILE MANUALLY!
+
+========
+fluid.io
+========
+
+.. _api_fluid_io_save_vars:
+
+save_vars
+---------
+
+..  autofunction:: paddle.fluid.io.save_vars
+    :noindex:
+
+.. _api_fluid_io_save_params:
+
+save_params
+-----------
+
+..  autofunction:: paddle.fluid.io.save_params
+    :noindex:
+
+.. _api_fluid_io_save_persistables:
+
+save_persistables
+-----------------
+
+..  autofunction:: paddle.fluid.io.save_persistables
+    :noindex:
+
+.. _api_fluid_io_load_vars:
+
+load_vars
+---------
+
+..  autofunction:: paddle.fluid.io.load_vars
+    :noindex:
+
+.. _api_fluid_io_load_params:
+
+load_params
+-----------
+
+..  autofunction:: paddle.fluid.io.load_params
+    :noindex:
+
+.. _api_fluid_io_load_persistables:
+
+load_persistables
+-----------------
+
+..  autofunction:: paddle.fluid.io.load_persistables
+    :noindex:
+
+.. _api_fluid_io_save_inference_model:
+
+save_inference_model
+--------------------
+
+..  autofunction:: paddle.fluid.io.save_inference_model
+    :noindex:
+
+.. _api_fluid_io_load_inference_model:
+
+load_inference_model
+--------------------
+
+..  autofunction:: paddle.fluid.io.load_inference_model
+    :noindex:
+
+.. _api_fluid_io_get_inference_program:
+
+get_inference_program
+---------------------
+
+..  autofunction:: paddle.fluid.io.get_inference_program
+    :noindex:
+
+.. _api_fluid_io_save_checkpoint:
+
+save_checkpoint
+---------------
+
+..  autofunction:: paddle.fluid.io.save_checkpoint
+    :noindex:
+
+.. _api_fluid_io_load_checkpoint:
+
+load_checkpoint
+---------------
+
+..  autofunction:: paddle.fluid.io.load_checkpoint
+    :noindex:
+
+.. _api_fluid_io_clean_checkpoint:
+
+clean_checkpoint
+----------------
+
+..  autofunction:: paddle.fluid.io.clean_checkpoint
+    :noindex:
+
+.. _api_fluid_io_load_persist_vars_without_grad:
+
+load_persist_vars_without_grad
+------------------------------
+
+..  autofunction:: paddle.fluid.io.load_persist_vars_without_grad
+    :noindex:
+
+.. _api_fluid_io_save_persist_vars_without_grad:
+
+save_persist_vars_without_grad
+------------------------------
+
+..  autofunction:: paddle.fluid.io.save_persist_vars_without_grad
+    :noindex:
+
+.. _api_fluid_io_get_latest_checkpoint_serial:
+
+get_latest_checkpoint_serial
+----------------------------
+
+..  autofunction:: paddle.fluid.io.get_latest_checkpoint_serial
+    :noindex:
+
--- a/doc/fluid/api/layers.rst
+++ b/doc/fluid/api/layers.rst
--- a/doc/fluid/api/metrics.rst
+++ b/doc/fluid/api/metrics.rst
+..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
+    !DO NOT EDIT THIS FILE MANUALLY!
+
+=============
+fluid.metrics
+=============
+
+.. _api_fluid_metrics_MetricBase:
+
+MetricBase
+----------
+
+..  autoclass:: paddle.fluid.metrics.MetricBase
+    :members:
+    :noindex:
+
+.. _api_fluid_metrics_CompositeMetric:
+
+CompositeMetric
+---------------
+
+..  autoclass:: paddle.fluid.metrics.CompositeMetric
+    :members:
+    :noindex:
+
+.. _api_fluid_metrics_Precision:
+
+Precision
+---------
+
+..  autoclass:: paddle.fluid.metrics.Precision
+    :members:
+    :noindex:
+
+.. _api_fluid_metrics_Recall:
+
+Recall
+------
+
+..  autoclass:: paddle.fluid.metrics.Recall
+    :members:
+    :noindex:
+
+.. _api_fluid_metrics_Accuracy:
+
+Accuracy
+--------
+
+..  autoclass:: paddle.fluid.metrics.Accuracy
+    :members:
+    :noindex:
+
+.. _api_fluid_metrics_ChunkEvaluator:
+
+ChunkEvaluator
+--------------
+
+..  autoclass:: paddle.fluid.metrics.ChunkEvaluator
+    :members:
+    :noindex:
+
+.. _api_fluid_metrics_EditDistance:
+
+EditDistance
+------------
+
+..  autoclass:: paddle.fluid.metrics.EditDistance
+    :members:
+    :noindex:
+
+.. _api_fluid_metrics_DetectionMAP:
+
+DetectionMAP
+------------
+
+..  autoclass:: paddle.fluid.metrics.DetectionMAP
+    :members:
+    :noindex:
+
+.. _api_fluid_metrics_Auc:
+
+Auc
+---
+
+..  autoclass:: paddle.fluid.metrics.Auc
+    :members:
+    :noindex:
+
--- a/doc/fluid/api/nets.rst
+++ b/doc/fluid/api/nets.rst
--- a/doc/fluid/api/optimizer.rst
+++ b/doc/fluid/api/optimizer.rst
--- a/doc/fluid/api/param_attr.rst
+++ b/doc/fluid/api/param_attr.rst
+..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
+    !DO NOT EDIT THIS FILE MANUALLY!
+
+================
+fluid.param_attr
+================
+
+.. _api_fluid_param_attr_ParamAttr:
+
+ParamAttr
+---------
+
+..  autoclass:: paddle.fluid.param_attr.ParamAttr
+    :members:
+    :noindex:
+
+.. _api_fluid_param_attr_WeightNormParamAttr:
+
+WeightNormParamAttr
+-------------------
+
+..  autoclass:: paddle.fluid.param_attr.WeightNormParamAttr
+    :members:
+    :noindex:
+
--- a/doc/fluid/api/profiler.rst
+++ b/doc/fluid/api/profiler.rst
--- a/doc/fluid/api/recordio_writer.rst
+++ b/doc/fluid/api/recordio_writer.rst
--- a/doc/fluid/api/regularizer.rst
+++ b/doc/fluid/api/regularizer.rst
--- a/doc/fluid/api/transpiler.rst
+++ b/doc/fluid/api/transpiler.rst
--- a/source/beginners_guide/basics/image_classification/.gitignore
+++ b/source/beginners_guide/basics/image_classification/.gitignore
--- a/source/beginners_guide/basics/image_classification/image/dog.png
+++ b/source/beginners_guide/basics/image_classification/image/dog.png
--- a/source/beginners_guide/basics/image_classification/image/dog_cat.png
+++ b/source/beginners_guide/basics/image_classification/image/dog_cat.png
--- a/source/beginners_guide/basics/image_classification/image/fea_conv0.png
+++ b/source/beginners_guide/basics/image_classification/image/fea_conv0.png
--- a/source/beginners_guide/basics/image_classification/image/flowers.png
+++ b/source/beginners_guide/basics/image_classification/image/flowers.png
--- a/source/beginners_guide/basics/image_classification/image/googlenet.jpeg
+++ b/source/beginners_guide/basics/image_classification/image/googlenet.jpeg
--- a/source/beginners_guide/basics/image_classification/image/ilsvrc.png
+++ b/source/beginners_guide/basics/image_classification/image/ilsvrc.png
--- a/source/beginners_guide/basics/image_classification/image/inception.png
+++ b/source/beginners_guide/basics/image_classification/image/inception.png
--- a/source/beginners_guide/basics/image_classification/image/lenet.png
+++ b/source/beginners_guide/basics/image_classification/image/lenet.png
--- a/source/beginners_guide/basics/image_classification/image/plot.png
+++ b/source/beginners_guide/basics/image_classification/image/plot.png
--- a/source/beginners_guide/basics/image_classification/image/resnet.png
+++ b/source/beginners_guide/basics/image_classification/image/resnet.png
--- a/source/beginners_guide/basics/image_classification/image/resnet_block.jpg
+++ b/source/beginners_guide/basics/image_classification/image/resnet_block.jpg
--- a/source/beginners_guide/basics/image_classification/image/train_and_test.png
+++ b/source/beginners_guide/basics/image_classification/image/train_and_test.png
--- a/source/beginners_guide/basics/image_classification/image/vgg16.png
+++ b/source/beginners_guide/basics/image_classification/image/vgg16.png
--- a/doc/fluid/beginners_guide/basics/image_classification/index.md
+++ b/doc/fluid/beginners_guide/basics/image_classification/index.md
+../../../../../external/book/03.image_classification/README.cn.md
\ No newline at end of file
--- a/source/beginners_guide/basics/index.rst
+++ b/source/beginners_guide/basics/index.rst
--- a/source/beginners_guide/basics/label_semantic_roles/.gitignore
+++ b/source/beginners_guide/basics/label_semantic_roles/.gitignore
--- a/source/beginners_guide/basics/label_semantic_roles/image/bidirectional_stacked_lstm.png
+++ b/source/beginners_guide/basics/label_semantic_roles/image/bidirectional_stacked_lstm.png
--- a/source/beginners_guide/basics/label_semantic_roles/image/bidirectional_stacked_lstm_en.png
+++ b/source/beginners_guide/basics/label_semantic_roles/image/bidirectional_stacked_lstm_en.png
--- a/source/beginners_guide/basics/label_semantic_roles/image/bio_example.png
+++ b/source/beginners_guide/basics/label_semantic_roles/image/bio_example.png
--- a/source/beginners_guide/basics/label_semantic_roles/image/bio_example_en.png
+++ b/source/beginners_guide/basics/label_semantic_roles/image/bio_example_en.png
--- a/source/beginners_guide/basics/label_semantic_roles/image/db_lstm_network.png
+++ b/source/beginners_guide/basics/label_semantic_roles/image/db_lstm_network.png
--- a/source/beginners_guide/basics/label_semantic_roles/image/db_lstm_network_en.png
+++ b/source/beginners_guide/basics/label_semantic_roles/image/db_lstm_network_en.png
--- a/source/beginners_guide/basics/label_semantic_roles/image/dependency_parsing.png
+++ b/source/beginners_guide/basics/label_semantic_roles/image/dependency_parsing.png
--- a/source/beginners_guide/basics/label_semantic_roles/image/dependency_parsing_en.png
+++ b/source/beginners_guide/basics/label_semantic_roles/image/dependency_parsing_en.png
--- a/source/beginners_guide/basics/label_semantic_roles/image/linear_chain_crf.png
+++ b/source/beginners_guide/basics/label_semantic_roles/image/linear_chain_crf.png
--- a/source/beginners_guide/basics/label_semantic_roles/image/stacked_lstm.png
+++ b/source/beginners_guide/basics/label_semantic_roles/image/stacked_lstm.png
--- a/source/beginners_guide/basics/label_semantic_roles/image/stacked_lstm_en.png
+++ b/source/beginners_guide/basics/label_semantic_roles/image/stacked_lstm_en.png
--- a/doc/fluid/beginners_guide/basics/label_semantic_roles/index.md
+++ b/doc/fluid/beginners_guide/basics/label_semantic_roles/index.md
+../../../../../external/book/07.label_semantic_roles/README.cn.md
\ No newline at end of file
--- a/source/beginners_guide/basics/learning_materials.md
+++ b/source/beginners_guide/basics/learning_materials.md
--- a/source/beginners_guide/basics/machine_translation/.gitignore
+++ b/source/beginners_guide/basics/machine_translation/.gitignore
--- a/source/beginners_guide/basics/machine_translation/image/bi_rnn.png
+++ b/source/beginners_guide/basics/machine_translation/image/bi_rnn.png
--- a/source/beginners_guide/basics/machine_translation/image/bi_rnn_en.png
+++ b/source/beginners_guide/basics/machine_translation/image/bi_rnn_en.png
--- a/source/beginners_guide/basics/machine_translation/image/decoder_attention.png
+++ b/source/beginners_guide/basics/machine_translation/image/decoder_attention.png
--- a/source/beginners_guide/basics/machine_translation/image/decoder_attention_en.png
+++ b/source/beginners_guide/basics/machine_translation/image/decoder_attention_en.png
--- a/source/beginners_guide/basics/machine_translation/image/encoder_attention.png
+++ b/source/beginners_guide/basics/machine_translation/image/encoder_attention.png
--- a/source/beginners_guide/basics/machine_translation/image/encoder_attention_en.png
+++ b/source/beginners_guide/basics/machine_translation/image/encoder_attention_en.png
--- a/source/beginners_guide/basics/machine_translation/image/encoder_decoder.png
+++ b/source/beginners_guide/basics/machine_translation/image/encoder_decoder.png
--- a/source/beginners_guide/basics/machine_translation/image/encoder_decoder_en.png
+++ b/source/beginners_guide/basics/machine_translation/image/encoder_decoder_en.png
--- a/source/beginners_guide/basics/machine_translation/image/gru.png
+++ b/source/beginners_guide/basics/machine_translation/image/gru.png
--- a/source/beginners_guide/basics/machine_translation/image/gru_en.png
+++ b/source/beginners_guide/basics/machine_translation/image/gru_en.png
--- a/source/beginners_guide/basics/machine_translation/image/nmt.png
+++ b/source/beginners_guide/basics/machine_translation/image/nmt.png
--- a/source/beginners_guide/basics/machine_translation/image/nmt_en.png
+++ b/source/beginners_guide/basics/machine_translation/image/nmt_en.png
--- a/doc/fluid/beginners_guide/basics/machine_translation/index.md
+++ b/doc/fluid/beginners_guide/basics/machine_translation/index.md
--- a/source/beginners_guide/basics/recommender_system/.gitignore
+++ b/source/beginners_guide/basics/recommender_system/.gitignore
--- a/source/beginners_guide/basics/recommender_system/image/Deep_candidate_generation_model_architecture.en.png
+++ b/source/beginners_guide/basics/recommender_system/image/Deep_candidate_generation_model_architecture.en.png
--- a/source/beginners_guide/basics/recommender_system/image/Deep_candidate_generation_model_architecture.png
+++ b/source/beginners_guide/basics/recommender_system/image/Deep_candidate_generation_model_architecture.png
--- a/source/beginners_guide/basics/recommender_system/image/YouTube_Overview.en.png
+++ b/source/beginners_guide/basics/recommender_system/image/YouTube_Overview.en.png
--- a/source/beginners_guide/basics/recommender_system/image/YouTube_Overview.png
+++ b/source/beginners_guide/basics/recommender_system/image/YouTube_Overview.png
--- a/source/beginners_guide/basics/recommender_system/image/output_32_0.png
+++ b/source/beginners_guide/basics/recommender_system/image/output_32_0.png
--- a/source/beginners_guide/basics/recommender_system/image/rec_regression_network.png
+++ b/source/beginners_guide/basics/recommender_system/image/rec_regression_network.png
--- a/source/beginners_guide/basics/recommender_system/image/rec_regression_network_en.png
+++ b/source/beginners_guide/basics/recommender_system/image/rec_regression_network_en.png
--- a/source/beginners_guide/basics/recommender_system/image/text_cnn.png
+++ b/source/beginners_guide/basics/recommender_system/image/text_cnn.png
--- a/source/beginners_guide/basics/recommender_system/image/text_cnn_en.png
+++ b/source/beginners_guide/basics/recommender_system/image/text_cnn_en.png
--- a/doc/fluid/beginners_guide/basics/recommender_system/index.md
+++ b/doc/fluid/beginners_guide/basics/recommender_system/index.md
--- a/source/beginners_guide/basics/understand_sentiment/.gitignore
+++ b/source/beginners_guide/basics/understand_sentiment/.gitignore
--- a/source/beginners_guide/basics/understand_sentiment/image/lstm.png
+++ b/source/beginners_guide/basics/understand_sentiment/image/lstm.png
--- a/source/beginners_guide/basics/understand_sentiment/image/lstm_en.png
+++ b/source/beginners_guide/basics/understand_sentiment/image/lstm_en.png
--- a/source/beginners_guide/basics/understand_sentiment/image/rnn.png
+++ b/source/beginners_guide/basics/understand_sentiment/image/rnn.png
--- a/source/beginners_guide/basics/understand_sentiment/image/stacked_lstm.jpg
+++ b/source/beginners_guide/basics/understand_sentiment/image/stacked_lstm.jpg
--- a/source/beginners_guide/basics/understand_sentiment/image/stacked_lstm_en.png
+++ b/source/beginners_guide/basics/understand_sentiment/image/stacked_lstm_en.png
--- a/doc/fluid/beginners_guide/basics/understand_sentiment/index.md
+++ b/doc/fluid/beginners_guide/basics/understand_sentiment/index.md
--- a/source/beginners_guide/basics/word2vec/.gitignore
+++ b/source/beginners_guide/basics/word2vec/.gitignore
--- a/source/beginners_guide/basics/word2vec/image/2d_similarity.png
+++ b/source/beginners_guide/basics/word2vec/image/2d_similarity.png
--- a/source/beginners_guide/basics/word2vec/image/cbow.png
+++ b/source/beginners_guide/basics/word2vec/image/cbow.png
--- a/source/beginners_guide/basics/word2vec/image/cbow_en.png
+++ b/source/beginners_guide/basics/word2vec/image/cbow_en.png
--- a/source/beginners_guide/basics/word2vec/image/ngram.en.png
+++ b/source/beginners_guide/basics/word2vec/image/ngram.en.png
--- a/source/beginners_guide/basics/word2vec/image/ngram.png
+++ b/source/beginners_guide/basics/word2vec/image/ngram.png
--- a/source/beginners_guide/basics/word2vec/image/nnlm.png
+++ b/source/beginners_guide/basics/word2vec/image/nnlm.png
--- a/source/beginners_guide/basics/word2vec/image/nnlm_en.png
+++ b/source/beginners_guide/basics/word2vec/image/nnlm_en.png
--- a/source/beginners_guide/basics/word2vec/image/sentence_emb.png
+++ b/source/beginners_guide/basics/word2vec/image/sentence_emb.png
--- a/source/beginners_guide/basics/word2vec/image/skipgram.png
+++ b/source/beginners_guide/basics/word2vec/image/skipgram.png
--- a/source/beginners_guide/basics/word2vec/image/skipgram_en.png
+++ b/source/beginners_guide/basics/word2vec/image/skipgram_en.png
--- a/doc/fluid/beginners_guide/basics/word2vec/index.md
+++ b/doc/fluid/beginners_guide/basics/word2vec/index.md
--- a/source/beginners_guide/index.rst
+++ b/source/beginners_guide/index.rst
--- a/doc/fluid/beginners_guide/install/install_doc.rst
+++ b/doc/fluid/beginners_guide/install/install_doc.rst
--- a/source/beginners_guide/install/paddleci.png
+++ b/source/beginners_guide/install/paddleci.png
--- a/doc/fluid/beginners_guide/quick_start/fit_a_line/README.cn.md
+++ b/doc/fluid/beginners_guide/quick_start/fit_a_line/README.cn.md
--- a/source/beginners_guide/quick_start/fit_a_line/image/predictions.png
+++ b/source/beginners_guide/quick_start/fit_a_line/image/predictions.png
--- a/source/beginners_guide/quick_start/fit_a_line/image/ranges.png
+++ b/source/beginners_guide/quick_start/fit_a_line/image/ranges.png
--- a/source/beginners_guide/quick_start/fit_a_line/image/train_and_test.png
+++ b/source/beginners_guide/quick_start/fit_a_line/image/train_and_test.png
--- a/source/beginners_guide/quick_start/index.rst
+++ b/source/beginners_guide/quick_start/index.rst
--- a/doc/fluid/beginners_guide/quick_start/recognize_digits/README.cn.md
+++ b/doc/fluid/beginners_guide/quick_start/recognize_digits/README.cn.md
--- a/source/beginners_guide/quick_start/recognize_digits/image/cnn.png
+++ b/source/beginners_guide/quick_start/recognize_digits/image/cnn.png
--- a/source/beginners_guide/quick_start/recognize_digits/image/cnn_train_log.png
+++ b/source/beginners_guide/quick_start/recognize_digits/image/cnn_train_log.png
--- a/source/beginners_guide/quick_start/recognize_digits/image/infer_3.png
+++ b/source/beginners_guide/quick_start/recognize_digits/image/infer_3.png
--- a/source/beginners_guide/quick_start/recognize_digits/image/max_pooling.png
+++ b/source/beginners_guide/quick_start/recognize_digits/image/max_pooling.png
--- a/source/beginners_guide/quick_start/recognize_digits/image/mlp.png
+++ b/source/beginners_guide/quick_start/recognize_digits/image/mlp.png
--- a/source/beginners_guide/quick_start/recognize_digits/image/mlp_train_log.png
+++ b/source/beginners_guide/quick_start/recognize_digits/image/mlp_train_log.png
--- a/source/beginners_guide/quick_start/recognize_digits/image/mnist_example_image.png
+++ b/source/beginners_guide/quick_start/recognize_digits/image/mnist_example_image.png
--- a/source/beginners_guide/quick_start/recognize_digits/image/softmax_regression.png
+++ b/source/beginners_guide/quick_start/recognize_digits/image/softmax_regression.png
--- a/source/beginners_guide/quick_start/recognize_digits/image/softmax_train_log.png
+++ b/source/beginners_guide/quick_start/recognize_digits/image/softmax_train_log.png
--- a/source/beginners_guide/quick_start/recognize_digits/image/train_and_test.png
+++ b/source/beginners_guide/quick_start/recognize_digits/image/train_and_test.png
--- a/doc/fluid/build_and_install/build_from_source_cn.rst
+++ b/doc/fluid/build_and_install/build_from_source_cn.rst
--- a/doc/fluid/build_and_install/build_from_source_en.rst
+++ b/doc/fluid/build_and_install/build_from_source_en.rst
--- a/doc/fluid/build_and_install/docker_install_cn.rst
+++ b/doc/fluid/build_and_install/docker_install_cn.rst
--- a/doc/fluid/build_and_install/docker_install_en.rst
+++ b/doc/fluid/build_and_install/docker_install_en.rst
--- a/doc/fluid/build_and_install/index_cn.rst
+++ b/doc/fluid/build_and_install/index_cn.rst
--- a/doc/fluid/build_and_install/index_en.rst
+++ b/doc/fluid/build_and_install/index_en.rst
--- a/doc/fluid/build_and_install/paddleci.png
+++ b/doc/fluid/build_and_install/paddleci.png
--- a/doc/fluid/build_and_install/pip_install_cn.rst
+++ b/doc/fluid/build_and_install/pip_install_cn.rst
--- a/doc/fluid/build_and_install/pip_install_en.rst
+++ b/doc/fluid/build_and_install/pip_install_en.rst
--- a/doc/fluid/design/algorithm/images/asgd.gif
+++ b/doc/fluid/design/algorithm/images/asgd.gif
--- a/doc/fluid/design/algorithm/images/theta_star.gif
+++ b/doc/fluid/design/algorithm/images/theta_star.gif
--- a/doc/fluid/design/algorithm/index_cn.rst
+++ b/doc/fluid/design/algorithm/index_cn.rst
--- a/doc/fluid/design/algorithm/index_en.rst
+++ b/doc/fluid/design/algorithm/index_en.rst
--- a/doc/fluid/design/algorithm/parameter_average.md
+++ b/doc/fluid/design/algorithm/parameter_average.md
--- a/doc/fluid/design/concepts/README.md
+++ b/doc/fluid/design/concepts/README.md
--- a/doc/fluid/design/concepts/block.md
+++ b/doc/fluid/design/concepts/block.md
--- a/doc/fluid/design/concepts/cpp_data_feeding.md
+++ b/doc/fluid/design/concepts/cpp_data_feeding.md
--- a/doc/fluid/design/concepts/executor.md
+++ b/doc/fluid/design/concepts/executor.md
--- a/doc/fluid/design/concepts/functions_operators_layers.md
+++ b/doc/fluid/design/concepts/functions_operators_layers.md
--- a/doc/fluid/design/concepts/images/multiple_reader.png
+++ b/doc/fluid/design/concepts/images/multiple_reader.png
--- a/doc/fluid/design/concepts/images/parallel_executor_overview.dot
+++ b/doc/fluid/design/concepts/images/parallel_executor_overview.dot
--- a/doc/fluid/design/concepts/images/parallel_executor_overview.png
+++ b/doc/fluid/design/concepts/images/parallel_executor_overview.png
--- a/doc/fluid/design/concepts/images/readers.png
+++ b/doc/fluid/design/concepts/images/readers.png
--- a/doc/fluid/design/concepts/index_cn.rst
+++ b/doc/fluid/design/concepts/index_cn.rst
--- a/doc/fluid/design/concepts/index_en.rst
+++ b/doc/fluid/design/concepts/index_en.rst
--- a/doc/fluid/design/concepts/lod_tensor.md
+++ b/doc/fluid/design/concepts/lod_tensor.md
--- a/doc/fluid/design/concepts/parallel_executor.md
+++ b/doc/fluid/design/concepts/parallel_executor.md
--- a/doc/fluid/design/concepts/program.md
+++ b/doc/fluid/design/concepts/program.md
--- a/doc/fluid/design/concepts/python_data_feeding.md
+++ b/doc/fluid/design/concepts/python_data_feeding.md
--- a/doc/fluid/design/concepts/scope.md
+++ b/doc/fluid/design/concepts/scope.md
--- a/doc/fluid/design/concepts/tensor.md
+++ b/doc/fluid/design/concepts/tensor.md
--- a/doc/fluid/design/concepts/tensor_array.md
+++ b/doc/fluid/design/concepts/tensor_array.md
--- a/doc/fluid/design/concepts/var_desc.md
+++ b/doc/fluid/design/concepts/var_desc.md
--- a/doc/fluid/design/concepts/variable.md
+++ b/doc/fluid/design/concepts/variable.md
--- a/doc/fluid/design/concurrent/channel.md
+++ b/doc/fluid/design/concurrent/channel.md
--- a/doc/fluid/design/concurrent/concurrent_programming.md
+++ b/doc/fluid/design/concurrent/concurrent_programming.md
--- a/doc/fluid/design/concurrent/csp.md
+++ b/doc/fluid/design/concurrent/csp.md
--- a/doc/fluid/design/concurrent/go_op.md
+++ b/doc/fluid/design/concurrent/go_op.md
--- a/doc/fluid/design/concurrent/images/channel_recv.png
+++ b/doc/fluid/design/concurrent/images/channel_recv.png
--- a/doc/fluid/design/concurrent/images/channel_send.png
+++ b/doc/fluid/design/concurrent/images/channel_send.png
--- a/doc/fluid/design/concurrent/images/select_op_workflow.png
+++ b/doc/fluid/design/concurrent/images/select_op_workflow.png
--- a/doc/fluid/design/concurrent/index_cn.rst
+++ b/doc/fluid/design/concurrent/index_cn.rst
--- a/doc/fluid/design/concurrent/index_en.rst
+++ b/doc/fluid/design/concurrent/index_en.rst
--- a/doc/fluid/design/concurrent/parallel_do.md
+++ b/doc/fluid/design/concurrent/parallel_do.md
--- a/doc/fluid/design/concurrent/select_op.md
+++ b/doc/fluid/design/concurrent/select_op.md
--- a/doc/fluid/design/data_type/float16.md
+++ b/doc/fluid/design/data_type/float16.md
--- a/doc/fluid/design/data_type/index_cn.rst
+++ b/doc/fluid/design/data_type/index_cn.rst
--- a/doc/fluid/design/data_type/index_en.rst
+++ b/doc/fluid/design/data_type/index_en.rst
--- a/doc/fluid/design/dist_train/README.md
+++ b/doc/fluid/design/dist_train/README.md
--- a/doc/fluid/design/dist_train/async_update.md
+++ b/doc/fluid/design/dist_train/async_update.md
--- a/doc/fluid/design/dist_train/dist_train_nccl2.md
+++ b/doc/fluid/design/dist_train/dist_train_nccl2.md
--- a/doc/fluid/design/dist_train/distributed_architecture.md
+++ b/doc/fluid/design/dist_train/distributed_architecture.md
--- a/doc/fluid/design/dist_train/distributed_lookup_table_design.md
+++ b/doc/fluid/design/dist_train/distributed_lookup_table_design.md
--- a/doc/fluid/design/dist_train/distributed_traing_review.md
+++ b/doc/fluid/design/dist_train/distributed_traing_review.md
--- a/doc/fluid/design/dist_train/index_cn.rst
+++ b/doc/fluid/design/dist_train/index_cn.rst
--- a/doc/fluid/design/dist_train/index_en.rst
+++ b/doc/fluid/design/dist_train/index_en.rst
--- a/doc/fluid/design/dist_train/mpi_enabled_design.md
+++ b/doc/fluid/design/dist_train/mpi_enabled_design.md
--- a/doc/fluid/design/dist_train/multi_cpu.md
+++ b/doc/fluid/design/dist_train/multi_cpu.md
--- a/doc/fluid/design/dist_train/parameter_server.md
+++ b/doc/fluid/design/dist_train/parameter_server.md
--- a/doc/fluid/design/dist_train/src/async_distributed_training.png
+++ b/doc/fluid/design/dist_train/src/async_distributed_training.png
--- a/doc/fluid/design/dist_train/src/async_pserver.graffle
+++ b/doc/fluid/design/dist_train/src/async_pserver.graffle
--- a/doc/fluid/design/dist_train/src/async_pserver.png
+++ b/doc/fluid/design/dist_train/src/async_pserver.png
--- a/doc/fluid/design/dist_train/src/async_update.graffle
+++ b/doc/fluid/design/dist_train/src/async_update.graffle
--- a/doc/fluid/design/dist_train/src/async_update.png
+++ b/doc/fluid/design/dist_train/src/async_update.png
--- a/doc/fluid/design/dist_train/src/compiler.graffle
+++ b/doc/fluid/design/dist_train/src/compiler.graffle
--- a/doc/fluid/design/dist_train/src/compiler.png
+++ b/doc/fluid/design/dist_train/src/compiler.png
--- a/doc/fluid/design/dist_train/src/dist-graph.graffle
+++ b/doc/fluid/design/dist_train/src/dist-graph.graffle
--- a/doc/fluid/design/dist_train/src/dist-graph.png
+++ b/doc/fluid/design/dist_train/src/dist-graph.png
--- a/doc/fluid/design/dist_train/src/distributed_architecture.graffle
+++ b/doc/fluid/design/dist_train/src/distributed_architecture.graffle
--- a/doc/fluid/design/dist_train/src/distributed_architecture.png
+++ b/doc/fluid/design/dist_train/src/distributed_architecture.png
--- a/doc/fluid/design/dist_train/src/distributed_lookup_table.graffle
+++ b/doc/fluid/design/dist_train/src/distributed_lookup_table.graffle
--- a/doc/fluid/design/dist_train/src/distributed_lookup_table.jpeg
+++ b/doc/fluid/design/dist_train/src/distributed_lookup_table.jpeg
--- a/doc/fluid/design/dist_train/src/distributed_training.graffle
+++ b/doc/fluid/design/dist_train/src/distributed_training.graffle
--- a/doc/fluid/design/dist_train/src/fluid_lookup_remote_table.graffle
+++ b/doc/fluid/design/dist_train/src/fluid_lookup_remote_table.graffle
--- a/doc/fluid/design/dist_train/src/fluid_lookup_remote_table.png
+++ b/doc/fluid/design/dist_train/src/fluid_lookup_remote_table.png
--- a/doc/fluid/design/dist_train/src/local-graph.graffle
+++ b/doc/fluid/design/dist_train/src/local-graph.graffle
--- a/doc/fluid/design/dist_train/src/local-graph.png
+++ b/doc/fluid/design/dist_train/src/local-graph.png
--- a/doc/fluid/design/dist_train/src/local_architecture.graffle
+++ b/doc/fluid/design/dist_train/src/local_architecture.graffle
--- a/doc/fluid/design/dist_train/src/local_architecture.png
+++ b/doc/fluid/design/dist_train/src/local_architecture.png
--- a/doc/fluid/design/dist_train/src/lookup_table.png
+++ b/doc/fluid/design/dist_train/src/lookup_table.png
--- a/doc/fluid/design/dist_train/src/lookup_table_training.png
+++ b/doc/fluid/design/dist_train/src/lookup_table_training.png
--- a/doc/fluid/design/dist_train/src/mpi_module.png
+++ b/doc/fluid/design/dist_train/src/mpi_module.png
--- a/doc/fluid/design/dist_train/src/multi-threads.graffle
+++ b/doc/fluid/design/dist_train/src/multi-threads.graffle
--- a/doc/fluid/design/dist_train/src/multi-threads/multi-threads@3x.png
+++ b/doc/fluid/design/dist_train/src/multi-threads/multi-threads@3x.png
--- a/doc/fluid/design/dist_train/src/multi-threads/single-thread@3x.png
+++ b/doc/fluid/design/dist_train/src/multi-threads/single-thread@3x.png
--- a/doc/fluid/design/dist_train/src/ncc2_design.graffle
+++ b/doc/fluid/design/dist_train/src/ncc2_design.graffle
--- a/doc/fluid/design/dist_train/src/ncc2_design.png
+++ b/doc/fluid/design/dist_train/src/ncc2_design.png
--- a/doc/fluid/design/dist_train/src/paddle-compile.graffle
+++ b/doc/fluid/design/dist_train/src/paddle-compile.graffle
--- a/doc/fluid/design/dist_train/src/paddle-compile.png
+++ b/doc/fluid/design/dist_train/src/paddle-compile.png
--- a/doc/fluid/design/dist_train/src/remote_executor.graffle
+++ b/doc/fluid/design/dist_train/src/remote_executor.graffle
--- a/doc/fluid/design/dist_train/src/remote_executor.png
+++ b/doc/fluid/design/dist_train/src/remote_executor.png
--- a/doc/fluid/design/dist_train/src/sparse_update.graffle
+++ b/doc/fluid/design/dist_train/src/sparse_update.graffle
--- a/doc/fluid/design/dist_train/src/sparse_update.png
+++ b/doc/fluid/design/dist_train/src/sparse_update.png
--- a/doc/fluid/design/dist_train/src/sync_distributed_training.png
+++ b/doc/fluid/design/dist_train/src/sync_distributed_training.png
--- a/doc/fluid/design/dynamic_rnn/2_level_rnn.dot
+++ b/doc/fluid/design/dynamic_rnn/2_level_rnn.dot
--- a/doc/fluid/design/dynamic_rnn/2_level_rnn.png
+++ b/doc/fluid/design/dynamic_rnn/2_level_rnn.png
--- a/doc/fluid/design/dynamic_rnn/index_cn.rst
+++ b/doc/fluid/design/dynamic_rnn/index_cn.rst
--- a/doc/fluid/design/dynamic_rnn/index_en.rst
+++ b/doc/fluid/design/dynamic_rnn/index_en.rst
--- a/doc/fluid/design/dynamic_rnn/rnn.dot
+++ b/doc/fluid/design/dynamic_rnn/rnn.dot
--- a/doc/fluid/design/dynamic_rnn/rnn.jpg
+++ b/doc/fluid/design/dynamic_rnn/rnn.jpg
--- a/doc/fluid/design/dynamic_rnn/rnn.md
+++ b/doc/fluid/design/dynamic_rnn/rnn.md
--- a/doc/fluid/design/dynamic_rnn/rnn.png
+++ b/doc/fluid/design/dynamic_rnn/rnn.png
--- a/doc/fluid/design/dynamic_rnn/rnn_2level_data.dot
+++ b/doc/fluid/design/dynamic_rnn/rnn_2level_data.dot
--- a/doc/fluid/design/dynamic_rnn/rnn_2level_data.png
+++ b/doc/fluid/design/dynamic_rnn/rnn_2level_data.png
--- a/doc/fluid/design/dynamic_rnn/rnn_design.md
+++ b/doc/fluid/design/dynamic_rnn/rnn_design.md
--- a/doc/fluid/design/dynamic_rnn/rnn_design_en.md
+++ b/doc/fluid/design/dynamic_rnn/rnn_design_en.md
--- a/doc/fluid/design/execution/if_else_op.md
+++ b/doc/fluid/design/execution/if_else_op.md
--- a/doc/fluid/design/execution/index_cn.rst
+++ b/doc/fluid/design/execution/index_cn.rst
--- a/doc/fluid/design/execution/index_en.rst
+++ b/doc/fluid/design/execution/index_en.rst
--- a/doc/fluid/design/execution/switch.md
+++ b/doc/fluid/design/execution/switch.md
--- a/doc/fluid/design/index_cn.rst
+++ b/doc/fluid/design/index_cn.rst
--- a/doc/fluid/design/index_en.rst
+++ b/doc/fluid/design/index_en.rst
--- a/doc/fluid/design/interface/index_cn.rst
+++ b/doc/fluid/design/interface/index_cn.rst
--- a/doc/fluid/design/interface/index_en.rst
+++ b/doc/fluid/design/interface/index_en.rst
--- a/doc/fluid/design/ir/overview.md
+++ b/doc/fluid/design/ir/overview.md
--- a/doc/fluid/design/memory/README.md
+++ b/doc/fluid/design/memory/README.md
--- a/doc/fluid/design/memory/images/control_flow_graph.png
+++ b/doc/fluid/design/memory/images/control_flow_graph.png
--- a/doc/fluid/design/memory/images/dataflow_equations.png
+++ b/doc/fluid/design/memory/images/dataflow_equations.png
--- a/doc/fluid/design/memory/images/deep_learning.png
+++ b/doc/fluid/design/memory/images/deep_learning.png
--- a/doc/fluid/design/memory/index_cn.rst
+++ b/doc/fluid/design/memory/index_cn.rst
--- a/doc/fluid/design/memory/index_en.rst
+++ b/doc/fluid/design/memory/index_en.rst
--- a/doc/fluid/design/memory/memory_optimization.md
+++ b/doc/fluid/design/memory/memory_optimization.md
--- a/doc/fluid/design/modules/backward.md
+++ b/doc/fluid/design/modules/backward.md
--- a/doc/fluid/design/modules/batch_norm_op.md
+++ b/doc/fluid/design/modules/batch_norm_op.md
--- a/doc/fluid/design/modules/evaluator.md
+++ b/doc/fluid/design/modules/evaluator.md
--- a/doc/fluid/design/modules/images/batch_norm_fork.dot
+++ b/doc/fluid/design/modules/images/batch_norm_fork.dot
--- a/doc/fluid/design/modules/images/batch_norm_fork.png
+++ b/doc/fluid/design/modules/images/batch_norm_fork.png
--- a/doc/fluid/design/modules/images/batch_norm_op_kernel.png
+++ b/doc/fluid/design/modules/images/batch_norm_op_kernel.png
--- a/doc/fluid/design/modules/images/feed_forward.png
+++ b/doc/fluid/design/modules/images/feed_forward.png
--- a/doc/fluid/design/modules/images/feed_forward_regularized.png
+++ b/doc/fluid/design/modules/images/feed_forward_regularized.png
--- a/doc/fluid/design/modules/images/l1_regularization.png
+++ b/doc/fluid/design/modules/images/l1_regularization.png
--- a/doc/fluid/design/modules/images/l2_regularization.png
+++ b/doc/fluid/design/modules/images/l2_regularization.png
--- a/doc/fluid/design/modules/images/loss_equation.png
+++ b/doc/fluid/design/modules/images/loss_equation.png
--- a/doc/fluid/design/modules/index_cn.rst
+++ b/doc/fluid/design/modules/index_cn.rst
--- a/doc/fluid/design/modules/index_en.rst
+++ b/doc/fluid/design/modules/index_en.rst
--- a/doc/fluid/design/modules/infer_var_type.md
+++ b/doc/fluid/design/modules/infer_var_type.md
--- a/doc/fluid/design/modules/net_op_design.md
+++ b/doc/fluid/design/modules/net_op_design.md
--- a/doc/fluid/design/modules/optimizer.md
+++ b/doc/fluid/design/modules/optimizer.md
--- a/doc/fluid/design/modules/prune.md
+++ b/doc/fluid/design/modules/prune.md
--- a/doc/fluid/design/modules/python_api.md
+++ b/doc/fluid/design/modules/python_api.md
--- a/doc/fluid/design/modules/register_grad_op.md
+++ b/doc/fluid/design/modules/register_grad_op.md
--- a/doc/fluid/design/modules/regularization.md
+++ b/doc/fluid/design/modules/regularization.md
--- a/doc/fluid/design/modules/selected_rows.md
+++ b/doc/fluid/design/modules/selected_rows.md
--- a/doc/fluid/design/motivation/api.md
+++ b/doc/fluid/design/motivation/api.md
--- a/doc/fluid/design/motivation/fluid-compiler.graffle
+++ b/doc/fluid/design/motivation/fluid-compiler.graffle
--- a/doc/fluid/design/motivation/fluid-compiler.png
+++ b/doc/fluid/design/motivation/fluid-compiler.png
--- a/doc/fluid/design/motivation/fluid.md
+++ b/doc/fluid/design/motivation/fluid.md
--- a/doc/fluid/design/motivation/fluid_compiler.md
+++ b/doc/fluid/design/motivation/fluid_compiler.md
--- a/doc/fluid/design/motivation/index_cn.rst
+++ b/doc/fluid/design/motivation/index_cn.rst
--- a/doc/fluid/design/motivation/index_en.rst
+++ b/doc/fluid/design/motivation/index_en.rst
--- a/doc/fluid/design/motivation/refactorization.md
+++ b/doc/fluid/design/motivation/refactorization.md
--- a/doc/fluid/design/multi_devices/index_cn.rst
+++ b/doc/fluid/design/multi_devices/index_cn.rst
--- a/doc/fluid/design/multi_devices/index_en.rst
+++ b/doc/fluid/design/multi_devices/index_en.rst
--- a/doc/fluid/design/multi_devices/kernel_hint_design.md
+++ b/doc/fluid/design/multi_devices/kernel_hint_design.md
--- a/doc/fluid/design/multi_devices/kernel_selection.md
+++ b/doc/fluid/design/multi_devices/kernel_selection.md
--- a/doc/fluid/design/multi_devices/operator_kernel_type.md
+++ b/doc/fluid/design/multi_devices/operator_kernel_type.md
--- a/doc/fluid/design/network/deep_speech_2.md
+++ b/doc/fluid/design/network/deep_speech_2.md
--- a/doc/fluid/design/network/images/LOD-and-shape-changes-during-decoding.jpg
+++ b/doc/fluid/design/network/images/LOD-and-shape-changes-during-decoding.jpg
--- a/doc/fluid/design/network/images/beam_search.png
+++ b/doc/fluid/design/network/images/beam_search.png
--- a/doc/fluid/design/network/images/ds2_network.png
+++ b/doc/fluid/design/network/images/ds2_network.png
--- a/doc/fluid/design/network/index_cn.rst
+++ b/doc/fluid/design/network/index_cn.rst
--- a/doc/fluid/design/network/index_en.rst
+++ b/doc/fluid/design/network/index_en.rst
--- a/doc/fluid/design/network/sequence_decoder.md
+++ b/doc/fluid/design/network/sequence_decoder.md
--- a/doc/fluid/design/onnx/images/project_structure.png
+++ b/doc/fluid/design/onnx/images/project_structure.png
--- a/doc/fluid/design/onnx/onnx_convertor.md
+++ b/doc/fluid/design/onnx/onnx_convertor.md
--- a/doc/fluid/design/others/auto_gradient_check.md
+++ b/doc/fluid/design/others/auto_gradient_check.md
--- a/doc/fluid/design/others/dcgan.png
+++ b/doc/fluid/design/others/dcgan.png
--- a/doc/fluid/design/others/gan_api.md
+++ b/doc/fluid/design/others/gan_api.md
--- a/doc/fluid/design/others/graph.md
+++ b/doc/fluid/design/others/graph.md
--- a/doc/fluid/design/others/graph_survey.md
+++ b/doc/fluid/design/others/graph_survey.md
--- a/doc/fluid/design/others/images/graph_construction_example.bash
+++ b/doc/fluid/design/others/images/graph_construction_example.bash
--- a/doc/fluid/design/others/images/graph_construction_example.dot
+++ b/doc/fluid/design/others/images/graph_construction_example.dot
--- a/doc/fluid/design/others/images/graph_construction_example_all.png
+++ b/doc/fluid/design/others/images/graph_construction_example_all.png
--- a/doc/fluid/design/others/images/graph_construction_example_forward_backward.png
+++ b/doc/fluid/design/others/images/graph_construction_example_forward_backward.png
--- a/doc/fluid/design/others/images/graph_construction_example_forward_only.png
+++ b/doc/fluid/design/others/images/graph_construction_example_forward_only.png
--- a/doc/fluid/design/others/parameters_in_cpp.md
+++ b/doc/fluid/design/others/parameters_in_cpp.md
--- a/doc/fluid/design/others/simple_op_design.md
+++ b/doc/fluid/design/others/simple_op_design.md
--- a/doc/fluid/design/others/test.dot
+++ b/doc/fluid/design/others/test.dot
--- a/doc/fluid/design/others/test.dot.png
+++ b/doc/fluid/design/others/test.dot.png
--- a/doc/fluid/design/quantization/fixed_point_quantization.md
+++ b/doc/fluid/design/quantization/fixed_point_quantization.md
--- a/doc/fluid/design/quantization/quantization_backward_and_optimization.png
+++ b/doc/fluid/design/quantization/quantization_backward_and_optimization.png
--- a/doc/fluid/design/quantization/quantization_equivalent_forward.png
+++ b/doc/fluid/design/quantization/quantization_equivalent_forward.png
--- a/doc/fluid/design/quantization/quantization_forward.png
+++ b/doc/fluid/design/quantization/quantization_forward.png
--- a/doc/fluid/dev/api_doc_std_cn.md
+++ b/doc/fluid/dev/api_doc_std_cn.md
--- a/doc/fluid/dev/api_doc_std_en.md
+++ b/doc/fluid/dev/api_doc_std_en.md
--- a/doc/fluid/dev/ci_build_whl.png
+++ b/doc/fluid/dev/ci_build_whl.png
--- a/doc/fluid/dev/contribute_to_paddle_cn.md
+++ b/doc/fluid/dev/contribute_to_paddle_cn.md
--- a/doc/fluid/dev/contribute_to_paddle_en.md
+++ b/doc/fluid/dev/contribute_to_paddle_en.md
--- a/doc/fluid/dev/index_cn.rst
+++ b/doc/fluid/dev/index_cn.rst
--- a/doc/fluid/dev/index_en.rst
+++ b/doc/fluid/dev/index_en.rst
--- a/doc/fluid/dev/name_convention.md
+++ b/doc/fluid/dev/name_convention.md
--- a/doc/fluid/dev/new_op_cn.md
+++ b/doc/fluid/dev/new_op_cn.md
--- a/doc/fluid/dev/new_op_en.md
+++ b/doc/fluid/dev/new_op_en.md
--- a/doc/fluid/dev/new_op_kernel.md
+++ b/doc/fluid/dev/new_op_kernel.md
--- a/doc/fluid/dev/op_markdown_format.md
+++ b/doc/fluid/dev/op_markdown_format.md
--- a/doc/fluid/dev/releasing_process_cn.md
+++ b/doc/fluid/dev/releasing_process_cn.md
--- a/doc/fluid/dev/releasing_process_en.md
+++ b/doc/fluid/dev/releasing_process_en.md
--- a/doc/fluid/dev/src/fc.py
+++ b/doc/fluid/dev/src/fc.py
--- a/doc/fluid/dev/support_new_device.md
+++ b/doc/fluid/dev/support_new_device.md
--- a/doc/fluid/dev/use_eigen_cn.md
+++ b/doc/fluid/dev/use_eigen_cn.md
--- a/doc/fluid/dev/use_eigen_en.md
+++ b/doc/fluid/dev/use_eigen_en.md
--- a/doc/fluid/dev/write_docs_cn.rst
+++ b/doc/fluid/dev/write_docs_cn.rst
--- a/doc/fluid/dev/write_docs_en.rst
+++ b/doc/fluid/dev/write_docs_en.rst
--- a/source/faq/faq.rst
+++ b/source/faq/faq.rst
--- a/source/faq/index_cn.rst
+++ b/source/faq/index_cn.rst
--- a/doc/fluid/faq/index_en.rst
+++ b/doc/fluid/faq/index_en.rst
--- a/doc/fluid/getstarted/Developer's_Guide_to_Paddle_Fluid.md
+++ b/doc/fluid/getstarted/Developer's_Guide_to_Paddle_Fluid.md
--- a/doc/fluid/getstarted/concepts/index_cn.rst
+++ b/doc/fluid/getstarted/concepts/index_cn.rst
--- a/doc/fluid/getstarted/concepts/index_en.rst
+++ b/doc/fluid/getstarted/concepts/index_en.rst
--- a/doc/fluid/getstarted/concepts/reader/README.md
+++ b/doc/fluid/getstarted/concepts/reader/README.md
--- a/doc/fluid/getstarted/concepts/save_model/model_format.md
+++ b/doc/fluid/getstarted/concepts/save_model/model_format.md
--- a/doc/fluid/getstarted/index_cn.rst
+++ b/doc/fluid/getstarted/index_cn.rst
--- a/doc/fluid/getstarted/index_en.rst
+++ b/doc/fluid/getstarted/index_en.rst
--- a/doc/fluid/getstarted/quickstart_cn.rst
+++ b/doc/fluid/getstarted/quickstart_cn.rst
--- a/doc/fluid/getstarted/quickstart_en.rst
+++ b/doc/fluid/getstarted/quickstart_en.rst
--- a/doc/fluid/howto/cluster/fluid_cluster_train_cn.md
+++ b/doc/fluid/howto/cluster/fluid_cluster_train_cn.md
--- a/doc/fluid/howto/cluster/fluid_cluster_train_en.md
+++ b/doc/fluid/howto/cluster/fluid_cluster_train_en.md
--- a/doc/fluid/howto/cluster/fluid_recordio.md
+++ b/doc/fluid/howto/cluster/fluid_recordio.md
--- a/doc/fluid/howto/cluster/nccl2_rdma_training.md
+++ b/doc/fluid/howto/cluster/nccl2_rdma_training.md
--- a/doc/fluid/howto/index_cn.rst
+++ b/doc/fluid/howto/index_cn.rst
--- a/doc/fluid/howto/index_en.rst
+++ b/doc/fluid/howto/index_en.rst
--- a/doc/fluid/howto/inference/build_and_install_lib_cn.rst
+++ b/doc/fluid/howto/inference/build_and_install_lib_cn.rst
--- a/doc/fluid/howto/inference/index_cn.rst
+++ b/doc/fluid/howto/inference/index_cn.rst
--- a/doc/fluid/howto/inference/inference_support_in_fluid_cn.md
+++ b/doc/fluid/howto/inference/inference_support_in_fluid_cn.md
--- a/doc/fluid/howto/optimization/benchmark/index_cn.rst
+++ b/doc/fluid/howto/optimization/benchmark/index_cn.rst
--- a/doc/fluid/howto/optimization/benchmark/index_en.rst
+++ b/doc/fluid/howto/optimization/benchmark/index_en.rst
--- a/doc/fluid/howto/optimization/cpu_profiling_cn.md
+++ b/doc/fluid/howto/optimization/cpu_profiling_cn.md
--- a/doc/fluid/howto/optimization/cpu_profiling_en.md
+++ b/doc/fluid/howto/optimization/cpu_profiling_en.md
--- a/doc/fluid/howto/optimization/host_memory_profiling_cn.md
+++ b/doc/fluid/howto/optimization/host_memory_profiling_cn.md
--- a/doc/fluid/howto/optimization/index_cn.rst
+++ b/doc/fluid/howto/optimization/index_cn.rst
--- a/doc/fluid/howto/optimization/index_en.rst
+++ b/doc/fluid/howto/optimization/index_en.rst
--- a/doc/fluid/howto/optimization/pprof_1.png
+++ b/doc/fluid/howto/optimization/pprof_1.png
--- a/doc/fluid/howto/optimization/pprof_2.png
+++ b/doc/fluid/howto/optimization/pprof_2.png
--- a/doc/fluid/howto/optimization/timeline.jpeg
+++ b/doc/fluid/howto/optimization/timeline.jpeg
--- a/doc/fluid/howto/optimization/timeline_cn.md
+++ b/doc/fluid/howto/optimization/timeline_cn.md
--- a/doc/fluid/howto/optimization/timeline_en.md
+++ b/doc/fluid/howto/optimization/timeline_en.md
--- a/doc/fluid/howto/optimization/tracing.jpeg
+++ b/doc/fluid/howto/optimization/tracing.jpeg
--- a/doc/fluid/howto/performance/error_clip.md
+++ b/doc/fluid/howto/performance/error_clip.md
--- a/doc/fluid/howto/performance/images/profiler.png
+++ b/doc/fluid/howto/performance/images/profiler.png
--- a/doc/fluid/howto/performance/profiler.md
+++ b/doc/fluid/howto/performance/profiler.md
--- a/doc/fluid/howto/third_party/images/multigpu_allreduce.graffle
+++ b/doc/fluid/howto/third_party/images/multigpu_allreduce.graffle
--- a/doc/fluid/howto/third_party/images/multigpu_allreduce.png
+++ b/doc/fluid/howto/third_party/images/multigpu_allreduce.png
--- a/doc/fluid/howto/third_party/images/multigpu_before_convert.graffle
+++ b/doc/fluid/howto/third_party/images/multigpu_before_convert.graffle
--- a/doc/fluid/howto/third_party/images/multigpu_before_convert.png
+++ b/doc/fluid/howto/third_party/images/multigpu_before_convert.png
--- a/doc/fluid/howto/third_party/mkldnn_fluid.md
+++ b/doc/fluid/howto/third_party/mkldnn_fluid.md
--- a/doc/fluid/howto/third_party/paddle_nccl.md
+++ b/doc/fluid/howto/third_party/paddle_nccl.md
--- a/doc/fluid/images/1.png
+++ b/doc/fluid/images/1.png
--- a/doc/fluid/images/2.png
+++ b/doc/fluid/images/2.png
--- a/doc/fluid/images/2_level_rnn.dot
+++ b/doc/fluid/images/2_level_rnn.dot
--- a/doc/fluid/images/2_level_rnn.png
+++ b/doc/fluid/images/2_level_rnn.png
--- a/doc/fluid/images/3.png
+++ b/doc/fluid/images/3.png
--- a/doc/fluid/images/4.png
+++ b/doc/fluid/images/4.png
--- a/doc/fluid/images/LOD-and-shape-changes-during-decoding.jpg
+++ b/doc/fluid/images/LOD-and-shape-changes-during-decoding.jpg
--- a/doc/fluid/images/LoDTensor.png
+++ b/doc/fluid/images/LoDTensor.png
--- a/doc/fluid/images/asgd.gif
+++ b/doc/fluid/images/asgd.gif
--- a/doc/fluid/images/batch_norm_fork.dot
+++ b/doc/fluid/images/batch_norm_fork.dot
--- a/doc/fluid/images/batch_norm_fork.png
+++ b/doc/fluid/images/batch_norm_fork.png
--- a/doc/fluid/images/batch_norm_op_kernel.png
+++ b/doc/fluid/images/batch_norm_op_kernel.png
--- a/doc/fluid/images/beam_search.png
+++ b/doc/fluid/images/beam_search.png
--- a/doc/fluid/images/ci_build_whl.png
+++ b/doc/fluid/images/ci_build_whl.png
--- a/doc/fluid/images/compile_run_time.png
+++ b/doc/fluid/images/compile_run_time.png
--- a/doc/fluid/images/compiler.graffle
+++ b/doc/fluid/images/compiler.graffle
--- a/doc/fluid/images/compiler.png
+++ b/doc/fluid/images/compiler.png
--- a/doc/fluid/images/control_flow_graph.png
+++ b/doc/fluid/images/control_flow_graph.png
--- a/doc/fluid/images/dataflow_equations.png
+++ b/doc/fluid/images/dataflow_equations.png
--- a/doc/fluid/images/dcgan.png
+++ b/doc/fluid/images/dcgan.png
--- a/doc/fluid/images/deep_learning.png
+++ b/doc/fluid/images/deep_learning.png
--- a/doc/fluid/images/dist-graph.graffle
+++ b/doc/fluid/images/dist-graph.graffle
--- a/doc/fluid/images/dist-graph.png
+++ b/doc/fluid/images/dist-graph.png
--- a/doc/fluid/images/distributed_architecture.graffle
+++ b/doc/fluid/images/distributed_architecture.graffle
--- a/doc/fluid/images/distributed_architecture.png
+++ b/doc/fluid/images/distributed_architecture.png
--- a/doc/fluid/images/ds2_network.png
+++ b/doc/fluid/images/ds2_network.png
--- a/doc/fluid/images/executor.png
+++ b/doc/fluid/images/executor.png
--- a/doc/fluid/images/feed_forward.png
+++ b/doc/fluid/images/feed_forward.png
--- a/doc/fluid/images/feed_forward_regularized.png
+++ b/doc/fluid/images/feed_forward_regularized.png
--- a/doc/fluid/images/fluid-compiler.graffle
+++ b/doc/fluid/images/fluid-compiler.graffle
--- a/doc/fluid/images/fluid-compiler.png
+++ b/doc/fluid/images/fluid-compiler.png
--- a/doc/fluid/images/fluid_examples.png
+++ b/doc/fluid/images/fluid_examples.png
--- a/doc/fluid/images/fluid_module_1.png
+++ b/doc/fluid/images/fluid_module_1.png
--- a/doc/fluid/images/fluid_module_2.png
+++ b/doc/fluid/images/fluid_module_2.png
--- a/doc/fluid/images/graph_construction_example.bash
+++ b/doc/fluid/images/graph_construction_example.bash
--- a/doc/fluid/images/graph_construction_example.dot
+++ b/doc/fluid/images/graph_construction_example.dot
--- a/doc/fluid/images/graph_construction_example_all.png
+++ b/doc/fluid/images/graph_construction_example_all.png
--- a/doc/fluid/images/graph_construction_example_forward_backward.png
+++ b/doc/fluid/images/graph_construction_example_forward_backward.png
--- a/doc/fluid/images/graph_construction_example_forward_only.png
+++ b/doc/fluid/images/graph_construction_example_forward_only.png
--- a/doc/fluid/images/l1_regularization.png
+++ b/doc/fluid/images/l1_regularization.png
--- a/doc/fluid/images/l2_regularization.png
+++ b/doc/fluid/images/l2_regularization.png
--- a/doc/fluid/images/layer.png
+++ b/doc/fluid/images/layer.png
--- a/doc/fluid/images/local-graph.graffle
+++ b/doc/fluid/images/local-graph.graffle
--- a/doc/fluid/images/local-graph.png
+++ b/doc/fluid/images/local-graph.png
--- a/doc/fluid/images/local_architecture.graffle
+++ b/doc/fluid/images/local_architecture.graffle
--- a/doc/fluid/images/local_architecture.png
+++ b/doc/fluid/images/local_architecture.png
--- a/doc/fluid/images/lookup_table.png
+++ b/doc/fluid/images/lookup_table.png
--- a/doc/fluid/images/lookup_table_training.png
+++ b/doc/fluid/images/lookup_table_training.png
--- a/doc/fluid/images/loss_equation.png
+++ b/doc/fluid/images/loss_equation.png
--- a/doc/fluid/images/multi-threads.graffle
+++ b/doc/fluid/images/multi-threads.graffle
--- a/doc/fluid/images/multi-threads@3x.png
+++ b/doc/fluid/images/multi-threads@3x.png
--- a/doc/fluid/images/multigpu_allreduce.graffle
+++ b/doc/fluid/images/multigpu_allreduce.graffle
--- a/doc/fluid/images/multigpu_allreduce.png
+++ b/doc/fluid/images/multigpu_allreduce.png
--- a/doc/fluid/images/multigpu_before_convert.graffle
+++ b/doc/fluid/images/multigpu_before_convert.graffle
--- a/doc/fluid/images/multigpu_before_convert.png
+++ b/doc/fluid/images/multigpu_before_convert.png
--- a/doc/fluid/images/multiple_reader.png
+++ b/doc/fluid/images/multiple_reader.png
--- a/doc/fluid/images/op.dot
+++ b/doc/fluid/images/op.dot
--- a/doc/fluid/images/op_op_with_kern_class_diagram.dot
+++ b/doc/fluid/images/op_op_with_kern_class_diagram.dot
--- a/doc/fluid/images/op_with_kernel.dot
+++ b/doc/fluid/images/op_with_kernel.dot
--- a/doc/fluid/images/operator1.png
+++ b/doc/fluid/images/operator1.png
--- a/doc/fluid/images/operator2.png
+++ b/doc/fluid/images/operator2.png
--- a/doc/fluid/images/paddle-compile.graffle
+++ b/doc/fluid/images/paddle-compile.graffle
--- a/doc/fluid/images/paddle-compile.png
+++ b/doc/fluid/images/paddle-compile.png
--- a/doc/fluid/images/place.png
+++ b/doc/fluid/images/place.png
--- a/doc/fluid/images/pprof_1.png
+++ b/doc/fluid/images/pprof_1.png
--- a/doc/fluid/images/pprof_2.png
+++ b/doc/fluid/images/pprof_2.png
--- a/doc/fluid/images/print_fluid_program.png
+++ b/doc/fluid/images/print_fluid_program.png
--- a/doc/fluid/images/profiler.png
+++ b/doc/fluid/images/profiler.png
--- a/doc/fluid/images/program_desc1.png
+++ b/doc/fluid/images/program_desc1.png
--- a/doc/fluid/images/program_desc2.png
+++ b/doc/fluid/images/program_desc2.png
--- a/doc/fluid/images/raw_input.png
+++ b/doc/fluid/images/raw_input.png
--- a/doc/fluid/images/readers.png
+++ b/doc/fluid/images/readers.png
--- a/doc/fluid/images/remote_executor.graffle
+++ b/doc/fluid/images/remote_executor.graffle
--- a/doc/fluid/images/remote_executor.png
+++ b/doc/fluid/images/remote_executor.png
--- a/doc/fluid/images/rnn.dot
+++ b/doc/fluid/images/rnn.dot
--- a/doc/fluid/images/rnn.jpg
+++ b/doc/fluid/images/rnn.jpg
--- a/doc/fluid/images/rnn.png
+++ b/doc/fluid/images/rnn.png
--- a/doc/fluid/images/rnn_2level_data.dot
+++ b/doc/fluid/images/rnn_2level_data.dot
--- a/doc/fluid/images/rnn_2level_data.png
+++ b/doc/fluid/images/rnn_2level_data.png
--- a/doc/fluid/images/scope_variable_tensor.png
+++ b/doc/fluid/images/scope_variable_tensor.png
--- a/doc/fluid/images/single-thread@3x.png
+++ b/doc/fluid/images/single-thread@3x.png
--- a/doc/fluid/images/sorted_input.png
+++ b/doc/fluid/images/sorted_input.png
--- a/doc/fluid/images/sparse_update.graffle
+++ b/doc/fluid/images/sparse_update.graffle
--- a/doc/fluid/images/sparse_update.png
+++ b/doc/fluid/images/sparse_update.png
--- a/doc/fluid/images/test.dot
+++ b/doc/fluid/images/test.dot
--- a/doc/fluid/images/test.dot.png
+++ b/doc/fluid/images/test.dot.png
--- a/doc/fluid/images/theta_star.gif
+++ b/doc/fluid/images/theta_star.gif
--- a/doc/fluid/images/timeline.jpeg
+++ b/doc/fluid/images/timeline.jpeg
--- a/doc/fluid/images/tracing.jpeg
+++ b/doc/fluid/images/tracing.jpeg
--- a/doc/fluid/images/transpiler.png
+++ b/doc/fluid/images/transpiler.png
--- a/doc/fluid/images/user_interface.png
+++ b/doc/fluid/images/user_interface.png
--- a/doc/fluid/index_cn.rst
+++ b/doc/fluid/index_cn.rst
--- a/doc/fluid/index_en.rst
+++ b/doc/fluid/index_en.rst
--- a/doc/fluid/read_source.md
+++ b/doc/fluid/read_source.md
--- a/source/user_guides/howto/basic_concept/fluid_basic_concept.rst
+++ b/source/user_guides/howto/basic_concept/fluid_basic_concept.rst
--- a/source/user_guides/howto/basic_concept/fluid_local_train.jpeg
+++ b/source/user_guides/howto/basic_concept/fluid_local_train.jpeg
--- a/source/user_guides/howto/basic_concept/fluid_mnist.png
+++ b/source/user_guides/howto/basic_concept/fluid_mnist.png
--- a/doc/fluid/user_guides/howto/configure_simple_model/index.rst
+++ b/doc/fluid/user_guides/howto/configure_simple_model/index.rst
--- a/source/user_guides/howto/debug/index.rst
+++ b/source/user_guides/howto/debug/index.rst
--- a/source/user_guides/howto/debug/visualdl.md
+++ b/source/user_guides/howto/debug/visualdl.md
--- a/source/user_guides/howto/evaluation/index.rst
+++ b/source/user_guides/howto/evaluation/index.rst
--- a/source/user_guides/howto/evaluation/metrics.rst
+++ b/source/user_guides/howto/evaluation/metrics.rst
--- a/source/appendix/foo.rst
+++ b/source/appendix/foo.rst
--- a/doc/fluid/user_guides/howto/prepare_data/feeding_data.rst
+++ b/doc/fluid/user_guides/howto/prepare_data/feeding_data.rst
--- a/doc/fluid/user_guides/howto/prepare_data/index.rst
+++ b/doc/fluid/user_guides/howto/prepare_data/index.rst
--- a/source/user_guides/howto/prepare_data/reader.md
+++ b/source/user_guides/howto/prepare_data/reader.md
--- a/doc/fluid/user_guides/howto/prepare_data/use_recordio_reader.rst
+++ b/doc/fluid/user_guides/howto/prepare_data/use_recordio_reader.rst
--- a/source/user_guides/howto/training/checkpoint_doc_cn.md
+++ b/source/user_guides/howto/training/checkpoint_doc_cn.md
--- a/source/user_guides/howto/training/checkpoint_doc_en.md
+++ b/source/user_guides/howto/training/checkpoint_doc_en.md
--- a/source/user_guides/howto/training/cluster_howto.rst
+++ b/source/user_guides/howto/training/cluster_howto.rst
--- a/source/user_guides/howto/training/cluster_quick_start.rst
+++ b/source/user_guides/howto/training/cluster_quick_start.rst
--- a/source/user_guides/howto/training/index.rst
+++ b/source/user_guides/howto/training/index.rst
--- a/source/user_guides/howto/training/multi_node.rst
+++ b/source/user_guides/howto/training/multi_node.rst
--- a/source/user_guides/howto/training/save_load_variables.rst
+++ b/source/user_guides/howto/training/save_load_variables.rst
--- a/source/user_guides/howto/training/single_node.rst
+++ b/source/user_guides/howto/training/single_node.rst
--- a/source/user_guides/howto/training/src/dist_train_nccl2.graffle
+++ b/source/user_guides/howto/training/src/dist_train_nccl2.graffle
--- a/source/user_guides/howto/training/src/dist_train_nccl2.png
+++ b/source/user_guides/howto/training/src/dist_train_nccl2.png
--- a/source/user_guides/howto/training/src/dist_train_pserver.graffle
+++ b/source/user_guides/howto/training/src/dist_train_pserver.graffle
--- a/source/user_guides/howto/training/src/dist_train_pserver.png
+++ b/source/user_guides/howto/training/src/dist_train_pserver.png
--- a/source/user_guides/howto/training/src/parallelism.png
+++ b/source/user_guides/howto/training/src/parallelism.png
--- a/source/user_guides/howto/training/test_while_training.rst
+++ b/source/user_guides/howto/training/test_while_training.rst
--- a/source/user_guides/index.rst
+++ b/source/user_guides/index.rst
--- a/doc/fluid/user_guides/models/index.rst
+++ b/doc/fluid/user_guides/models/index.rst
--- a/doc/mobile/CMakeLists.txt
+++ b/doc/mobile/CMakeLists.txt
--- a/doc/mobile/cross_compiling_for_android_cn.md
+++ b/doc/mobile/cross_compiling_for_android_cn.md
--- a/doc/mobile/cross_compiling_for_android_en.md
+++ b/doc/mobile/cross_compiling_for_android_en.md
--- a/doc/mobile/cross_compiling_for_ios_cn.md
+++ b/doc/mobile/cross_compiling_for_ios_cn.md
--- a/doc/mobile/cross_compiling_for_ios_en.md
+++ b/doc/mobile/cross_compiling_for_ios_en.md
--- a/doc/mobile/cross_compiling_for_raspberry_cn.md
+++ b/doc/mobile/cross_compiling_for_raspberry_cn.md
--- a/doc/mobile/cross_compiling_for_raspberry_en.md
+++ b/doc/mobile/cross_compiling_for_raspberry_en.md
--- a/doc/mobile/index_cn.rst
+++ b/doc/mobile/index_cn.rst
--- a/doc/mobile/index_en.rst
+++ b/doc/mobile/index_en.rst
--- a/doc/survey/cluster_bootstrapping_tools.md
+++ b/doc/survey/cluster_bootstrapping_tools.md
--- a/doc/survey/dynamic_graph.md
+++ b/doc/survey/dynamic_graph.md
--- a/doc/survey/op_fusion_design.md
+++ b/doc/survey/op_fusion_design.md
--- a/doc/templates/conf.py.cn.in
+++ b/doc/templates/conf.py.cn.in
--- a/doc/templates/conf.py.en.in
+++ b/doc/templates/conf.py.en.in
--- a/doc/templates/layout.html
+++ b/doc/templates/layout.html
--- a/doc/v2/CMakeLists.txt
+++ b/doc/v2/CMakeLists.txt
--- a/doc/v2/api/CMakeLists.txt
+++ b/doc/v2/api/CMakeLists.txt
--- a/doc/v2/api/config/activation.rst
+++ b/doc/v2/api/config/activation.rst
--- a/doc/v2/api/config/attr.rst
+++ b/doc/v2/api/config/attr.rst
--- a/doc/v2/api/config/evaluators.rst
+++ b/doc/v2/api/config/evaluators.rst
--- a/doc/v2/api/config/layer.rst
+++ b/doc/v2/api/config/layer.rst
--- a/doc/v2/api/config/networks.rst
+++ b/doc/v2/api/config/networks.rst
--- a/doc/v2/api/config/optimizer.rst
+++ b/doc/v2/api/config/optimizer.rst
--- a/doc/v2/api/config/pooling.rst
+++ b/doc/v2/api/config/pooling.rst
--- a/doc/v2/api/data.rst
+++ b/doc/v2/api/data.rst
--- a/doc/v2/api/data/data_reader.rst
+++ b/doc/v2/api/data/data_reader.rst
--- a/doc/v2/api/data/dataset.rst
+++ b/doc/v2/api/data/dataset.rst
--- a/doc/v2/api/data/image.rst
+++ b/doc/v2/api/data/image.rst
--- a/doc/v2/api/index_en.rst
+++ b/doc/v2/api/index_en.rst
--- a/doc/v2/api/model_configs.rst
+++ b/doc/v2/api/model_configs.rst
--- a/doc/v2/api/overview.rst
+++ b/doc/v2/api/overview.rst
--- a/doc/v2/api/run_logic.rst
+++ b/doc/v2/api/run_logic.rst
--- a/doc/v2/build_and_install/build_from_source_cn.rst
+++ b/doc/v2/build_and_install/build_from_source_cn.rst
--- a/doc/v2/build_and_install/build_from_source_en.rst
+++ b/doc/v2/build_and_install/build_from_source_en.rst
--- a/doc/v2/build_and_install/docker_install_cn.rst
+++ b/doc/v2/build_and_install/docker_install_cn.rst
--- a/doc/v2/build_and_install/docker_install_en.rst
+++ b/doc/v2/build_and_install/docker_install_en.rst
--- a/doc/v2/build_and_install/index_cn.rst
+++ b/doc/v2/build_and_install/index_cn.rst
--- a/doc/v2/build_and_install/index_en.rst
+++ b/doc/v2/build_and_install/index_en.rst
--- a/doc/v2/build_and_install/paddleci.png
+++ b/doc/v2/build_and_install/paddleci.png
--- a/doc/v2/build_and_install/pip_install_cn.rst
+++ b/doc/v2/build_and_install/pip_install_cn.rst
--- a/doc/v2/build_and_install/pip_install_en.rst
+++ b/doc/v2/build_and_install/pip_install_en.rst
--- a/doc/v2/design/cluster_train/README.md
+++ b/doc/v2/design/cluster_train/README.md
--- a/doc/v2/design/cluster_train/checkpointing.md
+++ b/doc/v2/design/cluster_train/checkpointing.md
--- a/doc/v2/design/cluster_train/data_dispatch.md
+++ b/doc/v2/design/cluster_train/data_dispatch.md
--- a/doc/v2/design/cluster_train/large_model_dist_train.md
+++ b/doc/v2/design/cluster_train/large_model_dist_train.md
--- a/doc/v2/design/cluster_train/master_server.md
+++ b/doc/v2/design/cluster_train/master_server.md
--- a/doc/v2/design/cluster_train/pserver_client.md
+++ b/doc/v2/design/cluster_train/pserver_client.md
--- a/doc/v2/design/cluster_train/remote_parameter_updater.md
+++ b/doc/v2/design/cluster_train/remote_parameter_updater.md
--- a/doc/v2/design/cluster_train/save_model.md
+++ b/doc/v2/design/cluster_train/save_model.md
--- a/doc/v2/design/cluster_train/src/checkpointing.png
+++ b/doc/v2/design/cluster_train/src/checkpointing.png
--- a/doc/v2/design/cluster_train/src/data_dispatch.png
+++ b/doc/v2/design/cluster_train/src/data_dispatch.png
--- a/doc/v2/design/cluster_train/src/dataset.graffle
+++ b/doc/v2/design/cluster_train/src/dataset.graffle
--- a/doc/v2/design/cluster_train/src/dataset.png
+++ b/doc/v2/design/cluster_train/src/dataset.png
--- a/doc/v2/design/cluster_train/src/file_storage.graffle
+++ b/doc/v2/design/cluster_train/src/file_storage.graffle
--- a/doc/v2/design/cluster_train/src/file_storage.png
+++ b/doc/v2/design/cluster_train/src/file_storage.png
--- a/doc/v2/design/cluster_train/src/init_lock.graffle
+++ b/doc/v2/design/cluster_train/src/init_lock.graffle
--- a/doc/v2/design/cluster_train/src/init_lock.png
+++ b/doc/v2/design/cluster_train/src/init_lock.png
--- a/doc/v2/design/cluster_train/src/paddle-cloud-in-data-center.png
+++ b/doc/v2/design/cluster_train/src/paddle-cloud-in-data-center.png
--- a/doc/v2/design/cluster_train/src/paddle-etcd.graffle
+++ b/doc/v2/design/cluster_train/src/paddle-etcd.graffle
--- a/doc/v2/design/cluster_train/src/paddle-etcd.png
+++ b/doc/v2/design/cluster_train/src/paddle-etcd.png
--- a/doc/v2/design/cluster_train/src/paddle-model-sharding.graffle
+++ b/doc/v2/design/cluster_train/src/paddle-model-sharding.graffle
--- a/doc/v2/design/cluster_train/src/paddle-model-sharding.png
+++ b/doc/v2/design/cluster_train/src/paddle-model-sharding.png
--- a/doc/v2/design/cluster_train/src/paddle-ps-0.png
+++ b/doc/v2/design/cluster_train/src/paddle-ps-0.png
--- a/doc/v2/design/cluster_train/src/paddle-ps-1.png
+++ b/doc/v2/design/cluster_train/src/paddle-ps-1.png
--- a/doc/v2/design/cluster_train/src/paddle-ps.graffle
+++ b/doc/v2/design/cluster_train/src/paddle-ps.graffle
--- a/doc/v2/design/cluster_train/src/paddle-task-queues.graffle
+++ b/doc/v2/design/cluster_train/src/paddle-task-queues.graffle
--- a/doc/v2/design/cluster_train/src/paddle-task-queues.png
+++ b/doc/v2/design/cluster_train/src/paddle-task-queues.png
--- a/doc/v2/design/cluster_train/src/paddle-task-states.graffle
+++ b/doc/v2/design/cluster_train/src/paddle-task-states.graffle
--- a/doc/v2/design/cluster_train/src/paddle-task-states.png
+++ b/doc/v2/design/cluster_train/src/paddle-task-states.png
--- a/doc/v2/design/cluster_train/src/pserver_init.graffle
+++ b/doc/v2/design/cluster_train/src/pserver_init.graffle
--- a/doc/v2/design/cluster_train/src/pserver_init.png
+++ b/doc/v2/design/cluster_train/src/pserver_init.png
--- a/doc/v2/design/cluster_train/src/submit-job.graffle
+++ b/doc/v2/design/cluster_train/src/submit-job.graffle
--- a/doc/v2/design/cluster_train/src/submit-job.png
+++ b/doc/v2/design/cluster_train/src/submit-job.png
--- a/doc/v2/design/cluster_train/src/trainer.graffle
+++ b/doc/v2/design/cluster_train/src/trainer.graffle
--- a/doc/v2/design/cluster_train/src/trainer.png
+++ b/doc/v2/design/cluster_train/src/trainer.png
--- a/doc/v2/design/cluster_train/submit-job.md
+++ b/doc/v2/design/cluster_train/submit-job.md
--- a/doc/v2/design/interface/00.why_plain_c.md
+++ b/doc/v2/design/interface/00.why_plain_c.md
--- a/doc/v2/design/interface/01.inference_implementation.md
+++ b/doc/v2/design/interface/01.inference_implementation.md
--- a/doc/v2/design/interface/index_cn.rst
+++ b/doc/v2/design/interface/index_cn.rst
--- a/doc/v2/design/interface/index_en.rst
+++ b/doc/v2/design/interface/index_en.rst
--- a/doc/v2/design/mkl/image/engine.png
+++ b/doc/v2/design/mkl/image/engine.png
--- a/doc/v2/design/mkl/image/gradients.png
+++ b/doc/v2/design/mkl/image/gradients.png
--- a/doc/v2/design/mkl/image/layers.png
+++ b/doc/v2/design/mkl/image/layers.png
--- a/doc/v2/design/mkl/image/matrix.png
+++ b/doc/v2/design/mkl/image/matrix.png
--- a/doc/v2/design/mkl/image/overview.png
+++ b/doc/v2/design/mkl/image/overview.png
--- a/doc/v2/design/mkl/mkl_packed.md
+++ b/doc/v2/design/mkl/mkl_packed.md
--- a/doc/v2/design/mkl/mkldnn.md
+++ b/doc/v2/design/mkl/mkldnn.md
--- a/doc/v2/dev/contribute_to_paddle_cn.md
+++ b/doc/v2/dev/contribute_to_paddle_cn.md
--- a/doc/v2/dev/contribute_to_paddle_en.md
+++ b/doc/v2/dev/contribute_to_paddle_en.md
--- a/doc/v2/dev/index_cn.rst
+++ b/doc/v2/dev/index_cn.rst
--- a/doc/v2/dev/index_en.rst
+++ b/doc/v2/dev/index_en.rst
--- a/doc/v2/dev/new_layer_cn.rst
+++ b/doc/v2/dev/new_layer_cn.rst
--- a/doc/v2/dev/new_layer_en.rst
+++ b/doc/v2/dev/new_layer_en.rst
--- a/doc/v2/dev/src/FullyConnected.jpg
+++ b/doc/v2/dev/src/FullyConnected.jpg
--- a/doc/v2/dev/src/doc_en.png
+++ b/doc/v2/dev/src/doc_en.png
--- a/doc/v2/dev/write_docs_cn.rst
+++ b/doc/v2/dev/write_docs_cn.rst
--- a/doc/v2/dev/write_docs_en.rst
+++ b/doc/v2/dev/write_docs_en.rst
--- a/doc/v2/faq/build_and_install/index_cn.rst
+++ b/doc/v2/faq/build_and_install/index_cn.rst
--- a/doc/v2/faq/build_and_install/index_en.rst
+++ b/doc/v2/faq/build_and_install/index_en.rst
--- a/doc/v2/faq/cluster/index_cn.rst
+++ b/doc/v2/faq/cluster/index_cn.rst
--- a/doc/v2/faq/cluster/index_en.rst
+++ b/doc/v2/faq/cluster/index_en.rst
--- a/doc/v2/faq/index_cn.rst
+++ b/doc/v2/faq/index_cn.rst
--- a/doc/v2/faq/index_en.rst
+++ b/doc/v2/faq/index_en.rst
--- a/doc/v2/faq/local/index_cn.rst
+++ b/doc/v2/faq/local/index_cn.rst
--- a/doc/v2/faq/local/index_en.rst
+++ b/doc/v2/faq/local/index_en.rst
--- a/doc/v2/faq/local/src/reduce_min_pool_size.py
+++ b/doc/v2/faq/local/src/reduce_min_pool_size.py
--- a/doc/v2/faq/local/src/word2vec_config.py
+++ b/doc/v2/faq/local/src/word2vec_config.py
--- a/doc/v2/faq/local/src/word2vec_dataprovider.py
+++ b/doc/v2/faq/local/src/word2vec_dataprovider.py
--- a/doc/v2/faq/model/index_cn.rst
+++ b/doc/v2/faq/model/index_cn.rst
--- a/doc/v2/faq/model/index_en.rst
+++ b/doc/v2/faq/model/index_en.rst
--- a/doc/v2/faq/parameter/index_cn.rst
+++ b/doc/v2/faq/parameter/index_cn.rst
--- a/doc/v2/faq/parameter/index_en.rst
+++ b/doc/v2/faq/parameter/index_en.rst
--- a/doc/v2/getstarted/concepts/src/infer.py
+++ b/doc/v2/getstarted/concepts/src/infer.py
--- a/doc/v2/getstarted/concepts/src/train.py
+++ b/doc/v2/getstarted/concepts/src/train.py
--- a/doc/v2/getstarted/concepts/use_concepts_cn.rst
+++ b/doc/v2/getstarted/concepts/use_concepts_cn.rst
--- a/doc/v2/getstarted/concepts/use_concepts_en.rst
+++ b/doc/v2/getstarted/concepts/use_concepts_en.rst
--- a/doc/v2/getstarted/index_cn.rst
+++ b/doc/v2/getstarted/index_cn.rst
--- a/doc/v2/getstarted/index_en.rst
+++ b/doc/v2/getstarted/index_en.rst
--- a/doc/v2/getstarted/quickstart_cn.rst
+++ b/doc/v2/getstarted/quickstart_cn.rst
--- a/doc/v2/getstarted/quickstart_en.rst
+++ b/doc/v2/getstarted/quickstart_en.rst
--- a/doc/v2/howto/capi/compile_paddle_lib_cn.md
+++ b/doc/v2/howto/capi/compile_paddle_lib_cn.md
--- a/doc/v2/howto/capi/compile_paddle_lib_en.md
+++ b/doc/v2/howto/capi/compile_paddle_lib_en.md
--- a/doc/v2/howto/capi/images/csr.png
+++ b/doc/v2/howto/capi/images/csr.png
--- a/doc/v2/howto/capi/images/sequence_data.png
+++ b/doc/v2/howto/capi/images/sequence_data.png
--- a/doc/v2/howto/capi/images/workflow_of_CAPI.png
+++ b/doc/v2/howto/capi/images/workflow_of_CAPI.png
--- a/doc/v2/howto/capi/index_cn.rst
+++ b/doc/v2/howto/capi/index_cn.rst
--- a/doc/v2/howto/capi/index_en.rst
+++ b/doc/v2/howto/capi/index_en.rst
--- a/doc/v2/howto/capi/organization_of_the_inputs_cn.md
+++ b/doc/v2/howto/capi/organization_of_the_inputs_cn.md
--- a/doc/v2/howto/capi/organization_of_the_inputs_en.md
+++ b/doc/v2/howto/capi/organization_of_the_inputs_en.md
--- a/doc/v2/howto/capi/workflow_of_capi_cn.md
+++ b/doc/v2/howto/capi/workflow_of_capi_cn.md
--- a/doc/v2/howto/capi/workflow_of_capi_en.md
+++ b/doc/v2/howto/capi/workflow_of_capi_en.md
--- a/doc/v2/howto/cluster/cmd_argument_cn.md
+++ b/doc/v2/howto/cluster/cmd_argument_cn.md
--- a/doc/v2/howto/cluster/cmd_argument_en.md
+++ b/doc/v2/howto/cluster/cmd_argument_en.md
--- a/doc/v2/howto/cluster/index_cn.rst
+++ b/doc/v2/howto/cluster/index_cn.rst
--- a/doc/v2/howto/cluster/index_en.rst
+++ b/doc/v2/howto/cluster/index_en.rst
--- a/doc/v2/howto/cluster/multi_cluster/fabric_cn.md
+++ b/doc/v2/howto/cluster/multi_cluster/fabric_cn.md
--- a/doc/v2/howto/cluster/multi_cluster/fabric_en.md
+++ b/doc/v2/howto/cluster/multi_cluster/fabric_en.md
--- a/doc/v2/howto/cluster/multi_cluster/index_cn.rst
+++ b/doc/v2/howto/cluster/multi_cluster/index_cn.rst
--- a/doc/v2/howto/cluster/multi_cluster/index_en.rst
+++ b/doc/v2/howto/cluster/multi_cluster/index_en.rst
--- a/doc/v2/howto/cluster/multi_cluster/k8s_aws_cn.md
+++ b/doc/v2/howto/cluster/multi_cluster/k8s_aws_cn.md
--- a/doc/v2/howto/cluster/multi_cluster/k8s_aws_en.md
+++ b/doc/v2/howto/cluster/multi_cluster/k8s_aws_en.md
--- a/doc/v2/howto/cluster/multi_cluster/k8s_cn.md
+++ b/doc/v2/howto/cluster/multi_cluster/k8s_cn.md
--- a/doc/v2/howto/cluster/multi_cluster/k8s_distributed_cn.md
+++ b/doc/v2/howto/cluster/multi_cluster/k8s_distributed_cn.md
--- a/doc/v2/howto/cluster/multi_cluster/k8s_distributed_en.md
+++ b/doc/v2/howto/cluster/multi_cluster/k8s_distributed_en.md
--- a/doc/v2/howto/cluster/multi_cluster/k8s_en.md
+++ b/doc/v2/howto/cluster/multi_cluster/k8s_en.md
--- a/doc/v2/howto/cluster/multi_cluster/openmpi_cn.md
+++ b/doc/v2/howto/cluster/multi_cluster/openmpi_cn.md
--- a/doc/v2/howto/cluster/multi_cluster/openmpi_en.md
+++ b/doc/v2/howto/cluster/multi_cluster/openmpi_en.md
--- a/doc/v2/howto/cluster/multi_cluster/src/add_security_group.png
+++ b/doc/v2/howto/cluster/multi_cluster/src/add_security_group.png
--- a/doc/v2/howto/cluster/multi_cluster/src/create_efs.png
+++ b/doc/v2/howto/cluster/multi_cluster/src/create_efs.png
--- a/doc/v2/howto/cluster/multi_cluster/src/k8s-paddle-arch.png
+++ b/doc/v2/howto/cluster/multi_cluster/src/k8s-paddle-arch.png
--- a/doc/v2/howto/cluster/multi_cluster/src/k8s_data/Dockerfile
+++ b/doc/v2/howto/cluster/multi_cluster/src/k8s_data/Dockerfile
--- a/doc/v2/howto/cluster/multi_cluster/src/k8s_data/README.md
+++ b/doc/v2/howto/cluster/multi_cluster/src/k8s_data/README.md
--- a/doc/v2/howto/cluster/multi_cluster/src/k8s_data/get_data.sh
+++ b/doc/v2/howto/cluster/multi_cluster/src/k8s_data/get_data.sh
--- a/doc/v2/howto/cluster/multi_cluster/src/k8s_train/Dockerfile
+++ b/doc/v2/howto/cluster/multi_cluster/src/k8s_train/Dockerfile
--- a/doc/v2/howto/cluster/multi_cluster/src/k8s_train/README.md
+++ b/doc/v2/howto/cluster/multi_cluster/src/k8s_train/README.md
--- a/doc/v2/howto/cluster/multi_cluster/src/k8s_train/start.sh
+++ b/doc/v2/howto/cluster/multi_cluster/src/k8s_train/start.sh
--- a/doc/v2/howto/cluster/multi_cluster/src/k8s_train/start_paddle.py
+++ b/doc/v2/howto/cluster/multi_cluster/src/k8s_train/start_paddle.py
--- a/doc/v2/howto/cluster/multi_cluster/src/pserver_and_trainer.png
+++ b/doc/v2/howto/cluster/multi_cluster/src/pserver_and_trainer.png
--- a/doc/v2/howto/cluster/multi_cluster/src/route53_create_recordset.png
+++ b/doc/v2/howto/cluster/multi_cluster/src/route53_create_recordset.png
--- a/doc/v2/howto/cluster/multi_cluster/src/route53_create_zone.png
+++ b/doc/v2/howto/cluster/multi_cluster/src/route53_create_zone.png
--- a/doc/v2/howto/cluster/multi_cluster/src/worker_security_group.png
+++ b/doc/v2/howto/cluster/multi_cluster/src/worker_security_group.png
--- a/doc/v2/howto/cluster/preparations_cn.md
+++ b/doc/v2/howto/cluster/preparations_cn.md
--- a/doc/v2/howto/cluster/preparations_en.md
+++ b/doc/v2/howto/cluster/preparations_en.md
--- a/doc/v2/howto/cluster/src/Dockerfile
+++ b/doc/v2/howto/cluster/src/Dockerfile
--- a/doc/v2/howto/cluster/src/efs_mount.png
+++ b/doc/v2/howto/cluster/src/efs_mount.png
--- a/doc/v2/howto/cluster/src/managed_policy.png
+++ b/doc/v2/howto/cluster/src/managed_policy.png
--- a/doc/v2/howto/cluster/src/ps_cn.png
+++ b/doc/v2/howto/cluster/src/ps_cn.png
--- a/doc/v2/howto/cluster/src/ps_en.png
+++ b/doc/v2/howto/cluster/src/ps_en.png
--- a/doc/v2/howto/cluster/src/trainer.png
+++ b/doc/v2/howto/cluster/src/trainer.png
--- a/doc/v2/howto/cluster/src/trainer_cn.png
+++ b/doc/v2/howto/cluster/src/trainer_cn.png
--- a/doc/v2/howto/cluster/src/word2vec/api_train_v2.py
+++ b/doc/v2/howto/cluster/src/word2vec/api_train_v2.py
--- a/doc/v2/howto/cluster/src/word2vec/api_train_v2_cluster.py
+++ b/doc/v2/howto/cluster/src/word2vec/api_train_v2_cluster.py
--- a/doc/v2/howto/cluster/src/word2vec/prepare.py
+++ b/doc/v2/howto/cluster/src/word2vec/prepare.py
--- a/doc/v2/howto/cmd_parameter/arguments_cn.md
+++ b/doc/v2/howto/cmd_parameter/arguments_cn.md
--- a/doc/v2/howto/cmd_parameter/arguments_en.md
+++ b/doc/v2/howto/cmd_parameter/arguments_en.md
--- a/doc/v2/howto/cmd_parameter/detail_introduction_cn.md
+++ b/doc/v2/howto/cmd_parameter/detail_introduction_cn.md
--- a/doc/v2/howto/cmd_parameter/detail_introduction_en.md
+++ b/doc/v2/howto/cmd_parameter/detail_introduction_en.md
--- a/doc/v2/howto/cmd_parameter/index_cn.rst
+++ b/doc/v2/howto/cmd_parameter/index_cn.rst
--- a/doc/v2/howto/cmd_parameter/index_en.rst
+++ b/doc/v2/howto/cmd_parameter/index_en.rst
--- a/doc/v2/howto/cmd_parameter/use_case_cn.md
+++ b/doc/v2/howto/cmd_parameter/use_case_cn.md
--- a/doc/v2/howto/cmd_parameter/use_case_en.md
+++ b/doc/v2/howto/cmd_parameter/use_case_en.md
--- a/doc/v2/howto/index_cn.rst
+++ b/doc/v2/howto/index_cn.rst
--- a/doc/v2/howto/index_en.rst
+++ b/doc/v2/howto/index_en.rst
--- a/doc/v2/howto/optimization/gpu_profiling_cn.rst
+++ b/doc/v2/howto/optimization/gpu_profiling_cn.rst
--- a/doc/v2/howto/optimization/gpu_profiling_en.rst
+++ b/doc/v2/howto/optimization/gpu_profiling_en.rst
--- a/doc/v2/howto/optimization/nvvp1.png
+++ b/doc/v2/howto/optimization/nvvp1.png
--- a/doc/v2/howto/optimization/nvvp2.png
+++ b/doc/v2/howto/optimization/nvvp2.png
--- a/doc/v2/howto/optimization/nvvp3.png
+++ b/doc/v2/howto/optimization/nvvp3.png
--- a/doc/v2/howto/optimization/nvvp4.png
+++ b/doc/v2/howto/optimization/nvvp4.png
--- a/doc/v2/howto/rnn/hierarchical_layer_cn.rst
+++ b/doc/v2/howto/rnn/hierarchical_layer_cn.rst
--- a/doc/v2/howto/rnn/hierarchical_layer_en.rst
+++ b/doc/v2/howto/rnn/hierarchical_layer_en.rst
--- a/doc/v2/howto/rnn/hrnn_rnn_api_compare_cn.rst
+++ b/doc/v2/howto/rnn/hrnn_rnn_api_compare_cn.rst
--- a/doc/v2/howto/rnn/hrnn_rnn_api_compare_en.rst
+++ b/doc/v2/howto/rnn/hrnn_rnn_api_compare_en.rst
--- a/doc/v2/howto/rnn/index_cn.rst
+++ b/doc/v2/howto/rnn/index_cn.rst
--- a/doc/v2/howto/rnn/index_en.rst
+++ b/doc/v2/howto/rnn/index_en.rst
--- a/doc/v2/howto/rnn/recurrent_group_cn.md
+++ b/doc/v2/howto/rnn/recurrent_group_cn.md
--- a/doc/v2/howto/rnn/recurrent_group_en.md
+++ b/doc/v2/howto/rnn/recurrent_group_en.md
--- a/doc/v2/howto/rnn/rnn_config_cn.rst
+++ b/doc/v2/howto/rnn/rnn_config_cn.rst
--- a/doc/v2/howto/rnn/rnn_config_en.rst
+++ b/doc/v2/howto/rnn/rnn_config_en.rst
--- a/doc/v2/howto/rnn/src/bi_lstm.jpg
+++ b/doc/v2/howto/rnn/src/bi_lstm.jpg
--- a/doc/v2/howto/rnn/src/encoder-decoder-attention-model.png
+++ b/doc/v2/howto/rnn/src/encoder-decoder-attention-model.png
--- a/doc/v2/howto/rnn/src/glossary_rnn.dot
+++ b/doc/v2/howto/rnn/src/glossary_rnn.dot
--- a/doc/v2/howto/rnn/src/glossary_rnn_with_memory.dot
+++ b/doc/v2/howto/rnn/src/glossary_rnn_with_memory.dot
--- a/doc/v2/howto/rnn/src/simple_full_hierarchical_recurrent.dot
+++ b/doc/v2/howto/rnn/src/simple_full_hierarchical_recurrent.dot
--- a/doc/v2/howto/rnn/src/simple_full_recurrent.dot
+++ b/doc/v2/howto/rnn/src/simple_full_recurrent.dot
--- a/doc/v2/images/FullyConnected.jpg
+++ b/doc/v2/images/FullyConnected.jpg
--- a/doc/v2/images/add_security_group.png
+++ b/doc/v2/images/add_security_group.png
--- a/doc/v2/images/bi_lstm.jpg
+++ b/doc/v2/images/bi_lstm.jpg
--- a/doc/v2/images/checkpointing.png
+++ b/doc/v2/images/checkpointing.png
--- a/doc/v2/images/create_efs.png
+++ b/doc/v2/images/create_efs.png
--- a/doc/v2/images/csr.png
+++ b/doc/v2/images/csr.png
--- a/doc/v2/images/data_dispatch.png
+++ b/doc/v2/images/data_dispatch.png
--- a/doc/v2/images/dataset.graffle
+++ b/doc/v2/images/dataset.graffle
--- a/doc/v2/images/dataset.png
+++ b/doc/v2/images/dataset.png
--- a/doc/v2/images/doc_en.png
+++ b/doc/v2/images/doc_en.png
--- a/doc/v2/images/efs_mount.png
+++ b/doc/v2/images/efs_mount.png
--- a/doc/v2/images/encoder-decoder-attention-model.png
+++ b/doc/v2/images/encoder-decoder-attention-model.png
--- a/doc/v2/images/engine.png
+++ b/doc/v2/images/engine.png
--- a/doc/v2/images/file_storage.graffle
+++ b/doc/v2/images/file_storage.graffle
--- a/doc/v2/images/file_storage.png
+++ b/doc/v2/images/file_storage.png
--- a/doc/v2/images/glossary_rnn.dot
+++ b/doc/v2/images/glossary_rnn.dot
--- a/doc/v2/images/glossary_rnn_with_memory.dot
+++ b/doc/v2/images/glossary_rnn_with_memory.dot
--- a/doc/v2/images/gradients.png
+++ b/doc/v2/images/gradients.png
--- a/doc/v2/images/init_lock.graffle
+++ b/doc/v2/images/init_lock.graffle
--- a/doc/v2/images/init_lock.png
+++ b/doc/v2/images/init_lock.png
--- a/doc/v2/images/k8s-paddle-arch.png
+++ b/doc/v2/images/k8s-paddle-arch.png
--- a/doc/v2/images/layers.png
+++ b/doc/v2/images/layers.png
--- a/doc/v2/images/managed_policy.png
+++ b/doc/v2/images/managed_policy.png
--- a/doc/v2/images/matrix.png
+++ b/doc/v2/images/matrix.png
--- a/doc/v2/images/nvvp1.png
+++ b/doc/v2/images/nvvp1.png
--- a/doc/v2/images/nvvp2.png
+++ b/doc/v2/images/nvvp2.png
--- a/doc/v2/images/nvvp3.png
+++ b/doc/v2/images/nvvp3.png
--- a/doc/v2/images/nvvp4.png
+++ b/doc/v2/images/nvvp4.png
--- a/doc/v2/images/overview.png
+++ b/doc/v2/images/overview.png
--- a/doc/v2/images/paddle-cloud-in-data-center.png
+++ b/doc/v2/images/paddle-cloud-in-data-center.png
--- a/doc/v2/images/paddle-etcd.graffle
+++ b/doc/v2/images/paddle-etcd.graffle
--- a/doc/v2/images/paddle-etcd.png
+++ b/doc/v2/images/paddle-etcd.png
--- a/doc/v2/images/paddle-model-sharding.graffle
+++ b/doc/v2/images/paddle-model-sharding.graffle
--- a/doc/v2/images/paddle-model-sharding.png
+++ b/doc/v2/images/paddle-model-sharding.png
--- a/doc/v2/images/paddle-ps-0.png
+++ b/doc/v2/images/paddle-ps-0.png
--- a/doc/v2/images/paddle-ps-1.png
+++ b/doc/v2/images/paddle-ps-1.png
--- a/doc/v2/images/paddle-ps.graffle
+++ b/doc/v2/images/paddle-ps.graffle
--- a/doc/v2/images/paddle-task-queues.graffle
+++ b/doc/v2/images/paddle-task-queues.graffle
--- a/doc/v2/images/paddle-task-queues.png
+++ b/doc/v2/images/paddle-task-queues.png
--- a/doc/v2/images/paddle-task-states.graffle
+++ b/doc/v2/images/paddle-task-states.graffle
--- a/doc/v2/images/paddle-task-states.png
+++ b/doc/v2/images/paddle-task-states.png
--- a/doc/v2/images/ps_cn.png
+++ b/doc/v2/images/ps_cn.png
--- a/doc/v2/images/ps_en.png
+++ b/doc/v2/images/ps_en.png
--- a/doc/v2/images/pserver_and_trainer.png
+++ b/doc/v2/images/pserver_and_trainer.png
--- a/doc/v2/images/pserver_init.graffle
+++ b/doc/v2/images/pserver_init.graffle
--- a/doc/v2/images/pserver_init.png
+++ b/doc/v2/images/pserver_init.png
--- a/doc/v2/images/route53_create_recordset.png
+++ b/doc/v2/images/route53_create_recordset.png
--- a/doc/v2/images/route53_create_zone.png
+++ b/doc/v2/images/route53_create_zone.png
--- a/doc/v2/images/sequence_data.png
+++ b/doc/v2/images/sequence_data.png
--- a/doc/v2/images/simple_full_hierarchical_recurrent.dot
+++ b/doc/v2/images/simple_full_hierarchical_recurrent.dot
--- a/doc/v2/images/simple_full_recurrent.dot
+++ b/doc/v2/images/simple_full_recurrent.dot
--- a/doc/v2/images/submit-job.graffle
+++ b/doc/v2/images/submit-job.graffle
--- a/doc/v2/images/submit-job.png
+++ b/doc/v2/images/submit-job.png
--- a/doc/v2/images/trainer.graffle
+++ b/doc/v2/images/trainer.graffle
--- a/doc/v2/images/trainer.png
+++ b/doc/v2/images/trainer.png
--- a/doc/v2/images/trainer_cn.png
+++ b/doc/v2/images/trainer_cn.png
--- a/doc/v2/images/worker_security_group.png
+++ b/doc/v2/images/worker_security_group.png
--- a/doc/v2/images/workflow_of_CAPI.png
+++ b/doc/v2/images/workflow_of_CAPI.png
--- a/doc/v2/index_cn.rst
+++ b/doc/v2/index_cn.rst
--- a/doc/v2/index_en.rst
+++ b/doc/v2/index_en.rst
--- a/Anakin @ beec126e
+++ b/Anakin @ beec126e
--- a/Paddle @ 6f68fe71
+++ b/Paddle @ 6f68fe71
--- a/book @ 2b81d844
+++ b/book @ 2b81d844
--- a/models @ d6024059
+++ b/models @ d6024059
--- a/paddle-mobile @ 73e2f989
+++ b/paddle-mobile @ 73e2f989
--- a/mobile @ c3aa92ac
+++ b/mobile @ c3aa92ac
--- a/paddle @ 653686c7
+++ b/paddle @ 653686c7
--- a/requirements.txt
+++ b/requirements.txt
--- a/scripts/build_doc_lib_lite.sh
+++ b/scripts/build_doc_lib_lite.sh
--- a/scripts/deploy_docs.sh
+++ b/scripts/deploy_docs.sh
--- a/source/advanced_usage/deploy/anakin_example.md
+++ b/source/advanced_usage/deploy/anakin_example.md
--- a/source/advanced_usage/deploy/anakin_tutorial.md
+++ b/source/advanced_usage/deploy/anakin_tutorial.md
--- a/source/advanced_usage/deploy/convert_paddle_to_anakin.md
+++ b/source/advanced_usage/deploy/convert_paddle_to_anakin.md
--- a/source/advanced_usage/deploy/how_to_add_anakin_op.md
+++ b/source/advanced_usage/deploy/how_to_add_anakin_op.md
--- a/source/advanced_usage/deploy/how_to_support_new_device_in_anakin.md
+++ b/source/advanced_usage/deploy/how_to_support_new_device_in_anakin.md
--- a/source/advanced_usage/deploy/install_anakin.md
+++ b/source/advanced_usage/deploy/install_anakin.md
--- a/source/advanced_usage/deploy/mobile_build.md
+++ b/source/advanced_usage/deploy/mobile_build.md
--- a/source/advanced_usage/deploy/mobile_dev.md
+++ b/source/advanced_usage/deploy/mobile_dev.md
--- a/source/advanced_usage/deploy/native_infer.rst
+++ b/source/advanced_usage/deploy/native_infer.rst
--- a/source/advanced_usage/deploy/native_inference_engine.rst
+++ b/source/advanced_usage/deploy/native_inference_engine.rst
--- a/source/advanced_usage/deploy/run_anakin_on_arm.md
+++ b/source/advanced_usage/deploy/run_anakin_on_arm.md
--- a/source/advanced_usage/development/contribute_to_paddle.md
+++ b/source/advanced_usage/development/contribute_to_paddle.md
--- a/source/advanced_usage/development/cpu_profiling_cn.md
+++ b/source/advanced_usage/development/cpu_profiling_cn.md
--- a/source/advanced_usage/development/gpu_profiling_cn.rst
+++ b/source/advanced_usage/development/gpu_profiling_cn.rst
--- a/source/advanced_usage/development/host_memory_profiling_cn.md
+++ b/source/advanced_usage/development/host_memory_profiling_cn.md
--- a/source/advanced_usage/development/new_op.md
+++ b/source/advanced_usage/development/new_op.md
--- a/source/advanced_usage/development/timeline_cn.md
+++ b/source/advanced_usage/development/timeline_cn.md
--- a/source/advanced_usage/development/write_docs.rst
+++ b/source/advanced_usage/development/write_docs.rst
--- a/source/api_guides/high_level/index.rst
+++ b/source/api_guides/high_level/index.rst
--- a/source/api_guides/index.rst
+++ b/source/api_guides/index.rst
--- a/source/api_guides/low_level/executor/executor.rst
+++ b/source/api_guides/low_level/executor/executor.rst
--- a/source/api_guides/low_level/executor/parallel_executor.rst
+++ b/source/api_guides/low_level/executor/parallel_executor.rst
--- a/source/api_guides/low_level/index.rst
+++ b/source/api_guides/low_level/index.rst
--- a/source/api_guides/low_level/layers/activations.rst
+++ b/source/api_guides/low_level/layers/activations.rst
--- a/source/api_guides/low_level/layers/convolution.rst
+++ b/source/api_guides/low_level/layers/convolution.rst
--- a/source/api_guides/low_level/layers/detection.rst
+++ b/source/api_guides/low_level/layers/detection.rst
--- a/source/api_guides/low_level/layers/io.rst
+++ b/source/api_guides/low_level/layers/io.rst
--- a/source/api_guides/low_level/layers/math.rst
+++ b/source/api_guides/low_level/layers/math.rst
--- a/source/api_guides/low_level/layers/metrics.rst
+++ b/source/api_guides/low_level/layers/metrics.rst
--- a/source/api_guides/low_level/layers/pooling.rst
+++ b/source/api_guides/low_level/layers/pooling.rst
--- a/source/api_guides/low_level/layers/preprocessing.rst
+++ b/source/api_guides/low_level/layers/preprocessing.rst
--- a/source/api_guides/low_level/lodtensor.rst
+++ b/source/api_guides/low_level/lodtensor.rst
--- a/source/api_guides/low_level/recordio.rst
+++ b/source/api_guides/low_level/recordio.rst
--- a/source/api_reference/average.rst
+++ b/source/api_reference/average.rst
--- a/source/api_reference/backward.rst
+++ b/source/api_reference/backward.rst
--- a/source/api_reference/clip.rst
+++ b/source/api_reference/clip.rst
--- a/source/api_reference/data
+++ b/source/api_reference/data
--- a/source/api_reference/data_feeder.rst
+++ b/source/api_reference/data_feeder.rst
--- a/source/api_reference/executor.rst
+++ b/source/api_reference/executor.rst
--- a/source/api_reference/fluid.rst
+++ b/source/api_reference/fluid.rst
--- a/source/api_reference/gen_doc.py
+++ b/source/api_reference/gen_doc.py
--- a/source/api_reference/initializer.rst
+++ b/source/api_reference/initializer.rst
--- a/source/api_reference/io.rst
+++ b/source/api_reference/io.rst
--- a/source/api_reference/layers.rst
+++ b/source/api_reference/layers.rst
--- a/source/api_reference/metrics.rst
+++ b/source/api_reference/metrics.rst
--- a/source/api_reference/nets.rst
+++ b/source/api_reference/nets.rst
--- a/source/api_reference/optimizer.rst
+++ b/source/api_reference/optimizer.rst
--- a/source/api_reference/param_attr.rst
+++ b/source/api_reference/param_attr.rst
--- a/source/api_reference/profiler.rst
+++ b/source/api_reference/profiler.rst
--- a/source/api_reference/recordio_writer.rst
+++ b/source/api_reference/recordio_writer.rst
--- a/source/api_reference/regularizer.rst
+++ b/source/api_reference/regularizer.rst
--- a/source/api_reference/transpiler.rst
+++ b/source/api_reference/transpiler.rst
--- a/source/beginners_guide/basics/image_classification/image/cifar.png
+++ b/source/beginners_guide/basics/image_classification/image/cifar.png
--- a/source/beginners_guide/basics/image_classification/image/variations.png
+++ b/source/beginners_guide/basics/image_classification/image/variations.png
--- a/source/beginners_guide/basics/image_classification/index.md
+++ b/source/beginners_guide/basics/image_classification/index.md
--- a/source/beginners_guide/basics/label_semantic_roles/index.md
+++ b/source/beginners_guide/basics/label_semantic_roles/index.md
--- a/source/beginners_guide/basics/machine_translation/index.md
+++ b/source/beginners_guide/basics/machine_translation/index.md
--- a/source/beginners_guide/basics/recommender_system/index.md
+++ b/source/beginners_guide/basics/recommender_system/index.md
--- a/source/beginners_guide/basics/understand_sentiment/index.md
+++ b/source/beginners_guide/basics/understand_sentiment/index.md
--- a/source/beginners_guide/basics/word2vec/index.md
+++ b/source/beginners_guide/basics/word2vec/index.md
--- a/source/beginners_guide/install/install_doc.rst
+++ b/source/beginners_guide/install/install_doc.rst
--- a/source/beginners_guide/quick_start/fit_a_line/README.cn.md
+++ b/source/beginners_guide/quick_start/fit_a_line/README.cn.md
--- a/source/beginners_guide/quick_start/recognize_digits/README.cn.md
+++ b/source/beginners_guide/quick_start/recognize_digits/README.cn.md
--- a/source/beginners_guide/quick_start/recognize_digits/image/conv_layer.png
+++ b/source/beginners_guide/quick_start/recognize_digits/image/conv_layer.png
--- a/source/conf.py
+++ b/source/conf.py
--- a/source/index.rst
+++ b/source/index.rst
--- a/source/locale/en/LC_MESSAGES/advanced_usage/deploy/index.po
+++ b/source/locale/en/LC_MESSAGES/advanced_usage/deploy/index.po
--- a/source/locale/en/LC_MESSAGES/advanced_usage/development/index.po
+++ b/source/locale/en/LC_MESSAGES/advanced_usage/development/index.po
--- a/source/locale/en/LC_MESSAGES/advanced_usage/index.po
+++ b/source/locale/en/LC_MESSAGES/advanced_usage/index.po
--- a/source/locale/en/LC_MESSAGES/api_guides/high_level/index.po
+++ b/source/locale/en/LC_MESSAGES/api_guides/high_level/index.po
--- a/source/locale/en/LC_MESSAGES/api_guides/index.po
+++ b/source/locale/en/LC_MESSAGES/api_guides/index.po
--- a/source/locale/en/LC_MESSAGES/api_guides/low_level/executor/executor.po
+++ b/source/locale/en/LC_MESSAGES/api_guides/low_level/executor/executor.po
--- a/source/locale/en/LC_MESSAGES/api_guides/low_level/executor/parallel_executor.po
+++ b/source/locale/en/LC_MESSAGES/api_guides/low_level/executor/parallel_executor.po
--- a/source/locale/en/LC_MESSAGES/api_guides/low_level/index.po
+++ b/source/locale/en/LC_MESSAGES/api_guides/low_level/index.po
--- a/source/locale/en/LC_MESSAGES/api_guides/low_level/layers/activations.po
+++ b/source/locale/en/LC_MESSAGES/api_guides/low_level/layers/activations.po
--- a/source/locale/en/LC_MESSAGES/api_guides/low_level/layers/convolution.po
+++ b/source/locale/en/LC_MESSAGES/api_guides/low_level/layers/convolution.po
--- a/source/locale/en/LC_MESSAGES/api_guides/low_level/layers/detection.po
+++ b/source/locale/en/LC_MESSAGES/api_guides/low_level/layers/detection.po
--- a/source/locale/en/LC_MESSAGES/api_guides/low_level/layers/io.po
+++ b/source/locale/en/LC_MESSAGES/api_guides/low_level/layers/io.po
--- a/source/locale/en/LC_MESSAGES/api_guides/low_level/layers/math.po
+++ b/source/locale/en/LC_MESSAGES/api_guides/low_level/layers/math.po
--- a/source/locale/en/LC_MESSAGES/api_guides/low_level/layers/metrics.po
+++ b/source/locale/en/LC_MESSAGES/api_guides/low_level/layers/metrics.po
--- a/source/locale/en/LC_MESSAGES/api_guides/low_level/layers/pooling.po
+++ b/source/locale/en/LC_MESSAGES/api_guides/low_level/layers/pooling.po
--- a/source/locale/en/LC_MESSAGES/api_guides/low_level/layers/preprocessing.po
+++ b/source/locale/en/LC_MESSAGES/api_guides/low_level/layers/preprocessing.po
--- a/source/locale/en/LC_MESSAGES/api_reference/clip.po
+++ b/source/locale/en/LC_MESSAGES/api_reference/clip.po
--- a/source/locale/en/LC_MESSAGES/api_reference/data/data_reader.po
+++ b/source/locale/en/LC_MESSAGES/api_reference/data/data_reader.po
--- a/source/locale/en/LC_MESSAGES/api_reference/data/dataset.po
+++ b/source/locale/en/LC_MESSAGES/api_reference/data/dataset.po
--- a/source/locale/en/LC_MESSAGES/api_reference/data/image.po
+++ b/source/locale/en/LC_MESSAGES/api_reference/data/image.po
--- a/source/locale/en/LC_MESSAGES/api_reference/data_feeder.po
+++ b/source/locale/en/LC_MESSAGES/api_reference/data_feeder.po
--- a/source/locale/en/LC_MESSAGES/api_reference/executor.po
+++ b/source/locale/en/LC_MESSAGES/api_reference/executor.po
--- a/source/locale/en/LC_MESSAGES/api_reference/fluid.po
+++ b/source/locale/en/LC_MESSAGES/api_reference/fluid.po
--- a/source/locale/en/LC_MESSAGES/api_reference/index.po
+++ b/source/locale/en/LC_MESSAGES/api_reference/index.po
--- a/source/locale/en/LC_MESSAGES/api_reference/initializer.po
+++ b/source/locale/en/LC_MESSAGES/api_reference/initializer.po
--- a/source/locale/en/LC_MESSAGES/api_reference/io.po
+++ b/source/locale/en/LC_MESSAGES/api_reference/io.po
--- a/source/locale/en/LC_MESSAGES/api_reference/layers.po
+++ b/source/locale/en/LC_MESSAGES/api_reference/layers.po
--- a/source/locale/en/LC_MESSAGES/api_reference/metrics.po
+++ b/source/locale/en/LC_MESSAGES/api_reference/metrics.po
--- a/source/locale/en/LC_MESSAGES/api_reference/nets.po
+++ b/source/locale/en/LC_MESSAGES/api_reference/nets.po
--- a/source/locale/en/LC_MESSAGES/api_reference/optimizer.po
+++ b/source/locale/en/LC_MESSAGES/api_reference/optimizer.po
--- a/source/locale/en/LC_MESSAGES/api_reference/param_attr.po
+++ b/source/locale/en/LC_MESSAGES/api_reference/param_attr.po
--- a/source/locale/en/LC_MESSAGES/api_reference/profiler.po
+++ b/source/locale/en/LC_MESSAGES/api_reference/profiler.po
--- a/source/locale/en/LC_MESSAGES/api_reference/regularizer.po
+++ b/source/locale/en/LC_MESSAGES/api_reference/regularizer.po
--- a/source/locale/en/LC_MESSAGES/faq.po
+++ b/source/locale/en/LC_MESSAGES/faq.po
--- a/source/locale/en/LC_MESSAGES/index.po
+++ b/source/locale/en/LC_MESSAGES/index.po
--- a/source/locale/en/LC_MESSAGES/quick_start/fit_a_line/index.po
+++ b/source/locale/en/LC_MESSAGES/quick_start/fit_a_line/index.po
--- a/source/locale/en/LC_MESSAGES/quick_start/index.po
+++ b/source/locale/en/LC_MESSAGES/quick_start/index.po
--- a/source/locale/en/LC_MESSAGES/quick_start/install/build_from_source_cn.po
+++ b/source/locale/en/LC_MESSAGES/quick_start/install/build_from_source_cn.po
--- a/source/locale/en/LC_MESSAGES/quick_start/install/docker_install_cn.po
+++ b/source/locale/en/LC_MESSAGES/quick_start/install/docker_install_cn.po
--- a/source/locale/en/LC_MESSAGES/quick_start/install/index.po
+++ b/source/locale/en/LC_MESSAGES/quick_start/install/index.po
--- a/source/locale/en/LC_MESSAGES/quick_start/install/pip_install_cn.po
+++ b/source/locale/en/LC_MESSAGES/quick_start/install/pip_install_cn.po
--- a/source/locale/en/LC_MESSAGES/quick_start/quick_start.po
+++ b/source/locale/en/LC_MESSAGES/quick_start/quick_start.po
--- a/source/locale/en/LC_MESSAGES/quick_start/recognize_digits/index.po
+++ b/source/locale/en/LC_MESSAGES/quick_start/recognize_digits/index.po
--- a/source/locale/en/LC_MESSAGES/quick_start/theoretical_background.po
+++ b/source/locale/en/LC_MESSAGES/quick_start/theoretical_background.po
--- a/source/locale/en/LC_MESSAGES/user_guides/howto/index.po
+++ b/source/locale/en/LC_MESSAGES/user_guides/howto/index.po
--- a/source/locale/en/LC_MESSAGES/user_guides/index.po
+++ b/source/locale/en/LC_MESSAGES/user_guides/index.po
--- a/source/locale/en/LC_MESSAGES/user_guides/model_bank/index.po
+++ b/source/locale/en/LC_MESSAGES/user_guides/model_bank/index.po
--- a/source/mobile/foo.rst
+++ b/source/mobile/foo.rst
--- a/source/user_guides/howto/configure_simple_model/index.rst
+++ b/source/user_guides/howto/configure_simple_model/index.rst
--- a/source/user_guides/howto/modification/foo.rst
+++ b/source/user_guides/howto/modification/foo.rst
--- a/source/user_guides/howto/prepare_data/feeding_data.rst
+++ b/source/user_guides/howto/prepare_data/feeding_data.rst
--- a/source/user_guides/howto/prepare_data/index.rst
+++ b/source/user_guides/howto/prepare_data/index.rst
--- a/source/user_guides/howto/prepare_data/use_recordio_reader.rst
+++ b/source/user_guides/howto/prepare_data/use_recordio_reader.rst
--- a/source/user_guides/models/index.rst
+++ b/source/user_guides/models/index.rst