diff --git a/doc/fluid/advanced_usage/deploy/inference/native_infer_en.md b/doc/fluid/advanced_usage/deploy/inference/native_infer_en.md
index 77ac2ee61a5bc09984ec0ed2ace5fbf9865654ad..62228dbac2b726fadd4394bee87d4902d18b25a9 100644
--- a/doc/fluid/advanced_usage/deploy/inference/native_infer_en.md
+++ b/doc/fluid/advanced_usage/deploy/inference/native_infer_en.md
@@ -96,7 +96,7 @@ There are two modes in term of memory management in `PaddleBuf` :
 
 In the two modes, the first is more convenient while the second strictly controls memory management to facilitate integration with `tcmalloc` and other libraries.
  
-### Upgrade performance based on contrib::AnalysisConfig (Prerelease)
+### Upgrade performance based on contrib::AnalysisConfig
 
 AnalyisConfig is at the stage of pre-release and protected by `namespace contrib` , which may be adjusted in the future.
 
@@ -106,9 +106,11 @@ The usage of `AnalysisConfig` is similiar with that of `NativeConfig` but the fo
 
 ```c++
 AnalysisConfig config;
-config.model_dir = xxx;
-config.use_gpu = false;  // GPU optimization is not supported at present
-config.specify_input_name = true; // it needs to set name of input
+config.SetModel(dirname);                // set the directory of the model
+config.EnableUseGpu(100, 0 /*gpu id*/);  // use GPU,or
+config.DisableGpu();                     // use CPU
+config.SwitchSpecifyInputNames(true);    // need to appoint the name of your input
+config.SwitchIrOptim();     // turn on the optimization switch,and a sequence of optimizations will be executed in operation                      
 ```
 
 Note that input PaddleTensor needs to be allocated. Previous examples need to be revised as follows:
@@ -147,7 +149,7 @@ For more specific examples, please refer to[LoD-Tensor Instructions](../../../us
 
 1. If the CPU type permits, it's best to use the versions with support for AVX and MKL.
 2. Reuse input and output `PaddleTensor` to avoid frequent memory allocation resulting in low performance
-3. Try to replace `NativeConfig` with `AnalysisConfig` to perform optimization for CPU inference 
+3. Try to replace `NativeConfig` with `AnalysisConfig` to perform optimization for CPU or GPU inference 
 
 ## Code Demo
 
diff --git a/doc/fluid/beginners_guide/install/install_Windows_en.md b/doc/fluid/beginners_guide/install/install_Windows_en.md
index 4002f824f995603b18d15dd3b0e195e555ed7a59..2f60fff40466869665f4ecc5389fd3f46a1c42f1 100644
--- a/doc/fluid/beginners_guide/install/install_Windows_en.md
+++ b/doc/fluid/beginners_guide/install/install_Windows_en.md
@@ -9,9 +9,8 @@ This instruction will show you how to install PaddlePaddle on Windows.  The foll
 
 **Note** : 
 
-* The current version does not support NCCL, distributed training, AVX, warpctc and MKL related functions.
+* The current version does not support NCCL, distributed training related functions.
 
-* Currently, only PaddlePaddle for CPU is supported on Windows.
 
 
 
@@ -30,14 +29,20 @@ Version of pip or pip3 should be equal to or above 9.0.1 .
 
 * Install PaddlePaddle
 
+* ***CPU version of PaddlePaddle***:
 Execute `pip install paddlepaddle` or `pip3 install paddlepaddle` to download and install PaddlePaddle.
 
-
+* ***GPU version of PaddlePaddle***:
+Execute `pip install paddlepaddle-gpu`(python2.7) or `pip3 install paddlepaddle-gpu`(python3.x) to download and install PaddlePaddle.
+ 
 ## ***Verify installation***
 
 After completing the installation, you can use `python` or `python3` to enter the python interpreter and then use `import paddle.fluid` to verify that the installation was successful.
 
 ## ***How to uninstall***
 
+* ***CPU version of PaddlePaddle***:
 Use the following command to uninstall PaddlePaddle : `pip uninstallpaddlepaddle `or `pip3 uninstall paddlepaddle`
 
+* ***GPU version of PaddlePaddle***:
+Use the following command to uninstall PaddlePaddle : `pip uninstall paddlepaddle-gpu` or `pip3 uninstall paddlepaddle-gpu`
diff --git a/doc/fluid/user_guides/howto/training/cluster_howto_en.rst b/doc/fluid/user_guides/howto/training/cluster_howto_en.rst
index 5610ee3cdab7d9102673a80e35a75ab3906f5615..2d329b6ca96e0dd8643f2cf882c7ea3eac10aab4 100644
--- a/doc/fluid/user_guides/howto/training/cluster_howto_en.rst
+++ b/doc/fluid/user_guides/howto/training/cluster_howto_en.rst
@@ -205,6 +205,25 @@ For example:
 Currently, distributed training using NCCL2 only supports synchronous training. The distributed training using NCCL2 mode is more suitable for the model which is relatively large and needs \
 synchronous training and GPU training. If the hardware device supports RDMA and GPU Direct, this can achieve high distributed training performance.
 
+Start Up NCCL2 Distributed Training in Muti-Process Mode
+++++++++++++++++++++++++++++++++++++++++++++++
+
+ Usually you can get better multi-training performance by using multi-process mode to start up NCCL2 distributed training assignment. Paddle provides :code:`paddle.distributed.launch` module to start up multi-process assignment, after which each training process will use an independent GPU device.
+
+Attention during usage:
+
+ * set the number of nodes: set the number of nodes of an assignment by the environment variable :code:`PADDLE_NUM_TRAINERS` , and this variable will also be set in every training process.
+ * set the number of devices of each node: by activating the parameter :code:`--gpus` , you can set the number of GPU devices of each node, and the sequence number of each process will be set in the environment variable :code:`PADDLE_TRAINER_ID` automatically.
+ * data segment: mult-process mode means one process in each device. Generally, each process manages a part of training data, in order to make sure that all processes can manage the whole data set.
+ * entrance file: entrance file is the training script for actual startup.
+ * journal: for each training process, the joural is saved in the default :code:`./mylog` directory, and you can assign by the parameter :code:`--log_dir` .
+
+  startup example:
+
+  .. code-block:: bash
+
+     > PADDLE_NUM_TRAINERS=<TRAINER_COUNT> python -m paddle.distributed.launch train.py --gpus <NUM_GPUS_ON_HOSTS> <ENTRYPOINT_SCRIPT> --arg1 --arg2 ...
+
 Important Notes on NCCL2 Distributed Training
 ++++++++++++++++++++++++++++++++++++++++++++++
 
@@ -215,7 +234,7 @@ exit at the final iteration. There are two common ways:
 - Each node only trains fixed number of batches per pass, which is controlled by python codes. If a node has more data than this fixed amount, then these 
   marginal data will not be trained.
 
-**Note** :  If there are multiple network devices in the system, you need to manually specify the devices used by NCCL2.
+**Note** : If there are multiple network devices in the system, you need to manually specify the devices used by NCCL2.
 
 Assuming you need to use :code:`eth2` as the communication device, you need to set the following environment variables:
 
diff --git a/doc/fluid/user_guides/index_cn.rst b/doc/fluid/user_guides/index_cn.rst
index c64a97f166866009d506503186ac40524fe6b189..9696955c255c8bbea5123634e5773e24a1a2116f 100644
--- a/doc/fluid/user_guides/index_cn.rst
+++ b/doc/fluid/user_guides/index_cn.rst
@@ -15,7 +15,7 @@
     - `训练神经网络 <../user_guides/howto/training/index_cn.html>`_：介绍如何使用 Fluid 进行单机训练、多机训练、以及保存和载入模型变量
 
 
-	- `DyGraph模式 <../user_guides/howto/dygraph/DyGraph.md>`_：介绍在 Fluid 下使用DyGraph       
+    - `DyGraph模式 <../user_guides/howto/dygraph/DyGraph.html>`_：介绍在 Fluid 下使用DyGraph
 
     - `模型评估与调试 <../user_guides/howto/evaluation_and_debugging/index_cn.html>`_：介绍在 Fluid 下进行模型评估和调试的方法，包括：