diff --git a/source/user_guides/howto/training/dist_train_howto.md b/source/user_guides/howto/training/dist_train_howto.md
index 1a0df2b8f5a950d6140db8477c6cca48d6a75772..799d315de702fe36153d8a6e3e6af9a844f73cbf 100644
--- a/source/user_guides/howto/training/dist_train_howto.md
+++ b/source/user_guides/howto/training/dist_train_howto.md
@@ -4,7 +4,7 @@
 
 分布式深度学习训练通常分为两种并行化方法：数据并行，模型并行，参考下图：
 
-<img src="parallelism.png">
+<img src="src/parallelism.png">
 
 在模型并行方式下，模型的层和参数将被分布在多个节点上，模型在一个mini-batch的前向和反向训练中，将经过多次跨
 节点之间的通信。每个节点只保存整个模型的一部分；在数据并行方式下，每个节点保存有完整的模型的层和参数，每个节点
@@ -15,8 +15,8 @@
 通信。其中RPC通信方式使用[gRPC](https://github.com/grpc/grpc/)，Collective通信方式使用
 [NCCL2](https://developer.nvidia.com/nccl)。下面是一个RPC通信和Collective通信的横向对比：
 
-| Feature       | Collective    | RPC   |
-| ------------- |:-------------:| -----:|
+| Feature       | Collective    | RPC    |
+| ------------- |:-------------:| :-----:|
 | Ring-Based Comm  | Yes | No |
 | Async Training   | Reduce ranks | Fast, Direct async updates |
 | Dist-Sparse-Table | No      | Yes |
@@ -24,14 +24,53 @@
 | Performance | Faster | Fast |
 
 * RPC通信方式的结构：
-  <img src="">
+  <img src="src/dist_train_pserver.png">
 * NCCL2通信方式的结构：
-  <img src="">
+  <img src="src/dist_train_nccl2.png">
 
 
 ## 使用parameter server方式的训练
 
+使用"trainer" API，程序可以自动的通过识别环境变量决定是否已分布式方式执行，需要在您的分布式环境中配置的环境变量包括：
+
+| Env Variable | Comment |
+| ------------ | ------- |
+| PADDLE_TRAINING_ROLE | role of current node, must be PSERVER or TRAINER |
+| PADDLE_PSERVER_PORT | the port that the parameter servers will bind to |
+| PADDLE_PSERVER_IPS | a comma separated list of parameter server ips or hostname |
+| PADDLE_TRAINERS | number of trainers that in this distributed job |
+| PADDLE_CURRENT_IP | current node ip address |
+| PADDLE_TRAINER_ID | zero based ID for each trainer |
+
+使用更加底层的"transpiler" API可以提供自定义的分布式训练的方法，比如可以在同一台机器上，启动多个pserver和trainer
+进行训练，使用底层API的方法可以参考下面的样例代码：
+
+```python
+role = "PSERVER"
+trainer_id = 0
+pserver_endpoints = "127.0.0.1:6170,127.0.0.1:6171"
+current_endpoint = "127.0.0.1:6170"
+trainers = 4
+t = fluid.DistributeTranspiler()
+t.transpile(trainer_id, pservers=pserver_endpoints, trainers=trainers)
+if role == "PSERVER":
+    pserver_prog = t.get_pserver_program(current_endpoint)
+    pserver_startup = t.get_startup_program(current_endpoint,
+                                            pserver_prog)
+    exe.run(pserver_startup)
+    exe.run(pserver_prog)
+elif role == "TRAINER":
+    train_loop(t.get_trainer_program())
+```
 
 ## 使用NCCL2通信方式的训练
 
+注NCCL2模式目前仅支持"trainer" API，NCCL2方式并没有很多可选项，也没有"transpiler"，所以并没有底层API。
+使用NCCL2方式同样需要配置每个节点的环境变量，此处与parameter server模式有所不同：
 
+| Env Variable | Comment |
+| ------------ | ------- |
+| PADDLE_TRAINER_IPS | comma separated IP list of all trainer nodes |
+| PADDLE_TRAINER_ID | zero based ID for each trainer, aka. "rank" |
+| PADDLE_PSERVER_PORT | a port that will used at initial stage to broadcast the NCCL ID |
+| PADDLE_CURRENT_IP | current IP address of current node |
diff --git a/source/user_guides/howto/training/src/dist_train_nccl2.graffle b/source/user_guides/howto/training/src/dist_train_nccl2.graffle
new file mode 100644
index 0000000000000000000000000000000000000000..d26db8657dbd4661af49723cfaf45c6c3bb931d4
Binary files /dev/null and b/source/user_guides/howto/training/src/dist_train_nccl2.graffle differ
diff --git a/source/user_guides/howto/training/src/dist_train_nccl2.png b/source/user_guides/howto/training/src/dist_train_nccl2.png
new file mode 100644
index 0000000000000000000000000000000000000000..6a35baa06eedb47e88d434803980573db0bf59cc
Binary files /dev/null and b/source/user_guides/howto/training/src/dist_train_nccl2.png differ
diff --git a/source/user_guides/howto/training/src/dist_train_pserver.graffle b/source/user_guides/howto/training/src/dist_train_pserver.graffle
new file mode 100644
index 0000000000000000000000000000000000000000..d65172c5ab84a1e312a679381a9fb9be65c1acba
Binary files /dev/null and b/source/user_guides/howto/training/src/dist_train_pserver.graffle differ
diff --git a/source/user_guides/howto/training/src/dist_train_pserver.png b/source/user_guides/howto/training/src/dist_train_pserver.png
new file mode 100644
index 0000000000000000000000000000000000000000..9f39894dd6bf1853957d6c7612c54b208f0f5e06
Binary files /dev/null and b/source/user_guides/howto/training/src/dist_train_pserver.png differ
diff --git a/source/user_guides/howto/training/src/parallelism.png b/source/user_guides/howto/training/src/parallelism.png
new file mode 100644
index 0000000000000000000000000000000000000000..6c078b5241559a05219447db67b5d8a35aeefd3f
Binary files /dev/null and b/source/user_guides/howto/training/src/parallelism.png differ