diff --git a/README.md b/README.md index 4c8f0718aa2a8fd67c222364c01535ca7a3b4417..3cc716e7ed482960909cf66be7d19c9aaa790e25 100644 --- a/README.md +++ b/README.md @@ -1,37 +1,24 @@ +
| Argument- | Optional Values- | Default- | 
|---|---|---|
| ANDROID_ABI- | armeabi-v7a, arm64-v8a- | armeabi-v7a- | 
| ANDROID_API- | >= 16- | 21- | 
| Argument- | Optional Values- | Default- | 
|---|---|---|
| ANDROID_ABI- | armeabi-v7a, arm64-v8a- | armeabi-v7a- | 
| ANDROID_API- | >= 16- | 21- | 
| IOS_PLATFORM- | IOS_ARCH- | 
|---|---|
| OS- | armv7, armv7s, arm64- | 
| SIMULATOR- | i386, x86_64- | 
| IOS_PLATFORM- | IOS_ARCH- | 
|---|---|
| OS- | armv7, armv7s, arm64- | 
| SIMULATOR- | i386, x86_64- | 
 -
-By coordinating these processes, PaddlePaddle supports use both Synchronize Stochastic Gradient Descent (sync SGD) and Asynchronous Stochastic Gradient Descent (async SGD) to train user-defined neural network topologies.
-
-When training with sync SGD, parameter servers wait for all trainers to finish gradients update and then send the updated parameters to trainers, training can not proceed until the trainer received the updated parameters. This creates a synchronization point between trainers. When training with async SGD, each trainer upload gradient and download new parameters individually, without the synchronization with other trainers. Using asyc SGD will be faster in terms of time per pass, but have more noise in gradient since trainers are likely to have a stale model.
-
-### Master Server Process
-
-The master server process will:
-
-- Partition a dataset into [tasks](#task) and dispatch tasks to trainers.
-- Keep track of training progress on the dataset with [task queue](#task-queue). A training job will iterate on the dataset for a full pass until it goes into next pass.
-
-
-#### Task
-
-A task is a data shard to be trained. The total number of tasks will be much bigger than the total number of trainers. The number of data instances inside a task will be much bigger than the mini-batch size.
-
-#### Task Queue
-
-The master server has three task queues to track training progress. As illustrated in the graph below, Job A and Job B both have one master server. Each master server process has three task queues.
-
-
-
-By coordinating these processes, PaddlePaddle supports use both Synchronize Stochastic Gradient Descent (sync SGD) and Asynchronous Stochastic Gradient Descent (async SGD) to train user-defined neural network topologies.
-
-When training with sync SGD, parameter servers wait for all trainers to finish gradients update and then send the updated parameters to trainers, training can not proceed until the trainer received the updated parameters. This creates a synchronization point between trainers. When training with async SGD, each trainer upload gradient and download new parameters individually, without the synchronization with other trainers. Using asyc SGD will be faster in terms of time per pass, but have more noise in gradient since trainers are likely to have a stale model.
-
-### Master Server Process
-
-The master server process will:
-
-- Partition a dataset into [tasks](#task) and dispatch tasks to trainers.
-- Keep track of training progress on the dataset with [task queue](#task-queue). A training job will iterate on the dataset for a full pass until it goes into next pass.
-
-
-#### Task
-
-A task is a data shard to be trained. The total number of tasks will be much bigger than the total number of trainers. The number of data instances inside a task will be much bigger than the mini-batch size.
-
-#### Task Queue
-
-The master server has three task queues to track training progress. As illustrated in the graph below, Job A and Job B both have one master server. Each master server process has three task queues.
-
- -
-- The todo queue holds tasks to be dispatched. When a job starts, the master server fills in the todo queue with all tasks.
-- The pending queue holds tasks that are currently training by trainers.
-- the done queue holds tasks that are already trained.
-
-The life cycle of a single task is illustrated below:
-
-
-
-- The todo queue holds tasks to be dispatched. When a job starts, the master server fills in the todo queue with all tasks.
-- The pending queue holds tasks that are currently training by trainers.
-- the done queue holds tasks that are already trained.
-
-The life cycle of a single task is illustrated below:
-
- -
-1. When a new pass of training starts, all tasks will be placed in the todo queue.
-1. Upon trainer requests for new task, the master server will dispatch a task from todo queue to it, put the task in the pending queue and wait for completion.
-1. The trainer will work on its task and tell the master server once the task is completed and ask for new task. The master server will dispatch a new task to that trainer.
-1. If a task fails for any reason in trainer, or takes longer than a specific period of time,  the master server will move the task back to the todo queue. The timeout count for that task will increase by one. If the timeout count is above a threshold, the task is likely to cause a trainer to crash, then it will be discarded.
-1. The master server will move completed task to the done queue. When the todo queue is empty, the master server will start a new pass by moving all tasks in the done queue to todo queue and reset the timeout counter of all tasks to zero.
-
-### Trainer Process
-
-The trainer process will:
-
-- Request tasks from the master.
-- Work on the tasks
-- Upload gradient to parameter servers, and update local model by downloading new parameters from parameter servers.
-
-### Parameter Server Process
-
-Parameter server processes hold the parameters collaboratively. The parameters are partitioned on different parameter servers.
-
-The parameter server will:
-
-- Receive gradient from the trainers, update its parameters, and give the trainers the latest parameters.
-- Periodically save its parameters to distributed file system by overriding the previous save.
-
-### Optimization Algorithms
-
-The communication pattern between the trainers and the parameter servers depends on the category of optimization algorithm:
-
-- Synchronous Stochastic Gradient Descent (sync-SGD)
-
-	Parameter server will wait for all trainer finish n-th mini-batch calculation and send their gradients before broadcasting new parameters to every trainer. Every trainer will wait for the new parameters before starting n+1-th mini-batch.
-
-- Asynchronous Stochastic Gradient Descent (async-SGD)
-
-	There will no synchronization between different trainers, and parameter server updates its parameter as soon as it receives new gradient:
-
-	- Each trainer uploads its accumulated gradient every n mini-batches.
-	- Every m mini-batches, the trainer downloads new parameters from parameter server.
-	- n and m do not have to be equal.
-
-## Fault Tolerant
-
-The training job will pause if the master server processes is dead, or any of the parameter server process is dead. They will be started by [Kubernetes](https://kubernetes.io/) and recover in few minutes. Please refer to [fault recovery](#fault-recovery).
-
-The training job will continue to make progress if there is at least one training process running. The strategy depends on the type of optimization algorithm:
-
-- sync-SGD
-
-	TODO
-
-- async-SGD
-
-	Since async-SGD does not require synchronization between mini-batches, the system will by definition make process if at least one trainer is running.
-
-## Fault Recovery
-
-PaddlePaddle uses [etcd](https://github.com/coreos/etcd) to keep track of the states of processes. Because etcd is a distributed reliable key-value store, the restarted process can recover its states from etcd. The model parameters are periodically saved into distributed file system, so a restarted parameter server can recover its parameters from the saved file.
-
-Now we will introduce how each process recovers from a failure, the graph below shows how etcd is used:
-
-
-
-1. When a new pass of training starts, all tasks will be placed in the todo queue.
-1. Upon trainer requests for new task, the master server will dispatch a task from todo queue to it, put the task in the pending queue and wait for completion.
-1. The trainer will work on its task and tell the master server once the task is completed and ask for new task. The master server will dispatch a new task to that trainer.
-1. If a task fails for any reason in trainer, or takes longer than a specific period of time,  the master server will move the task back to the todo queue. The timeout count for that task will increase by one. If the timeout count is above a threshold, the task is likely to cause a trainer to crash, then it will be discarded.
-1. The master server will move completed task to the done queue. When the todo queue is empty, the master server will start a new pass by moving all tasks in the done queue to todo queue and reset the timeout counter of all tasks to zero.
-
-### Trainer Process
-
-The trainer process will:
-
-- Request tasks from the master.
-- Work on the tasks
-- Upload gradient to parameter servers, and update local model by downloading new parameters from parameter servers.
-
-### Parameter Server Process
-
-Parameter server processes hold the parameters collaboratively. The parameters are partitioned on different parameter servers.
-
-The parameter server will:
-
-- Receive gradient from the trainers, update its parameters, and give the trainers the latest parameters.
-- Periodically save its parameters to distributed file system by overriding the previous save.
-
-### Optimization Algorithms
-
-The communication pattern between the trainers and the parameter servers depends on the category of optimization algorithm:
-
-- Synchronous Stochastic Gradient Descent (sync-SGD)
-
-	Parameter server will wait for all trainer finish n-th mini-batch calculation and send their gradients before broadcasting new parameters to every trainer. Every trainer will wait for the new parameters before starting n+1-th mini-batch.
-
-- Asynchronous Stochastic Gradient Descent (async-SGD)
-
-	There will no synchronization between different trainers, and parameter server updates its parameter as soon as it receives new gradient:
-
-	- Each trainer uploads its accumulated gradient every n mini-batches.
-	- Every m mini-batches, the trainer downloads new parameters from parameter server.
-	- n and m do not have to be equal.
-
-## Fault Tolerant
-
-The training job will pause if the master server processes is dead, or any of the parameter server process is dead. They will be started by [Kubernetes](https://kubernetes.io/) and recover in few minutes. Please refer to [fault recovery](#fault-recovery).
-
-The training job will continue to make progress if there is at least one training process running. The strategy depends on the type of optimization algorithm:
-
-- sync-SGD
-
-	TODO
-
-- async-SGD
-
-	Since async-SGD does not require synchronization between mini-batches, the system will by definition make process if at least one trainer is running.
-
-## Fault Recovery
-
-PaddlePaddle uses [etcd](https://github.com/coreos/etcd) to keep track of the states of processes. Because etcd is a distributed reliable key-value store, the restarted process can recover its states from etcd. The model parameters are periodically saved into distributed file system, so a restarted parameter server can recover its parameters from the saved file.
-
-Now we will introduce how each process recovers from a failure, the graph below shows how etcd is used:
-
- -
-### Master Server Process
-
-When the master is started by the Kubernetes, it executes the following steps at startup:
-
-1. Grabs a unique *master* lock in etcd, which prevents concurrent master instantiations.
-1. Recovers the task queues from etcd if they already exist, otherwise, the master will create them.
-1. Write its ip address to */master/addr* so that trainers can discover it.
-1. Listens to trainers' request of task, dispatch one upon request, and updates task queue using an etcd transaction to ensure lock is held during the update.
-
-When the master server process is dead for any reason, Kubernetes will restart it. It will be online again with all states recovered from etcd in few minutes.
-
-### Trainer Process
-
-When the trainer is started by the Kubernetes, it executes the following steps at startup:
-
-1. Watches the available parameter server prefix keys `/ps/` on etcd and waits until the count of parameter servers reaches the desired count */ps_desired*.
-1. Finds and watches */master/addr* to get master's address.
-1. Requests for tasks from the master to start training.
-
-When a trainer fails, Kuberentes would try to restart it. The recovered trainer would fetch tasks from master and go on training.
-
-### Parameter Server Process
-
-When the parameter server is started by Kubernetes, it executes the following steps at startup:
-
-1. Read desired total number of parameter servers from etcd `/ps_desired`
-1. Search through etcd keys `/ps/
-
-### Master Server Process
-
-When the master is started by the Kubernetes, it executes the following steps at startup:
-
-1. Grabs a unique *master* lock in etcd, which prevents concurrent master instantiations.
-1. Recovers the task queues from etcd if they already exist, otherwise, the master will create them.
-1. Write its ip address to */master/addr* so that trainers can discover it.
-1. Listens to trainers' request of task, dispatch one upon request, and updates task queue using an etcd transaction to ensure lock is held during the update.
-
-When the master server process is dead for any reason, Kubernetes will restart it. It will be online again with all states recovered from etcd in few minutes.
-
-### Trainer Process
-
-When the trainer is started by the Kubernetes, it executes the following steps at startup:
-
-1. Watches the available parameter server prefix keys `/ps/` on etcd and waits until the count of parameter servers reaches the desired count */ps_desired*.
-1. Finds and watches */master/addr* to get master's address.
-1. Requests for tasks from the master to start training.
-
-When a trainer fails, Kuberentes would try to restart it. The recovered trainer would fetch tasks from master and go on training.
-
-### Parameter Server Process
-
-When the parameter server is started by Kubernetes, it executes the following steps at startup:
-
-1. Read desired total number of parameter servers from etcd `/ps_desired`
-1. Search through etcd keys `/ps/ -
-	The third parameter server joined:
-
-
-
-	The third parameter server joined:
-
-	 -
-1. The parameter server can load parameters if there are already saved parameters in the save path (inferred from its index).
-1. Now the parameter server is ready for the trainers' requests.
-
-If the parameter server's etcd lease expires, the parameter server will kill itself.
-
-
-## Parameter Server Checkpointing
-See [here](./checkpointing.md)
-
-## Store and dispatching trainning data
-See [here](./data_dispatch.md)
-
-
-## Dynamic Scaling
-
-### Trainer Scaling
-
-TODO
-
-### Parameter Server Scaling
-
-Not planned for v1.
-
-## Training Dataset Format
-
-TODO
-
-## User Interface
-
-TODO
diff --git a/doc/v2/design/cluster_train/checkpointing.md b/doc/v2/design/cluster_train/checkpointing.md
deleted file mode 100644
index c87ef2c7d2636208866d05456d5d44316d0bb200..0000000000000000000000000000000000000000
--- a/doc/v2/design/cluster_train/checkpointing.md
+++ /dev/null
@@ -1,44 +0,0 @@
-## 模型参数检查点(Checkpointing)
-模型数据检查点的实现,可以有效的避免parameter server的单点或多点同时故障。模型参数检查点通过定期向磁盘上保存一份存储在parameter server内存中的模型数据的完整镜像,来保证训练过程可以从中间状态重新启动。在一个不可中断并缺少备份的训练任务中,可以通过阶段性的保存每个parameter server的数据快照(snapshot)到 ***分布式存储服务*** 达到容灾的目的,比如每隔10分钟最新的快照,并删除更早的快照。在出现单点故障时,只需要恢复这台节点,或者将这台节点迁移到另一个节点并启动即可恢复训练任务。
-
-
-
-1. The parameter server can load parameters if there are already saved parameters in the save path (inferred from its index).
-1. Now the parameter server is ready for the trainers' requests.
-
-If the parameter server's etcd lease expires, the parameter server will kill itself.
-
-
-## Parameter Server Checkpointing
-See [here](./checkpointing.md)
-
-## Store and dispatching trainning data
-See [here](./data_dispatch.md)
-
-
-## Dynamic Scaling
-
-### Trainer Scaling
-
-TODO
-
-### Parameter Server Scaling
-
-Not planned for v1.
-
-## Training Dataset Format
-
-TODO
-
-## User Interface
-
-TODO
diff --git a/doc/v2/design/cluster_train/checkpointing.md b/doc/v2/design/cluster_train/checkpointing.md
deleted file mode 100644
index c87ef2c7d2636208866d05456d5d44316d0bb200..0000000000000000000000000000000000000000
--- a/doc/v2/design/cluster_train/checkpointing.md
+++ /dev/null
@@ -1,44 +0,0 @@
-## 模型参数检查点(Checkpointing)
-模型数据检查点的实现,可以有效的避免parameter server的单点或多点同时故障。模型参数检查点通过定期向磁盘上保存一份存储在parameter server内存中的模型数据的完整镜像,来保证训练过程可以从中间状态重新启动。在一个不可中断并缺少备份的训练任务中,可以通过阶段性的保存每个parameter server的数据快照(snapshot)到 ***分布式存储服务*** 达到容灾的目的,比如每隔10分钟最新的快照,并删除更早的快照。在出现单点故障时,只需要恢复这台节点,或者将这台节点迁移到另一个节点并启动即可恢复训练任务。
-
- -
-### 快照保存的设计如下:
-
-说明:
-
-* parameter server在集群中启动后,自动挂载分布式存储目录,并把快照保存到这个目录下。
-* ***注:每个parameter server的检查点各自独立保存,暂时不考虑多个parameter server同步的保存一个特定时间点的全局检查点,因为这样做也没法保证消除随机性。***
-
-检查点保存程序流程:
-
-1. 如果满足条件"每隔10分钟"时,parameter server会获取parameters内存的`read_lock`,启动一个新的线程开始保存检查点。如果已经正在执行保存检查点的线程,则忽略。由于对parameters的更新需要获取parameters内存的`write_lock`,所以在写入快照的过程中,parameter server会暂停参数更新并等待。
-2. parameter server生成一个UUID,向指定的目录中一个新的文件(文件名为此UUID)写入快照数据。在快照写入完成后,计算这个文件的MD5 sum。然后在etcd的`/checkpoints/[pserver_id]`中写入json内容:`{"uuid": [UUID], "md5", "MD5 sum", "timestamp": xxxx}`。
-3. 删除磁盘目录中不是当前uuid的快照文件。
-4. 释放对paramters内存的锁定,停止保存检查点的线程。
-
-这里需要用户额外注意,在您的实际环境中,训练任务的运行可能会占满trainer和parameter server之间的网络带宽,如果parameter server此时还需要通过网络访问分布式存储以保存快照,可能会造成网络拥塞,而出现阶段性的运行停滞。
-
-### 从快照恢复
-
-在parameter server第一次启动或任意时间parameter server故障后被Kubernetes重新启动,则需要回滚到上一个检查点:
-
-  1. 从etcd中读取节点:`/checkpoints/[pserver_id]`获取最新的检查点的文件uuid
-  1. 从磁盘文件中加载uuid文件名的检查点快照文件,并加载其中的参数
-  1. 如果上面两步出现错误,则使用启动参数定义的初始化方法初始化参数
-  1. 开始提供服务
-
-## TODO List
-### 推测执行/加速执行(TODO)
-在异构集群中,如果存在某些trainer执行速度过慢会影响整体集群的速度(如图中Trainer 1),此时master将负责启动一个新的Trainer(Accelerate Trainer 2),使用同样的训练数据block。哪个trainer先完成block的训练,则把另一个慢速的kill掉。
-
-### 动态扩容/缩容
-目前只考虑动态扩容trainer数量,可以减小系统复杂性。
-
-## 术语
-* model: 指深度学习训练之后得到的所有参数,使用这个神经网络可以完成对新数据的预测
-* parameters: 神经网络中的参数,包括权重w和偏置b。一个神经网络的模型由大量的参数组成
-* shard: 分片,通常指将一个整体拆分成多份的其中的一份。
-* model shard: 将一个神经网络参数拆分成多份,每个shard分别存储在其中一台parameter server之上
-* parameter block: 多个parameter block构成一个model shard
-* 单点故障: 任意时刻只可能同时有一台服务器故障。由于集群中同时存在两台机器故障的概率极低((平均故障率*平均故障修复时间)^2)只对特殊在线系统考虑两台以上同时故障的容灾。
diff --git a/doc/v2/design/cluster_train/data_dispatch.md b/doc/v2/design/cluster_train/data_dispatch.md
deleted file mode 100644
index 1f5d22ff5e6abcb576d16cbe7391da1967a1ab8e..0000000000000000000000000000000000000000
--- a/doc/v2/design/cluster_train/data_dispatch.md
+++ /dev/null
@@ -1,160 +0,0 @@
-## 训练数据的存储和分发
-
-### 概念解释
-
-### 流程介绍
-生产环境中的训练数据集通常体积很大,并被存储在诸如Hadoop HDFS,Ceph,AWS S3之类的分布式存储之上。这些分布式存储服务通常会把数据切割成多个分片分布式的存储在多个节点之上。这样就可以在云端执行多种数据类计算任务,包括:
-
-* 数据预处理任务
-* Paddle训练任务
-* 在线模型预测服务
-
-
-### 快照保存的设计如下:
-
-说明:
-
-* parameter server在集群中启动后,自动挂载分布式存储目录,并把快照保存到这个目录下。
-* ***注:每个parameter server的检查点各自独立保存,暂时不考虑多个parameter server同步的保存一个特定时间点的全局检查点,因为这样做也没法保证消除随机性。***
-
-检查点保存程序流程:
-
-1. 如果满足条件"每隔10分钟"时,parameter server会获取parameters内存的`read_lock`,启动一个新的线程开始保存检查点。如果已经正在执行保存检查点的线程,则忽略。由于对parameters的更新需要获取parameters内存的`write_lock`,所以在写入快照的过程中,parameter server会暂停参数更新并等待。
-2. parameter server生成一个UUID,向指定的目录中一个新的文件(文件名为此UUID)写入快照数据。在快照写入完成后,计算这个文件的MD5 sum。然后在etcd的`/checkpoints/[pserver_id]`中写入json内容:`{"uuid": [UUID], "md5", "MD5 sum", "timestamp": xxxx}`。
-3. 删除磁盘目录中不是当前uuid的快照文件。
-4. 释放对paramters内存的锁定,停止保存检查点的线程。
-
-这里需要用户额外注意,在您的实际环境中,训练任务的运行可能会占满trainer和parameter server之间的网络带宽,如果parameter server此时还需要通过网络访问分布式存储以保存快照,可能会造成网络拥塞,而出现阶段性的运行停滞。
-
-### 从快照恢复
-
-在parameter server第一次启动或任意时间parameter server故障后被Kubernetes重新启动,则需要回滚到上一个检查点:
-
-  1. 从etcd中读取节点:`/checkpoints/[pserver_id]`获取最新的检查点的文件uuid
-  1. 从磁盘文件中加载uuid文件名的检查点快照文件,并加载其中的参数
-  1. 如果上面两步出现错误,则使用启动参数定义的初始化方法初始化参数
-  1. 开始提供服务
-
-## TODO List
-### 推测执行/加速执行(TODO)
-在异构集群中,如果存在某些trainer执行速度过慢会影响整体集群的速度(如图中Trainer 1),此时master将负责启动一个新的Trainer(Accelerate Trainer 2),使用同样的训练数据block。哪个trainer先完成block的训练,则把另一个慢速的kill掉。
-
-### 动态扩容/缩容
-目前只考虑动态扩容trainer数量,可以减小系统复杂性。
-
-## 术语
-* model: 指深度学习训练之后得到的所有参数,使用这个神经网络可以完成对新数据的预测
-* parameters: 神经网络中的参数,包括权重w和偏置b。一个神经网络的模型由大量的参数组成
-* shard: 分片,通常指将一个整体拆分成多份的其中的一份。
-* model shard: 将一个神经网络参数拆分成多份,每个shard分别存储在其中一台parameter server之上
-* parameter block: 多个parameter block构成一个model shard
-* 单点故障: 任意时刻只可能同时有一台服务器故障。由于集群中同时存在两台机器故障的概率极低((平均故障率*平均故障修复时间)^2)只对特殊在线系统考虑两台以上同时故障的容灾。
diff --git a/doc/v2/design/cluster_train/data_dispatch.md b/doc/v2/design/cluster_train/data_dispatch.md
deleted file mode 100644
index 1f5d22ff5e6abcb576d16cbe7391da1967a1ab8e..0000000000000000000000000000000000000000
--- a/doc/v2/design/cluster_train/data_dispatch.md
+++ /dev/null
@@ -1,160 +0,0 @@
-## 训练数据的存储和分发
-
-### 概念解释
-
-### 流程介绍
-生产环境中的训练数据集通常体积很大,并被存储在诸如Hadoop HDFS,Ceph,AWS S3之类的分布式存储之上。这些分布式存储服务通常会把数据切割成多个分片分布式的存储在多个节点之上。这样就可以在云端执行多种数据类计算任务,包括:
-
-* 数据预处理任务
-* Paddle训练任务
-* 在线模型预测服务
- -
- -
- -
-A dataset is a list of files in *RecordIO* format. A RecordIO file consists of chunks, whereas each chunk consists some records.
-
-## Task Queue
-
-As mentioned in [distributed training design doc](./README.md), a *task* is a data shard that the master server assigns to the trainer process to train on. A task consists of one or multiple *chunks* from one or multiple files. The master server maintains *task queues* to track the training progress.
-
-### Task Queue Creation
-
-1. Each trainer will make an RPC call (using Go's [rpc](https://golang.org/pkg/net/rpc/) package) to the master server, telling it the RecordIO files representing the dataset specified by the user. Since every trainer will tell the master server the same dataset, only the first RPC call will be honored.
-
-	The RPC interface is:
-	```go
-	func (m *RPCServer) ReportDataset(Paths []string, dummy *int) error {
-	}
-	```
-1. The master server will scan through each RecordIO file to generate the *chunk index* and know how many chunks does each file have. A chunk can be referenced by the file path and the index of the chunk within the file. The chunk index is in memory data structure that enables fast access to each chunk, and the index of the chunk with the file is an integer start from 0, representing the n-th chunk within the file.
-
-	The definition of the chunk is:
-	```go
-	type Chunk struct {
-		Idx   int // index of the chunk within the file
-		Path  string
-		Index recordio.Index // chunk index
-	}
-	```
-1. Chunks are grouped into tasks, and tasks are filled into the todo queue. The pending queue and the done queue are initialized with no element.
-
-	The definition of the task is:
-	```go
-	type Task struct {
-		Index  int
-		Chunks []Chunk
-	}
-	```
-
-	The elements in the tasks queues is of type `TaskEntry`, containing a timeout counter (described in [task retry logic](#task-retry-logic)), and a task:
-	```go
-	type TaskEntry struct {
-		NumTimeout int
-		Task       Task
-	}
-	```
-
-	The definition of task queues is:
-	```go
-	type TaskQueues struct {
-		Todo    []TaskEntry
-		Pending map[int]TaskEntry // map from task index to task entry
-		Done    []TaskEntry
-	}
-	```
-
-### Task Queue Persistence
-
-The task queues need to be persisted on [etcd](https://github.com/coreos/etcd) for fault recovery. Since the task queues only change once a task is completed or timed out, which is not very frequent, we can afford to synchronize with etcd every time the task queues change.
-
-We will serialize the task queues data structure with [gob encoding](https://golang.org/pkg/encoding/gob/), compress with gzip, and save into etcd synchronously under key `/task_queues`.
-
-### Task Dispatch
-
-The trainer will make an RPC call to master to get a new task when:
-
-- the trainer first started, or
-- the trainer finishes a task.
-
-The RPC interface is:
-```go
-func (m *RPCServer) GetTask(finished *Task, result *Task) error {
-}
-```
-Argument `finished` will be `nil` when the trainer is just started.
-
-During the RPC call the master will do the following:
-
-- Make a copy of the task queues, and update the copy reflecting the finished tasks and the new pending tasks.
-- Synchronize the copy of task queues with etcd using a transaction conditioned on holding the master lock.
-- Replace the task queues with the copy and report to the trainer with the new tasks if succeeded, or discard the copy and report the error to the trainer if failed.
-
-### Task Retry Logic
-
-When a task is dispatched to the trainer, the master will schedule a function for execution after the timeout duration (based on the moving average of task completion time). If the task entry in still in the pending queue, its timeout counter will increase by one, and the task will be moved to todo queue. If the timeout counter is above the threshold, the master will log the error and discard the task.
-
-Please note that since a timed out task could be completed after it has been dispatched for retry, so it is possible for a task to be processed multiple times. We do not try to prevent it from happening since it's fine to train on the same task multiple times due to the stochastic nature of the stochastic gradient decent algorithm.
diff --git a/doc/v2/design/cluster_train/pserver_client.md b/doc/v2/design/cluster_train/pserver_client.md
deleted file mode 100644
index 474b8c572cd92fc87e9f7f3f2b19d12cccd158de..0000000000000000000000000000000000000000
--- a/doc/v2/design/cluster_train/pserver_client.md
+++ /dev/null
@@ -1,171 +0,0 @@
-# Design Doc: The Client Library of Parameter Server
-
-For an overview of trainer's role, please refer to [distributed training design doc](README.md). In this design doc, we will discuss the parameter server's client library, which will manage communication with parameter servers. The library will be implemented in [Go](https://golang.org/) and made available as a static or dynamic library with a C header file.
-
-## Parameter Partition
-
-Each parameter will be partitioned into parameter blocks to make the parameters evenly distributed on parameter servers. The partition is done automatically by the client library. The *sparse parameter* require a little different treatment:
-
-### Sparse Parameter
-
-The sparse parameter is a parameter that is updated sparsely. The name is somewhat misleading, it does not have a sparse representation, it has the same representation as a dense vector.
-
-Because a sparse parameter is updated sparsely, the trainer will have to partition the sparse parameter. Because the parameter server will merge all sparse parameter shard into the same file when saving the parameter. It needs special naming convention:
-
-If a sparse parameter is partitioned into n shards, they should be named as:
-
-```text
-name:sparse-0
-name:sparse-1
-...
-name:sparse-n-1
-```
-
-The library is unaware of the partition, and treat each parameter independently. Only when saving parameters, the parameter servers will merge the sparse parameters according to the naming convention.
-
-## Model Optimization Using Gradients
-
-There are two ways to perform model optimization using gradients:
-
-- On Client
-
-  The client does multiple steps of forward and backward update. In each step, the gradients are calculated and a new model is generated. After some steps, the client will calculate the difference between the newest model and the old model at step 0. The difference will be updated to parameter servers. Parameter servers will just update parameters using the difference without any optimization using gradients (such as Adam and L1 regularization).
-
-- On Parameter Server
-
-  The client will send accumulated gradients to parameter servers, the parameter server will do the optimization using gradients.
-
-## L1 and L2 Regularization
-
-PaddlePaddle allows L1 or L2 regularizations to be specified per parameter, so when the trainer initializes the parameter it needs include a parameter configuration when L1 or L2 regularization is necessary.
-
-## Parameter Initialization
-
-The parameters on parameter servers need to be initialized. To provide maximum flexibility, the trainer will initialize the parameters. Only one trainer will do the initialization, the other trainers will wait for the completion of initialization and get the parameters from the parameter servers.
-
-### Trainer Selection
-
-To select the trainer for initialization, every trainer will try to get a distributed lock, whoever owns the lock will do the initialization. As illustrated below:
-
-
-
-A dataset is a list of files in *RecordIO* format. A RecordIO file consists of chunks, whereas each chunk consists some records.
-
-## Task Queue
-
-As mentioned in [distributed training design doc](./README.md), a *task* is a data shard that the master server assigns to the trainer process to train on. A task consists of one or multiple *chunks* from one or multiple files. The master server maintains *task queues* to track the training progress.
-
-### Task Queue Creation
-
-1. Each trainer will make an RPC call (using Go's [rpc](https://golang.org/pkg/net/rpc/) package) to the master server, telling it the RecordIO files representing the dataset specified by the user. Since every trainer will tell the master server the same dataset, only the first RPC call will be honored.
-
-	The RPC interface is:
-	```go
-	func (m *RPCServer) ReportDataset(Paths []string, dummy *int) error {
-	}
-	```
-1. The master server will scan through each RecordIO file to generate the *chunk index* and know how many chunks does each file have. A chunk can be referenced by the file path and the index of the chunk within the file. The chunk index is in memory data structure that enables fast access to each chunk, and the index of the chunk with the file is an integer start from 0, representing the n-th chunk within the file.
-
-	The definition of the chunk is:
-	```go
-	type Chunk struct {
-		Idx   int // index of the chunk within the file
-		Path  string
-		Index recordio.Index // chunk index
-	}
-	```
-1. Chunks are grouped into tasks, and tasks are filled into the todo queue. The pending queue and the done queue are initialized with no element.
-
-	The definition of the task is:
-	```go
-	type Task struct {
-		Index  int
-		Chunks []Chunk
-	}
-	```
-
-	The elements in the tasks queues is of type `TaskEntry`, containing a timeout counter (described in [task retry logic](#task-retry-logic)), and a task:
-	```go
-	type TaskEntry struct {
-		NumTimeout int
-		Task       Task
-	}
-	```
-
-	The definition of task queues is:
-	```go
-	type TaskQueues struct {
-		Todo    []TaskEntry
-		Pending map[int]TaskEntry // map from task index to task entry
-		Done    []TaskEntry
-	}
-	```
-
-### Task Queue Persistence
-
-The task queues need to be persisted on [etcd](https://github.com/coreos/etcd) for fault recovery. Since the task queues only change once a task is completed or timed out, which is not very frequent, we can afford to synchronize with etcd every time the task queues change.
-
-We will serialize the task queues data structure with [gob encoding](https://golang.org/pkg/encoding/gob/), compress with gzip, and save into etcd synchronously under key `/task_queues`.
-
-### Task Dispatch
-
-The trainer will make an RPC call to master to get a new task when:
-
-- the trainer first started, or
-- the trainer finishes a task.
-
-The RPC interface is:
-```go
-func (m *RPCServer) GetTask(finished *Task, result *Task) error {
-}
-```
-Argument `finished` will be `nil` when the trainer is just started.
-
-During the RPC call the master will do the following:
-
-- Make a copy of the task queues, and update the copy reflecting the finished tasks and the new pending tasks.
-- Synchronize the copy of task queues with etcd using a transaction conditioned on holding the master lock.
-- Replace the task queues with the copy and report to the trainer with the new tasks if succeeded, or discard the copy and report the error to the trainer if failed.
-
-### Task Retry Logic
-
-When a task is dispatched to the trainer, the master will schedule a function for execution after the timeout duration (based on the moving average of task completion time). If the task entry in still in the pending queue, its timeout counter will increase by one, and the task will be moved to todo queue. If the timeout counter is above the threshold, the master will log the error and discard the task.
-
-Please note that since a timed out task could be completed after it has been dispatched for retry, so it is possible for a task to be processed multiple times. We do not try to prevent it from happening since it's fine to train on the same task multiple times due to the stochastic nature of the stochastic gradient decent algorithm.
diff --git a/doc/v2/design/cluster_train/pserver_client.md b/doc/v2/design/cluster_train/pserver_client.md
deleted file mode 100644
index 474b8c572cd92fc87e9f7f3f2b19d12cccd158de..0000000000000000000000000000000000000000
--- a/doc/v2/design/cluster_train/pserver_client.md
+++ /dev/null
@@ -1,171 +0,0 @@
-# Design Doc: The Client Library of Parameter Server
-
-For an overview of trainer's role, please refer to [distributed training design doc](README.md). In this design doc, we will discuss the parameter server's client library, which will manage communication with parameter servers. The library will be implemented in [Go](https://golang.org/) and made available as a static or dynamic library with a C header file.
-
-## Parameter Partition
-
-Each parameter will be partitioned into parameter blocks to make the parameters evenly distributed on parameter servers. The partition is done automatically by the client library. The *sparse parameter* require a little different treatment:
-
-### Sparse Parameter
-
-The sparse parameter is a parameter that is updated sparsely. The name is somewhat misleading, it does not have a sparse representation, it has the same representation as a dense vector.
-
-Because a sparse parameter is updated sparsely, the trainer will have to partition the sparse parameter. Because the parameter server will merge all sparse parameter shard into the same file when saving the parameter. It needs special naming convention:
-
-If a sparse parameter is partitioned into n shards, they should be named as:
-
-```text
-name:sparse-0
-name:sparse-1
-...
-name:sparse-n-1
-```
-
-The library is unaware of the partition, and treat each parameter independently. Only when saving parameters, the parameter servers will merge the sparse parameters according to the naming convention.
-
-## Model Optimization Using Gradients
-
-There are two ways to perform model optimization using gradients:
-
-- On Client
-
-  The client does multiple steps of forward and backward update. In each step, the gradients are calculated and a new model is generated. After some steps, the client will calculate the difference between the newest model and the old model at step 0. The difference will be updated to parameter servers. Parameter servers will just update parameters using the difference without any optimization using gradients (such as Adam and L1 regularization).
-
-- On Parameter Server
-
-  The client will send accumulated gradients to parameter servers, the parameter server will do the optimization using gradients.
-
-## L1 and L2 Regularization
-
-PaddlePaddle allows L1 or L2 regularizations to be specified per parameter, so when the trainer initializes the parameter it needs include a parameter configuration when L1 or L2 regularization is necessary.
-
-## Parameter Initialization
-
-The parameters on parameter servers need to be initialized. To provide maximum flexibility, the trainer will initialize the parameters. Only one trainer will do the initialization, the other trainers will wait for the completion of initialization and get the parameters from the parameter servers.
-
-### Trainer Selection
-
-To select the trainer for initialization, every trainer will try to get a distributed lock, whoever owns the lock will do the initialization. As illustrated below:
-
- -
-### Trainer Selection Process
-
-The trainer select process is encapsulated in the C API function:
-```c
-int paddle_begin_init_params(paddle_pserver_client* client, const char* config_proto);
-```
-The selected trainer's call to `paddle_begin_init_params` will return with 1, and the other trainers' call to `paddle_begin_init_params` will return 0. `paddle_get_params` will be blocked until initialization is completed. As illustrated below:
-
-
-
-### Trainer Selection Process
-
-The trainer select process is encapsulated in the C API function:
-```c
-int paddle_begin_init_params(paddle_pserver_client* client, const char* config_proto);
-```
-The selected trainer's call to `paddle_begin_init_params` will return with 1, and the other trainers' call to `paddle_begin_init_params` will return 0. `paddle_get_params` will be blocked until initialization is completed. As illustrated below:
-
- -
-## C Interface
-
-```c
-typedef enum {
-  PADDLE_ELEMENT_TYPE_INT32   = 0,
-  PADDLE_ELEMENT_TYPE_UINT32  = 1,
-  PADDLE_ELEMENT_TYPE_INT64   = 2,
-  PADDLE_ELEMENT_TYPE_UINT64  = 3,
-  PADDLE_ELEMENT_TYPE_FLOAT32 = 4,
-  PADDLE_ELEMENT_TYPE_FLOAT64 = 5,
-} paddle_element_type;
-
-typedef struct {
-  char*               name;
-  paddle_element_type element_type;
-  unsigned char*      content;
-  int                 content_len;
-} paddle_parameter, paddle_gradient;
-
-typedef int paddle_pserver_client;
-
-/**
- * @brief creates a pserver client that talks to etcd for coordination.
- */
-paddle_pserver_client paddle_new_etcd_pserver_client(char* etcd_addr);
-
-/**
- * @brief creates a pserver client given pserver addresses.
- *
- * @param pserver_addrs comma-separated pserver addresses.
- * @param selected if current pserver client is selected to initialize all parameter servers.
- */
-paddle_pserver_client paddle_new_pserver_client(char* pserver_addrs, int selected);
-void paddle_pserver_client_release(paddle_pserver_client c);
-
-/**
- * @brief paddle_begin_init_params begins to initialize parameters on
- * parameter servers.
- *
- * paddle_begin_init_params will be called from multiple trainers,
- * only one trainer will be selected to initialize the parameters on
- * parameter servers. Other trainers need to get the initialized
- * parameters from parameter servers using @paddle_get_params.
- *
- * @return 1 if the trainer is selected to initialize parameter
- * servers, otherwise 0.
- */
-int paddle_begin_init_params(paddle_pserver_client client);
-
-/**
- * @brief paddle_init_param initializes the parameter on parameter
- * servers.
- *
- * @param param the parameter to initialize.
- * @param param_config_proto the configuration for the parameter.
- * @param config_len the length of param_config_proto
- * @return 0 if successful, otherwise -1. On failure, the trainer
- * needs to restart the entire initialization process (starting from
- * @paddle_begin_init_param). Or simply exit the program and wait for
- * the cluster management system to restart the trainer.
- */
-int paddle_init_param(paddle_pserver_client client, paddle_parameter param, const unsigned char* param_config_proto, int config_len);
-
-/**
- * @brief paddle_finish_init_params tells parameter servers client has
- * sent all parameters to parameter servers as initialization.
- *
- * @return 0 if successful, otherwise -1. On failure, the trainer
- * needs to restart the entire initialization process (starting from
- * @paddle_begin_init_param). Or simply exit the program and wait for
- * the cluster management system to restart the trainer.
- */
-int paddle_finish_init_params(paddle_pserver_client client);
-
-/**
- * @brief paddle_send_grads sends gradients to parameter servers for
- * updating parameters.
- *
- * @param grads the array of gradients to send.
- * @param len the length of the gradient array.
- * @param learning_rate the learning rate for the gradients.
- * @return 0 if successful, otherwise -1.
- */
-int paddle_send_grads(paddle_pserver_client client, const paddle_gradient* grads, int len);
-
-/**
- * @brief paddle_get_params gets parameters from parameter servers.
- *
- * paddle_get_params will block until parameters are initialized on
- * the parameter servers.
- *
- * @param dst the destination array of parameter pointers to save to.
- * The parameter pointer must be pre-popullated with required parameter name,
- * and the content of parameter must be pre-allocated of the size of required
- * parameter on pserver.
- * @param len the length of the names array and the paddle_parameter
- * array.
- * @return 0 if successful, otherwise -1.
- */
-int paddle_get_params(paddle_pserver_client client, paddle_parameter** dst, int len);
-
-/**
- * @brief paddle_save_model indicates parameters to save the parameter
- * to the given path
- *
- * @param path the path to save parameters.
- * @return 0 if successful, otherwise -1.
- */
-int paddle_save_model(paddle_pserver_client client, const char* path);
-```
diff --git a/doc/v2/design/cluster_train/remote_parameter_updater.md b/doc/v2/design/cluster_train/remote_parameter_updater.md
deleted file mode 100644
index 6e8e5938455b869e0f3367794c41250340b37f77..0000000000000000000000000000000000000000
--- a/doc/v2/design/cluster_train/remote_parameter_updater.md
+++ /dev/null
@@ -1,21 +0,0 @@
-# Design Doc: Remote Parameter Updater for Cluster Train
-
-For an overview of distribute training, please refer to [distributed training design doc](README.md). In this design doc, we will discuss the parameter updater that will use parameter server cclient [The Client Library of Parameter Server Design Doc](pserver_client.md) to manage and update parameters.
-
-## Parameter Updater
-
-Parameter Updater is used by trainer to manage and update parameter, there are mainly two kind of parameter updater: local and remote, since this design is for cluster train, we will only discuss remote parameter updater here.
-
-### Remote Parameter Updater
-
-Remote Parameter Updater manage parameters through remote parameter server with the client that communicate with pserver([The Client Library of Parameter Server Design Doc](pserver_client.md))
-
-In PaddlePaddle Python V2 API, trainer is implemented in python, and the trainer will hold a instance of parameter updater and call it's functions directly. In this design, we will also expose the api of RemoteParameterUpdater to python with swig.
-
-#### Sparse Remote Parameter Updater
-
-Since we will only implement dense parameter management new, the mechanism for sparse parameter will be discussed in next stage.
-
-### Interface Design
-
-TBD
diff --git a/doc/v2/design/cluster_train/save_model.md b/doc/v2/design/cluster_train/save_model.md
deleted file mode 100644
index b755185c81ad617b9c85c47de0f5f65d2201c658..0000000000000000000000000000000000000000
--- a/doc/v2/design/cluster_train/save_model.md
+++ /dev/null
@@ -1,111 +0,0 @@
-# Design Doc: Save Model
-
-## Overview
-
-The model is the output of the training process. There are two
-ways from which user can obtain a model:
-
-- Save model triggered by user code: user code asks PaddlePaddle to
-  save a model.
-- Convert model from the checkpoint: model being converted from
-  pservers' periodic checkpoint. In this way, the user can cancel a
-  job at any time, and still have a relatively fresh model (we
-  checkpoint around every 5 minutes).
-
-### Trainer Saving Model vs. Pservers Saving Model
-
-Both trainers and pservers have access to the model. So the model can
-be saved from a trainer or pservers. We need to decide where the model
-is saved from.
-
-#### Dense Update vs. Sparse Update
-
-There are two types of model update methods: dense update and sparse
-update (when the model parameter is configured to be sparse).
-
-- Dense update
-
-  Every trainer has it's own full copy of the model. Every model
-  update will update the entire model.
-
-- Sparse update
-
-  The training input is sparse, and the trainer does not have the
-  entire model. It will only download the sub-model necessary related
-  to the input. When updating the model, only the sub-model related to
-  the training input is updated.
-
-
-#### Pservers Saving Model
-
-The benefit of letting pservers save model is they have the entire
-model all the time. However, since pservers are on different nodes, it
-requires a merging process to merge model shards into the same
-model. Thus requires the pservers to write models to a distributed
-filesystem, making the checkpoint shards visible to the merge program.
-
-#### Trainer Saving Model
-
-The benefit of letting one trainer to save the model is it does not
-require a distributed filesystem. And it's reusing the same save model
-logic when training locally - except when doing sparse update, the
-trainer needs to download the entire model during the saving process.
-
-#### Conclusion
-
-Given trainer saving model does not require a distributed filesystem,
-and is an intuitive extension to trainer saving model when training
-locally, we decide to let the trainer save the model when doing
-distributed training.
-
-
-### Convert Model from Checkpoint
-
-TODO
-
-
-## Timeline
-
-We first implement trainer save the model. Converting the latest
-snapshot to a model will be a TODO for future.
-
-
-## Trainer Save Model
-
-### Trainer Election
-
-One trainer will be elected as the one to save the model. When using
-etcd, trainer ID is a randomly generated UUID, the trainer will
-contact the master server requesting to save the model, and find out
-if itself is elected. When the master server is not used, unique
-trainer IDs will be given by the administrator, the trainer whose ID
-is "0" is elected to save the model.
-
-### Model Save Path
-
-Each trainer will be given the directory to save the model. The
-elected trainer will save the model to
-`given-directory/trainerID`. Since the trainer ID is unique, this
-would prevent concurrent save to the same file when multiple trainers
-are elected to save the model when split-brain problem happens.
-
-### What Happens When Model Is Saving
-
-It takes some time to save model, we need to define what will happen
-when save model is taking place.
-
-When doing dense update, the trainer uses the local model. Pservers
-does not need to pause model update.
-
-When doing sparse update. The trainer needs to download the entire
-model while saving. To get the most accurate model, the model update
-needs to be paused before the download starts and resumed after the
-download finishes. Otherwise, the trainer gets a model that is
-"polluted": some part of the model is old, some part of the model is
-new.
-
-It's unclear that the "polluted" model will be inferior due to the
-stochastic nature of deep learning, and pausing the model update will
-add more complexity to the system. Since supporting sparse update is a
-TODO item. We defer the evaluation of pause the model update or not
-during saving model to the future.
diff --git a/doc/v2/design/cluster_train/src/checkpointing.png b/doc/v2/design/cluster_train/src/checkpointing.png
deleted file mode 100644
index c221e8474f90f37e31416cbb19c9452207a0d14c..0000000000000000000000000000000000000000
Binary files a/doc/v2/design/cluster_train/src/checkpointing.png and /dev/null differ
diff --git a/doc/v2/design/cluster_train/src/data_dispatch.png b/doc/v2/design/cluster_train/src/data_dispatch.png
deleted file mode 100644
index 5bdcc24d6a6d193cb014f8c38b362451fded5e54..0000000000000000000000000000000000000000
Binary files a/doc/v2/design/cluster_train/src/data_dispatch.png and /dev/null differ
diff --git a/doc/v2/design/cluster_train/src/dataset.graffle b/doc/v2/design/cluster_train/src/dataset.graffle
deleted file mode 100644
index c10a423ed16a23229a9ee33d11bfc82bb59646c8..0000000000000000000000000000000000000000
Binary files a/doc/v2/design/cluster_train/src/dataset.graffle and /dev/null differ
diff --git a/doc/v2/design/cluster_train/src/dataset.png b/doc/v2/design/cluster_train/src/dataset.png
deleted file mode 100644
index 2fb7f1cce3b6dd21489392557826e95a9f207c34..0000000000000000000000000000000000000000
Binary files a/doc/v2/design/cluster_train/src/dataset.png and /dev/null differ
diff --git a/doc/v2/design/cluster_train/src/file_storage.graffle b/doc/v2/design/cluster_train/src/file_storage.graffle
deleted file mode 100644
index 50a17e70fa255495337c529a3bf12a5c0024a5be..0000000000000000000000000000000000000000
Binary files a/doc/v2/design/cluster_train/src/file_storage.graffle and /dev/null differ
diff --git a/doc/v2/design/cluster_train/src/file_storage.png b/doc/v2/design/cluster_train/src/file_storage.png
deleted file mode 100644
index fccb4e3e7e738224c7f1584326bd5f351ce799aa..0000000000000000000000000000000000000000
Binary files a/doc/v2/design/cluster_train/src/file_storage.png and /dev/null differ
diff --git a/doc/v2/design/cluster_train/src/init_lock.graffle b/doc/v2/design/cluster_train/src/init_lock.graffle
deleted file mode 100644
index fa9149f21b1311eed48ef72ec55e556559d0fc94..0000000000000000000000000000000000000000
Binary files a/doc/v2/design/cluster_train/src/init_lock.graffle and /dev/null differ
diff --git a/doc/v2/design/cluster_train/src/init_lock.png b/doc/v2/design/cluster_train/src/init_lock.png
deleted file mode 100644
index 92404ee6d6c0f9a7727952bae3c869ba338ecd7f..0000000000000000000000000000000000000000
Binary files a/doc/v2/design/cluster_train/src/init_lock.png and /dev/null differ
diff --git a/doc/v2/design/cluster_train/src/paddle-cloud-in-data-center.png b/doc/v2/design/cluster_train/src/paddle-cloud-in-data-center.png
deleted file mode 100644
index da5d1a77562480ad1d886f5f21dbd84001d3d508..0000000000000000000000000000000000000000
Binary files a/doc/v2/design/cluster_train/src/paddle-cloud-in-data-center.png and /dev/null differ
diff --git a/doc/v2/design/cluster_train/src/paddle-etcd.graffle b/doc/v2/design/cluster_train/src/paddle-etcd.graffle
deleted file mode 100644
index f973dc9b9dbf72e9bc31e2d32822916cd281f8d9..0000000000000000000000000000000000000000
Binary files a/doc/v2/design/cluster_train/src/paddle-etcd.graffle and /dev/null differ
diff --git a/doc/v2/design/cluster_train/src/paddle-etcd.png b/doc/v2/design/cluster_train/src/paddle-etcd.png
deleted file mode 100644
index 57981ceb4b94f0f7d6dfa63f3d28c0402bf9cc31..0000000000000000000000000000000000000000
Binary files a/doc/v2/design/cluster_train/src/paddle-etcd.png and /dev/null differ
diff --git a/doc/v2/design/cluster_train/src/paddle-model-sharding.graffle b/doc/v2/design/cluster_train/src/paddle-model-sharding.graffle
deleted file mode 100644
index fba30f0ca2b47f0d202a432821d95e55aac37ec8..0000000000000000000000000000000000000000
Binary files a/doc/v2/design/cluster_train/src/paddle-model-sharding.graffle and /dev/null differ
diff --git a/doc/v2/design/cluster_train/src/paddle-model-sharding.png b/doc/v2/design/cluster_train/src/paddle-model-sharding.png
deleted file mode 100644
index 8c3f6724ef46c6527e63a4cd8cb0b50fe0167124..0000000000000000000000000000000000000000
Binary files a/doc/v2/design/cluster_train/src/paddle-model-sharding.png and /dev/null differ
diff --git a/doc/v2/design/cluster_train/src/paddle-ps-0.png b/doc/v2/design/cluster_train/src/paddle-ps-0.png
deleted file mode 100644
index 47ef32806f182cab003da77f1556823b3f6d1721..0000000000000000000000000000000000000000
Binary files a/doc/v2/design/cluster_train/src/paddle-ps-0.png and /dev/null differ
diff --git a/doc/v2/design/cluster_train/src/paddle-ps-1.png b/doc/v2/design/cluster_train/src/paddle-ps-1.png
deleted file mode 100644
index f3125db73096c52bac6e7c60e1675552857c0774..0000000000000000000000000000000000000000
Binary files a/doc/v2/design/cluster_train/src/paddle-ps-1.png and /dev/null differ
diff --git a/doc/v2/design/cluster_train/src/paddle-ps.graffle b/doc/v2/design/cluster_train/src/paddle-ps.graffle
deleted file mode 100644
index 0e536ffdd91cd696008b4c01bad3cb53edebdc16..0000000000000000000000000000000000000000
Binary files a/doc/v2/design/cluster_train/src/paddle-ps.graffle and /dev/null differ
diff --git a/doc/v2/design/cluster_train/src/paddle-task-queues.graffle b/doc/v2/design/cluster_train/src/paddle-task-queues.graffle
deleted file mode 100644
index 4263ed8bfd2ef0e55058828bf23f2fac3595e5fd..0000000000000000000000000000000000000000
Binary files a/doc/v2/design/cluster_train/src/paddle-task-queues.graffle and /dev/null differ
diff --git a/doc/v2/design/cluster_train/src/paddle-task-queues.png b/doc/v2/design/cluster_train/src/paddle-task-queues.png
deleted file mode 100644
index 5f980266795776752cebd0c346b85c4a75a47780..0000000000000000000000000000000000000000
Binary files a/doc/v2/design/cluster_train/src/paddle-task-queues.png and /dev/null differ
diff --git a/doc/v2/design/cluster_train/src/paddle-task-states.graffle b/doc/v2/design/cluster_train/src/paddle-task-states.graffle
deleted file mode 100644
index cf1a0b9246d9386a949d2dbb8c32fe84f72eea83..0000000000000000000000000000000000000000
Binary files a/doc/v2/design/cluster_train/src/paddle-task-states.graffle and /dev/null differ
diff --git a/doc/v2/design/cluster_train/src/paddle-task-states.png b/doc/v2/design/cluster_train/src/paddle-task-states.png
deleted file mode 100644
index 4ae43cb66c071aee9eb90d875e2373b29af9c3e0..0000000000000000000000000000000000000000
Binary files a/doc/v2/design/cluster_train/src/paddle-task-states.png and /dev/null differ
diff --git a/doc/v2/design/cluster_train/src/pserver_init.graffle b/doc/v2/design/cluster_train/src/pserver_init.graffle
deleted file mode 100644
index 5f3f1f52be8aa7f9049a8fcd6b7c93c8560c1676..0000000000000000000000000000000000000000
Binary files a/doc/v2/design/cluster_train/src/pserver_init.graffle and /dev/null differ
diff --git a/doc/v2/design/cluster_train/src/pserver_init.png b/doc/v2/design/cluster_train/src/pserver_init.png
deleted file mode 100644
index dfe491ff98dd7db1c336093c80964a260df2cd90..0000000000000000000000000000000000000000
Binary files a/doc/v2/design/cluster_train/src/pserver_init.png and /dev/null differ
diff --git a/doc/v2/design/cluster_train/src/submit-job.graffle b/doc/v2/design/cluster_train/src/submit-job.graffle
deleted file mode 100644
index 677cdfb6d9a32168bf71729eb841fa1ca0dd31d6..0000000000000000000000000000000000000000
Binary files a/doc/v2/design/cluster_train/src/submit-job.graffle and /dev/null differ
diff --git a/doc/v2/design/cluster_train/src/submit-job.png b/doc/v2/design/cluster_train/src/submit-job.png
deleted file mode 100644
index 3046a460a7ba708079e88a560debaa215a694680..0000000000000000000000000000000000000000
Binary files a/doc/v2/design/cluster_train/src/submit-job.png and /dev/null differ
diff --git a/doc/v2/design/cluster_train/src/trainer.graffle b/doc/v2/design/cluster_train/src/trainer.graffle
deleted file mode 100644
index 43415ed8cf61a5acfa34f8e56b9577f338dbf254..0000000000000000000000000000000000000000
Binary files a/doc/v2/design/cluster_train/src/trainer.graffle and /dev/null differ
diff --git a/doc/v2/design/cluster_train/src/trainer.png b/doc/v2/design/cluster_train/src/trainer.png
deleted file mode 100644
index 6537d3d56589ca9f19a77a50a970e4b5275e6ce0..0000000000000000000000000000000000000000
Binary files a/doc/v2/design/cluster_train/src/trainer.png and /dev/null differ
diff --git a/doc/v2/design/cluster_train/submit-job.md b/doc/v2/design/cluster_train/submit-job.md
deleted file mode 100644
index 8377d5489dc64bd2fdc5bb4f7bc737e7b489000d..0000000000000000000000000000000000000000
--- a/doc/v2/design/cluster_train/submit-job.md
+++ /dev/null
@@ -1,127 +0,0 @@
-# Submit a Distributed Training Job
-
-The user can submit a distributed training job with Python code, rather than with a command-line interface.
-
-## Runtime Environment On Kubernetes
-
-For a distributed training job, there is two Docker image called *runtime Docker image* and *base Docker image*. The runtime Docker image is the Docker image that gets scheduled by Kubernetes to run during training. The base Docker image is for building the runtime Docker image.
-
-### Base Docker Image
-
-Usually, the base Docker image is PaddlePaddle product Docker image including paddle binary files and python package. And of course, users can specify any image name hosted on any docker registry which users have the access right.
-
-### Runtime Docker Image
-
-The trainer package which user upload and some Python dependencies are packaged into a runtime Docker image based on base Docker image.
-
-- Handle Python Dependencies
-
-  You need to provide requirements.txt file in your `trainer-package` folder. Example:
-
-  ```txt
-  pillow
-  protobuf==3.1.0
-  ```
-  More [details](https://pip.readthedocs.io/en/1.1/requirements.html) about requirements, an example project looks like:
-  ```bash
-    paddle_example
-      |-quick_start
-        |-trainer.py
-        |-dataset.py
-        |-requirements.txt
-  ```
-
-## Submit Distributed Training Job With Python Code
-
-
-## C Interface
-
-```c
-typedef enum {
-  PADDLE_ELEMENT_TYPE_INT32   = 0,
-  PADDLE_ELEMENT_TYPE_UINT32  = 1,
-  PADDLE_ELEMENT_TYPE_INT64   = 2,
-  PADDLE_ELEMENT_TYPE_UINT64  = 3,
-  PADDLE_ELEMENT_TYPE_FLOAT32 = 4,
-  PADDLE_ELEMENT_TYPE_FLOAT64 = 5,
-} paddle_element_type;
-
-typedef struct {
-  char*               name;
-  paddle_element_type element_type;
-  unsigned char*      content;
-  int                 content_len;
-} paddle_parameter, paddle_gradient;
-
-typedef int paddle_pserver_client;
-
-/**
- * @brief creates a pserver client that talks to etcd for coordination.
- */
-paddle_pserver_client paddle_new_etcd_pserver_client(char* etcd_addr);
-
-/**
- * @brief creates a pserver client given pserver addresses.
- *
- * @param pserver_addrs comma-separated pserver addresses.
- * @param selected if current pserver client is selected to initialize all parameter servers.
- */
-paddle_pserver_client paddle_new_pserver_client(char* pserver_addrs, int selected);
-void paddle_pserver_client_release(paddle_pserver_client c);
-
-/**
- * @brief paddle_begin_init_params begins to initialize parameters on
- * parameter servers.
- *
- * paddle_begin_init_params will be called from multiple trainers,
- * only one trainer will be selected to initialize the parameters on
- * parameter servers. Other trainers need to get the initialized
- * parameters from parameter servers using @paddle_get_params.
- *
- * @return 1 if the trainer is selected to initialize parameter
- * servers, otherwise 0.
- */
-int paddle_begin_init_params(paddle_pserver_client client);
-
-/**
- * @brief paddle_init_param initializes the parameter on parameter
- * servers.
- *
- * @param param the parameter to initialize.
- * @param param_config_proto the configuration for the parameter.
- * @param config_len the length of param_config_proto
- * @return 0 if successful, otherwise -1. On failure, the trainer
- * needs to restart the entire initialization process (starting from
- * @paddle_begin_init_param). Or simply exit the program and wait for
- * the cluster management system to restart the trainer.
- */
-int paddle_init_param(paddle_pserver_client client, paddle_parameter param, const unsigned char* param_config_proto, int config_len);
-
-/**
- * @brief paddle_finish_init_params tells parameter servers client has
- * sent all parameters to parameter servers as initialization.
- *
- * @return 0 if successful, otherwise -1. On failure, the trainer
- * needs to restart the entire initialization process (starting from
- * @paddle_begin_init_param). Or simply exit the program and wait for
- * the cluster management system to restart the trainer.
- */
-int paddle_finish_init_params(paddle_pserver_client client);
-
-/**
- * @brief paddle_send_grads sends gradients to parameter servers for
- * updating parameters.
- *
- * @param grads the array of gradients to send.
- * @param len the length of the gradient array.
- * @param learning_rate the learning rate for the gradients.
- * @return 0 if successful, otherwise -1.
- */
-int paddle_send_grads(paddle_pserver_client client, const paddle_gradient* grads, int len);
-
-/**
- * @brief paddle_get_params gets parameters from parameter servers.
- *
- * paddle_get_params will block until parameters are initialized on
- * the parameter servers.
- *
- * @param dst the destination array of parameter pointers to save to.
- * The parameter pointer must be pre-popullated with required parameter name,
- * and the content of parameter must be pre-allocated of the size of required
- * parameter on pserver.
- * @param len the length of the names array and the paddle_parameter
- * array.
- * @return 0 if successful, otherwise -1.
- */
-int paddle_get_params(paddle_pserver_client client, paddle_parameter** dst, int len);
-
-/**
- * @brief paddle_save_model indicates parameters to save the parameter
- * to the given path
- *
- * @param path the path to save parameters.
- * @return 0 if successful, otherwise -1.
- */
-int paddle_save_model(paddle_pserver_client client, const char* path);
-```
diff --git a/doc/v2/design/cluster_train/remote_parameter_updater.md b/doc/v2/design/cluster_train/remote_parameter_updater.md
deleted file mode 100644
index 6e8e5938455b869e0f3367794c41250340b37f77..0000000000000000000000000000000000000000
--- a/doc/v2/design/cluster_train/remote_parameter_updater.md
+++ /dev/null
@@ -1,21 +0,0 @@
-# Design Doc: Remote Parameter Updater for Cluster Train
-
-For an overview of distribute training, please refer to [distributed training design doc](README.md). In this design doc, we will discuss the parameter updater that will use parameter server cclient [The Client Library of Parameter Server Design Doc](pserver_client.md) to manage and update parameters.
-
-## Parameter Updater
-
-Parameter Updater is used by trainer to manage and update parameter, there are mainly two kind of parameter updater: local and remote, since this design is for cluster train, we will only discuss remote parameter updater here.
-
-### Remote Parameter Updater
-
-Remote Parameter Updater manage parameters through remote parameter server with the client that communicate with pserver([The Client Library of Parameter Server Design Doc](pserver_client.md))
-
-In PaddlePaddle Python V2 API, trainer is implemented in python, and the trainer will hold a instance of parameter updater and call it's functions directly. In this design, we will also expose the api of RemoteParameterUpdater to python with swig.
-
-#### Sparse Remote Parameter Updater
-
-Since we will only implement dense parameter management new, the mechanism for sparse parameter will be discussed in next stage.
-
-### Interface Design
-
-TBD
diff --git a/doc/v2/design/cluster_train/save_model.md b/doc/v2/design/cluster_train/save_model.md
deleted file mode 100644
index b755185c81ad617b9c85c47de0f5f65d2201c658..0000000000000000000000000000000000000000
--- a/doc/v2/design/cluster_train/save_model.md
+++ /dev/null
@@ -1,111 +0,0 @@
-# Design Doc: Save Model
-
-## Overview
-
-The model is the output of the training process. There are two
-ways from which user can obtain a model:
-
-- Save model triggered by user code: user code asks PaddlePaddle to
-  save a model.
-- Convert model from the checkpoint: model being converted from
-  pservers' periodic checkpoint. In this way, the user can cancel a
-  job at any time, and still have a relatively fresh model (we
-  checkpoint around every 5 minutes).
-
-### Trainer Saving Model vs. Pservers Saving Model
-
-Both trainers and pservers have access to the model. So the model can
-be saved from a trainer or pservers. We need to decide where the model
-is saved from.
-
-#### Dense Update vs. Sparse Update
-
-There are two types of model update methods: dense update and sparse
-update (when the model parameter is configured to be sparse).
-
-- Dense update
-
-  Every trainer has it's own full copy of the model. Every model
-  update will update the entire model.
-
-- Sparse update
-
-  The training input is sparse, and the trainer does not have the
-  entire model. It will only download the sub-model necessary related
-  to the input. When updating the model, only the sub-model related to
-  the training input is updated.
-
-
-#### Pservers Saving Model
-
-The benefit of letting pservers save model is they have the entire
-model all the time. However, since pservers are on different nodes, it
-requires a merging process to merge model shards into the same
-model. Thus requires the pservers to write models to a distributed
-filesystem, making the checkpoint shards visible to the merge program.
-
-#### Trainer Saving Model
-
-The benefit of letting one trainer to save the model is it does not
-require a distributed filesystem. And it's reusing the same save model
-logic when training locally - except when doing sparse update, the
-trainer needs to download the entire model during the saving process.
-
-#### Conclusion
-
-Given trainer saving model does not require a distributed filesystem,
-and is an intuitive extension to trainer saving model when training
-locally, we decide to let the trainer save the model when doing
-distributed training.
-
-
-### Convert Model from Checkpoint
-
-TODO
-
-
-## Timeline
-
-We first implement trainer save the model. Converting the latest
-snapshot to a model will be a TODO for future.
-
-
-## Trainer Save Model
-
-### Trainer Election
-
-One trainer will be elected as the one to save the model. When using
-etcd, trainer ID is a randomly generated UUID, the trainer will
-contact the master server requesting to save the model, and find out
-if itself is elected. When the master server is not used, unique
-trainer IDs will be given by the administrator, the trainer whose ID
-is "0" is elected to save the model.
-
-### Model Save Path
-
-Each trainer will be given the directory to save the model. The
-elected trainer will save the model to
-`given-directory/trainerID`. Since the trainer ID is unique, this
-would prevent concurrent save to the same file when multiple trainers
-are elected to save the model when split-brain problem happens.
-
-### What Happens When Model Is Saving
-
-It takes some time to save model, we need to define what will happen
-when save model is taking place.
-
-When doing dense update, the trainer uses the local model. Pservers
-does not need to pause model update.
-
-When doing sparse update. The trainer needs to download the entire
-model while saving. To get the most accurate model, the model update
-needs to be paused before the download starts and resumed after the
-download finishes. Otherwise, the trainer gets a model that is
-"polluted": some part of the model is old, some part of the model is
-new.
-
-It's unclear that the "polluted" model will be inferior due to the
-stochastic nature of deep learning, and pausing the model update will
-add more complexity to the system. Since supporting sparse update is a
-TODO item. We defer the evaluation of pause the model update or not
-during saving model to the future.
diff --git a/doc/v2/design/cluster_train/src/checkpointing.png b/doc/v2/design/cluster_train/src/checkpointing.png
deleted file mode 100644
index c221e8474f90f37e31416cbb19c9452207a0d14c..0000000000000000000000000000000000000000
Binary files a/doc/v2/design/cluster_train/src/checkpointing.png and /dev/null differ
diff --git a/doc/v2/design/cluster_train/src/data_dispatch.png b/doc/v2/design/cluster_train/src/data_dispatch.png
deleted file mode 100644
index 5bdcc24d6a6d193cb014f8c38b362451fded5e54..0000000000000000000000000000000000000000
Binary files a/doc/v2/design/cluster_train/src/data_dispatch.png and /dev/null differ
diff --git a/doc/v2/design/cluster_train/src/dataset.graffle b/doc/v2/design/cluster_train/src/dataset.graffle
deleted file mode 100644
index c10a423ed16a23229a9ee33d11bfc82bb59646c8..0000000000000000000000000000000000000000
Binary files a/doc/v2/design/cluster_train/src/dataset.graffle and /dev/null differ
diff --git a/doc/v2/design/cluster_train/src/dataset.png b/doc/v2/design/cluster_train/src/dataset.png
deleted file mode 100644
index 2fb7f1cce3b6dd21489392557826e95a9f207c34..0000000000000000000000000000000000000000
Binary files a/doc/v2/design/cluster_train/src/dataset.png and /dev/null differ
diff --git a/doc/v2/design/cluster_train/src/file_storage.graffle b/doc/v2/design/cluster_train/src/file_storage.graffle
deleted file mode 100644
index 50a17e70fa255495337c529a3bf12a5c0024a5be..0000000000000000000000000000000000000000
Binary files a/doc/v2/design/cluster_train/src/file_storage.graffle and /dev/null differ
diff --git a/doc/v2/design/cluster_train/src/file_storage.png b/doc/v2/design/cluster_train/src/file_storage.png
deleted file mode 100644
index fccb4e3e7e738224c7f1584326bd5f351ce799aa..0000000000000000000000000000000000000000
Binary files a/doc/v2/design/cluster_train/src/file_storage.png and /dev/null differ
diff --git a/doc/v2/design/cluster_train/src/init_lock.graffle b/doc/v2/design/cluster_train/src/init_lock.graffle
deleted file mode 100644
index fa9149f21b1311eed48ef72ec55e556559d0fc94..0000000000000000000000000000000000000000
Binary files a/doc/v2/design/cluster_train/src/init_lock.graffle and /dev/null differ
diff --git a/doc/v2/design/cluster_train/src/init_lock.png b/doc/v2/design/cluster_train/src/init_lock.png
deleted file mode 100644
index 92404ee6d6c0f9a7727952bae3c869ba338ecd7f..0000000000000000000000000000000000000000
Binary files a/doc/v2/design/cluster_train/src/init_lock.png and /dev/null differ
diff --git a/doc/v2/design/cluster_train/src/paddle-cloud-in-data-center.png b/doc/v2/design/cluster_train/src/paddle-cloud-in-data-center.png
deleted file mode 100644
index da5d1a77562480ad1d886f5f21dbd84001d3d508..0000000000000000000000000000000000000000
Binary files a/doc/v2/design/cluster_train/src/paddle-cloud-in-data-center.png and /dev/null differ
diff --git a/doc/v2/design/cluster_train/src/paddle-etcd.graffle b/doc/v2/design/cluster_train/src/paddle-etcd.graffle
deleted file mode 100644
index f973dc9b9dbf72e9bc31e2d32822916cd281f8d9..0000000000000000000000000000000000000000
Binary files a/doc/v2/design/cluster_train/src/paddle-etcd.graffle and /dev/null differ
diff --git a/doc/v2/design/cluster_train/src/paddle-etcd.png b/doc/v2/design/cluster_train/src/paddle-etcd.png
deleted file mode 100644
index 57981ceb4b94f0f7d6dfa63f3d28c0402bf9cc31..0000000000000000000000000000000000000000
Binary files a/doc/v2/design/cluster_train/src/paddle-etcd.png and /dev/null differ
diff --git a/doc/v2/design/cluster_train/src/paddle-model-sharding.graffle b/doc/v2/design/cluster_train/src/paddle-model-sharding.graffle
deleted file mode 100644
index fba30f0ca2b47f0d202a432821d95e55aac37ec8..0000000000000000000000000000000000000000
Binary files a/doc/v2/design/cluster_train/src/paddle-model-sharding.graffle and /dev/null differ
diff --git a/doc/v2/design/cluster_train/src/paddle-model-sharding.png b/doc/v2/design/cluster_train/src/paddle-model-sharding.png
deleted file mode 100644
index 8c3f6724ef46c6527e63a4cd8cb0b50fe0167124..0000000000000000000000000000000000000000
Binary files a/doc/v2/design/cluster_train/src/paddle-model-sharding.png and /dev/null differ
diff --git a/doc/v2/design/cluster_train/src/paddle-ps-0.png b/doc/v2/design/cluster_train/src/paddle-ps-0.png
deleted file mode 100644
index 47ef32806f182cab003da77f1556823b3f6d1721..0000000000000000000000000000000000000000
Binary files a/doc/v2/design/cluster_train/src/paddle-ps-0.png and /dev/null differ
diff --git a/doc/v2/design/cluster_train/src/paddle-ps-1.png b/doc/v2/design/cluster_train/src/paddle-ps-1.png
deleted file mode 100644
index f3125db73096c52bac6e7c60e1675552857c0774..0000000000000000000000000000000000000000
Binary files a/doc/v2/design/cluster_train/src/paddle-ps-1.png and /dev/null differ
diff --git a/doc/v2/design/cluster_train/src/paddle-ps.graffle b/doc/v2/design/cluster_train/src/paddle-ps.graffle
deleted file mode 100644
index 0e536ffdd91cd696008b4c01bad3cb53edebdc16..0000000000000000000000000000000000000000
Binary files a/doc/v2/design/cluster_train/src/paddle-ps.graffle and /dev/null differ
diff --git a/doc/v2/design/cluster_train/src/paddle-task-queues.graffle b/doc/v2/design/cluster_train/src/paddle-task-queues.graffle
deleted file mode 100644
index 4263ed8bfd2ef0e55058828bf23f2fac3595e5fd..0000000000000000000000000000000000000000
Binary files a/doc/v2/design/cluster_train/src/paddle-task-queues.graffle and /dev/null differ
diff --git a/doc/v2/design/cluster_train/src/paddle-task-queues.png b/doc/v2/design/cluster_train/src/paddle-task-queues.png
deleted file mode 100644
index 5f980266795776752cebd0c346b85c4a75a47780..0000000000000000000000000000000000000000
Binary files a/doc/v2/design/cluster_train/src/paddle-task-queues.png and /dev/null differ
diff --git a/doc/v2/design/cluster_train/src/paddle-task-states.graffle b/doc/v2/design/cluster_train/src/paddle-task-states.graffle
deleted file mode 100644
index cf1a0b9246d9386a949d2dbb8c32fe84f72eea83..0000000000000000000000000000000000000000
Binary files a/doc/v2/design/cluster_train/src/paddle-task-states.graffle and /dev/null differ
diff --git a/doc/v2/design/cluster_train/src/paddle-task-states.png b/doc/v2/design/cluster_train/src/paddle-task-states.png
deleted file mode 100644
index 4ae43cb66c071aee9eb90d875e2373b29af9c3e0..0000000000000000000000000000000000000000
Binary files a/doc/v2/design/cluster_train/src/paddle-task-states.png and /dev/null differ
diff --git a/doc/v2/design/cluster_train/src/pserver_init.graffle b/doc/v2/design/cluster_train/src/pserver_init.graffle
deleted file mode 100644
index 5f3f1f52be8aa7f9049a8fcd6b7c93c8560c1676..0000000000000000000000000000000000000000
Binary files a/doc/v2/design/cluster_train/src/pserver_init.graffle and /dev/null differ
diff --git a/doc/v2/design/cluster_train/src/pserver_init.png b/doc/v2/design/cluster_train/src/pserver_init.png
deleted file mode 100644
index dfe491ff98dd7db1c336093c80964a260df2cd90..0000000000000000000000000000000000000000
Binary files a/doc/v2/design/cluster_train/src/pserver_init.png and /dev/null differ
diff --git a/doc/v2/design/cluster_train/src/submit-job.graffle b/doc/v2/design/cluster_train/src/submit-job.graffle
deleted file mode 100644
index 677cdfb6d9a32168bf71729eb841fa1ca0dd31d6..0000000000000000000000000000000000000000
Binary files a/doc/v2/design/cluster_train/src/submit-job.graffle and /dev/null differ
diff --git a/doc/v2/design/cluster_train/src/submit-job.png b/doc/v2/design/cluster_train/src/submit-job.png
deleted file mode 100644
index 3046a460a7ba708079e88a560debaa215a694680..0000000000000000000000000000000000000000
Binary files a/doc/v2/design/cluster_train/src/submit-job.png and /dev/null differ
diff --git a/doc/v2/design/cluster_train/src/trainer.graffle b/doc/v2/design/cluster_train/src/trainer.graffle
deleted file mode 100644
index 43415ed8cf61a5acfa34f8e56b9577f338dbf254..0000000000000000000000000000000000000000
Binary files a/doc/v2/design/cluster_train/src/trainer.graffle and /dev/null differ
diff --git a/doc/v2/design/cluster_train/src/trainer.png b/doc/v2/design/cluster_train/src/trainer.png
deleted file mode 100644
index 6537d3d56589ca9f19a77a50a970e4b5275e6ce0..0000000000000000000000000000000000000000
Binary files a/doc/v2/design/cluster_train/src/trainer.png and /dev/null differ
diff --git a/doc/v2/design/cluster_train/submit-job.md b/doc/v2/design/cluster_train/submit-job.md
deleted file mode 100644
index 8377d5489dc64bd2fdc5bb4f7bc737e7b489000d..0000000000000000000000000000000000000000
--- a/doc/v2/design/cluster_train/submit-job.md
+++ /dev/null
@@ -1,127 +0,0 @@
-# Submit a Distributed Training Job
-
-The user can submit a distributed training job with Python code, rather than with a command-line interface.
-
-## Runtime Environment On Kubernetes
-
-For a distributed training job, there is two Docker image called *runtime Docker image* and *base Docker image*. The runtime Docker image is the Docker image that gets scheduled by Kubernetes to run during training. The base Docker image is for building the runtime Docker image.
-
-### Base Docker Image
-
-Usually, the base Docker image is PaddlePaddle product Docker image including paddle binary files and python package. And of course, users can specify any image name hosted on any docker registry which users have the access right.
-
-### Runtime Docker Image
-
-The trainer package which user upload and some Python dependencies are packaged into a runtime Docker image based on base Docker image.
-
-- Handle Python Dependencies
-
-  You need to provide requirements.txt file in your `trainer-package` folder. Example:
-
-  ```txt
-  pillow
-  protobuf==3.1.0
-  ```
-  More [details](https://pip.readthedocs.io/en/1.1/requirements.html) about requirements, an example project looks like:
-  ```bash
-    paddle_example
-      |-quick_start
-        |-trainer.py
-        |-dataset.py
-        |-requirements.txt
-  ```
-
-## Submit Distributed Training Job With Python Code
- -
-- `paddle.job.dist_train()` will call the Job Server API `/v1/packages` to upload the trainer package and save them on CephFS, and then call `/v1/trainer/job` to submit the PaddlePaddle distributed job.
-- `/v1/trainer/job` will start a building job for preparing the runtime Docker image. When the building job is finished, Job Server will submit the PaddlePaddle distributed job to Kubernetes.
-- *NOTE*: For the first version, we will not prepare the runtime Docker image, instead, the package is uploaded to Paddle Cloud, and Paddle Cloud will mount the package in a temporary folder into the base Docker image. We will not support custom Python dependencies in the first version as well.
-
-You can call `paddle.job.dist_train` and provide distributed training configuration as the parameters:
-```python
-paddle.job.dist_train(
-  trainer=dist_trainer(),
-  paddle_job=PaddleJob(
-    job_name = "paddle-cloud",
-    entry_point = "python %s"%__file__,
-    trainer_package = "/example/word2vec",
-    image = "yancey1989/paddle-job",
-    trainers = 10,
-    pservers = 3,
-    trainer_cpu = 1,
-    trainer_gpu = 1,
-    trainer_mem = "10G",
-    pserver_cpu = 1,
-    pserver_mem = "2G"
-  ))
-```
-
-The parameter `trainer` of `paddle.job.dist_train` is a function and you can implement it as follows:
-```python
-def dist_trainer():
-  def trainer_creator():
-    trainer = paddle.v2.trainer.SGD(...)
-    trainer.train(...)
-  return trainer_creator
-```
-
-The pseudo code of `paddle.job.dist_train` is as follows:
-```python
-def dist_train(trainer, paddle_job):
-  # if the code is running on cloud, set PADDLE_ON_CLOUD=YES
-  if os.getenv("RUNNING_ON_CLOUD", "NO") == "NO":
-    #submit the paddle job
-    paddle_job.submit()
-  else:
-    #start the training
-    trainer()
-```
-### PaddleJob Parameters
-parameter | type | explanation
- --- | --- | ---
-job_name | str | the unique name for the training job
-entry_point | str | entry point for startup trainer process
-trainer_package | str | trainer package file path which user have the access right
-image|str|the [base image](#base-docker-image) for building the [runtime image](#runtime-docker-image)
-pservers|int| Parameter Server process count
-trainers|int| Trainer process count
-pserver_cpu|int| CPU count for each Parameter Server process
-pserver_mem|str| memory allocated for each Parameter Server process, a plain integer using one of these suffixes: E, P, T, G, M, K
-trainer_cpu|int| CPU count for each Trainer process
-trainer_mem|str| memory allocated for each Trainer process, a plain integer using one of these suffixes: E, P, T, G, M, K
-trainer_gpu|int| GPU count for each Trainer process, if you only want CPU, do not set this parameter
-
-### Deploy Parameter Server, Trainer and Master Process
-  - Deploy PaddlePaddle Parameter Server processes, it's a Kubernetes ReplicaSet.
-  - Deploy PaddlePaddle Trainer processes, it's a Kubernetes Job.
-  - Deploy PaddlePaddle Master processes, it's a Kubernetes ReplicaSet.
-
-## Job Server
-
-- RESTful API
-
-  Job server provides RESTful HTTP API for receiving the trainer package and displaying
-  PaddlePaddle job related informations.
-  - `POST   /v1/package` receive the trainer package and save them on CephFS
-  - `POST   /v1/trainer/job` submit a trainer job
-  - `GET    /v1/jobs/` list all jobs
-  - `GET    /v1/jobs/
-
-- `paddle.job.dist_train()` will call the Job Server API `/v1/packages` to upload the trainer package and save them on CephFS, and then call `/v1/trainer/job` to submit the PaddlePaddle distributed job.
-- `/v1/trainer/job` will start a building job for preparing the runtime Docker image. When the building job is finished, Job Server will submit the PaddlePaddle distributed job to Kubernetes.
-- *NOTE*: For the first version, we will not prepare the runtime Docker image, instead, the package is uploaded to Paddle Cloud, and Paddle Cloud will mount the package in a temporary folder into the base Docker image. We will not support custom Python dependencies in the first version as well.
-
-You can call `paddle.job.dist_train` and provide distributed training configuration as the parameters:
-```python
-paddle.job.dist_train(
-  trainer=dist_trainer(),
-  paddle_job=PaddleJob(
-    job_name = "paddle-cloud",
-    entry_point = "python %s"%__file__,
-    trainer_package = "/example/word2vec",
-    image = "yancey1989/paddle-job",
-    trainers = 10,
-    pservers = 3,
-    trainer_cpu = 1,
-    trainer_gpu = 1,
-    trainer_mem = "10G",
-    pserver_cpu = 1,
-    pserver_mem = "2G"
-  ))
-```
-
-The parameter `trainer` of `paddle.job.dist_train` is a function and you can implement it as follows:
-```python
-def dist_trainer():
-  def trainer_creator():
-    trainer = paddle.v2.trainer.SGD(...)
-    trainer.train(...)
-  return trainer_creator
-```
-
-The pseudo code of `paddle.job.dist_train` is as follows:
-```python
-def dist_train(trainer, paddle_job):
-  # if the code is running on cloud, set PADDLE_ON_CLOUD=YES
-  if os.getenv("RUNNING_ON_CLOUD", "NO") == "NO":
-    #submit the paddle job
-    paddle_job.submit()
-  else:
-    #start the training
-    trainer()
-```
-### PaddleJob Parameters
-parameter | type | explanation
- --- | --- | ---
-job_name | str | the unique name for the training job
-entry_point | str | entry point for startup trainer process
-trainer_package | str | trainer package file path which user have the access right
-image|str|the [base image](#base-docker-image) for building the [runtime image](#runtime-docker-image)
-pservers|int| Parameter Server process count
-trainers|int| Trainer process count
-pserver_cpu|int| CPU count for each Parameter Server process
-pserver_mem|str| memory allocated for each Parameter Server process, a plain integer using one of these suffixes: E, P, T, G, M, K
-trainer_cpu|int| CPU count for each Trainer process
-trainer_mem|str| memory allocated for each Trainer process, a plain integer using one of these suffixes: E, P, T, G, M, K
-trainer_gpu|int| GPU count for each Trainer process, if you only want CPU, do not set this parameter
-
-### Deploy Parameter Server, Trainer and Master Process
-  - Deploy PaddlePaddle Parameter Server processes, it's a Kubernetes ReplicaSet.
-  - Deploy PaddlePaddle Trainer processes, it's a Kubernetes Job.
-  - Deploy PaddlePaddle Master processes, it's a Kubernetes ReplicaSet.
-
-## Job Server
-
-- RESTful API
-
-  Job server provides RESTful HTTP API for receiving the trainer package and displaying
-  PaddlePaddle job related informations.
-  - `POST   /v1/package` receive the trainer package and save them on CephFS
-  - `POST   /v1/trainer/job` submit a trainer job
-  - `GET    /v1/jobs/` list all jobs
-  - `GET    /v1/jobs/
| Name- | Open Source- | License- | Descriptions- | 
|---|---|---|---|
| MKL- | No- | Proprietary- | Accelerate math processing routines- | 
| MKLML- | No- | Proprietary- | Small package of MKL, especially for Machine Learning- | 
| MKL-DNN- | Yes- | Apache 2.0- | Accelerate primitives processing routines especially for Deep Neural Networks- | 




 -
-选择目标分支:
-
-
-
-选择目标分支:
-
- -
-在 PR 的描述说明中,填写 `resolve #Issue编号` 可以在这个 PR 被 merge 后,自动关闭对应的 Issue,具体请见
-
-在 PR 的描述说明中,填写 `resolve #Issue编号` 可以在这个 PR 被 merge 后,自动关闭对应的 Issue,具体请见  -
-也可以使用 `git push origin :分支名` 删除远程分支,如:
-
-```bash
-➜  git push origin :my-cool-stuff
-```
-
-## 删除本地分支
-
-最后,删除本地分支。
-
-```bash
-# 切换到 develop 分支
-➜  git checkout develop 
-
-# 删除 my-cool-stuff 分支
-➜  git branch -D my-cool-stuff
-```
-
-至此,我们就完成了一次代码贡献的过程。
-
-## 提交代码的一些约定
-
-为了使评审人在评审代码时更好地专注于代码本身,请您每次提交代码时,遵守以下约定:
-
-1. 请保证Travis-CI 中单元测试能顺利通过。如果没过,说明提交的代码存在问题,评审人一般不做评审。
-2. 提交PUll Request前:
-   - 请注意commit的数量:
-     - 原因:如果仅仅修改一个文件但提交了十几个commit,每个commit只做了少量的修改,这会给评审人带来很大困扰。评审人需要逐一查看每个commit才能知道做了哪些修改,且不排除commit之间的修改存在相互覆盖的情况。
-     - 建议:每次提交时,保持尽量少的commit,可以通过`git commit --amend`补充上次的commit。对已经Push到远程仓库的多个commit,可以参考[squash commits after push](http://stackoverflow.com/questions/5667884/how-to-squash-commits-in-git-after-they-have-been-pushed)。
-   - 请注意每个commit的名称:应能反映当前commit的内容,不能太随意。
-3. 如果解决了某个Issue的问题,请在该PUll Request的**第一个**评论框中加上:`fix #issue_number`,这样当该PUll Request被合并后,会自动关闭对应的Issue。关键词包括:close, closes, closed, fix, fixes, fixed, resolve, resolves, resolved,请选择合适的词汇。详细可参考[Closing issues via commit messages](https://help.github.com/articles/closing-issues-via-commit-messages)。
-
-此外,在回复评审人意见时,请您遵守以下约定:
-
-1. 评审人的每个意见都必须回复(这是开源社区的基本礼貌,别人帮了忙,应该说谢谢):
-   - 对评审意见同意且按其修改完的,给个简单的`Done`即可;
-   - 对评审意见不同意的,请给出您自己的反驳理由。
-2. 如果评审意见比较多:
-   - 请给出总体的修改情况。
-   - 请采用[start a review](https://help.github.com/articles/reviewing-proposed-changes-in-a-pull-request/)进行回复,而非直接回复的方式。原因是每个回复都会发送一封邮件,会造成邮件灾难。
diff --git a/doc/v2/dev/contribute_to_paddle_en.md b/doc/v2/dev/contribute_to_paddle_en.md
deleted file mode 100644
index b878f37a5b8e807e5aa346e0074a741f2f8b6cc5..0000000000000000000000000000000000000000
--- a/doc/v2/dev/contribute_to_paddle_en.md
+++ /dev/null
@@ -1,162 +0,0 @@
-# Contribute Code
-
-You are welcome to contribute to project PaddlePaddle. To contribute to PaddlePaddle, you have to agree with the 
-[PaddlePaddle Contributor License Agreement](https://gist.github.com/wangkuiyi/0c22c7b1bd3bb7eb27d76f85c3a3e329).
-
-We sincerely appreciate your contribution.  This document explains our workflow and work style.
-
-## Workflow
-
-PaddlePaddle uses this [Git branching model](http://nvie.com/posts/a-successful-git-branching-model/).  The following steps guide usual contributions.
-
-1. Fork
-
-   Our development community has been growing fastly; it doesn't make sense for everyone to write into the official repo.  So, please file Pull Requests from your fork.  To make a fork,  just head over to the GitHub page and click the ["Fork" button](https://help.github.com/articles/fork-a-repo/).
-
-1. Clone
-
-   To make a copy of your fork to your local computers, please run
-
-   ```bash
-   git clone https://github.com/your-github-account/paddle
-   cd paddle
-   ```
-
-1. Create the local feature branch
-
-   For daily works like adding a new feature or fixing a bug, please open your feature branch before coding:
-
-   ```bash
-   git checkout -b my-cool-stuff
-   ```
-
-1. Commit
-
-   Before issuing your first `git commit` command, please install [`pre-commit`](http://pre-commit.com/) by running the following commands:
-
-   ```bash
-   pip install pre-commit
-   pre-commit install
-   ```
-
-   Our pre-commit configuration requires clang-format 3.8 for auto-formating C/C++ code and yapf for Python.
-
-   Once installed, `pre-commit` checks the style of code and documentation in every commit.  We will see something like the following when you run `git commit`:
-
-   ```
-   ➜  git commit
-   CRLF end-lines remover...............................(no files to check)Skipped
-   yapf.................................................(no files to check)Skipped
-   Check for added large files..............................................Passed
-   Check for merge conflicts................................................Passed
-   Check for broken symlinks................................................Passed
-   Detect Private Key...................................(no files to check)Skipped
-   Fix End of Files.....................................(no files to check)Skipped
-   clang-formater.......................................(no files to check)Skipped
-   [my-cool-stuff c703c041] add test file
-    1 file changed, 0 insertions(+), 0 deletions(-)
-    create mode 100644 233
-   ```
-
-	NOTE: The `yapf` installed by `pip install pre-commit` and `conda install -c conda-forge pre-commit` is slightly different. Paddle developers use `pip install pre-commit`.
-
-1. Build and test
-
-   Users can build PaddlePaddle natively on Linux and Mac OS X.  But to unify the building environment and to make it easy for debugging, the recommended way is [using Docker](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/dev/build_en.md).
-
-1. Keep pulling
-
-   An experienced Git user pulls from the official repo often -- daily or even hourly, so they notice conflicts with others work early, and it's easier to resolve smaller conflicts.
-
-   ```bash
-   git remote add upstream https://github.com/PaddlePaddle/Paddle
-   git pull upstream develop
-   ```
-
-1. Push and file a pull request
-
-   You can "push" your local work into your forked repo:
-
-   ```bash
-   git push origin my-cool-stuff
-   ```
-
-   The push allows you to create a pull request, requesting owners of this [official repo](https://github.com/PaddlePaddle/Paddle) to pull your change into the official one.
-
-   To create a pull request, please follow [these steps](https://help.github.com/articles/creating-a-pull-request/).
-
-   If your change is for fixing an issue, please write ["Fixes
-
-也可以使用 `git push origin :分支名` 删除远程分支,如:
-
-```bash
-➜  git push origin :my-cool-stuff
-```
-
-## 删除本地分支
-
-最后,删除本地分支。
-
-```bash
-# 切换到 develop 分支
-➜  git checkout develop 
-
-# 删除 my-cool-stuff 分支
-➜  git branch -D my-cool-stuff
-```
-
-至此,我们就完成了一次代码贡献的过程。
-
-## 提交代码的一些约定
-
-为了使评审人在评审代码时更好地专注于代码本身,请您每次提交代码时,遵守以下约定:
-
-1. 请保证Travis-CI 中单元测试能顺利通过。如果没过,说明提交的代码存在问题,评审人一般不做评审。
-2. 提交PUll Request前:
-   - 请注意commit的数量:
-     - 原因:如果仅仅修改一个文件但提交了十几个commit,每个commit只做了少量的修改,这会给评审人带来很大困扰。评审人需要逐一查看每个commit才能知道做了哪些修改,且不排除commit之间的修改存在相互覆盖的情况。
-     - 建议:每次提交时,保持尽量少的commit,可以通过`git commit --amend`补充上次的commit。对已经Push到远程仓库的多个commit,可以参考[squash commits after push](http://stackoverflow.com/questions/5667884/how-to-squash-commits-in-git-after-they-have-been-pushed)。
-   - 请注意每个commit的名称:应能反映当前commit的内容,不能太随意。
-3. 如果解决了某个Issue的问题,请在该PUll Request的**第一个**评论框中加上:`fix #issue_number`,这样当该PUll Request被合并后,会自动关闭对应的Issue。关键词包括:close, closes, closed, fix, fixes, fixed, resolve, resolves, resolved,请选择合适的词汇。详细可参考[Closing issues via commit messages](https://help.github.com/articles/closing-issues-via-commit-messages)。
-
-此外,在回复评审人意见时,请您遵守以下约定:
-
-1. 评审人的每个意见都必须回复(这是开源社区的基本礼貌,别人帮了忙,应该说谢谢):
-   - 对评审意见同意且按其修改完的,给个简单的`Done`即可;
-   - 对评审意见不同意的,请给出您自己的反驳理由。
-2. 如果评审意见比较多:
-   - 请给出总体的修改情况。
-   - 请采用[start a review](https://help.github.com/articles/reviewing-proposed-changes-in-a-pull-request/)进行回复,而非直接回复的方式。原因是每个回复都会发送一封邮件,会造成邮件灾难。
diff --git a/doc/v2/dev/contribute_to_paddle_en.md b/doc/v2/dev/contribute_to_paddle_en.md
deleted file mode 100644
index b878f37a5b8e807e5aa346e0074a741f2f8b6cc5..0000000000000000000000000000000000000000
--- a/doc/v2/dev/contribute_to_paddle_en.md
+++ /dev/null
@@ -1,162 +0,0 @@
-# Contribute Code
-
-You are welcome to contribute to project PaddlePaddle. To contribute to PaddlePaddle, you have to agree with the 
-[PaddlePaddle Contributor License Agreement](https://gist.github.com/wangkuiyi/0c22c7b1bd3bb7eb27d76f85c3a3e329).
-
-We sincerely appreciate your contribution.  This document explains our workflow and work style.
-
-## Workflow
-
-PaddlePaddle uses this [Git branching model](http://nvie.com/posts/a-successful-git-branching-model/).  The following steps guide usual contributions.
-
-1. Fork
-
-   Our development community has been growing fastly; it doesn't make sense for everyone to write into the official repo.  So, please file Pull Requests from your fork.  To make a fork,  just head over to the GitHub page and click the ["Fork" button](https://help.github.com/articles/fork-a-repo/).
-
-1. Clone
-
-   To make a copy of your fork to your local computers, please run
-
-   ```bash
-   git clone https://github.com/your-github-account/paddle
-   cd paddle
-   ```
-
-1. Create the local feature branch
-
-   For daily works like adding a new feature or fixing a bug, please open your feature branch before coding:
-
-   ```bash
-   git checkout -b my-cool-stuff
-   ```
-
-1. Commit
-
-   Before issuing your first `git commit` command, please install [`pre-commit`](http://pre-commit.com/) by running the following commands:
-
-   ```bash
-   pip install pre-commit
-   pre-commit install
-   ```
-
-   Our pre-commit configuration requires clang-format 3.8 for auto-formating C/C++ code and yapf for Python.
-
-   Once installed, `pre-commit` checks the style of code and documentation in every commit.  We will see something like the following when you run `git commit`:
-
-   ```
-   ➜  git commit
-   CRLF end-lines remover...............................(no files to check)Skipped
-   yapf.................................................(no files to check)Skipped
-   Check for added large files..............................................Passed
-   Check for merge conflicts................................................Passed
-   Check for broken symlinks................................................Passed
-   Detect Private Key...................................(no files to check)Skipped
-   Fix End of Files.....................................(no files to check)Skipped
-   clang-formater.......................................(no files to check)Skipped
-   [my-cool-stuff c703c041] add test file
-    1 file changed, 0 insertions(+), 0 deletions(-)
-    create mode 100644 233
-   ```
-
-	NOTE: The `yapf` installed by `pip install pre-commit` and `conda install -c conda-forge pre-commit` is slightly different. Paddle developers use `pip install pre-commit`.
-
-1. Build and test
-
-   Users can build PaddlePaddle natively on Linux and Mac OS X.  But to unify the building environment and to make it easy for debugging, the recommended way is [using Docker](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/dev/build_en.md).
-
-1. Keep pulling
-
-   An experienced Git user pulls from the official repo often -- daily or even hourly, so they notice conflicts with others work early, and it's easier to resolve smaller conflicts.
-
-   ```bash
-   git remote add upstream https://github.com/PaddlePaddle/Paddle
-   git pull upstream develop
-   ```
-
-1. Push and file a pull request
-
-   You can "push" your local work into your forked repo:
-
-   ```bash
-   git push origin my-cool-stuff
-   ```
-
-   The push allows you to create a pull request, requesting owners of this [official repo](https://github.com/PaddlePaddle/Paddle) to pull your change into the official one.
-
-   To create a pull request, please follow [these steps](https://help.github.com/articles/creating-a-pull-request/).
-
-   If your change is for fixing an issue, please write ["Fixes | 版本说明- | C-API- | 
|---|---|
| cpu_avx_mkl- | paddle.tgz- | 
| cpu_avx_openblas- | paddle.tgz- | 
| cpu_noavx_openblas- | paddle.tgz- | 
| cuda7.5_cudnn5_avx_mkl- | paddle.tgz- | 
| cuda8.0_cudnn5_avx_mkl- | paddle.tgz- | 
| cuda8.0_cudnn7_avx_mkl- | paddle.tgz- | 
| cuda9.0_cudnn7_avx_mkl- | paddle.tgz- | 
| 选项- | 值- | 
|---|---|
| WITH_C_API- | ON- | 
| WITH_PYTHON- | OFF(推荐)- | 
| WITH_SWIG_PY- | OFF(推荐)- | 
| WITH_GOLANG- | OFF(推荐)- | 
| WITH_GPU- | ON/OFF- | 
| WITH_MKL- | ON/OFF- | 
| Version Tips- | C-API- | 
|---|---|
| cpu_avx_mkl- | paddle.tgz- | 
| cpu_avx_openblas- | paddle.tgz- | 
| cpu_noavx_openblas- | paddle.tgz- | 
| cuda7.5_cudnn5_avx_mkl- | paddle.tgz- | 
| cuda8.0_cudnn5_avx_mkl- | paddle.tgz- | 
| cuda8.0_cudnn7_avx_mkl- | paddle.tgz- | 
| cuda9.0_cudnn7_avx_mkl- | paddle.tgz- | 
| Options- | Value- | 
|---|---|
| WITH_C_API- | ON- | 
| WITH_PYTHON- | OFF(recommended)- | 
| WITH_SWIG_PY- | OFF(recommended)- | 
| WITH_GOLANG- | OFF(recommended)- | 
| WITH_GPU- | ON/OFF- | 
| WITH_MKL- | ON/OFF- | 
-  
 图1. 稀疏矩阵存储示意图
-  
-
图2. 序列输入示意图
-
| Python 端数据类型- | C-API 输入数据类型- | 
|---|---|
| paddle.data_type.integer_value- | 整型数组,无需附加序列信息- | 
| paddle.data_type.dense_vector- | 浮点型稠密矩阵,无需附加序列信息- | 
| paddle.data_type.sparse_binary_vector- | 浮点型稀疏矩阵,无需提供非零元的值,默认为1,无需附加序列信息- | 
| paddle.data_type.sparse_vector- | 浮点型稀疏矩阵,需提供非零元的值,无需附加序列信息- | 
| paddle.data_type.integer_value_sequence- | 整型数组,需附加序列信息- | 
| paddle.data_type.dense_vector_sequence- | 浮点型稠密矩阵,需附加序列信息- | 
| paddle.data_type.sparse_binary_vector_sequence- | 浮点型稀疏矩阵,无需提供非零元的值,默认为1,需附加序列信息- | 
| paddle.data_type.sparse_vector_sequence- | 浮点型稀疏矩阵,需提供非零元的值,需附加序列信息- | 
| paddle.data_type.integer_value_sub_sequence- | 整型数组,需附加双层序列信息- | 
| paddle.data_type.dense_vector_sub_sequence- | 浮点型稠密矩阵,需附加双层序列信息- | 
| paddle.data_type.sparse_binary_vector_sub_sequence- | 浮点型稀疏矩阵,无需提供非零元的值,默认为1,需附加双层序列信息- | 
| paddle.data_type.sparse_vector_sub_sequence- | 浮点型稀疏矩阵,需提供非零元的值,需附加双层序列信息- | 
-
 图1. C-API使用流程示意图
-
| - | 参数- | 本地训练- | 集群训练- | 本地测试- | 集群测试- | 
|---|---|---|---|---|---|
| 通用- | job- | √ | √ | √ | √- | 
| use_gpu- | √ | √ | √ | √- | |
| local- | √ | √ | √ | √- | |
| config- | √ | √ | √ | √- | |
| config_args- | √ | √ | √ | √- | |
| num_passes- | √ | √ | √ | √- | |
| trainer_count- | √ | √ | √ | √- | |
| version- | √ | √ | √ | √- | |
| show_layer_stat- | √ | √ | √ | √- | |
| 训练 | dot_period- | √ | √ | - | |
| test_period- | √ | √ | - | ||
| saving_period- | √ | √ | - | ||
| show_parameter_stats_period- | √ | √ | - | ||
| init_model_path- | √ | √ | √ | - | |
| load_missing_parameter_strategy- | √ | √ | - | ||
| saving_period_by_batches- | √ | √ | - | ||
| use_old_updater- | √ | √ | - | ||
| enable_grad_share- | √ | √ | - | ||
| grad_share_block_num- | √ | √ | - | ||
| log_error_clipping- | √ | √ | - | ||
| log_clipping- | √ | √ | - | ||
| save_only_one- | √ | √ | - | ||
| start_pass- | √ | √ | - | ||
| 训练/测试 | save_dir- | √ | √ | √ | √- | 
| 训练过程中测试 | test_period- | √ | √ | - | |
| average_test_period- | √ | √ | - | ||
| 测试 | model_list- | √ | √- | ||
| test_wait- | √ | √- | |||
| test_pass- | √ | √- | |||
| predict_output_dir- | √ | √- | |||
| distribute_test- | √ | √- | |||
| Auc/正负对验证(PnpairValidation) | predict_file- | √ | √- | ||
| GPU | gpu_id- | √ | √ | √ | √- | 
| parallel_nn- | √ | √ | √ | √- | |
| allow_only_one_model_on_one_gpu- | √ | √ | √ | √- | |
| cudnn_dir- | √ | √ | √ | √- | |
| cuda_dir- | √ | √ | √ | √- | |
| cudnn_conv_workspace_limit_in_mb- | √ | √ | √ | √- | |
| 递归神经网络(RNN)- | beam_size- | √ | √- | ||
| rnn_use_batch- | √ | √ | √ | √- | |
| prev_batch_state- | √ | √ | - | ||
| diy_beam_search_prob_so- | √ | √- | |||
| 参数服务器(PServer) | start_pserver- | √ | √- | ||
| pservers- | √ | √- | |||
| port- | √ | √- | |||
| port_num- | √ | √- | |||
| ports_num_for_sparse- | √ | √- | |||
| nics- | √ | √- | |||
| rdma_tcp- | √ | √- | |||
| small_messages- | √ | - | |||
| loadsave_parameters_in_pserver- | √ | √- | |||
| log_period_server- | √ | - | |||
| pserver_num_threads- | √ | - | |||
| sock_send_buf_size- | √ | - | |||
| sock_recv_buf_size- | √ | - | |||
| num_gradient_servers- | √ | - | |||
| parameter_block_size- | √ | - | |||
| parameter_block_size_for_sparse- | √ | - | |||
| 异步随机梯度下降(Async SGD) | async_count- | √ | - | ||
| async_lagged_ratio_min- | √ | - | |||
| async_lagged_ratio_default- | √ | - | |||
| 性能调优(Performance Tuning) | log_barrier_abstract- | √ | - | ||
| log_barrier_lowest_nodes- | √ | - | |||
| log_barrier_show_log- | √ | - | |||
| check_sparse_distribution_batches- | √ | - | |||
| check_sparse_distribution_ratio- | √ | - | |||
| check_sparse_distribution_unbalance_degree- | √ | - | |||
| check_sparse_distribution_in_pserver- | √ | - | |||
| show_check_sparse_distribution_log- | √ | - | |||
| 数据提供器(Data Provider) | memory_threshold_on_load_data- | √ | √ | - | |
| 随机数 | seed- | √ | √ | - | |
| thread_local_rand_use_global_seed- | √ | √ | - | ||
| 单元测试 | checkgrad_eps- | - | |||
| 矩阵/向量 | enable_parallel_vector- | √ | √ | √ | √- | 
| - | args- | local train- | cluster train- | local test- | cluster test- | 
|---|---|---|---|---|---|
| common- | job- | √ | √ | √ | √- | 
| use_gpu- | √ | √ | √ | √- | |
| local- | √ | √ | √ | √- | |
| config- | √ | √ | √ | √- | |
| config_args- | √ | √ | √ | √- | |
| num_passes- | √ | √ | √ | √- | |
| trainer_count- | √ | √ | √ | √- | |
| version- | √ | √ | √ | √- | |
| show_layer_stat- | √ | √ | √ | √- | |
| train | dot_period- | √ | √ | - | |
| test_period- | √ | √ | - | ||
| saving_period- | √ | √ | - | ||
| show_parameter_stats_period- | √ | √ | - | ||
| init_model_path- | √ | √ | √ | - | |
| load_missing_parameter_strategy- | √ | √ | - | ||
| saving_period_by_batches- | √ | √ | - | ||
| use_old_updater- | √ | √ | - | ||
| enable_grad_share- | √ | √ | - | ||
| grad_share_block_num- | √ | √ | - | ||
| log_error_clipping- | √ | √ | - | ||
| log_clipping- | √ | √ | - | ||
| save_only_one- | √ | √ | - | ||
| start_pass- | √ | √ | - | ||
| train/test | save_dir- | √ | √ | √ | √- | 
| testing during training | test_period- | √ | √ | - | |
| average_test_period- | √ | √ | - | ||
| test | model_list- | √ | √- | ||
| test_wait- | √ | √- | |||
| test_pass- | √ | √- | |||
| predict_output_dir- | √ | √- | |||
| distribute_test- | √ | √- | |||
| Auc/PnpairValidation | predict_file- | √ | √- | ||
| GPU | gpu_id- | √ | √ | √ | √- | 
| parallel_nn- | √ | √ | √ | √- | |
| allow_only_one_model_on_one_gpu- | √ | √ | √ | √- | |
| cudnn_dir- | √ | √ | √ | √- | |
| cuda_dir- | √ | √ | √ | √- | |
| cudnn_conv_workspace_limit_in_mb- | √ | √ | √ | √- | |
| RNN- | beam_size- | √ | √- | ||
| rnn_use_batch- | √ | √ | √ | √- | |
| prev_batch_state- | √ | √ | - | ||
| diy_beam_search_prob_so- | √ | √- | |||
| PServer | start_pserver- | √ | √- | ||
| pservers- | √ | √- | |||
| port- | √ | √- | |||
| port_num- | √ | √- | |||
| ports_num_for_sparse- | √ | √- | |||
| nics- | √ | √- | |||
| rdma_tcp- | √ | √- | |||
| small_messages- | √ | - | |||
| loadsave_parameters_in_pserver- | √ | √- | |||
| log_period_server- | √ | - | |||
| pserver_num_threads- | √ | - | |||
| sock_send_buf_size- | √ | - | |||
| sock_recv_buf_size- | √ | - | |||
| num_gradient_servers- | √ | - | |||
| parameter_block_size- | √ | - | |||
| parameter_block_size_for_sparse- | √ | - | |||
| Async SGD | async_count- | √ | - | ||
| async_lagged_ratio_min- | √ | - | |||
| async_lagged_ratio_default- | √ | - | |||
| Performance Tuning | log_barrier_abstract- | √ | - | ||
| log_barrier_lowest_nodes- | √ | - | |||
| log_barrier_show_log- | √ | - | |||
| check_sparse_distribution_batches- | √ | - | |||
| check_sparse_distribution_ratio- | √ | - | |||
| check_sparse_distribution_unbalance_degree- | √ | - | |||
| check_sparse_distribution_in_pserver- | √ | - | |||
| show_check_sparse_distribution_log- | √ | - | |||
| Data Provider | memory_threshold_on_load_data- | √ | √ | - | |
| RandomNumber | seed- | √ | √ | - | |
| thread_local_rand_use_global_seed- | √ | √ | - | ||
| UnitTest | checkgrad_eps- | - | |||
| Matrix/Vector | enable_parallel_vector- | √ | √ | √ | √- |