diff --git a/doc/design/refactor/distributed_architecture.md b/doc/design/refactor/distributed_architecture.md
index ac7e98ccf1aadbb973a4801fde842375cf63448c..68db509a61e83ffcad084e3c0a957ee1924da27b 100644
--- a/doc/design/refactor/distributed_architecture.md
+++ b/doc/design/refactor/distributed_architecture.md
@@ -64,29 +64,31 @@ the cases - the compiler will optimize the IR:
-We can have our own IR too: PaddlePaddle can support model parallel by
+We can have our own IR which is called [Program](../program.md).
+PaddlePaddle can support model parallel by
converting the IR so the user no longer need to manually do it in
Python:
-The IR for PaddlePaddle after refactor is called `Block`, it specifies
-the computation dependency graph and the variables used in the
-computation.
### Limitation 3
The user can not directly specify the parameter update rule for the
-parameter server because the parameter server does not use the same
-computation definition as the trainer. Instead, the update rule is
-baked in the parameter server. The user can not specify the update
-rule in the same way of specifying the trainer computation.
+parameter server because the previous implementaion hard coded that
+parameter server only do vector's optimization algorithm by
+configuration. The user can not specify the parameter server's
+computation layer by layer.
-This could be fixed by making the parameter server run the same
+This could be fixed by making the parameter server run a separated
+IR according to the trainer's varialble (tensors, selectedrows)
+defination.
+
+the same
computation definition as the trainer. For a detailed explanation,
please
see
-[Design Doc: Operation Graph Based Parameter Server](./dist_train.md)
+[Design Doc: Operation Graph Based Parameter Server](./parameter_server.md)
## Distributed Training Architecture
@@ -110,30 +112,63 @@ img, label = input[0], input[1]
hidden = paddle.layer.fc(input=img, size=200, act=paddle.activation.Tanh())
prediction = paddle.layer.fc(input=img, size=10, act=paddle.activation.Softmax())
cost = paddle.layer.classification_cost(input=prediction, label=label)
-optimizer = paddle.optimizer.SGD(cost, learning_rate=0.01)
-session = paddle.session.NewRemote(num_trainer=3, num_ps=2, GPU_per_trainer=1)
+optimizer = paddle.optimizer.SGD(learning_rate=0.01)
+opts = optimizer.minimize(cost)
+exe = RemoteExecutor(num_trainer=3, num_ps=2, GPU_per_trainer=2, sync_batches=1)
+# this will init variable data on both server and trainer
+exe.run(framework.default_startup_program())
+exe.sync()
+
for i in range(1000):
- _, cost_val = session.eval(targets=[cost, optimizer])
- print cost_val
+ # feed data
+ ...
+ cost, acc = exe.run(framework.default_main_program(),
+ fetch_list=[avg_cost, acc_out])
+ print cost, acc
```
The code above is a typical Python trainer code, the neural network
topology is built using helper functions such as
-`paddle.layer.fc`. The training is done by calling `session.eval`
+`paddle.layer.fc`. The training is done by calling `Executor.run`
iteratively.
-#### session.eval
+#### RemoteExecutor
+
+As shown in the graph, `RemoteExecutor.run` sends the IR to the
+PaddlePaddle cluster for Execution. You can also use parameter
+`fetch_list` to interactively fetch varirable back to local for
+log printing.
+
+The Python `RemoteExecutor` is derived from `Executor` class.
+For more information about `RemoteExecutor`, please
+see [Design Doc: RemoteExecutor](./remote_executor.md).
+
+By default, `Executor.run` starts a PaddlePaddle Cloud
+[TrainingJob](https://github.com/PaddlePaddle/cloud/blob/develop/doc/autoscale/README.md#training-job-resource), or you can run each component in the
+executor by your own method:
-As shown in the graph, `session.eval` sends the IR and the evaluation
-inputs/targets to the PaddlePaddle cluster for evaluation. The
-targets can be any variable in the computation graph. When the target
-is the `optimizer` variable, the neural network will be optimized
-once. When the target is the `cost` variable, `session.eval` returns
-the cost value.
+- Data Parrallelism
+ ```python
+ if os.getenv('PLACE_PSERVER'):
+ exe.run_pserver()
+ elif os.getenv('PLACE_TRAINER'):
+ exe.run_trainer()
+ ```
+- Model Parrallelism
+ ```python
+ for part in exe.get_parralle_parts():
+ exe.run_part(part)
+ ```
-The Python `session` is a wrapper of the C++ `Session` class. For more
-information about `Session`, please
-see [Design Doc: Session](./session.md).
+#### Program and Executor
+
+As mentioned above, the implementation of IR is [Program](../program.md).
+
+[Executor](../executor.md) converts and parses the IR to a prefered
+graph for final execution. For local training you generally use
+`Executor` to run the graph locally. For any kind of distributed
+training, you can use `RemoteExecutor` to specify desired distributed
+training method with some optional arguments.
### PaddlePaddle Converter
@@ -183,13 +218,6 @@ device computation time and device communication time. Model
parallelism requires the general placement algorithm.
-### PaddlePaddle Runtime
-
-The PaddlePaddle runtime owns multiple devices (e.g., CPUs, GPUs) and
-runs the IR. The runtime does not need to do OP placement since it's
-already done by the converter.
-
-
### Local Training Architecture
The local training architecture will be the same as the distributed
@@ -210,7 +238,7 @@ the Python reader will need to read from the distributed filesystem
network traffic.
When doing distributed training, the user can still use Python data
-reader: the training data are sent with `session.eval`. However should
+reader: the training data are sent with `Executor.run`. However should
be used for debugging purpose only. The users are encouraged to use
the read data OPs.
diff --git a/doc/design/refactor/session.md b/doc/design/refactor/session.md
deleted file mode 100644
index 1d9a26683c14f54e3b5fe41675cd03b5620646b8..0000000000000000000000000000000000000000
--- a/doc/design/refactor/session.md
+++ /dev/null
@@ -1,180 +0,0 @@
-# Design Doc: Session
-
-## Abstract
-
-The *session* object encapsulates the environment in which the
-computation graph is executed.
-
-We will have the *local* session and *remote* session, they offer the
-same [interface](#interface). The local session encapsulates the local
-runtime environment and the remote session encapsulates the cluster
-runtime environment.
-
-The local runtime environment contains:
-
-1. computation devices (i.e., CPU, GPU) handles, and
-1. the [scope](../scope.md) which holds all variables.
-
-The remote runtime environment contains:
-
-1. computation devices (i.e., CPU and GPU on node 0, 1) in a cluster,
- and
-1. the distributed [scope](../scope.md) in a cluster which holds all
- variables.
-
-The user can create a remote session on Paddle Cloud and evaluate the
-computation graph with it. In this way, the user can control the
-remote computation resource in a cluster from his local computer.
-
-
-## Background
-
-The current design has an implicit global session in which
-`paddle.eval()` is executed. The pain point is:
-
-Since the user is not able to explicitly switch between runtime
-environments, the user cannot run a topology in two independent
-environments.
-
-For example, in reinforcement learning, the user may want to have a
-stale model for inference and a fresh model for training, and only
-replace the stale model with the fresh model periodically.
-
-Furthermore, we have no concept that encapsulates a remote environment
-that executes a computation graph.
-
-We need the session object to address above issues.
-
-
-## Session
-
-A session is an object that owns the runtime environment. All
-computations are executed through `session.eval()`.
-
-
-### Interface
-
-```python
-eval(
- targets,
- feed_dict=None,
-)
-```
-
-Evaluates the target Operations or Variables in `targets`.
-
-- *targets*: the evaluation targets. Can be a single Operation or
- Variable, or a list with the Operations or Variables as
- elements. The value returned by `eval()` has the same shape as the
- `target` argument.
-
- The PaddlePaddle program is represented by
- the [ProgramDesc](../design/program.md), `eval()` will infer the
- ProgramDesc from the given targets and run the PaddlePaddle
- program. Please
- see
- [this graph](./distributed_architecture.md#local-training-architecture) for
- the detailed illustration for the local session
- and
- [this graph](./distributed_architecture.md#distributed-training-architecture) for
- the detailed illustration for the remote session.
-
-- *feed_dict*: a dictionary that contains the tensors which override
- the edges of the computation graph.
-
- feed_dict not only can provide the input data, it can override any
- OP's input as well:
-
- ```python
- a = pd.constant(2.0, name="a")
- b = pd.variable(name="b")
- c = pd.mul(a,b)
- sess.eval(targets=c, feed_dict={"b":3.0}) # returns 6.0
- ```
-
-```python
-close()
-```
-
-Closes the session and releases the scope that the session owns.
-
-
-### Create a Local Session
-
-```python
-session(
- devices=None
-)
-```
-
-Creates a new session. One session owns one global scope, so creating
-multiple sessions will create different scopes.
-
-- *devices*: a single `string` or a list of `string` of device names,
- the corresponding devices will be the computation devices for
- `eval()`. If not specified, all available devices (e.g., all GPUs)
- will be used. The user doesn't need to specify the CPU device since
- it will be always used. Multiple sessions can use the same device.
-
-
-#### Example
-
-```Python
-a = paddle.constant(1.0)
-b = paddle.constant(2.0)
-c = a + b
-sess = paddle.session(devices=["gpu:0", "gpu:1", "fpga:0"])
-sess.eval(c)
-sess.close()
-```
-
-### Create a Remote Session
-
-```python
-create_cloud_job(
- name,
- num_trainer,
- mem_per_trainer,
- gpu_per_trainer,
- cpu_per_trainer,
- num_ps,
- mem_per_ps,
- cpu_per_ps,
-)
-```
-
-Creates a Paddle Cloud job. Fails if the job name exists.
-
-```python
-get_cloud_job(
- name
-)
-```
-
-Gets a Paddle Cloud job.
-
-```python
-remote_session(
- job
-)
-```
-
-- *job*: the Paddle Cloud job.
-
-#### Example
-
-```Python
-reader = paddle.reader.recordio("/pfs/home/peter/mnist-train-*") # data stored on Paddle Cloud
-image = reader.column(0)
-label = reader.column(1)
-fc1 = paddle.op.fc(image, size=256, act="sigmoid")
-fc2 = paddle.op.fc(fc1, size=10, act="softmax")
-cost = paddle.op.cross_entropy(fc2, label)
-opt = paddle.optimizer.sgd(cost)
-
-job = paddle.create_cloud_job("test", 3, "1G", 1, 1, 2, "1G", 1)
-sess = paddle.remote_ession(job)
-for i in range(1000):
- sess.eval(opt)
-sess.close()
-```
diff --git a/doc/design/refactor/src/distributed_architecture.graffle b/doc/design/refactor/src/distributed_architecture.graffle
index f8496e57326c38de7468eb452a7713291d57653c..1ebbe70db04ebe0321f252a1de68aaefdaa32bd6 100644
Binary files a/doc/design/refactor/src/distributed_architecture.graffle and b/doc/design/refactor/src/distributed_architecture.graffle differ
diff --git a/doc/design/refactor/src/distributed_architecture.png b/doc/design/refactor/src/distributed_architecture.png
index 410c4510c6aab301dec95e6427fe80ac24e105fe..0da49f44123be7c0b84bf420ad81c33f0ac0a105 100644
Binary files a/doc/design/refactor/src/distributed_architecture.png and b/doc/design/refactor/src/distributed_architecture.png differ
diff --git a/doc/design/refactor/src/local_architecture.graffle b/doc/design/refactor/src/local_architecture.graffle
index cc7783c45381f25ded0b898649322c81418ad317..49fcc663ebe3824aa234e3a67aadf285cb417877 100644
Binary files a/doc/design/refactor/src/local_architecture.graffle and b/doc/design/refactor/src/local_architecture.graffle differ
diff --git a/doc/design/refactor/src/local_architecture.png b/doc/design/refactor/src/local_architecture.png
index 4b999538b7825c805292ee28b5e3256d5543bd09..14adc9fd72b855bb9f74fbf2c84ac9ec0cf2b122 100644
Binary files a/doc/design/refactor/src/local_architecture.png and b/doc/design/refactor/src/local_architecture.png differ