session.md 4.4 KB
Newer Older
H
Helin Wang 已提交
1 2 3 4 5 6 7
# Design Doc: Session

## Abstract

The *session* object encapsulates the environment in which the
computation graph is executed.

H
Helin Wang 已提交
8
We will have the *local* session and *remote* session, they offer the
H
Helin Wang 已提交
9 10
same [interface](#interface). The local session encapsulates the local
runtime environment and the remote session encapsulates the cluster
H
Helin Wang 已提交
11
runtime environment.
H
Helin Wang 已提交
12

H
Helin Wang 已提交
13
The local runtime environment contains:
H
Helin Wang 已提交
14 15 16 17

1. computation devices (i.e., CPU, GPU) handles, and
1. the [scope](../scope.md) which holds all variables.

H
Helin Wang 已提交
18
The remote runtime environment contains:
H
Helin Wang 已提交
19 20 21 22 23 24 25 26 27 28 29 30 31

1. computation devices (i.e., CPU and GPU on node 0, 1) in a cluster,
   and
1. the distributed [scope](../scope.md) in a cluster which holds all
   variables.

The user can create a remote session on Paddle Cloud and evaluate the
computation graph with it. In this way, the user can control the
remote computation resource in a cluster from his local computer.


## Background

H
Helin Wang 已提交
32
The current design has an implicit global session in which
H
Helin Wang 已提交
33 34 35
`paddle.eval()` is executed. The pain point is:

Since the user is not able to explicitly switch between runtime
H
Helin Wang 已提交
36 37
environments, the user cannot run a topology in two independent
environments.
H
Helin Wang 已提交
38 39 40 41 42 43 44 45 46 47 48 49 50 51

For example, in reinforcement learning, the user may want to have a
stale model for inference and a fresh model for training, and only
replace the stale model with the fresh model periodically.

Furthermore, we have no concept that encapsulates a remote environment
that executes a computation graph.

We need the session object to address above issues.


## Session

A session is an object that owns the runtime environment. All
H
Helin Wang 已提交
52
computations are executed through `session.eval()`.
H
Helin Wang 已提交
53 54 55 56


### Interface

H
Helin Wang 已提交
57
```python
H
Helin Wang 已提交
58 59 60 61 62 63 64 65 66
eval(
    targets,
    feed_dict=None,
)
```

Evaluates the target Operations or Variables in `targets`.

- *targets*: the evaluation targets. Can be a single Operation or
H
Helin Wang 已提交
67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83
  Variable, or a list with the Operations or Variables as
  elements. The value returned by `eval()` has the same shape as the
  `target` argument.

  The PaddlePaddle program is represented by
  the [ProgramDesc](../design/program.md), `eval()` will infer the
  ProgramDesc from the given targets and run the PaddlePaddle
  program. Please
  see
  [this graph](./distributed_architecture.md#local-training-architecture) for
  the detailed illustration for the local session
  and
  [this graph](./distributed_architecture.md#distributed-training-architecture) for
  the detailed illustration for the remote session.

- *feed_dict*: a dictionary that contains the tensors which override
  the edges of the computation graph.
H
Helin Wang 已提交
84

H
Helin Wang 已提交
85 86
  feed_dict not only can provide the input data, it can override any
  OP's input as well:
H
Helin Wang 已提交
87

H
Helin Wang 已提交
88 89 90 91 92 93
  ```python
  a = pd.constant(1.0, name="a")
  b = pd.constant(2.0)
  c = pd.mul(a,b)
  sess.eval(targets=c, feed_dict={"a":3.0}) # returns 6.0
  ```
H
Helin Wang 已提交
94

H
Helin Wang 已提交
95
```python
H
Helin Wang 已提交
96 97 98
close()
```

H
Helin Wang 已提交
99
Closes the session and releases the scope that the session owns.
H
Helin Wang 已提交
100 101 102 103


### Create a Local Session

H
Helin Wang 已提交
104
```python
H
Helin Wang 已提交
105
session(
H
Helin Wang 已提交
106
    devices=None
H
Helin Wang 已提交
107 108 109 110 111 112
)
```

Creates a new session. One session owns one scope, so creating
multiple sessions will create different scopes.

H
Helin Wang 已提交
113 114 115 116 117
- *devices*: a single `string` or a list of `string` of device names,
  the corresponding devices will be the computation devices for
  `eval()`. If not specified, all available devices (e.g., all GPUs)
  will be used. The user doesn't need to specify the CPU device since
  it will be always used.
H
Helin Wang 已提交
118 119 120 121 122 123 124 125


#### Example

```Python
a = paddle.constant(1.0)
b = paddle.constant(2.0)
c = a + b
H
Helin Wang 已提交
126
sess = paddle.session(devices=["gpu:0", "gpu:1", "fpga:0"])
H
Helin Wang 已提交
127 128 129 130 131 132
sess.eval(c)
sess.close()
```

### Create a Remote Session

H
Helin Wang 已提交
133
```python
H
Helin Wang 已提交
134 135 136 137 138 139 140 141 142 143 144 145 146 147
create_cloud_job(
    name,
    num_trainer,
    mem_per_trainer,
    gpu_per_trainer,
    cpu_per_trainer,
    num_ps,
    mem_per_ps,
    cpu_per_ps,
)
```

Creates a Paddle Cloud job. Fails if the job name exists.

H
Helin Wang 已提交
148
```python
H
Helin Wang 已提交
149 150 151 152 153 154 155
get_cloud_job(
    name
)
```

Gets a Paddle Cloud job.

H
Helin Wang 已提交
156
```python
H
Helin Wang 已提交
157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180
remote_session(
    job
)
```

- *job*: the Paddle Cloud job.

#### Example

```Python
reader = paddle.reader.recordio("/pfs/home/peter/mnist-train-*") # data stored on Paddle Cloud
image = reader.column(0)
label = reader.column(1)
fc1 = paddle.op.fc(image, size=256, act="sigmoid")
fc2 = paddle.op.fc(fc1, size=10, act="softmax")
cost = paddle.op.cross_entropy(fc2, label)
opt = paddle.optimizer.sgd(cost)

job = paddle.create_cloud_job("test", 3, "1G", 1, 1, 2, "1G", 1)
sess = paddle.remote_ession(job)
for i in range(1000):
    sess.eval(opt)
sess.close()
```