prefetch_parameter.md 1.9 KB
Newer Older
Y
Yancey1989 已提交
1
# Design Doc: Prefetching Parameter From Parameter Server
Y
Yancey1989 已提交
2 3 4

## Abstract

Y
Yancey1989 已提交
5 6 7 8 9 10 11 12 13 14 15 16
We propose an approach to prefetch parameter from Parameter
Server while distributed training so that Fluid would training
a model including the large parameter which could not be stored in one
trainer's memory.

## Background

For an embedding layer, the trainable parameter may be very large and could
not be stored in one trainer's memory. In Fluid distributed training,
[Distributed Transpiler](./parameter_server.md#distributed-transpiler) would split every parameter into a number of small
parameters and stored in Parameter Server, so we could prefetch the parameter
from the specified Parameter Server according to the input `Ids`.
Y
Yancey1989 已提交
17 18 19

## Design

Y
Yancey1989 已提交
20
This is a feature of Fluid distributed training, maybe you want
Y
Yancey1989 已提交
21 22 23
to know [Distributed Architecture](./distributed_architecture.md) and
[Parameter Server](./parameter_server.md) before reading the following content.

Y
Yancey1989 已提交
24
### Partationed Parameter
Y
Yancey1989 已提交
25 26 27

<img src="src/split_parameter.png" width="400" />

Y
Yancey1989 已提交
28 29
- **Distributed Transpiler** would split the large parameter
(weight) into some partitioned parameters (weight_0, weight_1, weight_2) as the
Y
Yancey1989 已提交
30
figure above.
Y
Yancey1989 已提交
31
- We could use `round-robin` to distribute the partitioned parameter.
Y
Yancey1989 已提交
32

Y
Yancey1989 已提交
33
### Prefetching Parameter
Y
Yancey1989 已提交
34 35 36

<img src="src/prefetch_parameters.png" width="400" />

Y
Yancey1989 已提交
37 38 39 40 41
- `prefetch_rpc` operator would prefetch the parameter from different Parameter
    Server according with the input `Ids`, we use [SelectedRows](../../../design/selected_rows.md)
    as the received variable type.
- `merge_selected_rows` operator would merge the received parameters into one
    `SelectedRows` variable.
Y
Yancey1989 已提交
42 43 44

## TODO

Y
Yancey1989 已提交
45 46 47 48
- `prefetch_rpc` operator to send rows index and receive SelectedRows variables.
- `lookup_table` need to support `SelectedRows` variable type as input `Weight`.
- Async Update, To avoid slow-node, Async update is important for distributed training,
  we need a design doc and implement it in future.