# Design Doc: Prefetching Parameter From Parameter Server
## Abstract
We propose an approach to prefetch parameter from Parameter
Server while distributed training so that Fluid would training
a model including the large parameter which could not be stored in one
trainer's memory.
## Background
For an embedding layer, the trainable parameter may be very large and could
not be stored in one trainer's memory. In Fluid distributed training,
[Distributed Transpiler](./parameter_server.md#distributed-transpiler) would split every parameter into a number of small
parameters and stored in Parameter Server, so we could prefetch the parameter
from the specified Parameter Server according to the input `Ids`.
## Design
This is a feature of Fluid distributed training, maybe you want
to know [Distributed Architecture](./distributed_architecture.md) and
[Parameter Server](./parameter_server.md) before reading the following content.
### Partationed Parameter
- **Distributed Transpiler** would split the large parameter
(weight) into some partitioned parameters (weight_0, weight_1, weight_2) as the
figure above.
- We could use `round-robin` to distribute the partitioned parameter.
### Prefetching Parameter
- `prefetch_rpc` operator would prefetch the parameter from different Parameter
Server according with the input `Ids`, we use [SelectedRows](../../../design/selected_rows.md)
as the received variable type.
- `merge_selected_rows` operator would merge the received parameters into one
`SelectedRows` variable.
## TODO
- `prefetch_rpc` operator to send rows index and receive SelectedRows variables.
- `lookup_table` need to support `SelectedRows` variable type as input `Weight`.
- Async Update, To avoid slow-node, Async update is important for distributed training,
we need a design doc and implement it in future.