提交 f839e91b 编写于 作者: Y Yancey1989

update by comment

上级 b3827473
# Design Doc: Large Model # Design Doc: Prefecting Parameter From Parameter Server
## Abstract ## Abstract
We propose an approach to support the large parameter. We propose an approach to prefetch parameter from Parameter
For embedding layer, the parameter may very large and could Server while distributed training so that Fluid would training
not be stored in one trainer's memory. In this approach, a Trainer would a model including the large parameter which could not be stored in one
prefetch a sliced parameter from different Parameter Server instances trainer's memory.
according to the input `Ids`, and then run forward, backward and send
the gradient to Parameter Server to execute the optimize program. ## Background
For an embedding layer, the trainable parameter may be very large and could
not be stored in one trainer's memory. In Fluid distributed training,
[Distributed Transpiler](./parameter_server.md#distributed-transpiler) would split every parameter into a number of small
parameters and stored in Parameter Server, so we could prefetch the parameter
from the specified Parameter Server according to the input `Ids`.
## Design ## Design
**NOTE**: this approach is a feature of Fluid distributed trianing, maybe you want This is a feature of Fluid distributed training, maybe you want
to know [Distributed Architecture](./distributed_architecture.md) and to know [Distributed Architecture](./distributed_architecture.md) and
[Parameter Server](./parameter_server.md) before reading the following content. [Parameter Server](./parameter_server.md) before reading the following content.
Fluid large model distributed training use ### Partationed Parameter
[Distributed Transpiler](./parameter_server.md#distributed-transpiler) to split
a large parameter into multiple parameters which stored on Parameter Server, and
the Trainer would prefetch them by `RPC` interface.
### Split Large Parameter
<img src="src/split_parameter.png" width="400" /> <img src="src/split_parameter.png" width="400" />
**Distributed Transpiler** would split the large parameter - **Distributed Transpiler** would split the large parameter
(weight) into some sliced parameters (weight_0, weight_1, weight_2) as the (weight) into some partitioned parameters (weight_0, weight_1, weight_2) as the
figure above. figure above.
- We could use `round-robin` to distribute the partitioned parameter.
### Prefetch Parameters from Parameter Servers ### Prefetching Parameter
<img src="src/prefetch_parameters.png" width="400" /> <img src="src/prefetch_parameters.png" width="400" />
- `PrefetchRpc` operator would send the rows index the multiple Parameter Servers, - `prefetch_rpc` operator would prefetch the parameter from different Parameter
and then receive the SelctedRows. Server according with the input `Ids`, we use [SelectedRows](../../../design/selected_rows.md)
- The different with normal Fluid distributed training, we only prefetch the rows as the received variable type.
- `merge_selected_rows` operator would merge the received parameters into one
`SelectedRows` variable.
## TODO ## TODO
- Async Update - `prefetch_rpc` operator to send rows index and receive SelectedRows variables.
- `lookup_table` need to support `SelectedRows` variable type as input `Weight`.
To avoid slow-node, Async update is important for distributed training, - Async Update, To avoid slow-node, Async update is important for distributed training,
we need an design doc and implement it in future. we need a design doc and implement it in future.
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册