提交 f839e91b 编写于 作者: Y Yancey1989

update by comment

上级 b3827473
# Design Doc: Large Model
# Design Doc: Prefecting Parameter From Parameter Server
## Abstract
We propose an approach to support the large parameter.
For embedding layer, the parameter may very large and could
not be stored in one trainer's memory. In this approach, a Trainer would
prefetch a sliced parameter from different Parameter Server instances
according to the input `Ids`, and then run forward, backward and send
the gradient to Parameter Server to execute the optimize program.
We propose an approach to prefetch parameter from Parameter
Server while distributed training so that Fluid would training
a model including the large parameter which could not be stored in one
trainer's memory.
## Background
For an embedding layer, the trainable parameter may be very large and could
not be stored in one trainer's memory. In Fluid distributed training,
[Distributed Transpiler](./parameter_server.md#distributed-transpiler) would split every parameter into a number of small
parameters and stored in Parameter Server, so we could prefetch the parameter
from the specified Parameter Server according to the input `Ids`.
## Design
**NOTE**: this approach is a feature of Fluid distributed trianing, maybe you want
This is a feature of Fluid distributed training, maybe you want
to know [Distributed Architecture](./distributed_architecture.md) and
[Parameter Server](./parameter_server.md) before reading the following content.
Fluid large model distributed training use
[Distributed Transpiler](./parameter_server.md#distributed-transpiler) to split
a large parameter into multiple parameters which stored on Parameter Server, and
the Trainer would prefetch them by `RPC` interface.
### Split Large Parameter
### Partationed Parameter
<img src="src/split_parameter.png" width="400" />
**Distributed Transpiler** would split the large parameter
(weight) into some sliced parameters (weight_0, weight_1, weight_2) as the
- **Distributed Transpiler** would split the large parameter
(weight) into some partitioned parameters (weight_0, weight_1, weight_2) as the
figure above.
- We could use `round-robin` to distribute the partitioned parameter.
### Prefetch Parameters from Parameter Servers
### Prefetching Parameter
<img src="src/prefetch_parameters.png" width="400" />
- `PrefetchRpc` operator would send the rows index the multiple Parameter Servers,
and then receive the SelctedRows.
- The different with normal Fluid distributed training, we only prefetch the rows
- `prefetch_rpc` operator would prefetch the parameter from different Parameter
Server according with the input `Ids`, we use [SelectedRows](../../../design/selected_rows.md)
as the received variable type.
- `merge_selected_rows` operator would merge the received parameters into one
`SelectedRows` variable.
## TODO
- Async Update
To avoid slow-node, Async update is important for distributed training,
we need an design doc and implement it in future.
- `prefetch_rpc` operator to send rows index and receive SelectedRows variables.
- `lookup_table` need to support `SelectedRows` variable type as input `Weight`.
- Async Update, To avoid slow-node, Async update is important for distributed training,
we need a design doc and implement it in future.
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册