提交 1b4db80b 编写于 作者: Y Yancey1989

update by lookup remote table

上级 5948fd27
# Design Doc: Prefetching Parameter From Parameter Server
# Design Doc: Lookup Remote Table while Distributed training
## Abstract
We propose an approach to pre-fetch the parameters from a Parameter Server while distributed training so that Fluid is able to train a model with a large number of parameters that cannot be stored in one trainer's memory.
We propose an approach to pre-fetch the parameters from a Parameter Server while distributed training so that Fluid can train a model with the very large parameter that cannot be stored in one trainer's memory.
## Background
For an embedding layer, the number of trainable parameters may be very large and it is likely that they may not be able to be stored in one trainer's memory. In Fluid distributed training,
the [Distributed Transpiler](./parameter_server.md#distributed-transpiler) would split every parameter into a number of small parameters that are stored on the Parameter Server. Hence, we can pre-fetch the parameters from the specified Parameter Server using the input `Ids`.
For an embedding layer, the trainable parameter may be very large, and it is likely that it may not be able to be stored in one trainer's memory. In Fluid distributed training,
the [Distributed Transpiler](./parameter_server.md#distributed-transpiler) would split every parameter into some small parameters that stored on the Parameter Server. Hence, we can pre-fetch the parameter from the specified Parameter Server using the input `Ids`.
## Design
Prior to reading this design, it would be useful for the reader to make themselves familiar with Fluid [Distributed Training Architecture](./distributed_architecture.md) and
[Parameter Server](./parameter_server.md).
### Partationed Parameter
The execution of `lookup local table` is as follows:
<img src="src/split_parameter.png" width="400" />
<img src="src/lookup_local_table.png" width="400" />
- **Distributed Transpiler** would split the large parameters
(`weight`) into some partitioned parameters (`weight_0`, `weight_1`, `weight_2`) as shown in the
figure above.
- We can use `round-robin` to distribute the partitioned parameter.
For some cases, the parameter(`weight`) may be very large, such as 10 billion features, the entire
data could not be stored in one trainer's memory, so we need to partition this parameter and
pre-fetch it at the beginning of each mini-batch, and we call it `lookup remote table`:
### Pre-fetching Parameters
<img src="src/lookup_remote_table.png" width="400">
<img src="src/prefetch_parameters.png" width="400" />
The processing flow of `lookup remote table` is as follows:
- `prefetch_rpc` operator would prefetch the parameter from different Parameter
1. partitioned parameter
<img src="src/split_parameter.png" width="400" />
- **Distributed Transpiler** would split the large parameters
(`weight`) into some partitioned parameters (`weight_0`, `weight_1`, `weight_2`) as shown in the figure above.
- We can use `round-robin` to distribute the partitioned parameter.
1. pre-fetching parameter at the beginning of each mini-batch
- `prefetch_rpc` operator would prefetch the parameter from different Parameter
Servers using the input `Ids`. We use [SelectedRows](../../../design/selected_rows.md)
as the received variable type.
- `merge_selected_rows` operator would merge the received parameters into one
- `merge_selected_rows` operator would merge the received parameters into one
`SelectedRows` variable.
## TODO
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册