提交 5948fd27 编写于 作者: A Abhinav Arora

Refine the prefetch parameter document

上级 302136e0
...@@ -2,40 +2,33 @@ ...@@ -2,40 +2,33 @@
## Abstract ## Abstract
We propose an approach to prefetch parameter from Parameter We propose an approach to pre-fetch the parameters from a Parameter Server while distributed training so that Fluid is able to train a model with a large number of parameters that cannot be stored in one trainer's memory.
Server while distributed training so that Fluid would training
a model including the large parameter which could not be stored in one
trainer's memory.
## Background ## Background
For an embedding layer, the trainable parameter may be very large and could For an embedding layer, the number of trainable parameters may be very large and it is likely that they may not be able to be stored in one trainer's memory. In Fluid distributed training,
not be stored in one trainer's memory. In Fluid distributed training, the [Distributed Transpiler](./parameter_server.md#distributed-transpiler) would split every parameter into a number of small parameters that are stored on the Parameter Server. Hence, we can pre-fetch the parameters from the specified Parameter Server using the input `Ids`.
[Distributed Transpiler](./parameter_server.md#distributed-transpiler) would split every parameter into a number of small
parameters and stored in Parameter Server, so we could prefetch the parameter
from the specified Parameter Server according to the input `Ids`.
## Design ## Design
This is a feature of Fluid distributed training, maybe you want Prior to reading this design, it would be useful for the reader to make themselves familiar with Fluid [Distributed Training Architecture](./distributed_architecture.md) and
to know [Distributed Architecture](./distributed_architecture.md) and [Parameter Server](./parameter_server.md).
[Parameter Server](./parameter_server.md) before reading the following content.
### Partationed Parameter ### Partationed Parameter
<img src="src/split_parameter.png" width="400" /> <img src="src/split_parameter.png" width="400" />
- **Distributed Transpiler** would split the large parameter - **Distributed Transpiler** would split the large parameters
(weight) into some partitioned parameters (weight_0, weight_1, weight_2) as the (`weight`) into some partitioned parameters (`weight_0`, `weight_1`, `weight_2`) as shown in the
figure above. figure above.
- We could use `round-robin` to distribute the partitioned parameter. - We can use `round-robin` to distribute the partitioned parameter.
### Prefetching Parameter ### Pre-fetching Parameters
<img src="src/prefetch_parameters.png" width="400" /> <img src="src/prefetch_parameters.png" width="400" />
- `prefetch_rpc` operator would prefetch the parameter from different Parameter - `prefetch_rpc` operator would prefetch the parameter from different Parameter
Server according with the input `Ids`, we use [SelectedRows](../../../design/selected_rows.md) Servers using the input `Ids`. We use [SelectedRows](../../../design/selected_rows.md)
as the received variable type. as the received variable type.
- `merge_selected_rows` operator would merge the received parameters into one - `merge_selected_rows` operator would merge the received parameters into one
`SelectedRows` variable. `SelectedRows` variable.
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册