diff --git a/doc/fluid/design/dist_train/prefetch_parameter.md b/doc/fluid/design/dist_train/prefetch_parameter.md index ac56502c280d540592d73685a2c018b53f5e73e1..e8ea7fe67b9a5666a88aa952c86477c0bc01338e 100644 --- a/doc/fluid/design/dist_train/prefetch_parameter.md +++ b/doc/fluid/design/dist_train/prefetch_parameter.md @@ -2,40 +2,33 @@ ## Abstract -We propose an approach to prefetch parameter from Parameter -Server while distributed training so that Fluid would training -a model including the large parameter which could not be stored in one -trainer's memory. +We propose an approach to pre-fetch the parameters from a Parameter Server while distributed training so that Fluid is able to train a model with a large number of parameters that cannot be stored in one trainer's memory. ## Background -For an embedding layer, the trainable parameter may be very large and could -not be stored in one trainer's memory. In Fluid distributed training, -[Distributed Transpiler](./parameter_server.md#distributed-transpiler) would split every parameter into a number of small -parameters and stored in Parameter Server, so we could prefetch the parameter -from the specified Parameter Server according to the input `Ids`. +For an embedding layer, the number of trainable parameters may be very large and it is likely that they may not be able to be stored in one trainer's memory. In Fluid distributed training, +the [Distributed Transpiler](./parameter_server.md#distributed-transpiler) would split every parameter into a number of small parameters that are stored on the Parameter Server. Hence, we can pre-fetch the parameters from the specified Parameter Server using the input `Ids`. ## Design -This is a feature of Fluid distributed training, maybe you want -to know [Distributed Architecture](./distributed_architecture.md) and -[Parameter Server](./parameter_server.md) before reading the following content. +Prior to reading this design, it would be useful for the reader to make themselves familiar with Fluid [Distributed Training Architecture](./distributed_architecture.md) and +[Parameter Server](./parameter_server.md). ### Partationed Parameter -- **Distributed Transpiler** would split the large parameter -(weight) into some partitioned parameters (weight_0, weight_1, weight_2) as the +- **Distributed Transpiler** would split the large parameters +(`weight`) into some partitioned parameters (`weight_0`, `weight_1`, `weight_2`) as shown in the figure above. -- We could use `round-robin` to distribute the partitioned parameter. +- We can use `round-robin` to distribute the partitioned parameter. -### Prefetching Parameter +### Pre-fetching Parameters - `prefetch_rpc` operator would prefetch the parameter from different Parameter - Server according with the input `Ids`, we use [SelectedRows](../../../design/selected_rows.md) + Servers using the input `Ids`. We use [SelectedRows](../../../design/selected_rows.md) as the received variable type. - `merge_selected_rows` operator would merge the received parameters into one `SelectedRows` variable.