# Design Doc: Large Model ## Abstract We propose an approach to support the large parameter. For embedding layer, the parameter may very large and could not be stored in one trainer's memory. In this approach, a Trainer would prefetch a sliced parameter from different Parameter Server instances according to the input `Ids`, and then run forward, backward and send the gradient to Parameter Server to execute the optimize program. ## Design **NOTE**: this approach is a feature of Fluid distributed trianing, maybe you want to know [Distributed Architecture](./distributed_architecture.md) and [Parameter Server](./parameter_server.md) before reading the following content. Fluid large model distributed training use [Distributed Transpiler](./parameter_server.md#distributed-transpiler) to split a large parameter into multiple parameters which stored on Parameter Server, and the Trainer would prefetch them by `RPC` interface. ### Split Large Parameter **Distributed Transpiler** would split the large parameter (weight) into some sliced parameters (weight_0, weight_1, weight_2) as the figure above. ### Prefetch Parameters from Parameter Servers - `PrefetchRpc` operator would send the rows index the multiple Parameter Servers, and then receive the SelctedRows. - The different with normal Fluid distributed training, we only prefetch the rows ## TODO - Async Update To avoid slow-node, Async update is important for distributed training, we need an design doc and implement it in future.