Design Doc: The C++ Class Parameters¶
Parameters is a concept we designed in PaddlePaddle V2 API. Parameters is a container of parameters, which makes PaddlePaddle capable of  sharing parameter between topologies. We described usages of Parameter in api.md.
We used Python to implement Parameters when designing V2 API before. There are several defects for the current implementation:
- We just use memcpyto share Parameters between topologies, but this is very inefficient.
- We did not support sharing Parameters while training. We just trigger memcpywhen start training.
It is necessary that we implement Parameters in CPP side. However, it could result a code refactoring for PaddlePaddle, because PaddlePaddle was designed for training only one topology before, i.e., each GradientMachine contains its Parameter as a data member. In current PaddlePaddle implementation, there are three concepts associated with Parameters:
- paddle::Parameter. A- Parametersis a container for- paddle::Parameter. It is evident that we should use- paddle::Parameterwhen developing- Parameters. However, the- Parameterclass contains many functions and does not have a clear interface. It contains- create/store Parameter,- serialize/deserialize,- optimize(i.e SGD),- randomize/zero. When we developing- Parameters, we only use- create/store Parameterfunctionality. We should extract functionalities of Parameter into many classes to clean PaddlePaddle CPP implementation.
- paddle::GradientMachineand its sub-classes, e.g.,- paddle::MultiGradientMachine,- paddle::NeuralNetwork. We should pass- Parametersto- paddle::GradientMachinewhen- forward/backwardto avoid- memcpybetween topologies. Also, we should handle multi-GPU/CPU training, because- forwardand- backwardwould perform on multi-GPUs and multi-CPUs.- Parametersshould dispatch the parameter value to each device, and gather the parameter gradient from each device.
- paddle::ParameterUpdater. The ParameterUpdater is used to update parameters in Paddle. So- Parametersshould be used by- paddle::ParameterUpdater, and- paddle::ParameterUpdatershould optimize- Parameters(by SGD).
The step by step approach for implementation Parameters in PaddlePaddle C++ core is listed below. Each step should be a PR and could be merged into PaddlePaddle one by one.
- Clean paddle::Parameterinterface. Extract the functionalities ofpaddle::Parameterto prepare for the implementation of Parameters.
- Implementation a Parametersclass. It just stores thepaddle::Parameterinside. MakeGradientMachineusesParametersas a class member.
- Make Parameterssupport Multi-CPU and Multi-GPU training to prepare for sharingParameterbetween topologies. Because we need shareParametersbetween topologies, it isParameters‘s response to exchange Parameters between GPUs.GradientMachineshould not handle how to exchange Parameters becauseGradientMachineonly used to train one topology and we need to support train many topologies in Paddle, i.e., there could be many GradientMachines use oneParameters.- We should use a global function to exchange Parameters between GPUs, not a member function in Parameters. TheMultiGradientMachineinvoke this function, which usesParametersas this function inputs.
- The MultiGradientMachine contains many functionalities. Extracting the Parameters exchanging logic could make MultiGradientMachine clearer and simpler.
 
- We should use a global function to exchange Parameters between GPUs, not a member function in 
- Make Parametersas an argument forforward/backwardfunction, not a data member forGradientMachine. For example,forwardcould beforward(const Parameters& params, ...)andbackwardcould bebackward(Parameters* params, ...). After this step, Paddle could shareParametersbetween topologies.
- ParameterUpdateris invoked by- GradientMachineand- Trainer, but it updates- Parameters. In the end of this code refactoring, we could change- ParameterUpdaterdirectly uses- Parametersto make- ParameterUpdater‘s implementation clear.
