Design Doc: The C++ Class Parameters¶
Parameters is a concept we designed in PaddlePaddle V2 API. Parameters is a container of parameters, which makes PaddlePaddle capable of sharing parameter between topologies. We described usages of Parameter in api.md.
We used Python to implement Parameters when designing V2 API before. There are several defects for the current implementation:
- We just use
memcpyto share Parameters between topologies, but this is very inefficient. - We did not support sharing Parameters while training. We just trigger
memcpywhen start training.
It is necessary that we implement Parameters in CPP side. However, it could result a code refactoring for PaddlePaddle, because PaddlePaddle was designed for training only one topology before, i.e., each GradientMachine contains its Parameter as a data member. In current PaddlePaddle implementation, there are three concepts associated with Parameters:
paddle::Parameter. AParametersis a container forpaddle::Parameter. It is evident that we should usepaddle::Parameterwhen developingParameters. However, theParameterclass contains many functions and does not have a clear interface. It containscreate/store Parameter,serialize/deserialize,optimize(i.e SGD),randomize/zero. When we developingParameters, we only usecreate/store Parameterfunctionality. We should extract functionalities of Parameter into many classes to clean PaddlePaddle CPP implementation.paddle::GradientMachineand its sub-classes, e.g.,paddle::MultiGradientMachine,paddle::NeuralNetwork. We should passParameterstopaddle::GradientMachinewhenforward/backwardto avoidmemcpybetween topologies. Also, we should handle multi-GPU/CPU training, becauseforwardandbackwardwould perform on multi-GPUs and multi-CPUs.Parametersshould dispatch the parameter value to each device, and gather the parameter gradient from each device.paddle::ParameterUpdater. The ParameterUpdater is used to update parameters in Paddle. SoParametersshould be used bypaddle::ParameterUpdater, andpaddle::ParameterUpdatershould optimizeParameters(by SGD).
The step by step approach for implementation Parameters in PaddlePaddle C++ core is listed below. Each step should be a PR and could be merged into PaddlePaddle one by one.
- Clean
paddle::Parameterinterface. Extract the functionalities ofpaddle::Parameterto prepare for the implementation of Parameters. - Implementation a
Parametersclass. It just stores thepaddle::Parameterinside. MakeGradientMachineusesParametersas a class member. - Make
Parameterssupport Multi-CPU and Multi-GPU training to prepare for sharingParameterbetween topologies. Because we need shareParametersbetween topologies, it isParameters‘s response to exchange Parameters between GPUs.GradientMachineshould not handle how to exchange Parameters becauseGradientMachineonly used to train one topology and we need to support train many topologies in Paddle, i.e., there could be many GradientMachines use oneParameters.- We should use a global function to exchange Parameters between GPUs, not a member function in
Parameters. TheMultiGradientMachineinvoke this function, which usesParametersas this function inputs. - The MultiGradientMachine contains many functionalities. Extracting the Parameters exchanging logic could make MultiGradientMachine clearer and simpler.
- We should use a global function to exchange Parameters between GPUs, not a member function in
- Make
Parametersas an argument forforward/backwardfunction, not a data member forGradientMachine. For example,forwardcould beforward(const Parameters& params, ...)andbackwardcould bebackward(Parameters* params, ...). After this step, Paddle could shareParametersbetween topologies. ParameterUpdateris invoked byGradientMachineandTrainer, but it updatesParameters. In the end of this code refactoring, we could changeParameterUpdaterdirectly usesParametersto makeParameterUpdater‘s implementation clear.
