Design Doc: The C++ Class Parameters
ΒΆ
Parameters
is a concept we designed in PaddlePaddle V2 API. Parameters
is a container of parameters, which makes PaddlePaddle capable of sharing parameter between topologies. We described usages of Parameter
in api.md.
We used Python to implement Parameters when designing V2 API before. There are several defects for the current implementation:
- We just use
memcpy
to share Parameters between topologies, but this is very inefficient. - We did not support sharing Parameters while training. We just trigger
memcpy
when start training.
It is necessary that we implement Parameters in CPP side. However, it could result a code refactoring for PaddlePaddle, because PaddlePaddle was designed for training only one topology before, i.e., each GradientMachine contains its Parameter as a data member. In current PaddlePaddle implementation, there are three concepts associated with Parameters
:
paddle::Parameter
. AParameters
is a container forpaddle::Parameter
. It is evident that we should usepaddle::Parameter
when developingParameters
. However, theParameter
class contains many functions and does not have a clear interface. It containscreate/store Parameter
,serialize/deserialize
,optimize(i.e SGD)
,randomize/zero
. When we developingParameters
, we only usecreate/store Parameter
functionality. We should extract functionalities of Parameter into many classes to clean PaddlePaddle CPP implementation.paddle::GradientMachine
and its sub-classes, e.g.,paddle::MultiGradientMachine
,paddle::NeuralNetwork
. We should passParameters
topaddle::GradientMachine
whenforward/backward
to avoidmemcpy
between topologies. Also, we should handle multi-GPU/CPU training, becauseforward
andbackward
would perform on multi-GPUs and multi-CPUs.Parameters
should dispatch the parameter value to each device, and gather the parameter gradient from each device.paddle::ParameterUpdater
. The ParameterUpdater is used to update parameters in Paddle. SoParameters
should be used bypaddle::ParameterUpdater
, andpaddle::ParameterUpdater
should optimizeParameters
(by SGD).
The step by step approach for implementation Parameters in PaddlePaddle C++ core is listed below. Each step should be a PR and could be merged into PaddlePaddle one by one.
- Clean
paddle::Parameter
interface. Extract the functionalities ofpaddle::Parameter
to prepare for the implementation of Parameters. - Implementation a
Parameters
class. It just stores thepaddle::Parameter
inside. MakeGradientMachine
usesParameters
as a class member. - Make
Parameters
support Multi-CPU and Multi-GPU training to prepare for sharingParameter
between topologies. Because we need shareParameters
between topologies, it isParameters
‘s response to exchange Parameters between GPUs.GradientMachine
should not handle how to exchange Parameters becauseGradientMachine
only used to train one topology and we need to support train many topologies in Paddle, i.e., there could be many GradientMachines use oneParameters
.- We should use a global function to exchange Parameters between GPUs, not a member function in
Parameters
. TheMultiGradientMachine
invoke this function, which usesParameters
as this function inputs. - The MultiGradientMachine contains many functionalities. Extracting the Parameters exchanging logic could make MultiGradientMachine clearer and simpler.
- We should use a global function to exchange Parameters between GPUs, not a member function in
- Make
Parameters
as an argument forforward/backward
function, not a data member forGradientMachine
. For example,forward
could beforward(const Parameters& params, ...)
andbackward
could bebackward(Parameters* params, ...)
. After this step, Paddle could shareParameters
between topologies. ParameterUpdater
is invoked byGradientMachine
andTrainer
, but it updatesParameters
. In the end of this code refactoring, we could changeParameterUpdater
directly usesParameters
to makeParameterUpdater
‘s implementation clear.