`Parameters` is a concept we designed in Paddle V2 API. `Parameters` is a container of parameters, and make Paddle can shared parameter between topologies. We described usages of `Parameter` in [api.md](./api.md).
We used Python to implement Parameters when designing V2 API before. There are several defects for current implementation:
* We just use `memcpy` to share Parameters between topologies, but this is very inefficient.
* We did not implement share Parameters while training. We just trigger `memcpy` when start training.
It is necessary that we implement Parameters in CPP side. However, it could be a code refactoring for Paddle, because Paddle was designed for training only one topology before, i.e., each GradientMachine contains its Parameter as a data member. In current Paddle implementation, there are three concepts associated with `Parameters`:
1. `paddle::Parameter`. A `Parameters` is a container for `paddle::Parameter`.
It is evident that we should use `paddle::Parameter` when developing `Parameters`.
However, the `Parameter` class contains many functions and does not have a clear interface.
It contains `create/store Parameter`, `serialize/deserialize`, `optimize(i.e SGD)`, `randomize/zero`.
When we developing `Parameters`, we only use `create/store Parameter` functionality.
We should extract functionalities of Parameter into many classes to clean Paddle CPP implementation.
2. `paddle::GradientMachine` and its sub-classes, e.g., `paddle::MultiGradientMachine`, `paddle::NeuralNetwork`.
We should pass `Parameters` to `paddle::GradientMachine` when `forward/backward` to avoid `memcpy` between topologies.
Also, we should handle multi-GPU/CPU training, because `forward` and `backward` would perform on multi-GPUs and multi-CPUs.
`Parameters` should dispatch the parameter value to each device, and gather the parameter gradient from each device.
3. `paddle::ParameterUpdater`. The ParameterUpdater is used to update parameters in Paddle.
So `Parameters` should be used by `paddle::ParameterUpdater`, and `paddle::ParameterUpdater` should optimize `Parameters` (by SGD).
The step by step approach for implementation Parameters in Paddle C++ core is listed below. Each step should be a PR and could be merged into Paddle one by one.
1. Clean `paddle::Parameter` interface. Extract the functionalities of `paddle::Parameter` to prepare for the implementation of Parameters.
2. Implementation a `Parameters` class. It just stores the `paddle::Parameter` inside. Make `GradientMachine` uses `Parameters` as a class member.
3. Make `Parameters` support Multi-CPU and Multi-GPU training to prepare for sharing `Parameter` between topologies.
Because we need share `Parameters` between topologies, it is `Parameters`'s response to exchange Parameters between GPUs.
`GradientMachine` should not handle how to exchange Parameters because `GradientMachine` only used to train one topology and we need to support train many topologies in Paddle, i.e., there could be many GradientMachines use one `Parameters`.
* We should use a global function to exchange Parameters between GPUs, not a member function in `Parameters`. The `MultiGradientMachine` invoke this function, which uses `Parameters` as this function inputs.
* The MultiGradientMachine contains many functionalities. Extracting the Parameters exchanging logic could make MultiGradientMachine clearer and simpler.
4. Make `Parameters` as an argument for `forward/backward` function, not a data member for `GradientMachine`. For example, `forward` could be `forward(const Parameters& params, ...)` and `backward` could be `backward(Parameters* params, ...)`. After this step, Paddle could share `Parameters` between topologies.
5. `ParameterUpdater` is invoked by `GradientMachine` and `Trainer`, but it updates `Parameters`. In the end of this code refactoring, we could change `ParameterUpdater` directly uses `Parameters` to make `ParameterUpdater`'s implementation clear.
<liclass="toctree-l2"><aclass="reference internal"href="../getstarted/build_and_install/index_en.html">Install and Build</a><ul>
<liclass="toctree-l3"><aclass="reference internal"href="../getstarted/build_and_install/docker_install_en.html">PaddlePaddle in Docker Containers</a></li>
<liclass="toctree-l2"><aclass="reference internal"href="../howto/usage/k8s/k8s_en.html">Paddle On Kubernetes</a></li>
<liclass="toctree-l2"><aclass="reference internal"href="../howto/usage/k8s/k8s_aws_en.html">Distributed PaddlePaddle Training on AWS with Kubernetes</a></li>
<liclass="toctree-l2"><aclass="reference internal"href="../howto/dev/new_layer_en.html">Write New Layers</a></li>
<spanid="design-doc-the-c-class-parameters"></span><h1>Design Doc: The C++ Class <codeclass="docutils literal"><spanclass="pre">Parameters</span></code><aclass="headerlink"href="#design-doc-the-c-class-parameters"title="Permalink to this headline">¶</a></h1>
<p><codeclass="docutils literal"><spanclass="pre">Parameters</span></code> is a concept we designed in Paddle V2 API. <codeclass="docutils literal"><spanclass="pre">Parameters</span></code> is a container of parameters, and make Paddle can shared parameter between topologies. We described usages of <codeclass="docutils literal"><spanclass="pre">Parameter</span></code> in <aclass="reference internal"href="api.html"><spanclass="doc">api.md</span></a>.</p>
<p>We used Python to implement Parameters when designing V2 API before. There are several defects for current implementation:</p>
<ulclass="simple">
<li>We just use <codeclass="docutils literal"><spanclass="pre">memcpy</span></code> to share Parameters between topologies, but this is very inefficient.</li>
<li>We did not implement share Parameters while training. We just trigger <codeclass="docutils literal"><spanclass="pre">memcpy</span></code> when start training.</li>
</ul>
<p>It is necessary that we implement Parameters in CPP side. However, it could be a code refactoring for Paddle, because Paddle was designed for training only one topology before, i.e., each GradientMachine contains its Parameter as a data member. In current Paddle implementation, there are three concepts associated with <codeclass="docutils literal"><spanclass="pre">Parameters</span></code>:</p>
<olclass="simple">
<li><codeclass="docutils literal"><spanclass="pre">paddle::Parameter</span></code>. A <codeclass="docutils literal"><spanclass="pre">Parameters</span></code> is a container for <codeclass="docutils literal"><spanclass="pre">paddle::Parameter</span></code>.
It is evident that we should use <codeclass="docutils literal"><spanclass="pre">paddle::Parameter</span></code> when developing <codeclass="docutils literal"><spanclass="pre">Parameters</span></code>.
However, the <codeclass="docutils literal"><spanclass="pre">Parameter</span></code> class contains many functions and does not have a clear interface.
It contains <codeclass="docutils literal"><spanclass="pre">create/store</span><spanclass="pre">Parameter</span></code>, <codeclass="docutils literal"><spanclass="pre">serialize/deserialize</span></code>, <codeclass="docutils literal"><spanclass="pre">optimize(i.e</span><spanclass="pre">SGD)</span></code>, <codeclass="docutils literal"><spanclass="pre">randomize/zero</span></code>.
When we developing <codeclass="docutils literal"><spanclass="pre">Parameters</span></code>, we only use <codeclass="docutils literal"><spanclass="pre">create/store</span><spanclass="pre">Parameter</span></code> functionality.
We should extract functionalities of Parameter into many classes to clean Paddle CPP implementation.</li>
<li><codeclass="docutils literal"><spanclass="pre">paddle::GradientMachine</span></code> and its sub-classes, e.g., <codeclass="docutils literal"><spanclass="pre">paddle::MultiGradientMachine</span></code>, <codeclass="docutils literal"><spanclass="pre">paddle::NeuralNetwork</span></code>.
We should pass <codeclass="docutils literal"><spanclass="pre">Parameters</span></code> to <codeclass="docutils literal"><spanclass="pre">paddle::GradientMachine</span></code> when <codeclass="docutils literal"><spanclass="pre">forward/backward</span></code> to avoid <codeclass="docutils literal"><spanclass="pre">memcpy</span></code> between topologies.
Also, we should handle multi-GPU/CPU training, because <codeclass="docutils literal"><spanclass="pre">forward</span></code> and <codeclass="docutils literal"><spanclass="pre">backward</span></code> would perform on multi-GPUs and multi-CPUs.
<codeclass="docutils literal"><spanclass="pre">Parameters</span></code> should dispatch the parameter value to each device, and gather the parameter gradient from each device.</li>
<li><codeclass="docutils literal"><spanclass="pre">paddle::ParameterUpdater</span></code>. The ParameterUpdater is used to update parameters in Paddle.
So <codeclass="docutils literal"><spanclass="pre">Parameters</span></code> should be used by <codeclass="docutils literal"><spanclass="pre">paddle::ParameterUpdater</span></code>, and <codeclass="docutils literal"><spanclass="pre">paddle::ParameterUpdater</span></code> should optimize <codeclass="docutils literal"><spanclass="pre">Parameters</span></code> (by SGD).</li>
</ol>
<p>The step by step approach for implementation Parameters in Paddle C++ core is listed below. Each step should be a PR and could be merged into Paddle one by one.</p>
<olclass="simple">
<li>Clean <codeclass="docutils literal"><spanclass="pre">paddle::Parameter</span></code> interface. Extract the functionalities of <codeclass="docutils literal"><spanclass="pre">paddle::Parameter</span></code> to prepare for the implementation of Parameters.</li>
<li>Implementation a <codeclass="docutils literal"><spanclass="pre">Parameters</span></code> class. It just stores the <codeclass="docutils literal"><spanclass="pre">paddle::Parameter</span></code> inside. Make <codeclass="docutils literal"><spanclass="pre">GradientMachine</span></code> uses <codeclass="docutils literal"><spanclass="pre">Parameters</span></code> as a class member.</li>
<li>Make <codeclass="docutils literal"><spanclass="pre">Parameters</span></code> support Multi-CPU and Multi-GPU training to prepare for sharing <codeclass="docutils literal"><spanclass="pre">Parameter</span></code> between topologies.
Because we need share <codeclass="docutils literal"><spanclass="pre">Parameters</span></code> between topologies, it is <codeclass="docutils literal"><spanclass="pre">Parameters</span></code>‘s response to exchange Parameters between GPUs.
<codeclass="docutils literal"><spanclass="pre">GradientMachine</span></code> should not handle how to exchange Parameters because <codeclass="docutils literal"><spanclass="pre">GradientMachine</span></code> only used to train one topology and we need to support train many topologies in Paddle, i.e., there could be many GradientMachines use one <codeclass="docutils literal"><spanclass="pre">Parameters</span></code>.<ul>
<li>We should use a global function to exchange Parameters between GPUs, not a member function in <codeclass="docutils literal"><spanclass="pre">Parameters</span></code>. The <codeclass="docutils literal"><spanclass="pre">MultiGradientMachine</span></code> invoke this function, which uses <codeclass="docutils literal"><spanclass="pre">Parameters</span></code> as this function inputs.</li>
<li>The MultiGradientMachine contains many functionalities. Extracting the Parameters exchanging logic could make MultiGradientMachine clearer and simpler.</li>
</ul>
</li>
<li>Make <codeclass="docutils literal"><spanclass="pre">Parameters</span></code> as an argument for <codeclass="docutils literal"><spanclass="pre">forward/backward</span></code> function, not a data member for <codeclass="docutils literal"><spanclass="pre">GradientMachine</span></code>. For example, <codeclass="docutils literal"><spanclass="pre">forward</span></code> could be <codeclass="docutils literal"><spanclass="pre">forward(const</span><spanclass="pre">Parameters&</span><spanclass="pre">params,</span><spanclass="pre">...)</span></code> and <codeclass="docutils literal"><spanclass="pre">backward</span></code> could be <codeclass="docutils literal"><spanclass="pre">backward(Parameters*</span><spanclass="pre">params,</span><spanclass="pre">...)</span></code>. After this step, Paddle could share <codeclass="docutils literal"><spanclass="pre">Parameters</span></code> between topologies.</li>
<li><codeclass="docutils literal"><spanclass="pre">ParameterUpdater</span></code> is invoked by <codeclass="docutils literal"><spanclass="pre">GradientMachine</span></code> and <codeclass="docutils literal"><spanclass="pre">Trainer</span></code>, but it updates <codeclass="docutils literal"><spanclass="pre">Parameters</span></code>. In the end of this code refactoring, we could change <codeclass="docutils literal"><spanclass="pre">ParameterUpdater</span></code> directly uses <codeclass="docutils literal"><spanclass="pre">Parameters</span></code> to make <codeclass="docutils literal"><spanclass="pre">ParameterUpdater</span></code>‘s implementation clear.</li>
Built with <ahref="http://sphinx-doc.org/">Sphinx</a> using a <ahref="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <ahref="https://readthedocs.org">Read the Docs</a>.
`Parameters` is a concept we designed in Paddle V2 API. `Parameters` is a container of parameters, and make Paddle can shared parameter between topologies. We described usages of `Parameter` in [api.md](./api.md).
We used Python to implement Parameters when designing V2 API before. There are several defects for current implementation:
* We just use `memcpy` to share Parameters between topologies, but this is very inefficient.
* We did not implement share Parameters while training. We just trigger `memcpy` when start training.
It is necessary that we implement Parameters in CPP side. However, it could be a code refactoring for Paddle, because Paddle was designed for training only one topology before, i.e., each GradientMachine contains its Parameter as a data member. In current Paddle implementation, there are three concepts associated with `Parameters`:
1. `paddle::Parameter`. A `Parameters` is a container for `paddle::Parameter`.
It is evident that we should use `paddle::Parameter` when developing `Parameters`.
However, the `Parameter` class contains many functions and does not have a clear interface.
It contains `create/store Parameter`, `serialize/deserialize`, `optimize(i.e SGD)`, `randomize/zero`.
When we developing `Parameters`, we only use `create/store Parameter` functionality.
We should extract functionalities of Parameter into many classes to clean Paddle CPP implementation.
2. `paddle::GradientMachine` and its sub-classes, e.g., `paddle::MultiGradientMachine`, `paddle::NeuralNetwork`.
We should pass `Parameters` to `paddle::GradientMachine` when `forward/backward` to avoid `memcpy` between topologies.
Also, we should handle multi-GPU/CPU training, because `forward` and `backward` would perform on multi-GPUs and multi-CPUs.
`Parameters` should dispatch the parameter value to each device, and gather the parameter gradient from each device.
3. `paddle::ParameterUpdater`. The ParameterUpdater is used to update parameters in Paddle.
So `Parameters` should be used by `paddle::ParameterUpdater`, and `paddle::ParameterUpdater` should optimize `Parameters` (by SGD).
The step by step approach for implementation Parameters in Paddle C++ core is listed below. Each step should be a PR and could be merged into Paddle one by one.
1. Clean `paddle::Parameter` interface. Extract the functionalities of `paddle::Parameter` to prepare for the implementation of Parameters.
2. Implementation a `Parameters` class. It just stores the `paddle::Parameter` inside. Make `GradientMachine` uses `Parameters` as a class member.
3. Make `Parameters` support Multi-CPU and Multi-GPU training to prepare for sharing `Parameter` between topologies.
Because we need share `Parameters` between topologies, it is `Parameters`'s response to exchange Parameters between GPUs.
`GradientMachine` should not handle how to exchange Parameters because `GradientMachine` only used to train one topology and we need to support train many topologies in Paddle, i.e., there could be many GradientMachines use one `Parameters`.
* We should use a global function to exchange Parameters between GPUs, not a member function in `Parameters`. The `MultiGradientMachine` invoke this function, which uses `Parameters` as this function inputs.
* The MultiGradientMachine contains many functionalities. Extracting the Parameters exchanging logic could make MultiGradientMachine clearer and simpler.
4. Make `Parameters` as an argument for `forward/backward` function, not a data member for `GradientMachine`. For example, `forward` could be `forward(const Parameters& params, ...)` and `backward` could be `backward(Parameters* params, ...)`. After this step, Paddle could share `Parameters` between topologies.
5. `ParameterUpdater` is invoked by `GradientMachine` and `Trainer`, but it updates `Parameters`. In the end of this code refactoring, we could change `ParameterUpdater` directly uses `Parameters` to make `ParameterUpdater`'s implementation clear.
<spanid="design-doc-the-c-class-parameters"></span><h1>Design Doc: The C++ Class <codeclass="docutils literal"><spanclass="pre">Parameters</span></code><aclass="headerlink"href="#design-doc-the-c-class-parameters"title="永久链接至标题">¶</a></h1>
<p><codeclass="docutils literal"><spanclass="pre">Parameters</span></code> is a concept we designed in Paddle V2 API. <codeclass="docutils literal"><spanclass="pre">Parameters</span></code> is a container of parameters, and make Paddle can shared parameter between topologies. We described usages of <codeclass="docutils literal"><spanclass="pre">Parameter</span></code> in <aclass="reference internal"href="api.html"><spanclass="doc">api.md</span></a>.</p>
<p>We used Python to implement Parameters when designing V2 API before. There are several defects for current implementation:</p>
<ulclass="simple">
<li>We just use <codeclass="docutils literal"><spanclass="pre">memcpy</span></code> to share Parameters between topologies, but this is very inefficient.</li>
<li>We did not implement share Parameters while training. We just trigger <codeclass="docutils literal"><spanclass="pre">memcpy</span></code> when start training.</li>
</ul>
<p>It is necessary that we implement Parameters in CPP side. However, it could be a code refactoring for Paddle, because Paddle was designed for training only one topology before, i.e., each GradientMachine contains its Parameter as a data member. In current Paddle implementation, there are three concepts associated with <codeclass="docutils literal"><spanclass="pre">Parameters</span></code>:</p>
<olclass="simple">
<li><codeclass="docutils literal"><spanclass="pre">paddle::Parameter</span></code>. A <codeclass="docutils literal"><spanclass="pre">Parameters</span></code> is a container for <codeclass="docutils literal"><spanclass="pre">paddle::Parameter</span></code>.
It is evident that we should use <codeclass="docutils literal"><spanclass="pre">paddle::Parameter</span></code> when developing <codeclass="docutils literal"><spanclass="pre">Parameters</span></code>.
However, the <codeclass="docutils literal"><spanclass="pre">Parameter</span></code> class contains many functions and does not have a clear interface.
It contains <codeclass="docutils literal"><spanclass="pre">create/store</span><spanclass="pre">Parameter</span></code>, <codeclass="docutils literal"><spanclass="pre">serialize/deserialize</span></code>, <codeclass="docutils literal"><spanclass="pre">optimize(i.e</span><spanclass="pre">SGD)</span></code>, <codeclass="docutils literal"><spanclass="pre">randomize/zero</span></code>.
When we developing <codeclass="docutils literal"><spanclass="pre">Parameters</span></code>, we only use <codeclass="docutils literal"><spanclass="pre">create/store</span><spanclass="pre">Parameter</span></code> functionality.
We should extract functionalities of Parameter into many classes to clean Paddle CPP implementation.</li>
<li><codeclass="docutils literal"><spanclass="pre">paddle::GradientMachine</span></code> and its sub-classes, e.g., <codeclass="docutils literal"><spanclass="pre">paddle::MultiGradientMachine</span></code>, <codeclass="docutils literal"><spanclass="pre">paddle::NeuralNetwork</span></code>.
We should pass <codeclass="docutils literal"><spanclass="pre">Parameters</span></code> to <codeclass="docutils literal"><spanclass="pre">paddle::GradientMachine</span></code> when <codeclass="docutils literal"><spanclass="pre">forward/backward</span></code> to avoid <codeclass="docutils literal"><spanclass="pre">memcpy</span></code> between topologies.
Also, we should handle multi-GPU/CPU training, because <codeclass="docutils literal"><spanclass="pre">forward</span></code> and <codeclass="docutils literal"><spanclass="pre">backward</span></code> would perform on multi-GPUs and multi-CPUs.
<codeclass="docutils literal"><spanclass="pre">Parameters</span></code> should dispatch the parameter value to each device, and gather the parameter gradient from each device.</li>
<li><codeclass="docutils literal"><spanclass="pre">paddle::ParameterUpdater</span></code>. The ParameterUpdater is used to update parameters in Paddle.
So <codeclass="docutils literal"><spanclass="pre">Parameters</span></code> should be used by <codeclass="docutils literal"><spanclass="pre">paddle::ParameterUpdater</span></code>, and <codeclass="docutils literal"><spanclass="pre">paddle::ParameterUpdater</span></code> should optimize <codeclass="docutils literal"><spanclass="pre">Parameters</span></code> (by SGD).</li>
</ol>
<p>The step by step approach for implementation Parameters in Paddle C++ core is listed below. Each step should be a PR and could be merged into Paddle one by one.</p>
<olclass="simple">
<li>Clean <codeclass="docutils literal"><spanclass="pre">paddle::Parameter</span></code> interface. Extract the functionalities of <codeclass="docutils literal"><spanclass="pre">paddle::Parameter</span></code> to prepare for the implementation of Parameters.</li>
<li>Implementation a <codeclass="docutils literal"><spanclass="pre">Parameters</span></code> class. It just stores the <codeclass="docutils literal"><spanclass="pre">paddle::Parameter</span></code> inside. Make <codeclass="docutils literal"><spanclass="pre">GradientMachine</span></code> uses <codeclass="docutils literal"><spanclass="pre">Parameters</span></code> as a class member.</li>
<li>Make <codeclass="docutils literal"><spanclass="pre">Parameters</span></code> support Multi-CPU and Multi-GPU training to prepare for sharing <codeclass="docutils literal"><spanclass="pre">Parameter</span></code> between topologies.
Because we need share <codeclass="docutils literal"><spanclass="pre">Parameters</span></code> between topologies, it is <codeclass="docutils literal"><spanclass="pre">Parameters</span></code>‘s response to exchange Parameters between GPUs.
<codeclass="docutils literal"><spanclass="pre">GradientMachine</span></code> should not handle how to exchange Parameters because <codeclass="docutils literal"><spanclass="pre">GradientMachine</span></code> only used to train one topology and we need to support train many topologies in Paddle, i.e., there could be many GradientMachines use one <codeclass="docutils literal"><spanclass="pre">Parameters</span></code>.<ul>
<li>We should use a global function to exchange Parameters between GPUs, not a member function in <codeclass="docutils literal"><spanclass="pre">Parameters</span></code>. The <codeclass="docutils literal"><spanclass="pre">MultiGradientMachine</span></code> invoke this function, which uses <codeclass="docutils literal"><spanclass="pre">Parameters</span></code> as this function inputs.</li>
<li>The MultiGradientMachine contains many functionalities. Extracting the Parameters exchanging logic could make MultiGradientMachine clearer and simpler.</li>
</ul>
</li>
<li>Make <codeclass="docutils literal"><spanclass="pre">Parameters</span></code> as an argument for <codeclass="docutils literal"><spanclass="pre">forward/backward</span></code> function, not a data member for <codeclass="docutils literal"><spanclass="pre">GradientMachine</span></code>. For example, <codeclass="docutils literal"><spanclass="pre">forward</span></code> could be <codeclass="docutils literal"><spanclass="pre">forward(const</span><spanclass="pre">Parameters&</span><spanclass="pre">params,</span><spanclass="pre">...)</span></code> and <codeclass="docutils literal"><spanclass="pre">backward</span></code> could be <codeclass="docutils literal"><spanclass="pre">backward(Parameters*</span><spanclass="pre">params,</span><spanclass="pre">...)</span></code>. After this step, Paddle could share <codeclass="docutils literal"><spanclass="pre">Parameters</span></code> between topologies.</li>
<li><codeclass="docutils literal"><spanclass="pre">ParameterUpdater</span></code> is invoked by <codeclass="docutils literal"><spanclass="pre">GradientMachine</span></code> and <codeclass="docutils literal"><spanclass="pre">Trainer</span></code>, but it updates <codeclass="docutils literal"><spanclass="pre">Parameters</span></code>. In the end of this code refactoring, we could change <codeclass="docutils literal"><spanclass="pre">ParameterUpdater</span></code> directly uses <codeclass="docutils literal"><spanclass="pre">Parameters</span></code> to make <codeclass="docutils literal"><spanclass="pre">ParameterUpdater</span></code>‘s implementation clear.</li>
Built with <ahref="http://sphinx-doc.org/">Sphinx</a> using a <ahref="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <ahref="https://readthedocs.org">Read the Docs</a>.