@@ -10,7 +10,7 @@ A dataset is a list of files in *RecordIO* format. A RecordIO file consists of c
## Task Queue
As mentioned in [distributed training design doc](./README.md), a *task* is a data shard that the master server assigns to the trainer process to train on. A task consists of one or multiple *blocks* from one or multiple files. The master server maintains *task queues* to track the training progress.
As mentioned in [distributed training design doc](./README.md), a *task* is a data shard that the master server assigns to the trainer process to train on. A task consists of one or multiple *chunks* from one or multiple files. The master server maintains *task queues* to track the training progress.
### Task Queue Creation
...
...
@@ -21,23 +21,23 @@ As mentioned in [distributed training design doc](./README.md), a *task* is a da
1. The master server will scan through each RecordIO file to generate the *block index* and know how many blocks does each file have. A block can be referenced by the file path and the index of the block within the file. The block index is in memory data structure that enables fast access to each block, and the index of the block with the file is an integer start from 0, representing the n-th block within the file.
1. The master server will scan through each RecordIO file to generate the *chunk index* and know how many chunks does each file have. A chunk can be referenced by the file path and the index of the chunk within the file. The chunk index is in memory data structure that enables fast access to each chunk, and the index of the chunk with the file is an integer start from 0, representing the n-th chunk within the file.
The definition of the block is:
The definition of the chunk is:
```go
type Block struct {
Idx int // index of the block within the file
type Chunk struct {
Idx int // index of the chunk within the file
Path string
Index recordio.Index // block index
Index recordio.Index // chunk index
}
```
1. Blocks are grouped into tasks, and tasks are filled into the todo queue. The pending queue and the done queue are initialized with no element.
1. Chunks are grouped into tasks, and tasks are filled into the todo queue. The pending queue and the done queue are initialized with no element.
@@ -55,7 +55,7 @@ The trainer select process is encapsulated in the C API function:
```c
int paddle_begin_init_params(paddle_pserver_client* client, const char* config_proto);
```
The selected trainer's call to `paddle_begin_init_params` will return with 1, and the other trainers' call to `paddle_begin_init_params` will block until initialization is done, and return 0. As illustrated below:
The selected trainer's call to `paddle_begin_init_params` will return with 1, and the other trainers' call to `paddle_begin_init_params` will return 0. `paddle_get_params` will be blocked until initialization is completed. As illustrated below:
`Parameters` is a concept we designed in Paddle V2 API. `Parameters` is a container of parameters, and make Paddle can shared parameter between topologies. We described usages of `Parameter` in [api.md](./api.md).
We used Python to implement Parameters when designing V2 API before. There are several defects for current implementation:
* We just use `memcpy` to share Parameters between topologies, but this is very inefficient.
* We did not implement share Parameters while training. We just trigger `memcpy` when start training.
It is necessary that we implement Parameters in CPP side. However, it could be a code refactoring for Paddle, because Paddle was designed for training only one topology before, i.e., each GradientMachine contains its Parameter as a data member. In current Paddle implementation, there are three concepts associated with `Parameters`:
1. `paddle::Parameter`. A `Parameters` is a container for `paddle::Parameter`.
It is evident that we should use `paddle::Parameter` when developing `Parameters`.
However, the `Parameter` class contains many functions and does not have a clear interface.
It contains `create/store Parameter`, `serialize/deserialize`, `optimize(i.e SGD)`, `randomize/zero`.
When we developing `Parameters`, we only use `create/store Parameter` functionality.
We should extract functionalities of Parameter into many classes to clean Paddle CPP implementation.
2. `paddle::GradientMachine` and its sub-classes, e.g., `paddle::MultiGradientMachine`, `paddle::NeuralNetwork`.
We should pass `Parameters` to `paddle::GradientMachine` when `forward/backward` to avoid `memcpy` between topologies.
Also, we should handle multi-GPU/CPU training, because `forward` and `backward` would perform on multi-GPUs and multi-CPUs.
`Parameters` should dispatch the parameter value to each device, and gather the parameter gradient from each device.
3. `paddle::ParameterUpdater`. The ParameterUpdater is used to update parameters in Paddle.
So `Parameters` should be used by `paddle::ParameterUpdater`, and `paddle::ParameterUpdater` should optimize `Parameters` (by SGD).
The step by step approach for implementation Parameters in Paddle C++ core is listed below. Each step should be a PR and could be merged into Paddle one by one.
1. Clean `paddle::Parameter` interface. Extract the functionalities of `paddle::Parameter` to prepare for the implementation of Parameters.
2. Implementation a `Parameters` class. It just stores the `paddle::Parameter` inside. Make `GradientMachine` uses `Parameters` as a class member.
3. Make `Parameters` support Multi-CPU and Multi-GPU training to prepare for sharing `Parameter` between topologies.
Because we need share `Parameters` between topologies, it is `Parameters`'s response to exchange Parameters between GPUs.
`GradientMachine` should not handle how to exchange Parameters because `GradientMachine` only used to train one topology and we need to support train many topologies in Paddle, i.e., there could be many GradientMachines use one `Parameters`.
* We should use a global function to exchange Parameters between GPUs, not a member function in `Parameters`. The `MultiGradientMachine` invoke this function, which uses `Parameters` as this function inputs.
* The MultiGradientMachine contains many functionalities. Extracting the Parameters exchanging logic could make MultiGradientMachine clearer and simpler.
4. Make `Parameters` as an argument for `forward/backward` function, not a data member for `GradientMachine`. For example, `forward` could be `forward(const Parameters& params, ...)` and `backward` could be `backward(Parameters* params, ...)`. After this step, Paddle could share `Parameters` between topologies.
5. `ParameterUpdater` is invoked by `GradientMachine` and `Trainer`, but it updates `Parameters`. In the end of this code refactoring, we could change `ParameterUpdater` directly uses `Parameters` to make `ParameterUpdater`'s implementation clear.
<spanid="task-queue"></span><h2>Task Queue<aclass="headerlink"href="#task-queue"title="Permalink to this headline">¶</a></h2>
<p>As mentioned in <aclass="reference internal"href="README.html"><spanclass="doc">distributed training design doc</span></a>, a <em>task</em> is a data shard that the master server assigns to the trainer process to train on. A task consists of one or multiple <em>blocks</em> from one or multiple files. The master server maintains <em>task queues</em> to track the training progress.</p>
<p>As mentioned in <aclass="reference internal"href="README.html"><spanclass="doc">distributed training design doc</span></a>, a <em>task</em> is a data shard that the master server assigns to the trainer process to train on. A task consists of one or multiple <em>chunks</em> from one or multiple files. The master server maintains <em>task queues</em> to track the training progress.</p>
<divclass="section"id="task-queue-creation">
<spanid="task-queue-creation"></span><h3>Task Queue Creation<aclass="headerlink"href="#task-queue-creation"title="Permalink to this headline">¶</a></h3>
<ol>
...
...
@@ -197,21 +197,21 @@
</pre></div>
</div>
</li>
<li><pclass="first">The master server will scan through each RecordIO file to generate the <em>block index</em> and know how many blocks does each file have. A block can be referenced by the file path and the index of the block within the file. The block index is in memory data structure that enables fast access to each block, and the index of the block with the file is an integer start from 0, representing the n-th block within the file.</p>
<spanclass="nx">Idx</span><spanclass="kt">int</span><spanclass="c1">// index of the block within the file</span>
<li><pclass="first">The master server will scan through each RecordIO file to generate the <em>chunk index</em> and know how many chunks does each file have. A chunk can be referenced by the file path and the index of the chunk within the file. The chunk index is in memory data structure that enables fast access to each chunk, and the index of the chunk with the file is an integer start from 0, representing the n-th chunk within the file.</p>
<li><pclass="first">Blocks are grouped into tasks, and tasks are filled into the todo queue. The pending queue and the done queue are initialized with no element.</p>
<li><pclass="first">Chunks are grouped into tasks, and tasks are filled into the todo queue. The pending queue and the done queue are initialized with no element.</p>
<p>The selected trainer’s call to <codeclass="docutils literal"><spanclass="pre">paddle_begin_init_params</span></code> will return with 1, and the other trainers’ call to <codeclass="docutils literal"><spanclass="pre">paddle_begin_init_params</span></code> will block until initialization is done, and return 0. As illustrated below:</p>
<p>The selected trainer’s call to <codeclass="docutils literal"><spanclass="pre">paddle_begin_init_params</span></code> will return with 1, and the other trainers’ call to <codeclass="docutils literal"><spanclass="pre">paddle_begin_init_params</span></code> will return 0. <codeclass="docutils literal"><spanclass="pre">paddle_get_params</span></code> will be blocked until initialization is completed. As illustrated below:</p>
<p><imgsrc="./src/pserver_init.png"></p>
</div>
</div>
...
...
@@ -259,16 +259,13 @@ name:sparse-n-1
<spanclass="cm"> *</span>
<spanclass="cm"> * paddle_begin_init_params will be called from multiple trainers,</span>
<spanclass="cm"> * only one trainer will be selected to initialize the parameters on</span>
<spanclass="cm"> * parameter servers. Other trainers will be blocked until the</span>
<spanclass="cm"> * initialization is done, and they need to get the initialized</span>
<spanclass="cm"> * parameter servers. Other trainers need to get the initialized</span>
<spanclass="cm"> * parameters from parameter servers using @paddle_get_params.</span>
<spanclass="cm"> *</span>
<spanclass="cm"> * @param pserver_config_proto serialized parameter server configuration in</span>
<liclass="toctree-l2"><aclass="reference internal"href="../getstarted/build_and_install/index_en.html">Install and Build</a><ul>
<liclass="toctree-l3"><aclass="reference internal"href="../getstarted/build_and_install/docker_install_en.html">PaddlePaddle in Docker Containers</a></li>
<liclass="toctree-l2"><aclass="reference internal"href="../howto/usage/k8s/k8s_en.html">Paddle On Kubernetes</a></li>
<liclass="toctree-l2"><aclass="reference internal"href="../howto/usage/k8s/k8s_aws_en.html">Distributed PaddlePaddle Training on AWS with Kubernetes</a></li>
<liclass="toctree-l2"><aclass="reference internal"href="../howto/dev/new_layer_en.html">Write New Layers</a></li>
<spanid="design-doc-the-c-class-parameters"></span><h1>Design Doc: The C++ Class <codeclass="docutils literal"><spanclass="pre">Parameters</span></code><aclass="headerlink"href="#design-doc-the-c-class-parameters"title="Permalink to this headline">¶</a></h1>
<p><codeclass="docutils literal"><spanclass="pre">Parameters</span></code> is a concept we designed in Paddle V2 API. <codeclass="docutils literal"><spanclass="pre">Parameters</span></code> is a container of parameters, and make Paddle can shared parameter between topologies. We described usages of <codeclass="docutils literal"><spanclass="pre">Parameter</span></code> in <aclass="reference internal"href="api.html"><spanclass="doc">api.md</span></a>.</p>
<p>We used Python to implement Parameters when designing V2 API before. There are several defects for current implementation:</p>
<ulclass="simple">
<li>We just use <codeclass="docutils literal"><spanclass="pre">memcpy</span></code> to share Parameters between topologies, but this is very inefficient.</li>
<li>We did not implement share Parameters while training. We just trigger <codeclass="docutils literal"><spanclass="pre">memcpy</span></code> when start training.</li>
</ul>
<p>It is necessary that we implement Parameters in CPP side. However, it could be a code refactoring for Paddle, because Paddle was designed for training only one topology before, i.e., each GradientMachine contains its Parameter as a data member. In current Paddle implementation, there are three concepts associated with <codeclass="docutils literal"><spanclass="pre">Parameters</span></code>:</p>
<olclass="simple">
<li><codeclass="docutils literal"><spanclass="pre">paddle::Parameter</span></code>. A <codeclass="docutils literal"><spanclass="pre">Parameters</span></code> is a container for <codeclass="docutils literal"><spanclass="pre">paddle::Parameter</span></code>.
It is evident that we should use <codeclass="docutils literal"><spanclass="pre">paddle::Parameter</span></code> when developing <codeclass="docutils literal"><spanclass="pre">Parameters</span></code>.
However, the <codeclass="docutils literal"><spanclass="pre">Parameter</span></code> class contains many functions and does not have a clear interface.
It contains <codeclass="docutils literal"><spanclass="pre">create/store</span><spanclass="pre">Parameter</span></code>, <codeclass="docutils literal"><spanclass="pre">serialize/deserialize</span></code>, <codeclass="docutils literal"><spanclass="pre">optimize(i.e</span><spanclass="pre">SGD)</span></code>, <codeclass="docutils literal"><spanclass="pre">randomize/zero</span></code>.
When we developing <codeclass="docutils literal"><spanclass="pre">Parameters</span></code>, we only use <codeclass="docutils literal"><spanclass="pre">create/store</span><spanclass="pre">Parameter</span></code> functionality.
We should extract functionalities of Parameter into many classes to clean Paddle CPP implementation.</li>
<li><codeclass="docutils literal"><spanclass="pre">paddle::GradientMachine</span></code> and its sub-classes, e.g., <codeclass="docutils literal"><spanclass="pre">paddle::MultiGradientMachine</span></code>, <codeclass="docutils literal"><spanclass="pre">paddle::NeuralNetwork</span></code>.
We should pass <codeclass="docutils literal"><spanclass="pre">Parameters</span></code> to <codeclass="docutils literal"><spanclass="pre">paddle::GradientMachine</span></code> when <codeclass="docutils literal"><spanclass="pre">forward/backward</span></code> to avoid <codeclass="docutils literal"><spanclass="pre">memcpy</span></code> between topologies.
Also, we should handle multi-GPU/CPU training, because <codeclass="docutils literal"><spanclass="pre">forward</span></code> and <codeclass="docutils literal"><spanclass="pre">backward</span></code> would perform on multi-GPUs and multi-CPUs.
<codeclass="docutils literal"><spanclass="pre">Parameters</span></code> should dispatch the parameter value to each device, and gather the parameter gradient from each device.</li>
<li><codeclass="docutils literal"><spanclass="pre">paddle::ParameterUpdater</span></code>. The ParameterUpdater is used to update parameters in Paddle.
So <codeclass="docutils literal"><spanclass="pre">Parameters</span></code> should be used by <codeclass="docutils literal"><spanclass="pre">paddle::ParameterUpdater</span></code>, and <codeclass="docutils literal"><spanclass="pre">paddle::ParameterUpdater</span></code> should optimize <codeclass="docutils literal"><spanclass="pre">Parameters</span></code> (by SGD).</li>
</ol>
<p>The step by step approach for implementation Parameters in Paddle C++ core is listed below. Each step should be a PR and could be merged into Paddle one by one.</p>
<olclass="simple">
<li>Clean <codeclass="docutils literal"><spanclass="pre">paddle::Parameter</span></code> interface. Extract the functionalities of <codeclass="docutils literal"><spanclass="pre">paddle::Parameter</span></code> to prepare for the implementation of Parameters.</li>
<li>Implementation a <codeclass="docutils literal"><spanclass="pre">Parameters</span></code> class. It just stores the <codeclass="docutils literal"><spanclass="pre">paddle::Parameter</span></code> inside. Make <codeclass="docutils literal"><spanclass="pre">GradientMachine</span></code> uses <codeclass="docutils literal"><spanclass="pre">Parameters</span></code> as a class member.</li>
<li>Make <codeclass="docutils literal"><spanclass="pre">Parameters</span></code> support Multi-CPU and Multi-GPU training to prepare for sharing <codeclass="docutils literal"><spanclass="pre">Parameter</span></code> between topologies.
Because we need share <codeclass="docutils literal"><spanclass="pre">Parameters</span></code> between topologies, it is <codeclass="docutils literal"><spanclass="pre">Parameters</span></code>‘s response to exchange Parameters between GPUs.
<codeclass="docutils literal"><spanclass="pre">GradientMachine</span></code> should not handle how to exchange Parameters because <codeclass="docutils literal"><spanclass="pre">GradientMachine</span></code> only used to train one topology and we need to support train many topologies in Paddle, i.e., there could be many GradientMachines use one <codeclass="docutils literal"><spanclass="pre">Parameters</span></code>.<ul>
<li>We should use a global function to exchange Parameters between GPUs, not a member function in <codeclass="docutils literal"><spanclass="pre">Parameters</span></code>. The <codeclass="docutils literal"><spanclass="pre">MultiGradientMachine</span></code> invoke this function, which uses <codeclass="docutils literal"><spanclass="pre">Parameters</span></code> as this function inputs.</li>
<li>The MultiGradientMachine contains many functionalities. Extracting the Parameters exchanging logic could make MultiGradientMachine clearer and simpler.</li>
</ul>
</li>
<li>Make <codeclass="docutils literal"><spanclass="pre">Parameters</span></code> as an argument for <codeclass="docutils literal"><spanclass="pre">forward/backward</span></code> function, not a data member for <codeclass="docutils literal"><spanclass="pre">GradientMachine</span></code>. For example, <codeclass="docutils literal"><spanclass="pre">forward</span></code> could be <codeclass="docutils literal"><spanclass="pre">forward(const</span><spanclass="pre">Parameters&</span><spanclass="pre">params,</span><spanclass="pre">...)</span></code> and <codeclass="docutils literal"><spanclass="pre">backward</span></code> could be <codeclass="docutils literal"><spanclass="pre">backward(Parameters*</span><spanclass="pre">params,</span><spanclass="pre">...)</span></code>. After this step, Paddle could share <codeclass="docutils literal"><spanclass="pre">Parameters</span></code> between topologies.</li>
<li><codeclass="docutils literal"><spanclass="pre">ParameterUpdater</span></code> is invoked by <codeclass="docutils literal"><spanclass="pre">GradientMachine</span></code> and <codeclass="docutils literal"><spanclass="pre">Trainer</span></code>, but it updates <codeclass="docutils literal"><spanclass="pre">Parameters</span></code>. In the end of this code refactoring, we could change <codeclass="docutils literal"><spanclass="pre">ParameterUpdater</span></code> directly uses <codeclass="docutils literal"><spanclass="pre">Parameters</span></code> to make <codeclass="docutils literal"><spanclass="pre">ParameterUpdater</span></code>‘s implementation clear.</li>
Built with <ahref="http://sphinx-doc.org/">Sphinx</a> using a <ahref="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <ahref="https://readthedocs.org">Read the Docs</a>.
@@ -10,7 +10,7 @@ A dataset is a list of files in *RecordIO* format. A RecordIO file consists of c
## Task Queue
As mentioned in [distributed training design doc](./README.md), a *task* is a data shard that the master server assigns to the trainer process to train on. A task consists of one or multiple *blocks* from one or multiple files. The master server maintains *task queues* to track the training progress.
As mentioned in [distributed training design doc](./README.md), a *task* is a data shard that the master server assigns to the trainer process to train on. A task consists of one or multiple *chunks* from one or multiple files. The master server maintains *task queues* to track the training progress.
### Task Queue Creation
...
...
@@ -21,23 +21,23 @@ As mentioned in [distributed training design doc](./README.md), a *task* is a da
1. The master server will scan through each RecordIO file to generate the *block index* and know how many blocks does each file have. A block can be referenced by the file path and the index of the block within the file. The block index is in memory data structure that enables fast access to each block, and the index of the block with the file is an integer start from 0, representing the n-th block within the file.
1. The master server will scan through each RecordIO file to generate the *chunk index* and know how many chunks does each file have. A chunk can be referenced by the file path and the index of the chunk within the file. The chunk index is in memory data structure that enables fast access to each chunk, and the index of the chunk with the file is an integer start from 0, representing the n-th chunk within the file.
The definition of the block is:
The definition of the chunk is:
```go
type Block struct {
Idx int // index of the block within the file
type Chunk struct {
Idx int // index of the chunk within the file
Path string
Index recordio.Index // block index
Index recordio.Index // chunk index
}
```
1. Blocks are grouped into tasks, and tasks are filled into the todo queue. The pending queue and the done queue are initialized with no element.
1. Chunks are grouped into tasks, and tasks are filled into the todo queue. The pending queue and the done queue are initialized with no element.
@@ -55,7 +55,7 @@ The trainer select process is encapsulated in the C API function:
```c
int paddle_begin_init_params(paddle_pserver_client* client, const char* config_proto);
```
The selected trainer's call to `paddle_begin_init_params` will return with 1, and the other trainers' call to `paddle_begin_init_params` will block until initialization is done, and return 0. As illustrated below:
The selected trainer's call to `paddle_begin_init_params` will return with 1, and the other trainers' call to `paddle_begin_init_params` will return 0. `paddle_get_params` will be blocked until initialization is completed. As illustrated below:
`Parameters` is a concept we designed in Paddle V2 API. `Parameters` is a container of parameters, and make Paddle can shared parameter between topologies. We described usages of `Parameter` in [api.md](./api.md).
We used Python to implement Parameters when designing V2 API before. There are several defects for current implementation:
* We just use `memcpy` to share Parameters between topologies, but this is very inefficient.
* We did not implement share Parameters while training. We just trigger `memcpy` when start training.
It is necessary that we implement Parameters in CPP side. However, it could be a code refactoring for Paddle, because Paddle was designed for training only one topology before, i.e., each GradientMachine contains its Parameter as a data member. In current Paddle implementation, there are three concepts associated with `Parameters`:
1. `paddle::Parameter`. A `Parameters` is a container for `paddle::Parameter`.
It is evident that we should use `paddle::Parameter` when developing `Parameters`.
However, the `Parameter` class contains many functions and does not have a clear interface.
It contains `create/store Parameter`, `serialize/deserialize`, `optimize(i.e SGD)`, `randomize/zero`.
When we developing `Parameters`, we only use `create/store Parameter` functionality.
We should extract functionalities of Parameter into many classes to clean Paddle CPP implementation.
2. `paddle::GradientMachine` and its sub-classes, e.g., `paddle::MultiGradientMachine`, `paddle::NeuralNetwork`.
We should pass `Parameters` to `paddle::GradientMachine` when `forward/backward` to avoid `memcpy` between topologies.
Also, we should handle multi-GPU/CPU training, because `forward` and `backward` would perform on multi-GPUs and multi-CPUs.
`Parameters` should dispatch the parameter value to each device, and gather the parameter gradient from each device.
3. `paddle::ParameterUpdater`. The ParameterUpdater is used to update parameters in Paddle.
So `Parameters` should be used by `paddle::ParameterUpdater`, and `paddle::ParameterUpdater` should optimize `Parameters` (by SGD).
The step by step approach for implementation Parameters in Paddle C++ core is listed below. Each step should be a PR and could be merged into Paddle one by one.
1. Clean `paddle::Parameter` interface. Extract the functionalities of `paddle::Parameter` to prepare for the implementation of Parameters.
2. Implementation a `Parameters` class. It just stores the `paddle::Parameter` inside. Make `GradientMachine` uses `Parameters` as a class member.
3. Make `Parameters` support Multi-CPU and Multi-GPU training to prepare for sharing `Parameter` between topologies.
Because we need share `Parameters` between topologies, it is `Parameters`'s response to exchange Parameters between GPUs.
`GradientMachine` should not handle how to exchange Parameters because `GradientMachine` only used to train one topology and we need to support train many topologies in Paddle, i.e., there could be many GradientMachines use one `Parameters`.
* We should use a global function to exchange Parameters between GPUs, not a member function in `Parameters`. The `MultiGradientMachine` invoke this function, which uses `Parameters` as this function inputs.
* The MultiGradientMachine contains many functionalities. Extracting the Parameters exchanging logic could make MultiGradientMachine clearer and simpler.
4. Make `Parameters` as an argument for `forward/backward` function, not a data member for `GradientMachine`. For example, `forward` could be `forward(const Parameters& params, ...)` and `backward` could be `backward(Parameters* params, ...)`. After this step, Paddle could share `Parameters` between topologies.
5. `ParameterUpdater` is invoked by `GradientMachine` and `Trainer`, but it updates `Parameters`. In the end of this code refactoring, we could change `ParameterUpdater` directly uses `Parameters` to make `ParameterUpdater`'s implementation clear.
<p>As mentioned in <aclass="reference internal"href="README.html"><spanclass="doc">distributed training design doc</span></a>, a <em>task</em> is a data shard that the master server assigns to the trainer process to train on. A task consists of one or multiple <em>blocks</em> from one or multiple files. The master server maintains <em>task queues</em> to track the training progress.</p>
<p>As mentioned in <aclass="reference internal"href="README.html"><spanclass="doc">distributed training design doc</span></a>, a <em>task</em> is a data shard that the master server assigns to the trainer process to train on. A task consists of one or multiple <em>chunks</em> from one or multiple files. The master server maintains <em>task queues</em> to track the training progress.</p>
<li><pclass="first">The master server will scan through each RecordIO file to generate the <em>block index</em> and know how many blocks does each file have. A block can be referenced by the file path and the index of the block within the file. The block index is in memory data structure that enables fast access to each block, and the index of the block with the file is an integer start from 0, representing the n-th block within the file.</p>
<spanclass="nx">Idx</span><spanclass="kt">int</span><spanclass="c1">// index of the block within the file</span>
<li><pclass="first">The master server will scan through each RecordIO file to generate the <em>chunk index</em> and know how many chunks does each file have. A chunk can be referenced by the file path and the index of the chunk within the file. The chunk index is in memory data structure that enables fast access to each chunk, and the index of the chunk with the file is an integer start from 0, representing the n-th chunk within the file.</p>
<li><pclass="first">Blocks are grouped into tasks, and tasks are filled into the todo queue. The pending queue and the done queue are initialized with no element.</p>
<li><pclass="first">Chunks are grouped into tasks, and tasks are filled into the todo queue. The pending queue and the done queue are initialized with no element.</p>
<p>The selected trainer’s call to <codeclass="docutils literal"><spanclass="pre">paddle_begin_init_params</span></code> will return with 1, and the other trainers’ call to <codeclass="docutils literal"><spanclass="pre">paddle_begin_init_params</span></code> will block until initialization is done, and return 0. As illustrated below:</p>
<p>The selected trainer’s call to <codeclass="docutils literal"><spanclass="pre">paddle_begin_init_params</span></code> will return with 1, and the other trainers’ call to <codeclass="docutils literal"><spanclass="pre">paddle_begin_init_params</span></code> will return 0. <codeclass="docutils literal"><spanclass="pre">paddle_get_params</span></code> will be blocked until initialization is completed. As illustrated below:</p>
<p><imgsrc="./src/pserver_init.png"></p>
</div>
</div>
...
...
@@ -266,16 +266,13 @@ name:sparse-n-1
<spanclass="cm"> *</span>
<spanclass="cm"> * paddle_begin_init_params will be called from multiple trainers,</span>
<spanclass="cm"> * only one trainer will be selected to initialize the parameters on</span>
<spanclass="cm"> * parameter servers. Other trainers will be blocked until the</span>
<spanclass="cm"> * initialization is done, and they need to get the initialized</span>
<spanclass="cm"> * parameter servers. Other trainers need to get the initialized</span>
<spanclass="cm"> * parameters from parameter servers using @paddle_get_params.</span>
<spanclass="cm"> *</span>
<spanclass="cm"> * @param pserver_config_proto serialized parameter server configuration in</span>
<spanid="design-doc-the-c-class-parameters"></span><h1>Design Doc: The C++ Class <codeclass="docutils literal"><spanclass="pre">Parameters</span></code><aclass="headerlink"href="#design-doc-the-c-class-parameters"title="永久链接至标题">¶</a></h1>
<p><codeclass="docutils literal"><spanclass="pre">Parameters</span></code> is a concept we designed in Paddle V2 API. <codeclass="docutils literal"><spanclass="pre">Parameters</span></code> is a container of parameters, and make Paddle can shared parameter between topologies. We described usages of <codeclass="docutils literal"><spanclass="pre">Parameter</span></code> in <aclass="reference internal"href="api.html"><spanclass="doc">api.md</span></a>.</p>
<p>We used Python to implement Parameters when designing V2 API before. There are several defects for current implementation:</p>
<ulclass="simple">
<li>We just use <codeclass="docutils literal"><spanclass="pre">memcpy</span></code> to share Parameters between topologies, but this is very inefficient.</li>
<li>We did not implement share Parameters while training. We just trigger <codeclass="docutils literal"><spanclass="pre">memcpy</span></code> when start training.</li>
</ul>
<p>It is necessary that we implement Parameters in CPP side. However, it could be a code refactoring for Paddle, because Paddle was designed for training only one topology before, i.e., each GradientMachine contains its Parameter as a data member. In current Paddle implementation, there are three concepts associated with <codeclass="docutils literal"><spanclass="pre">Parameters</span></code>:</p>
<olclass="simple">
<li><codeclass="docutils literal"><spanclass="pre">paddle::Parameter</span></code>. A <codeclass="docutils literal"><spanclass="pre">Parameters</span></code> is a container for <codeclass="docutils literal"><spanclass="pre">paddle::Parameter</span></code>.
It is evident that we should use <codeclass="docutils literal"><spanclass="pre">paddle::Parameter</span></code> when developing <codeclass="docutils literal"><spanclass="pre">Parameters</span></code>.
However, the <codeclass="docutils literal"><spanclass="pre">Parameter</span></code> class contains many functions and does not have a clear interface.
It contains <codeclass="docutils literal"><spanclass="pre">create/store</span><spanclass="pre">Parameter</span></code>, <codeclass="docutils literal"><spanclass="pre">serialize/deserialize</span></code>, <codeclass="docutils literal"><spanclass="pre">optimize(i.e</span><spanclass="pre">SGD)</span></code>, <codeclass="docutils literal"><spanclass="pre">randomize/zero</span></code>.
When we developing <codeclass="docutils literal"><spanclass="pre">Parameters</span></code>, we only use <codeclass="docutils literal"><spanclass="pre">create/store</span><spanclass="pre">Parameter</span></code> functionality.
We should extract functionalities of Parameter into many classes to clean Paddle CPP implementation.</li>
<li><codeclass="docutils literal"><spanclass="pre">paddle::GradientMachine</span></code> and its sub-classes, e.g., <codeclass="docutils literal"><spanclass="pre">paddle::MultiGradientMachine</span></code>, <codeclass="docutils literal"><spanclass="pre">paddle::NeuralNetwork</span></code>.
We should pass <codeclass="docutils literal"><spanclass="pre">Parameters</span></code> to <codeclass="docutils literal"><spanclass="pre">paddle::GradientMachine</span></code> when <codeclass="docutils literal"><spanclass="pre">forward/backward</span></code> to avoid <codeclass="docutils literal"><spanclass="pre">memcpy</span></code> between topologies.
Also, we should handle multi-GPU/CPU training, because <codeclass="docutils literal"><spanclass="pre">forward</span></code> and <codeclass="docutils literal"><spanclass="pre">backward</span></code> would perform on multi-GPUs and multi-CPUs.
<codeclass="docutils literal"><spanclass="pre">Parameters</span></code> should dispatch the parameter value to each device, and gather the parameter gradient from each device.</li>
<li><codeclass="docutils literal"><spanclass="pre">paddle::ParameterUpdater</span></code>. The ParameterUpdater is used to update parameters in Paddle.
So <codeclass="docutils literal"><spanclass="pre">Parameters</span></code> should be used by <codeclass="docutils literal"><spanclass="pre">paddle::ParameterUpdater</span></code>, and <codeclass="docutils literal"><spanclass="pre">paddle::ParameterUpdater</span></code> should optimize <codeclass="docutils literal"><spanclass="pre">Parameters</span></code> (by SGD).</li>
</ol>
<p>The step by step approach for implementation Parameters in Paddle C++ core is listed below. Each step should be a PR and could be merged into Paddle one by one.</p>
<olclass="simple">
<li>Clean <codeclass="docutils literal"><spanclass="pre">paddle::Parameter</span></code> interface. Extract the functionalities of <codeclass="docutils literal"><spanclass="pre">paddle::Parameter</span></code> to prepare for the implementation of Parameters.</li>
<li>Implementation a <codeclass="docutils literal"><spanclass="pre">Parameters</span></code> class. It just stores the <codeclass="docutils literal"><spanclass="pre">paddle::Parameter</span></code> inside. Make <codeclass="docutils literal"><spanclass="pre">GradientMachine</span></code> uses <codeclass="docutils literal"><spanclass="pre">Parameters</span></code> as a class member.</li>
<li>Make <codeclass="docutils literal"><spanclass="pre">Parameters</span></code> support Multi-CPU and Multi-GPU training to prepare for sharing <codeclass="docutils literal"><spanclass="pre">Parameter</span></code> between topologies.
Because we need share <codeclass="docutils literal"><spanclass="pre">Parameters</span></code> between topologies, it is <codeclass="docutils literal"><spanclass="pre">Parameters</span></code>‘s response to exchange Parameters between GPUs.
<codeclass="docutils literal"><spanclass="pre">GradientMachine</span></code> should not handle how to exchange Parameters because <codeclass="docutils literal"><spanclass="pre">GradientMachine</span></code> only used to train one topology and we need to support train many topologies in Paddle, i.e., there could be many GradientMachines use one <codeclass="docutils literal"><spanclass="pre">Parameters</span></code>.<ul>
<li>We should use a global function to exchange Parameters between GPUs, not a member function in <codeclass="docutils literal"><spanclass="pre">Parameters</span></code>. The <codeclass="docutils literal"><spanclass="pre">MultiGradientMachine</span></code> invoke this function, which uses <codeclass="docutils literal"><spanclass="pre">Parameters</span></code> as this function inputs.</li>
<li>The MultiGradientMachine contains many functionalities. Extracting the Parameters exchanging logic could make MultiGradientMachine clearer and simpler.</li>
</ul>
</li>
<li>Make <codeclass="docutils literal"><spanclass="pre">Parameters</span></code> as an argument for <codeclass="docutils literal"><spanclass="pre">forward/backward</span></code> function, not a data member for <codeclass="docutils literal"><spanclass="pre">GradientMachine</span></code>. For example, <codeclass="docutils literal"><spanclass="pre">forward</span></code> could be <codeclass="docutils literal"><spanclass="pre">forward(const</span><spanclass="pre">Parameters&</span><spanclass="pre">params,</span><spanclass="pre">...)</span></code> and <codeclass="docutils literal"><spanclass="pre">backward</span></code> could be <codeclass="docutils literal"><spanclass="pre">backward(Parameters*</span><spanclass="pre">params,</span><spanclass="pre">...)</span></code>. After this step, Paddle could share <codeclass="docutils literal"><spanclass="pre">Parameters</span></code> between topologies.</li>
<li><codeclass="docutils literal"><spanclass="pre">ParameterUpdater</span></code> is invoked by <codeclass="docutils literal"><spanclass="pre">GradientMachine</span></code> and <codeclass="docutils literal"><spanclass="pre">Trainer</span></code>, but it updates <codeclass="docutils literal"><spanclass="pre">Parameters</span></code>. In the end of this code refactoring, we could change <codeclass="docutils literal"><spanclass="pre">ParameterUpdater</span></code> directly uses <codeclass="docutils literal"><spanclass="pre">Parameters</span></code> to make <codeclass="docutils literal"><spanclass="pre">ParameterUpdater</span></code>‘s implementation clear.</li>
Built with <ahref="http://sphinx-doc.org/">Sphinx</a> using a <ahref="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <ahref="https://readthedocs.org">Read the Docs</a>.