# Design Doc: The Client Library of Parameter Server
For an overview of trainer's role, please refer to [distributed training design doc](README.md). In this design doc, we will discuss the parameter server's client library, which will manage communication with parameter servers. The library will be implemented in [Go](https://golang.org/) and made available as a static or dynamic library with a C header file.
## Parameter Partition
Each parameter will be partitioned into parameter blocks to make the parameters evenly distributed on parameter servers. The partition is done automatically by the client library. The *sparse parameter* require a little different treatment:
### Sparse Parameter
The sparse parameter is a parameter that is updated sparsely. The name is somewhat misleading, it does not have a sparse representation, it has the same representation as a dense vector.
Because a sparse parameter is updated sparsely, the trainer will have to partition the sparse parameter. Because the parameter server will merge all sparse parameter shard into the same file when saving the parameter. It needs special naming convention:
If a sparse parameter is partitioned into n shards, they should be named as:
```text
name:sparse-0
name:sparse-1
...
name:sparse-n-1
```
The library is unaware of the partition, and treat each parameter independently. Only when saving parameters, the parameter servers will merge the sparse parameters according to the naming convention.
## Model Optimization Using Gradients
There are two ways to perform model optimization using gradients:
- On Client
The client does multiple steps of forward and backward update. In each step, the gradients are calculated and a new model is generated. After some steps, the client will calculate the difference between the newest model and the old model at step 0. The difference will be updated to parameter servers. Parameter servers will just update parameters using the difference without any optimization using gradients (such as Adam and L1 regularization).
- On Parameter Server
The client will send accumulated gradients to parameter servers, the parameter server will do the optimization using gradients.
## L1 and L2 Regularization
PaddlePaddle allows L1 or L2 regularizations to be specified per parameter, so when the trainer initializes the parameter it needs include a parameter configuration when L1 or L2 regularization is necessary.
## Parameter Initialization
The parameters on parameter servers need to be initialized. To provide maximum flexibility, the trainer will initialize the parameters. Only one trainer will do the initialization, the other trainers will wait for the completion of initialization and get the parameters from the parameter servers.
### Trainer Selection
To select the trainer for initialization, every trainer will try to get a distributed lock, whoever owns the lock will do the initialization. As illustrated below:
<img src="./src/init_lock.png">
### Trainer Selection Process
The trainer select process is encapsulated in the C API function:
```c
int paddle_begin_init_params(paddle_pserver_client* client, const char* config_proto);
```
The selected trainer's call to `paddle_begin_init_params` will return with 1, and the other trainers' call to `paddle_begin_init_params` will block until initialization is done, and return 0. As illustrated below:
<liclass="toctree-l2"><aclass="reference internal"href="../../getstarted/build_and_install/index_en.html">Install and Build</a><ul>
<liclass="toctree-l3"><aclass="reference internal"href="../../getstarted/build_and_install/docker_install_en.html">PaddlePaddle in Docker Containers</a></li>
<liclass="toctree-l2"><aclass="reference internal"href="../../howto/usage/k8s/k8s_en.html">Paddle On Kubernetes</a></li>
<liclass="toctree-l2"><aclass="reference internal"href="../../howto/usage/k8s/k8s_aws_en.html">Distributed PaddlePaddle Training on AWS with Kubernetes</a></li>
<liclass="toctree-l2"><aclass="reference internal"href="../../howto/dev/new_layer_en.html">Write New Layers</a></li>
<spanid="design-doc-the-client-library-of-parameter-server"></span><h1>Design Doc: The Client Library of Parameter Server<aclass="headerlink"href="#design-doc-the-client-library-of-parameter-server"title="Permalink to this headline">¶</a></h1>
<p>For an overview of trainer’s role, please refer to <aclass="reference internal"href="README.html"><spanclass="doc">distributed training design doc</span></a>. In this design doc, we will discuss the parameter server’s client library, which will manage communication with parameter servers. The library will be implemented in <aclass="reference external"href="https://golang.org/">Go</a> and made available as a static or dynamic library with a C header file.</p>
<divclass="section"id="parameter-partition">
<spanid="parameter-partition"></span><h2>Parameter Partition<aclass="headerlink"href="#parameter-partition"title="Permalink to this headline">¶</a></h2>
<p>Each parameter will be partitioned into parameter blocks to make the parameters evenly distributed on parameter servers. The partition is done automatically by the client library. The <em>sparse parameter</em> require a little different treatment:</p>
<divclass="section"id="sparse-parameter">
<spanid="sparse-parameter"></span><h3>Sparse Parameter<aclass="headerlink"href="#sparse-parameter"title="Permalink to this headline">¶</a></h3>
<p>The sparse parameter is a parameter that is updated sparsely. The name is somewhat misleading, it does not have a sparse representation, it has the same representation as a dense vector.</p>
<p>Because a sparse parameter is updated sparsely, the trainer will have to partition the sparse parameter. Because the parameter server will merge all sparse parameter shard into the same file when saving the parameter. It needs special naming convention:</p>
<p>If a sparse parameter is partitioned into n shards, they should be named as:</p>
<p>The library is unaware of the partition, and treat each parameter independently. Only when saving parameters, the parameter servers will merge the sparse parameters according to the naming convention.</p>
<spanid="model-optimization-using-gradients"></span><h2>Model Optimization Using Gradients<aclass="headerlink"href="#model-optimization-using-gradients"title="Permalink to this headline">¶</a></h2>
<p>There are two ways to perform model optimization using gradients:</p>
<ul>
<li><pclass="first">On Client</p>
<p>The client does multiple steps of forward and backward update. In each step, the gradients are calculated and a new model is generated. After some steps, the client will calculate the difference between the newest model and the old model at step 0. The difference will be updated to parameter servers. Parameter servers will just update parameters using the difference without any optimization using gradients (such as Adam and L1 regularization).</p>
</li>
<li><pclass="first">On Parameter Server</p>
<p>The client will send accumulated gradients to parameter servers, the parameter server will do the optimization using gradients.</p>
</li>
</ul>
</div>
<divclass="section"id="l1-and-l2-regularization">
<spanid="l1-and-l2-regularization"></span><h2>L1 and L2 Regularization<aclass="headerlink"href="#l1-and-l2-regularization"title="Permalink to this headline">¶</a></h2>
<p>PaddlePaddle allows L1 or L2 regularizations to be specified per parameter, so when the trainer initializes the parameter it needs include a parameter configuration when L1 or L2 regularization is necessary.</p>
</div>
<divclass="section"id="parameter-initialization">
<spanid="parameter-initialization"></span><h2>Parameter Initialization<aclass="headerlink"href="#parameter-initialization"title="Permalink to this headline">¶</a></h2>
<p>The parameters on parameter servers need to be initialized. To provide maximum flexibility, the trainer will initialize the parameters. Only one trainer will do the initialization, the other trainers will wait for the completion of initialization and get the parameters from the parameter servers.</p>
<divclass="section"id="trainer-selection">
<spanid="trainer-selection"></span><h3>Trainer Selection<aclass="headerlink"href="#trainer-selection"title="Permalink to this headline">¶</a></h3>
<p>To select the trainer for initialization, every trainer will try to get a distributed lock, whoever owns the lock will do the initialization. As illustrated below:</p>
<spanid="trainer-selection-process"></span><h3>Trainer Selection Process<aclass="headerlink"href="#trainer-selection-process"title="Permalink to this headline">¶</a></h3>
<p>The trainer select process is encapsulated in the C API function:</p>
<p>The selected trainer’s call to <codeclass="docutils literal"><spanclass="pre">paddle_begin_init_params</span></code> will return with 1, and the other trainers’ call to <codeclass="docutils literal"><spanclass="pre">paddle_begin_init_params</span></code> will block until initialization is done, and return 0. As illustrated below:</p>
<p><imgsrc="./src/pserver_init.png"></p>
</div>
</div>
<divclass="section"id="c-interface">
<spanid="c-interface"></span><h2>C Interface<aclass="headerlink"href="#c-interface"title="Permalink to this headline">¶</a></h2>
Built with <ahref="http://sphinx-doc.org/">Sphinx</a> using a <ahref="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <ahref="https://readthedocs.org">Read the Docs</a>.
# Design Doc: The Client Library of Parameter Server
For an overview of trainer's role, please refer to [distributed training design doc](README.md). In this design doc, we will discuss the parameter server's client library, which will manage communication with parameter servers. The library will be implemented in [Go](https://golang.org/) and made available as a static or dynamic library with a C header file.
## Parameter Partition
Each parameter will be partitioned into parameter blocks to make the parameters evenly distributed on parameter servers. The partition is done automatically by the client library. The *sparse parameter* require a little different treatment:
### Sparse Parameter
The sparse parameter is a parameter that is updated sparsely. The name is somewhat misleading, it does not have a sparse representation, it has the same representation as a dense vector.
Because a sparse parameter is updated sparsely, the trainer will have to partition the sparse parameter. Because the parameter server will merge all sparse parameter shard into the same file when saving the parameter. It needs special naming convention:
If a sparse parameter is partitioned into n shards, they should be named as:
```text
name:sparse-0
name:sparse-1
...
name:sparse-n-1
```
The library is unaware of the partition, and treat each parameter independently. Only when saving parameters, the parameter servers will merge the sparse parameters according to the naming convention.
## Model Optimization Using Gradients
There are two ways to perform model optimization using gradients:
- On Client
The client does multiple steps of forward and backward update. In each step, the gradients are calculated and a new model is generated. After some steps, the client will calculate the difference between the newest model and the old model at step 0. The difference will be updated to parameter servers. Parameter servers will just update parameters using the difference without any optimization using gradients (such as Adam and L1 regularization).
- On Parameter Server
The client will send accumulated gradients to parameter servers, the parameter server will do the optimization using gradients.
## L1 and L2 Regularization
PaddlePaddle allows L1 or L2 regularizations to be specified per parameter, so when the trainer initializes the parameter it needs include a parameter configuration when L1 or L2 regularization is necessary.
## Parameter Initialization
The parameters on parameter servers need to be initialized. To provide maximum flexibility, the trainer will initialize the parameters. Only one trainer will do the initialization, the other trainers will wait for the completion of initialization and get the parameters from the parameter servers.
### Trainer Selection
To select the trainer for initialization, every trainer will try to get a distributed lock, whoever owns the lock will do the initialization. As illustrated below:
<img src="./src/init_lock.png">
### Trainer Selection Process
The trainer select process is encapsulated in the C API function:
```c
int paddle_begin_init_params(paddle_pserver_client* client, const char* config_proto);
```
The selected trainer's call to `paddle_begin_init_params` will return with 1, and the other trainers' call to `paddle_begin_init_params` will block until initialization is done, and return 0. As illustrated below:
<spanid="design-doc-the-client-library-of-parameter-server"></span><h1>Design Doc: The Client Library of Parameter Server<aclass="headerlink"href="#design-doc-the-client-library-of-parameter-server"title="永久链接至标题">¶</a></h1>
<p>For an overview of trainer’s role, please refer to <aclass="reference internal"href="README.html"><spanclass="doc">distributed training design doc</span></a>. In this design doc, we will discuss the parameter server’s client library, which will manage communication with parameter servers. The library will be implemented in <aclass="reference external"href="https://golang.org/">Go</a> and made available as a static or dynamic library with a C header file.</p>
<p>Each parameter will be partitioned into parameter blocks to make the parameters evenly distributed on parameter servers. The partition is done automatically by the client library. The <em>sparse parameter</em> require a little different treatment:</p>
<p>The sparse parameter is a parameter that is updated sparsely. The name is somewhat misleading, it does not have a sparse representation, it has the same representation as a dense vector.</p>
<p>Because a sparse parameter is updated sparsely, the trainer will have to partition the sparse parameter. Because the parameter server will merge all sparse parameter shard into the same file when saving the parameter. It needs special naming convention:</p>
<p>If a sparse parameter is partitioned into n shards, they should be named as:</p>
<p>The library is unaware of the partition, and treat each parameter independently. Only when saving parameters, the parameter servers will merge the sparse parameters according to the naming convention.</p>
<spanid="model-optimization-using-gradients"></span><h2>Model Optimization Using Gradients<aclass="headerlink"href="#model-optimization-using-gradients"title="永久链接至标题">¶</a></h2>
<p>There are two ways to perform model optimization using gradients:</p>
<ul>
<li><pclass="first">On Client</p>
<p>The client does multiple steps of forward and backward update. In each step, the gradients are calculated and a new model is generated. After some steps, the client will calculate the difference between the newest model and the old model at step 0. The difference will be updated to parameter servers. Parameter servers will just update parameters using the difference without any optimization using gradients (such as Adam and L1 regularization).</p>
</li>
<li><pclass="first">On Parameter Server</p>
<p>The client will send accumulated gradients to parameter servers, the parameter server will do the optimization using gradients.</p>
</li>
</ul>
</div>
<divclass="section"id="l1-and-l2-regularization">
<spanid="l1-and-l2-regularization"></span><h2>L1 and L2 Regularization<aclass="headerlink"href="#l1-and-l2-regularization"title="永久链接至标题">¶</a></h2>
<p>PaddlePaddle allows L1 or L2 regularizations to be specified per parameter, so when the trainer initializes the parameter it needs include a parameter configuration when L1 or L2 regularization is necessary.</p>
<p>The parameters on parameter servers need to be initialized. To provide maximum flexibility, the trainer will initialize the parameters. Only one trainer will do the initialization, the other trainers will wait for the completion of initialization and get the parameters from the parameter servers.</p>
<p>To select the trainer for initialization, every trainer will try to get a distributed lock, whoever owns the lock will do the initialization. As illustrated below:</p>
<p>The selected trainer’s call to <codeclass="docutils literal"><spanclass="pre">paddle_begin_init_params</span></code> will return with 1, and the other trainers’ call to <codeclass="docutils literal"><spanclass="pre">paddle_begin_init_params</span></code> will block until initialization is done, and return 0. As illustrated below:</p>
Built with <ahref="http://sphinx-doc.org/">Sphinx</a> using a <ahref="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <ahref="https://readthedocs.org">Read the Docs</a>.