# Alalysis of large model distributed training in Paddle
***NOTE: This is only some note for how we implemeted this scheme in V1, not a new design.***
## What is it
We often encounter cases that the embedding layer parameters(sparse) are so large that we can not store it in the trainer's memory when training. So we need to put them to several servers, and fetch them row by row instead of fetch all of the parameters.
## How to use
Specify command-line argument like `--loadsave_parameters_in_pserver=true --ports_num_for_sparse=1 --use_old_updater=1` when starting the paddle trainer. And also add something like `--ports_num_for_sparse=1 --pserver_num_threads=5` when starting pserver processes.
Accrodingly, configure your embedding layers like:
<liclass="toctree-l2"><aclass="reference internal"href="../../getstarted/build_and_install/index_en.html">Install and Build</a><ul>
<liclass="toctree-l3"><aclass="reference internal"href="../../getstarted/build_and_install/docker_install_en.html">PaddlePaddle in Docker Containers</a></li>
<liclass="toctree-l2"><aclass="reference internal"href="../../howto/usage/k8s/k8s_en.html">Paddle On Kubernetes</a></li>
<liclass="toctree-l2"><aclass="reference internal"href="../../howto/usage/k8s/k8s_aws_en.html">Distributed PaddlePaddle Training on AWS with Kubernetes</a></li>
<liclass="toctree-l2"><aclass="reference internal"href="../../howto/dev/new_layer_en.html">Write New Layers</a></li>
<spanid="alalysis-of-large-model-distributed-training-in-paddle"></span><h1>Alalysis of large model distributed training in Paddle<aclass="headerlink"href="#alalysis-of-large-model-distributed-training-in-paddle"title="Permalink to this headline">¶</a></h1>
<p><strong><em>NOTE: This is only some note for how we implemeted this scheme in V1, not a new design.</em></strong></p>
<divclass="section"id="what-is-it">
<spanid="what-is-it"></span><h2>What is it<aclass="headerlink"href="#what-is-it"title="Permalink to this headline">¶</a></h2>
<p>We often encounter cases that the embedding layer parameters(sparse) are so large that we can not store it in the trainer’s memory when training. So we need to put them to several servers, and fetch them row by row instead of fetch all of the parameters.</p>
</div>
<divclass="section"id="how-to-use">
<spanid="how-to-use"></span><h2>How to use<aclass="headerlink"href="#how-to-use"title="Permalink to this headline">¶</a></h2>
<p>Specify command-line argument like <codeclass="docutils literal"><spanclass="pre">--loadsave_parameters_in_pserver=true</span><spanclass="pre">--ports_num_for_sparse=1</span><spanclass="pre">--use_old_updater=1</span></code> when starting the paddle trainer. And also add something like <codeclass="docutils literal"><spanclass="pre">--ports_num_for_sparse=1</span><spanclass="pre">--pserver_num_threads=5</span></code> when starting pserver processes.</p>
<p>Accrodingly, configure your embedding layers like:</p>
<spanid="implementation-details"></span><h2>Implementation details<aclass="headerlink"href="#implementation-details"title="Permalink to this headline">¶</a></h2>
<p><codeclass="docutils literal"><spanclass="pre">MAT_SPARSE_ROW_PREFETCH</span></code> is what we use when configured to fetch only row of matrix when training.</p>
<p>Calling <codeclass="docutils literal"><spanclass="pre">parameterClient_->getParameterSparse</span></code> will do remote call to pserver’s <codeclass="docutils literal"><spanclass="pre">getParameterSparse</span></code>:</p>
<p><codeclass="docutils literal"><spanclass="pre">getParameterConfig(block).dims(1)</span></code> returns the width of the current “parameter block”(a shard of parameter object),
then <codeclass="docutils literal"><spanclass="pre">getParameterSparse</span></code> remote call returns only one row of data to the client.</p>
Built with <ahref="http://sphinx-doc.org/">Sphinx</a> using a <ahref="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <ahref="https://readthedocs.org">Read the Docs</a>.
# Alalysis of large model distributed training in Paddle
***NOTE: This is only some note for how we implemeted this scheme in V1, not a new design.***
## What is it
We often encounter cases that the embedding layer parameters(sparse) are so large that we can not store it in the trainer's memory when training. So we need to put them to several servers, and fetch them row by row instead of fetch all of the parameters.
## How to use
Specify command-line argument like `--loadsave_parameters_in_pserver=true --ports_num_for_sparse=1 --use_old_updater=1` when starting the paddle trainer. And also add something like `--ports_num_for_sparse=1 --pserver_num_threads=5` when starting pserver processes.
Accrodingly, configure your embedding layers like:
<spanid="alalysis-of-large-model-distributed-training-in-paddle"></span><h1>Alalysis of large model distributed training in Paddle<aclass="headerlink"href="#alalysis-of-large-model-distributed-training-in-paddle"title="永久链接至标题">¶</a></h1>
<p><strong><em>NOTE: This is only some note for how we implemeted this scheme in V1, not a new design.</em></strong></p>
<divclass="section"id="what-is-it">
<spanid="what-is-it"></span><h2>What is it<aclass="headerlink"href="#what-is-it"title="永久链接至标题">¶</a></h2>
<p>We often encounter cases that the embedding layer parameters(sparse) are so large that we can not store it in the trainer’s memory when training. So we need to put them to several servers, and fetch them row by row instead of fetch all of the parameters.</p>
</div>
<divclass="section"id="how-to-use">
<spanid="how-to-use"></span><h2>How to use<aclass="headerlink"href="#how-to-use"title="永久链接至标题">¶</a></h2>
<p>Specify command-line argument like <codeclass="docutils literal"><spanclass="pre">--loadsave_parameters_in_pserver=true</span><spanclass="pre">--ports_num_for_sparse=1</span><spanclass="pre">--use_old_updater=1</span></code> when starting the paddle trainer. And also add something like <codeclass="docutils literal"><spanclass="pre">--ports_num_for_sparse=1</span><spanclass="pre">--pserver_num_threads=5</span></code> when starting pserver processes.</p>
<p>Accrodingly, configure your embedding layers like:</p>
<p><codeclass="docutils literal"><spanclass="pre">MAT_SPARSE_ROW_PREFETCH</span></code> is what we use when configured to fetch only row of matrix when training.</p>
<p>Calling <codeclass="docutils literal"><spanclass="pre">parameterClient_->getParameterSparse</span></code> will do remote call to pserver’s <codeclass="docutils literal"><spanclass="pre">getParameterSparse</span></code>:</p>
<p><codeclass="docutils literal"><spanclass="pre">getParameterConfig(block).dims(1)</span></code> returns the width of the current “parameter block”(a shard of parameter object),
then <codeclass="docutils literal"><spanclass="pre">getParameterSparse</span></code> remote call returns only one row of data to the client.</p>
Built with <ahref="http://sphinx-doc.org/">Sphinx</a> using a <ahref="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <ahref="https://readthedocs.org">Read the Docs</a>.