  <div class="section" id="use-case">
<span id="use-case"></span><h1>Use Case<a class="headerlink" href="#use-case" title="Permalink to this headline"></a></h1>
<div class="section" id="local-training">
<span id="local-training"></span><h2>Local Training<a class="headerlink" href="#local-training" title="Permalink to this headline"></a></h2>
<p>These command line arguments are commonly used by local training experiments, such as image classification, natural language processing, et al.</p>
<div class="highlight-python"><div class="highlight"><pre><span></span>paddle train \
  --use_gpu=1/0 \                        #1:GPU,0:CPU(default:true)
  --config=network_config \
  --save_dir=output \
  --trainer_count=COUNT \                #(default:1)
  --test_period=M \                      #(default:1000)
  --test_all_data_in_one_period=true \   #(default:false) 
  --num_passes=N \                       #(defalut:100)
  --log_period=K \                       #(default:100)
  --dot_period=1000 \                    #(default:1)
  #[--show_parameter_stats_period=100] \ #(default:0)
  #[--saving_period_by_batches=200] \    #(default:0)
<p><code class="docutils literal"><span class="pre">show_parameter_stats_period</span></code> and <code class="docutils literal"><span class="pre">saving_period_by_batches</span></code> are optional according to your task.</p>
<div class="section" id="pass-command-argument-to-network-config">
<span id="pass-command-argument-to-network-config"></span><h3>1) Pass Command Argument to Network config<a class="headerlink" href="#pass-command-argument-to-network-config" title="Permalink to this headline"></a></h3>
<p><code class="docutils literal"><span class="pre">config_args</span></code> is a useful parameter to pass arguments to network config.</p>
<div class="highlight-python"><div class="highlight"><pre><span></span><span class="o">--</span><span class="n">config_args</span><span class="o">=</span><span class="n">generating</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span><span class="n">beam_size</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span><span class="n">layer_num</span><span class="o">=</span><span class="mi">10</span> \
<p>And <code class="docutils literal"><span class="pre">get_config_arg</span></code> can be used to parse these arguments in network config as follows:</p>
<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">generating</span> <span class="o">=</span> <span class="n">get_config_arg</span><span class="p">(</span><span class="s1">&#39;generating&#39;</span><span class="p">,</span> <span class="nb">bool</span><span class="p">,</span> <span class="bp">False</span><span class="p">)</span>
<span class="n">beam_size</span> <span class="o">=</span> <span class="n">get_config_arg</span><span class="p">(</span><span class="s1">&#39;beam_size&#39;</span><span class="p">,</span> <span class="nb">int</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span>
<span class="n">layer_num</span> <span class="o">=</span> <span class="n">get_config_arg</span><span class="p">(</span><span class="s1">&#39;layer_num&#39;</span><span class="p">,</span> <span class="nb">int</span><span class="p">,</span> <span class="mi">8</span><span class="p">)</span>
<p><code class="docutils literal"><span class="pre">get_config_arg</span></code>:</p>
<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">get_config_arg</span><span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="nb">type</span><span class="p">,</span> <span class="n">default_value</span><span class="p">)</span>
<ul class="simple">
<li>name: the name specified in the <code class="docutils literal"><span class="pre">--config_args</span></code></li>
<li>type: value type, bool, int, str, float etc.</li>
<li>default_value: default value if not set.</li>
<div class="section" id="use-model-to-initialize-network">
<span id="use-model-to-initialize-network"></span><h3>2) Use Model to Initialize Network<a class="headerlink" href="#use-model-to-initialize-network" title="Permalink to this headline"></a></h3>
<p>add argument:</p>
<div class="highlight-python"><div class="highlight"><pre><span></span><span class="o">--</span><span class="n">init_model_path</span><span class="o">=</span><span class="n">model_path</span>
<span class="o">--</span><span class="n">load_missing_parameter_strategy</span><span class="o">=</span><span class="n">rand</span>
<div class="section" id="local-testing">
<span id="local-testing"></span><h2>Local Testing<a class="headerlink" href="#local-testing" title="Permalink to this headline"></a></h2>
<p>Method 1:</p>
<div class="highlight-python"><div class="highlight"><pre><span></span>paddle train --job=test \
             --use_gpu=1/0 \ 
             --config=network_config \
             --trainer_count=COUNT \ 
             --init_model_path=model_path \
<ul class="simple">
<li>use init_model_path to specify test model.</li>
<li>only can test one model.</li>
<p>Method 2:</p>
<div class="highlight-python"><div class="highlight"><pre><span></span>paddle train --job=test \
             --use_gpu=1/0 \ 
             --config=network_config \
             --trainer_count=COUNT \ 
             --model_list=model.list \
<ul class="simple">
<li>use model_list to specify test models</li>
<li>can test several models, where model.list likes:</li>
<div class="highlight-python"><div class="highlight"><pre><span></span>./alexnet_pass1
<p>Method 3:</p>
<div class="highlight-python"><div class="highlight"><pre><span></span>paddle train --job=test \
             --use_gpu=1/0 \
             --config=network_config \
             --trainer_count=COUNT \
             --save_dir=model \
             --test_pass=M \
             --num_passes=N \
<p>This way must use model path saved by Paddle like this: <code class="docutils literal"><span class="pre">model/pass-%5d</span></code>. Testing model is from M-th pass to (N-1)-th pass. For example: M=12 and N=14 will test <code class="docutils literal"><span class="pre">model/pass-00012</span></code> and <code class="docutils literal"><span class="pre">model/pass-00013</span></code>.</p>
<div class="section" id="sparse-training">
<span id="sparse-training"></span><h2>Sparse Training<a class="headerlink" href="#sparse-training" title="Permalink to this headline"></a></h2>
<p>Sparse training is usually used to accelerate calculation when input is sparse data with highly dimension. For example, dictionary dimension of input data is 1 million, but one sample just have several words. In paddle, sparse matrix multiplication is used in forward propagation and sparse updating is perfomed on weight updating after backward propagation.</p>
<div class="section" id="local-training">
<span id="id1"></span><h3>1) Local training<a class="headerlink" href="#local-training" title="Permalink to this headline"></a></h3>
<p>You need to set <strong>sparse_update=True</strong> in network config.  Check the network config documentation for more details.</p>
<div class="section" id="cluster-training">
<span id="cluster-training"></span><h3>2) cluster training<a class="headerlink" href="#cluster-training" title="Permalink to this headline"></a></h3>
<p>Add the following argument for cluster training of a sparse model. At the same time you need to set <strong>sparse_remote_update=True</strong> in network config. Check the network config documentation for more details.</p>
<div class="highlight-python"><div class="highlight"><pre><span></span><span class="o">--</span><span class="n">ports_num_for_sparse</span><span class="o">=</span><span class="mi">1</span>    <span class="c1">#(default: 0)</span>
<div class="section" id="parallel-nn">
<span id="parallel-nn"></span><h2>parallel_nn<a class="headerlink" href="#parallel-nn" title="Permalink to this headline"></a></h2>
<p><code class="docutils literal"><span class="pre">parallel_nn</span></code> can be set to mixed use of GPUs and CPUs to compute layers. That is to say, you can deploy network to use a GPU to compute some layers and use a CPU to compute other layers. The other way is to split layers into different GPUs, which can <strong>reduce GPU memory</strong> or <strong>use parallel computation to accelerate some layers</strong>.</p>
<p>If you want to use these characteristics, you need to specify device ID in network config (denote it as deviceId) and add command line argument:</p>
<div class="highlight-python"><div class="highlight"><pre><span></span><span class="o">--</span><span class="n">parallel_nn</span><span class="o">=</span><span class="n">true</span>
<div class="section" id="case-1-mixed-use-of-gpu-and-cpu">
<span id="case-1-mixed-use-of-gpu-and-cpu"></span><h3>case 1: Mixed Use of GPU and CPU<a class="headerlink" href="#case-1-mixed-use-of-gpu-and-cpu" title="Permalink to this headline"></a></h3>
<p>Consider the following example:</p>
<div class="highlight-python"><div class="highlight"><pre><span></span>#command line:
paddle train --use_gpu=true --parallel_nn=true trainer_count=COUNT


<ul class="simple">
<li>default_device(0): set default device ID to 0. This means that except the layers with device=-1, all layers will use a GPU, and the specific GPU used for each layer depends on trainer_count and gpu_id (0 by default). Here, layer l1 and l2 are computed on the GPU.</li>
<li>device=-1: use the CPU for layer l3.</li>
<li>trainer_count=1: if gpu_id is not set, then use the first GPU to compute layers l1 and l2. Otherwise use the GPU with gpu_id.</li>
<li>trainer_count&gt;1: use trainer_count GPUs to compute one layer using data parallelism. For example, trainer_count=2 means that GPUs 0 and 1 will use data parallelism to compute layer l1 and l2.</li>
<div class="section" id="case-2-specify-layers-in-different-devices">
<span id="case-2-specify-layers-in-different-devices"></span><h3>Case 2: Specify Layers in Different Devices<a class="headerlink" href="#case-2-specify-layers-in-different-devices" title="Permalink to this headline"></a></h3>
<div class="highlight-python"><div class="highlight"><pre><span></span>#command line:
paddle train --use_gpu=true --parallel_nn=true --trainer_count=COUNT

fc2=fc_layer(input=l1, layer_attr=ExtraAttr(device=0), ...)
fc3=fc_layer(input=l1, layer_attr=ExtraAttr(device=1), ...)
fc4=fc_layer(input=fc2, layer_attr=ExtraAttr(device=-1), ...)
<p>In this case, we assume that there are 4 GPUs in one machine.</p>
<ul class="simple">
<li>Use GPU 0 to compute layer l2.</li>
<li>Use GPU 1 to compute layer l3.</li>
<li>Use CPU to compute layer l4.</li>
<li>Use GPU 0 and 1 to compute layer l2.</li>
<li>Use GPU 2 and 3 to compute layer l3.</li>
<li>Use CPU to compute l4 in two threads.</li>
<li>It will fail (note, we have assumed that there are 4 GPUs in machine), because argument <code class="docutils literal"><span class="pre">allow_only_one_model_on_one_gpu</span></code> is true by default.</li>
<p><strong>Allocation of device ID when <code class="docutils literal"><span class="pre">device!=-1</span></code></strong>:</p>
<div class="highlight-python"><div class="highlight"><pre><span></span>(deviceId + gpu_id + threadId * numLogicalDevices_) % numDevices_

deviceId:             specified in layer.
gpu_id:               0 by default.
threadId:             thread ID, range: 0,1,..., trainer_count-1
numDevices_:          device (GPU) count in machine.
numLogicalDevices_:   min(max(deviceId + 1), numDevices_)

