index.html 32.8 KB
Newer Older
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">


<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    
    <title>PaddlePaddle常见问题 &#8212; PaddlePaddle  documentation</title>
    
    <link rel="stylesheet" href="../_static/classic.css" type="text/css" />
    <link rel="stylesheet" href="../_static/pygments.css" type="text/css" />
    
    <script type="text/javascript">
      var DOCUMENTATION_OPTIONS = {
        URL_ROOT:    '../',
        VERSION:     '',
        COLLAPSE_INDEX: false,
        FILE_SUFFIX: '.html',
        HAS_SOURCE:  true
      };
    </script>
    <script type="text/javascript" src="../_static/jquery.js"></script>
    <script type="text/javascript" src="../_static/underscore.js"></script>
    <script type="text/javascript" src="../_static/doctools.js"></script>
    <script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
    <link rel="index" title="Index" href="../genindex.html" />
    <link rel="search" title="Search" href="../search.html" />
    <link rel="top" title="PaddlePaddle  documentation" href="../index.html" /> 
<script>
var _hmt = _hmt || [];
(function() {
  var hm = document.createElement("script");
  hm.src = "//hm.baidu.com/hm.js?b9a314ab40d04d805655aab1deee08ba";
  var s = document.getElementsByTagName("script")[0]; 
  s.parentNode.insertBefore(hm, s);
})();
</script>

  </head>
  <body role="document">
    <div class="related" role="navigation" aria-label="related navigation">
      <h3>Navigation</h3>
      <ul>
        <li class="right" style="margin-right: 10px">
          <a href="../genindex.html" title="General Index"
             accesskey="I">index</a></li>
        <li class="nav-item nav-item-0"><a href="../index.html">PaddlePaddle  documentation</a> &#187;</li> 
      </ul>
    </div>  

    <div class="document">
      <div class="documentwrapper">
        <div class="bodywrapper">
          <div class="body" role="main">
            
  <div class="section" id="paddlepaddle">
<h1><a class="toc-backref" href="#id13">PaddlePaddle常见问题</a><a class="headerlink" href="#paddlepaddle" title="Permalink to this headline"></a></h1>
<div class="contents topic" id="contents">
<p class="topic-title first">Contents</p>
<ul class="simple">
<li><a class="reference internal" href="#paddlepaddle" id="id13">PaddlePaddle常见问题</a><ul>
<li><a class="reference internal" href="#id1" id="id14">1. 如何减少PaddlePaddle的内存占用</a><ul>
<li><a class="reference internal" href="#dataprovider" id="id15">减少DataProvider缓冲池内存</a></li>
<li><a class="reference internal" href="#id3" id="id16">神经元激活内存</a></li>
<li><a class="reference internal" href="#id4" id="id17">参数内存</a></li>
</ul>
</li>
<li><a class="reference internal" href="#id5" id="id18">2. 如何加速PaddlePaddle的训练速度</a><ul>
<li><a class="reference internal" href="#id6" id="id19">减少数据载入的耗时</a></li>
<li><a class="reference internal" href="#id7" id="id20">加速训练速度</a></li>
<li><a class="reference internal" href="#id8" id="id21">利用更多的计算资源</a></li>
</ul>
</li>
<li><a class="reference internal" href="#illegal-instruction" id="id22">3. 遇到“非法指令”或者是“illegal instruction”</a></li>
<li><a class="reference internal" href="#sgd" id="id23">4. 如何选择SGD算法的学习率</a></li>
<li><a class="reference internal" href="#id11" id="id24">5. 如何初始化参数</a></li>
<li><a class="reference internal" href="#id12" id="id25">6. 如何共享参数</a></li>
</ul>
</li>
</ul>
</div>
<div class="section" id="id1">
<h2><a class="toc-backref" href="#id14">1. 如何减少PaddlePaddle的内存占用</a><a class="headerlink" href="#id1" title="Permalink to this headline"></a></h2>
<p>神经网络的训练本身是一个非常消耗内存和显存的工作。经常会消耗数十G的内存和数G的显存。
PaddlePaddle的内存占用主要分为如下几个方面:</p>
<ul class="simple">
<li>DataProvider缓冲池内存 (只针对内存)</li>
<li>神经元激活内存 (针对内存和显存)</li>
<li>参数内存 (针对内存和显存)</li>
<li>其他内存杂项</li>
</ul>
<p>这其中,其他内存杂项是指PaddlePaddle本身所用的一些内存,包括字符串分配,临时变量等等,
这些内存就不考虑如何缩减了。</p>
<p>其他的内存的减少方法依次为</p>
<div class="section" id="dataprovider">
<h3><a class="toc-backref" href="#id15">减少DataProvider缓冲池内存</a><a class="headerlink" href="#dataprovider" title="Permalink to this headline"></a></h3>
<p>PyDataProvider使用的是异步加载,同时在内存里直接随即选取数据来做Shuffle。即</p>
<img src="../_images/graphviz-9be6aad37f57c60f4b971dde0ef44ce27179cf9a.png" alt="digraph {
    rankdir=LR;
    数据文件 -&gt; 内存池 -&gt; PaddlePaddle训练
}" />
<p>所以,减小这个内存池即可减小内存占用,同时也可以加速开始训练前数据载入的过程。但是,这
个内存池实际上决定了shuffle的粒度。所以,如果将这个内存池减小,又要保证数据是随机的,
那么最好将数据文件在每次读取之前做一次shuffle。可能的代码为</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="nd">@provider</span><span class="p">(</span><span class="n">min_pool_size</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="o">...</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">process</span><span class="p">(</span><span class="n">settings</span><span class="p">,</span> <span class="n">filename</span><span class="p">):</span>
    <span class="n">os</span><span class="o">.</span><span class="n">system</span><span class="p">(</span><span class="s1">&#39;shuf </span><span class="si">%s</span><span class="s1"> &gt; </span><span class="si">%s</span><span class="s1">.shuf&#39;</span> <span class="o">%</span> <span class="p">(</span><span class="n">filename</span><span class="p">,</span> <span class="n">filename</span><span class="p">))</span>  <span class="c1"># shuffle before.</span>
    <span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s1">&#39;</span><span class="si">%s</span><span class="s1">.shuf&#39;</span> <span class="o">%</span> <span class="n">filename</span><span class="p">,</span> <span class="s1">&#39;r&#39;</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
        <span class="k">for</span> <span class="n">line</span> <span class="ow">in</span> <span class="n">f</span><span class="p">:</span>
            <span class="k">yield</span> <span class="n">get_sample_from_line</span><span class="p">(</span><span class="n">line</span><span class="p">)</span>
</pre></div>
</div>
<p>这样做可以极大的减少内存占用,并且可能会加速训练过程。 详细文档参考 <a class="reference external" href="../ui/data_provider/pydataprovider2.html#provider">这里</a></p>
</div>
<div class="section" id="id3">
<h3><a class="toc-backref" href="#id16">神经元激活内存</a><a class="headerlink" href="#id3" title="Permalink to this headline"></a></h3>
<p>神经网络在训练的时候,会对每一个激活暂存一些数据,包括激活,參差等等。
在反向传递的时候,这些数据会被用来更新参数。这些数据使用的内存主要和两个参数有关系,
一是batch size,另一个是每条序列(Sequence)长度。所以,其实也是和每个mini-batch中包含
的时间步信息成正比。</p>
<p>所以,做法可以有两种。他们是</p>
<ul class="simple">
<li>减小batch size。 即在网络配置中 <code class="code docutils literal"><span class="pre">settings(batch_size=1000)</span></code> 设置成一个小一些的值。但是batch size本身是神经网络的超参数,减小batch size可能会对训练结果产生影响。</li>
<li>减小序列的长度,或者直接扔掉非常长的序列。比如,一个数据集大部分序列长度是100-200,
但是突然有一个10000长的序列,就很容易导致内存超限。特别是在LSTM等RNN中。</li>
</ul>
</div>
<div class="section" id="id4">
<h3><a class="toc-backref" href="#id17">参数内存</a><a class="headerlink" href="#id4" title="Permalink to this headline"></a></h3>
<p>PaddlePaddle支持非常多的优化算法(Optimizer),不同的优化算法需要使用不同大小的内存。
例如如果使用 <code class="code docutils literal"><span class="pre">adadelta</span></code> 算法,则需要使用参数规模大约5倍的内存。 如果参数保存下来的
文件为 <code class="code docutils literal"><span class="pre">100M</span></code>, 那么该优化算法至少需要 <code class="code docutils literal"><span class="pre">500M</span></code> 的内存。</p>
<p>可以考虑使用一些优化算法,例如 <code class="code docutils literal"><span class="pre">momentum</span></code></p>
</div>
</div>
<div class="section" id="id5">
<h2><a class="toc-backref" href="#id18">2. 如何加速PaddlePaddle的训练速度</a><a class="headerlink" href="#id5" title="Permalink to this headline"></a></h2>
<p>PaddlePaddle是神经网络训练平台,加速PaddlePaddle训练有如下几个方面:</p>
<ul class="simple">
<li>减少数据载入的耗时</li>
<li>加速训练速度</li>
<li>利用更多的计算资源</li>
</ul>
<div class="section" id="id6">
<h3><a class="toc-backref" href="#id19">减少数据载入的耗时</a><a class="headerlink" href="#id6" title="Permalink to this headline"></a></h3>
<p>使用 <code class="code docutils literal"><span class="pre">pydataprovider`时,可以减少缓存池的大小,同时设置内存缓存功能,即可以极大的加速数据载入流程。</span>
<span class="pre">:code:`DataProvider</span></code> 缓存池的减小,和之前减小通过减小缓存池来减小内存占用的原理一致。</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="nd">@provider</span><span class="p">(</span><span class="n">min_pool_size</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="o">...</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">process</span><span class="p">(</span><span class="n">settings</span><span class="p">,</span> <span class="n">filename</span><span class="p">):</span>
    <span class="n">os</span><span class="o">.</span><span class="n">system</span><span class="p">(</span><span class="s1">&#39;shuf </span><span class="si">%s</span><span class="s1"> &gt; </span><span class="si">%s</span><span class="s1">.shuf&#39;</span> <span class="o">%</span> <span class="p">(</span><span class="n">filename</span><span class="p">,</span> <span class="n">filename</span><span class="p">))</span>  <span class="c1"># shuffle before.</span>
    <span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s1">&#39;</span><span class="si">%s</span><span class="s1">.shuf&#39;</span> <span class="o">%</span> <span class="n">filename</span><span class="p">,</span> <span class="s1">&#39;r&#39;</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
        <span class="k">for</span> <span class="n">line</span> <span class="ow">in</span> <span class="n">f</span><span class="p">:</span>
            <span class="k">yield</span> <span class="n">get_sample_from_line</span><span class="p">(</span><span class="n">line</span><span class="p">)</span>
</pre></div>
</div>
<p>同时 <code class="code docutils literal"><span class="pre">&#64;provider</span></code> 接口有一个 <code class="code docutils literal"><span class="pre">cache</span></code> 参数来控制缓存方法,将其设置成 <code class="code docutils literal"><span class="pre">CacheType.CACHE_PASS_IN_MEM</span></code> 的话,会将第一个 <code class="code docutils literal"><span class="pre">pass</span></code> (过完所有训练数据即为一个pass)生成的数据缓存在内存里,在之后的 <code class="code docutils literal"><span class="pre">pass</span></code> 中,不会再从 <code class="code docutils literal"><span class="pre">python</span></code> 端读取数据,而是直接从内存的缓存里读取数据。这也会极大减少数据读入的耗时。</p>
</div>
<div class="section" id="id7">
<h3><a class="toc-backref" href="#id20">加速训练速度</a><a class="headerlink" href="#id7" title="Permalink to this headline"></a></h3>
<p>PaddlePaddle支持Sparse的训练,sparse训练需要训练特征是 <code class="code docutils literal"><span class="pre">sparse_binary_vector</span></code><code class="code docutils literal"><span class="pre">sparse_vector</span></code> 、或者 <code class="code docutils literal"><span class="pre">integer_value</span></code> 的任一一种。同时,与这个训练数据交互的Layer,需要将其Parameter设置成 sparse 更新模式,即设置 <code class="code docutils literal"><span class="pre">sparse_update=True</span></code></p>
<p>这里使用简单的 <code class="code docutils literal"><span class="pre">word2vec</span></code> 训练语言模型距离,具体使用方法为:</p>
<p>使用一个词前两个词和后两个词,来预测这个中间的词。这个任务的DataProvider为:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">DICT_DIM</span><span class="o">=</span><span class="mi">3000</span>
<span class="nd">@provider</span><span class="p">(</span><span class="n">input_types</span><span class="o">=</span><span class="p">[</span><span class="n">integer_sequence</span><span class="p">(</span><span class="n">DICT_DIM</span><span class="p">),</span> <span class="n">integer_value</span><span class="p">(</span><span class="n">DICT_DIM</span><span class="p">)])</span>
<span class="k">def</span> <span class="nf">process</span><span class="p">(</span><span class="n">settings</span><span class="p">,</span> <span class="n">filename</span><span class="p">):</span>
	<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">filename</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
		<span class="c1"># yield word ids to predict inner word id</span>
		<span class="c1"># such as [28, 29, 10, 4], 4</span>
		<span class="c1"># It means the sentance is  28, 29, 4, 10, 4.</span>
		<span class="k">yield</span> <span class="n">read_next_from_file</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
</pre></div>
</div>
<p>这个任务的配置为:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="o">...</span> <span class="c1"># the settings and define data provider is omitted.</span>
<span class="n">DICT_DIM</span><span class="o">=</span><span class="mi">3000</span>  <span class="c1"># dictionary dimension.</span>
<span class="n">word_ids</span><span class="o">=</span><span class="n">data_layer</span><span class="p">(</span><span class="s1">&#39;word_ids&#39;</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="n">DICT_DIM</span><span class="p">)</span>

<span class="n">emb</span> <span class="o">=</span> <span class="n">embedding_layer</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">word_ids</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="mi">256</span><span class="p">,</span> <span class="n">param_attr</span><span class="o">=</span><span class="n">ParamAttr</span><span class="p">(</span><span class="n">sparse_update</span><span class="o">=</span><span class="kc">True</span><span class="p">))</span>
<span class="n">emb_sum</span> <span class="o">=</span> <span class="n">pooling_layer</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">emb</span><span class="p">,</span> <span class="n">pooling_type</span><span class="o">=</span><span class="n">SumPooling</span><span class="p">())</span>
<span class="n">predict</span> <span class="o">=</span> <span class="n">fc_layer</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">emb_sum</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="n">DICT_DIM</span><span class="p">,</span> <span class="n">act</span><span class="o">=</span><span class="n">Softmax</span><span class="p">())</span>
<span class="n">outputs</span><span class="p">(</span><span class="n">classification_cost</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">predict</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="n">data_layer</span><span class="p">(</span><span class="s1">&#39;label&#39;</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="n">DICT_DIM</span><span class="p">)))</span> 
</pre></div>
</div>
<p>更多关于sparse训练的内容请参考 <a class="reference external" href="TBD">sparse训练的文档</a></p>
</div>
<div class="section" id="id8">
<h3><a class="toc-backref" href="#id21">利用更多的计算资源</a><a class="headerlink" href="#id8" title="Permalink to this headline"></a></h3>
<p>利用更多的计算资源可以分为一下几个方式来进行:</p>
<ul class="simple">
<li>单机CPU训练
* 使用多线程训练。设置命令行参数 <code class="code docutils literal"><span class="pre">trainer_count</span></code>,即可以设置参与训练的线程数量。使用方法为 <code class="code docutils literal"><span class="pre">paddle</span> <span class="pre">train</span> <span class="pre">--trainer_count=4</span></code></li>
<li>单机GPU训练
* 使用显卡训练。设置命令行参数 <code class="code docutils literal"><span class="pre">use_gpu</span></code>。 使用方法为 <code class="code docutils literal"><span class="pre">paddle</span> <span class="pre">train</span> <span class="pre">--use_gpu=true</span></code>
* 使用多块显卡训练。设置命令行参数 <code class="code docutils literal"><span class="pre">use_gpu</span></code><code class="code docutils literal"><span class="pre">trainer_count</span></code>。使用 <code class="code docutils literal"><span class="pre">--use_gpu=True</span></code> 开启GPU训练,使用 <code class="code docutils literal"><span class="pre">trainer_count</span></code> 指定显卡数量。使用方法为 <code class="code docutils literal"><span class="pre">paddle</span> <span class="pre">train</span> <span class="pre">--use_gpu=true</span> <span class="pre">--trainer_count=4</span></code></li>
<li>多机训练
* 使用多机训练的方法也比较简单,需要先在每个节点启动 <code class="code docutils literal"><span class="pre">paddle</span> <span class="pre">pserver</span></code>,在使用 <code class="code docutils literal"><span class="pre">paddle</span> <span class="pre">train</span> <span class="pre">--pservers=192.168.100.1,192.168.100.2</span></code> 来指定每个pserver的ip地址
* 具体的多机训练方法参考 <a class="reference external" href="TBD">多机训练</a> 文档。</li>
</ul>
</div>
</div>
<div class="section" id="illegal-instruction">
<h2><a class="toc-backref" href="#id22">3. 遇到“非法指令”或者是“illegal instruction”</a><a class="headerlink" href="#illegal-instruction" title="Permalink to this headline"></a></h2>
<p>paddle在进行计算的时候为了提升计算性能,使用了avx指令。部分老的cpu型号无法支持这样的指令。通常来说执行下grep avx /proc/cpuinfo看看是否有输出即可知道是否支持。(另:用此方法部分虚拟机可能检测到支持avx指令但是实际运行会挂掉,请当成是不支持,看下面的解决方案)</p>
<p>解决办法是:</p>
<ul class="simple">
<li>使用 NO_AVX的 <a class="reference external" href="../build_and_install/index.html">安装包</a> 或者 <a class="reference external" href="../build_and_install/install/docker_install.html">Docker image</a></li>
<li>或者,使用 <code class="code docutils literal"><span class="pre">-DWITH_AVX=OFF</span></code> 重新编译PaddlePaddle。</li>
</ul>
</div>
<div class="section" id="sgd">
<h2><a class="toc-backref" href="#id23">4. 如何选择SGD算法的学习率</a><a class="headerlink" href="#sgd" title="Permalink to this headline"></a></h2>
<p>在采用sgd/async_sgd进行训练时,一个重要的问题是选择正确的learning_rate。如果learning_rate太大,那么训练有可能不收敛,如果learning_rate太小,那么收敛可能很慢,导致训练时间过长。</p>
<p>通常做法是从一个比较大的learning_rate开始试,如果不收敛,那减少学习率10倍继续试验,直到训练收敛为止。那么如何判断训练不收敛呢?可以估计出如果模型采用不变的输出最小的cost0是多少。</p>
<p>如果训练过程的的cost明显高于这个常数输出的cost,那么我们可以判断为训练不收敛。举一个例子,假如我们是三分类问题,采用multi-class-cross-entropy作为cost,数据中0,1,2三类的比例为 <code class="code docutils literal"><span class="pre">0.2,</span> <span class="pre">0.5,</span> <span class="pre">0.3</span></code> , 那么常数输出所能达到的最小cost是 <code class="code docutils literal"><span class="pre">-(0.2*log(0.2)+0.5*log(0.5)+0.3*log(0.3))=1.03</span></code> 。如果训练一个pass(或者更早)后,cost还大于这个数,那么可以认为训练不收敛,应该降低学习率。</p>
</div>
<div class="section" id="id11">
<h2><a class="toc-backref" href="#id24">5. 如何初始化参数</a><a class="headerlink" href="#id11" title="Permalink to this headline"></a></h2>
<p>默认情况下,PaddlePaddle使用均值0,标准差为 <span class="math">\(\frac{1}{\sqrt{d}}\)</span> 来初始化参数。其中 <span class="math">\(d\)</span> 为参数矩阵的宽度。这种初始化方式在一般情况下不会产生很差的结果。如果用户想要自定义初始化方式,PaddlePaddle目前提供两种参数初始化的方式:</p>
<ul class="simple">
<li>高斯分布。将 <code class="code docutils literal"><span class="pre">param_attr</span></code> 设置成 <code class="code docutils literal"><span class="pre">param_attr=ParamAttr(initial_mean=0.0,</span> <span class="pre">initial_std=1.0)</span></code></li>
<li>均匀分布。将 <code class="code docutils literal"><span class="pre">param_attr</span></code> 设置成 <code class="code docutils literal"><span class="pre">param_attr=ParamAttr(initial_max=1.0,</span> <span class="pre">initial_min=-1.0)</span></code></li>
</ul>
<p>比如设置一个全连接层的参数初始化方式和bias初始化方式,可以使用如下代码。</p>
<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">hidden</span> <span class="o">=</span> <span class="n">fc_layer</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">ipt</span><span class="p">,</span> <span class="n">param_attr</span><span class="o">=</span><span class="n">ParamAttr</span><span class="p">(</span><span class="n">initial_max</span><span class="o">=</span><span class="mf">1.0</span><span class="p">,</span> <span class="n">initial_min</span><span class="o">=-</span><span class="mf">1.0</span><span class="p">),</span>
                  <span class="n">bias_attr</span><span class="o">=</span><span class="n">ParamAttr</span><span class="p">(</span><span class="n">initial_mean</span><span class="o">=</span><span class="mf">1.0</span><span class="p">,</span> <span class="n">initial_std</span><span class="o">=</span><span class="mf">0.0</span><span class="p">))</span>
</pre></div>
</div>
<p>上述代码将bias全部初始化为1.0, 同时将参数初始化为 <code class="code docutils literal"><span class="pre">[1.0,</span> <span class="pre">-1.0]</span></code> 的均匀分布。</p>
</div>
<div class="section" id="id12">
<h2><a class="toc-backref" href="#id25">6. 如何共享参数</a><a class="headerlink" href="#id12" title="Permalink to this headline"></a></h2>
<p>PaddlePaddle的参数使用名字 <code class="code docutils literal"><span class="pre">name</span></code> 作为参数的ID,相同名字的参数,会共享参数。设置参数的名字,可以使用 <code class="code docutils literal"><span class="pre">ParamAttr(name=&quot;YOUR_PARAM_NAME&quot;)</span></code> 来设置。更方便的设置方式,是想要共享的参数使用同样的 <code class="code docutils literal"><span class="pre">ParamAttr</span></code> 对象。</p>
<p>简单的全连接网络,参数共享的配置示例为:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">paddle.trainer_config_helpers</span> <span class="k">import</span> <span class="o">*</span>

<span class="n">settings</span><span class="p">(</span>
    <span class="n">learning_rate</span><span class="o">=</span><span class="mi">1</span><span class="n">e</span><span class="o">-</span><span class="mi">4</span><span class="p">,</span>
    <span class="n">batch_size</span><span class="o">=</span><span class="mi">1000</span>
<span class="p">)</span>

<span class="n">a</span> <span class="o">=</span> <span class="n">data_layer</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s1">&#39;feature_a&#39;</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="mi">200</span><span class="p">)</span>
<span class="n">b</span> <span class="o">=</span> <span class="n">data_layer</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s1">&#39;feature_b&#39;</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="mi">200</span><span class="p">)</span>

<span class="n">fc_param</span> <span class="o">=</span> <span class="n">ParamAttr</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s1">&#39;fc_param&#39;</span><span class="p">,</span> <span class="n">initial_max</span><span class="o">=</span><span class="mf">1.0</span><span class="p">,</span> <span class="n">initial_min</span><span class="o">=-</span><span class="mf">1.0</span><span class="p">)</span>
<span class="n">bias_param</span> <span class="o">=</span> <span class="n">ParamAttr</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s1">&#39;bias_param&#39;</span><span class="p">,</span> <span class="n">initial_mean</span><span class="o">=</span><span class="mf">0.0</span><span class="p">,</span> <span class="n">initial_std</span><span class="o">=</span><span class="mf">0.0</span><span class="p">)</span>

<span class="n">softmax_param</span> <span class="o">=</span> <span class="n">ParamAttr</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s1">&#39;softmax_param&#39;</span><span class="p">,</span> <span class="n">initial_max</span><span class="o">=</span><span class="mf">1.0</span><span class="p">,</span> <span class="n">initial_min</span><span class="o">=-</span><span class="mf">1.0</span><span class="p">)</span>

<span class="n">hidden_a</span> <span class="o">=</span> <span class="n">fc_layer</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">a</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="mi">200</span><span class="p">,</span> <span class="n">param_attr</span><span class="o">=</span><span class="n">fc_param</span><span class="p">,</span> <span class="n">bias_attr</span><span class="o">=</span><span class="n">bias_param</span><span class="p">)</span>
<span class="n">hidden_b</span> <span class="o">=</span> <span class="n">fc_layer</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">b</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="mi">200</span><span class="p">,</span> <span class="n">param_attr</span><span class="o">=</span><span class="n">fc_param</span><span class="p">,</span> <span class="n">bias_attr</span><span class="o">=</span><span class="n">bias_param</span><span class="p">)</span>

<span class="n">predict</span> <span class="o">=</span> <span class="n">fc_layer</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="p">[</span><span class="n">hidden_a</span><span class="p">,</span> <span class="n">hidden_b</span><span class="p">],</span> <span class="n">param_attr</span><span class="o">=</span><span class="p">[</span><span class="n">softmax_param</span><span class="p">,</span> <span class="n">softmax_param</span><span class="p">],</span>
                   <span class="n">bias_attr</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">act</span><span class="o">=</span><span class="n">SoftmaxActivation</span><span class="p">())</span>

<span class="n">outputs</span><span class="p">(</span><span class="n">classification_cost</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">predict</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="n">data_layer</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s1">&#39;label&#39;</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="mi">10</span><span class="p">)))</span>
</pre></div>
</div>
<p>这里 <code class="code docutils literal"><span class="pre">hidden_a</span></code><code class="code docutils literal"><span class="pre">hidden_b</span></code> 使用了同样的parameter和bias。并且softmax层的两个输入也使用了同样的参数 <code class="code docutils literal"><span class="pre">softmax_param</span></code></p>
</div>
</div>


          </div>
        </div>
      </div>
      <div class="sphinxsidebar" role="navigation" aria-label="main navigation">
        <div class="sphinxsidebarwrapper">
  <h3><a href="../index.html">Table Of Contents</a></h3>
  <ul>
<li><a class="reference internal" href="#">PaddlePaddle常见问题</a><ul>
<li><a class="reference internal" href="#id1">1. 如何减少PaddlePaddle的内存占用</a><ul>
<li><a class="reference internal" href="#dataprovider">减少DataProvider缓冲池内存</a></li>
<li><a class="reference internal" href="#id3">神经元激活内存</a></li>
<li><a class="reference internal" href="#id4">参数内存</a></li>
</ul>
</li>
<li><a class="reference internal" href="#id5">2. 如何加速PaddlePaddle的训练速度</a><ul>
<li><a class="reference internal" href="#id6">减少数据载入的耗时</a></li>
<li><a class="reference internal" href="#id7">加速训练速度</a></li>
<li><a class="reference internal" href="#id8">利用更多的计算资源</a></li>
</ul>
</li>
<li><a class="reference internal" href="#illegal-instruction">3. 遇到“非法指令”或者是“illegal instruction”</a></li>
<li><a class="reference internal" href="#sgd">4. 如何选择SGD算法的学习率</a></li>
<li><a class="reference internal" href="#id11">5. 如何初始化参数</a></li>
<li><a class="reference internal" href="#id12">6. 如何共享参数</a></li>
</ul>
</li>
</ul>

  <div role="note" aria-label="source link">
    <h3>This Page</h3>
    <ul class="this-page-menu">
      <li><a href="../_sources/faq/index.txt"
            rel="nofollow">Show Source</a></li>
    </ul>
   </div>
<div id="searchbox" style="display: none" role="search">
  <h3>Quick search</h3>
    <form class="search" action="../search.html" method="get">
      <div><input type="text" name="q" /></div>
      <div><input type="submit" value="Go" /></div>
      <input type="hidden" name="check_keywords" value="yes" />
      <input type="hidden" name="area" value="default" />
    </form>
</div>
<script type="text/javascript">$('#searchbox').show(0);</script>
        </div>
      </div>
      <div class="clearer"></div>
    </div>
    <div class="related" role="navigation" aria-label="related navigation">
      <h3>Navigation</h3>
      <ul>
        <li class="right" style="margin-right: 10px">
          <a href="../genindex.html" title="General Index"
             >index</a></li>
        <li class="nav-item nav-item-0"><a href="../index.html">PaddlePaddle  documentation</a> &#187;</li> 
      </ul>
    </div>
    <div class="footer" role="contentinfo">
        &#169; Copyright 2016, PaddlePaddle developers.
      Created using <a href="http://sphinx-doc.org/">Sphinx</a> 1.4.8.
    </div>
  </body>
</html>