提交 5123156e 编写于 作者: T Travis CI

Deploy to GitHub Pages: 5f90a31f

上级 a45a5160
......@@ -300,3 +300,7 @@ conv2d_transpose
.. autofunction:: paddle.v2.fluid.layers.conv2d_transpose
:noindex:
sequence_expand
---------
.. autofunction:: paddle.v2.fluid.layers.sequence_expand
:noindex:
......@@ -30,10 +30,10 @@
由于在现有的某些情况下(例如RNN),多次调用 cblas_?gemm 会使用相同的原数据,因此,每次调用时对原数据的重复Packing便成为了冗余。
为了最大程度减少多次调用 cblas_?gemm 在Packing上的耗时,Intel® MKL 引入了以下四个API:
* cblas_?gemm_alloc
* cblas_?gemm_pack
* cblas_?gemm_compute
* cblas_?gemm_free
* [cblas_?gemm_alloc](https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-alloc)
* [cblas_?gemm_pack](https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-pack)
* [cblas_?gemm_compute](https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-compute)
* [cblas_?gemm_free](https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-free)
通过使用这些API,我们可以先完成对原数据的Packing操作,再把已转换为Packed格式的数据传递给那些复用同一数据的gemm_compute函数,从而避免了Packing冗余。
......@@ -84,7 +84,20 @@ PaddlePaddle/Paddle
2. 对比优化后layer与相对应的PaddlePaddle原有layer, 在batch mode下的结果。
### Python API
TBD
计划在`paddle/utils.Flags`中添加`use_mkl_packed`的flag,用于选择是否使用相关功能,并且当编译时`WITH_MKL=ON`的情况下,默认设置为`true`。
同时,在`python/paddle/trainer/config_parser.py`中对应的layer处,添加`use_mkl_packed`这个选择,方便用户在Python端选择是否启用这个功能。
具体实现方式比如:
```python
use_mkl_packed = bool(int(g_command_config_args.get("use_mkl_packed", 0)))
if use_mkl_packed:
self.layer_type = mkl_packed_*
```
所有相关的`layer_type`会以*mkl_packed_*开头,这些会在`MKLPacked*Layer`注册layer的时候保证,以示区分。
### Benchmarking
会添加相应的脚本用于测试和对比在使用MKL Packed recurrent layers 前后的网络性能。
......
......@@ -1065,6 +1065,79 @@ stride_H = stride_W = stride.</li>
</table>
</dd></dl>
</div>
<div class="section" id="sequence-expand">
<h2>sequence_expand<a class="headerlink" href="#sequence-expand" title="Permalink to this headline"></a></h2>
<dl class="function">
<dt>
<code class="descclassname">paddle.v2.fluid.layers.</code><code class="descname">sequence_expand</code><span class="sig-paren">(</span><em>x</em>, <em>y</em>, <em>main_program=None</em>, <em>startup_program=None</em><span class="sig-paren">)</span></dt>
<dd><p>Sequence Expand Layer. This layer will expand the input variable <strong>x</strong>
according to LoD information of <strong>y</strong>. And the following examples will
explain how sequence_expand works:</p>
<div class="highlight-text"><div class="highlight"><pre><span></span>* Case 1
x is a LoDTensor:
x.lod = [[0, 2, 3],
[0, 1, 3, 4]]
x.data = [a, b, c, d]
x.dims = [4, 1]
y is a LoDTensor:
y.lod = [[0, 2, 4],
[0, 3, 6, 7, 8]]
with condition len(y.lod[-1]) - 1 == x.dims[0]
then output is a 2-level LoDTensor:
out.lod = [[0, 2, 4],
[0, 3, 6, 7, 8]]
out.data = [a, a, a, b, b, b, c, d]
out.dims = [8, 1]
* Case 2
x is a Tensor:
x.data = [a, b, c]
x.dims = [3, 1]
y is a LoDTensor:
y.lod = [[0, 2, 3, 6]]
with condition len(y.lod[-1]) - 1 == x.dims[0]
then output is a 1-level LoDTensor:
out.lod = [[0, 2, 3, 6]]
out.data = [a, a, b, c, c, c]
out.dims = [6, 1]
</pre></div>
</div>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
<li><strong>x</strong> (<em>Variable</em>) &#8211; The input variable which is a Tensor or LoDTensor.</li>
<li><strong>y</strong> (<em>Variable</em>) &#8211; The input variable which is a LoDTensor.</li>
<li><strong>main_program</strong> (<em>Program</em>) &#8211; The main program.</li>
<li><strong>startup_program</strong> (<em>Program</em>) &#8211; The startup program.</li>
</ul>
</td>
</tr>
<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">The expanded variable which is a LoDTensor.</p>
</td>
</tr>
<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">Variable</p>
</td>
</tr>
</tbody>
</table>
<p class="rubric">Examples</p>
<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">x</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">data</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s1">&#39;x&#39;</span><span class="p">,</span> <span class="n">shape</span><span class="o">=</span><span class="p">[</span><span class="mi">10</span><span class="p">],</span> <span class="n">dtype</span><span class="o">=</span><span class="s1">&#39;float32&#39;</span><span class="p">)</span>
<span class="n">y</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">data</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s1">&#39;y&#39;</span><span class="p">,</span> <span class="n">shape</span><span class="o">=</span><span class="p">[</span><span class="mi">10</span><span class="p">,</span> <span class="mi">20</span><span class="p">],</span>
<span class="n">dtype</span><span class="o">=</span><span class="s1">&#39;float32&#39;</span><span class="p">,</span> <span class="n">lod_level</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="n">out</span> <span class="o">=</span> <span class="n">layers</span><span class="o">.</span><span class="n">sequence_expand</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="n">y</span><span class="p">)</span>
</pre></div>
</div>
</dd></dl>
</div>
</div>
......
......@@ -238,12 +238,14 @@
<li>转换冗余 由于在现有的某些情况下(例如RNN),多次调用 cblas_?gemm 会使用相同的原数据,因此,每次调用时对原数据的重复Packing便成为了冗余。</li>
</ol>
<p>为了最大程度减少多次调用 cblas_?gemm 在Packing上的耗时,Intel® MKL 引入了以下四个API:</p>
<ul class="simple">
<li>cblas_?gemm_alloc</li>
<li>cblas_?gemm_pack</li>
<li>cblas_?gemm_compute</li>
<li>cblas_?gemm_free</li>
<div class="toctree-wrapper compound">
<ul>
<li class="toctree-l1"><a class="reference external" href="https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-alloc">cblas</a></li>
<li class="toctree-l1"><a class="reference external" href="https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-pack">cblas</a></li>
<li class="toctree-l1"><a class="reference external" href="https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-compute">cblas</a></li>
<li class="toctree-l1"><a class="reference external" href="https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-free">cblas</a></li>
</ul>
</div>
<p>通过使用这些API,我们可以先完成对原数据的Packing操作,再把已转换为Packed格式的数据传递给那些复用同一数据的gemm_compute函数,从而避免了Packing冗余。</p>
</div>
<div class="section" id="solution">
......@@ -303,7 +305,15 @@
</div>
<div class="section" id="python-api">
<span id="python-api"></span><h3>Python API<a class="headerlink" href="#python-api" title="Permalink to this headline"></a></h3>
<p>TBD</p>
<p>计划在<code class="docutils literal"><span class="pre">paddle/utils.Flags</span></code>中添加<code class="docutils literal"><span class="pre">use_mkl_packed</span></code>的flag,用于选择是否使用相关功能,并且当编译时<code class="docutils literal"><span class="pre">WITH_MKL=ON</span></code>的情况下,默认设置为<code class="docutils literal"><span class="pre">true</span></code></p>
<p>同时,在<code class="docutils literal"><span class="pre">python/paddle/trainer/config_parser.py</span></code>中对应的layer处,添加<code class="docutils literal"><span class="pre">use_mkl_packed</span></code>这个选择,方便用户在Python端选择是否启用这个功能。</p>
<p>具体实现方式比如:</p>
<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">use_mkl_packed</span> <span class="o">=</span> <span class="nb">bool</span><span class="p">(</span><span class="nb">int</span><span class="p">(</span><span class="n">g_command_config_args</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">&quot;use_mkl_packed&quot;</span><span class="p">,</span> <span class="mi">0</span><span class="p">)))</span>
<span class="k">if</span> <span class="n">use_mkl_packed</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">layer_type</span> <span class="o">=</span> <span class="n">mkl_packed_</span><span class="o">*</span>
</pre></div>
</div>
<p>所有相关的<code class="docutils literal"><span class="pre">layer_type</span></code>会以*mkl_packed_*开头,这些会在<code class="docutils literal"><span class="pre">MKLPacked*Layer</span></code>注册layer的时候保证,以示区分。</p>
</div>
<div class="section" id="benchmarking">
<span id="benchmarking"></span><h3>Benchmarking<a class="headerlink" href="#benchmarking" title="Permalink to this headline"></a></h3>
......
因为 它太大了无法显示 source diff 。你可以改为 查看blob
......@@ -300,3 +300,7 @@ conv2d_transpose
.. autofunction:: paddle.v2.fluid.layers.conv2d_transpose
:noindex:
sequence_expand
---------
.. autofunction:: paddle.v2.fluid.layers.sequence_expand
:noindex:
......@@ -30,10 +30,10 @@
由于在现有的某些情况下(例如RNN),多次调用 cblas_?gemm 会使用相同的原数据,因此,每次调用时对原数据的重复Packing便成为了冗余。
为了最大程度减少多次调用 cblas_?gemm 在Packing上的耗时,Intel® MKL 引入了以下四个API:
* cblas_?gemm_alloc
* cblas_?gemm_pack
* cblas_?gemm_compute
* cblas_?gemm_free
* [cblas_?gemm_alloc](https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-alloc)
* [cblas_?gemm_pack](https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-pack)
* [cblas_?gemm_compute](https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-compute)
* [cblas_?gemm_free](https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-free)
通过使用这些API,我们可以先完成对原数据的Packing操作,再把已转换为Packed格式的数据传递给那些复用同一数据的gemm_compute函数,从而避免了Packing冗余。
......@@ -84,7 +84,20 @@ PaddlePaddle/Paddle
2. 对比优化后layer与相对应的PaddlePaddle原有layer, 在batch mode下的结果。
### Python API
TBD
计划在`paddle/utils.Flags`中添加`use_mkl_packed`的flag,用于选择是否使用相关功能,并且当编译时`WITH_MKL=ON`的情况下,默认设置为`true`。
同时,在`python/paddle/trainer/config_parser.py`中对应的layer处,添加`use_mkl_packed`这个选择,方便用户在Python端选择是否启用这个功能。
具体实现方式比如:
```python
use_mkl_packed = bool(int(g_command_config_args.get("use_mkl_packed", 0)))
if use_mkl_packed:
self.layer_type = mkl_packed_*
```
所有相关的`layer_type`会以*mkl_packed_*开头,这些会在`MKLPacked*Layer`注册layer的时候保证,以示区分。
### Benchmarking
会添加相应的脚本用于测试和对比在使用MKL Packed recurrent layers 前后的网络性能。
......
......@@ -1084,6 +1084,79 @@ stride_H = stride_W = stride.</li>
</table>
</dd></dl>
</div>
<div class="section" id="sequence-expand">
<h2>sequence_expand<a class="headerlink" href="#sequence-expand" title="永久链接至标题"></a></h2>
<dl class="function">
<dt>
<code class="descclassname">paddle.v2.fluid.layers.</code><code class="descname">sequence_expand</code><span class="sig-paren">(</span><em>x</em>, <em>y</em>, <em>main_program=None</em>, <em>startup_program=None</em><span class="sig-paren">)</span></dt>
<dd><p>Sequence Expand Layer. This layer will expand the input variable <strong>x</strong>
according to LoD information of <strong>y</strong>. And the following examples will
explain how sequence_expand works:</p>
<div class="highlight-text"><div class="highlight"><pre><span></span>* Case 1
x is a LoDTensor:
x.lod = [[0, 2, 3],
[0, 1, 3, 4]]
x.data = [a, b, c, d]
x.dims = [4, 1]
y is a LoDTensor:
y.lod = [[0, 2, 4],
[0, 3, 6, 7, 8]]
with condition len(y.lod[-1]) - 1 == x.dims[0]
then output is a 2-level LoDTensor:
out.lod = [[0, 2, 4],
[0, 3, 6, 7, 8]]
out.data = [a, a, a, b, b, b, c, d]
out.dims = [8, 1]
* Case 2
x is a Tensor:
x.data = [a, b, c]
x.dims = [3, 1]
y is a LoDTensor:
y.lod = [[0, 2, 3, 6]]
with condition len(y.lod[-1]) - 1 == x.dims[0]
then output is a 1-level LoDTensor:
out.lod = [[0, 2, 3, 6]]
out.data = [a, a, b, c, c, c]
out.dims = [6, 1]
</pre></div>
</div>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
<li><strong>x</strong> (<em>Variable</em>) &#8211; The input variable which is a Tensor or LoDTensor.</li>
<li><strong>y</strong> (<em>Variable</em>) &#8211; The input variable which is a LoDTensor.</li>
<li><strong>main_program</strong> (<em>Program</em>) &#8211; The main program.</li>
<li><strong>startup_program</strong> (<em>Program</em>) &#8211; The startup program.</li>
</ul>
</td>
</tr>
<tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">The expanded variable which is a LoDTensor.</p>
</td>
</tr>
<tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">Variable</p>
</td>
</tr>
</tbody>
</table>
<p class="rubric">Examples</p>
<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">x</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">data</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s1">&#39;x&#39;</span><span class="p">,</span> <span class="n">shape</span><span class="o">=</span><span class="p">[</span><span class="mi">10</span><span class="p">],</span> <span class="n">dtype</span><span class="o">=</span><span class="s1">&#39;float32&#39;</span><span class="p">)</span>
<span class="n">y</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">data</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s1">&#39;y&#39;</span><span class="p">,</span> <span class="n">shape</span><span class="o">=</span><span class="p">[</span><span class="mi">10</span><span class="p">,</span> <span class="mi">20</span><span class="p">],</span>
<span class="n">dtype</span><span class="o">=</span><span class="s1">&#39;float32&#39;</span><span class="p">,</span> <span class="n">lod_level</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="n">out</span> <span class="o">=</span> <span class="n">layers</span><span class="o">.</span><span class="n">sequence_expand</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="n">y</span><span class="p">)</span>
</pre></div>
</div>
</dd></dl>
</div>
</div>
......
......@@ -257,12 +257,14 @@
<li>转换冗余 由于在现有的某些情况下(例如RNN),多次调用 cblas_?gemm 会使用相同的原数据,因此,每次调用时对原数据的重复Packing便成为了冗余。</li>
</ol>
<p>为了最大程度减少多次调用 cblas_?gemm 在Packing上的耗时,Intel® MKL 引入了以下四个API:</p>
<ul class="simple">
<li>cblas_?gemm_alloc</li>
<li>cblas_?gemm_pack</li>
<li>cblas_?gemm_compute</li>
<li>cblas_?gemm_free</li>
<div class="toctree-wrapper compound">
<ul>
<li class="toctree-l1"><a class="reference external" href="https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-alloc">cblas</a></li>
<li class="toctree-l1"><a class="reference external" href="https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-pack">cblas</a></li>
<li class="toctree-l1"><a class="reference external" href="https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-compute">cblas</a></li>
<li class="toctree-l1"><a class="reference external" href="https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-free">cblas</a></li>
</ul>
</div>
<p>通过使用这些API,我们可以先完成对原数据的Packing操作,再把已转换为Packed格式的数据传递给那些复用同一数据的gemm_compute函数,从而避免了Packing冗余。</p>
</div>
<div class="section" id="solution">
......@@ -322,7 +324,15 @@
</div>
<div class="section" id="python-api">
<span id="python-api"></span><h3>Python API<a class="headerlink" href="#python-api" title="永久链接至标题"></a></h3>
<p>TBD</p>
<p>计划在<code class="docutils literal"><span class="pre">paddle/utils.Flags</span></code>中添加<code class="docutils literal"><span class="pre">use_mkl_packed</span></code>的flag,用于选择是否使用相关功能,并且当编译时<code class="docutils literal"><span class="pre">WITH_MKL=ON</span></code>的情况下,默认设置为<code class="docutils literal"><span class="pre">true</span></code></p>
<p>同时,在<code class="docutils literal"><span class="pre">python/paddle/trainer/config_parser.py</span></code>中对应的layer处,添加<code class="docutils literal"><span class="pre">use_mkl_packed</span></code>这个选择,方便用户在Python端选择是否启用这个功能。</p>
<p>具体实现方式比如:</p>
<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">use_mkl_packed</span> <span class="o">=</span> <span class="nb">bool</span><span class="p">(</span><span class="nb">int</span><span class="p">(</span><span class="n">g_command_config_args</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">&quot;use_mkl_packed&quot;</span><span class="p">,</span> <span class="mi">0</span><span class="p">)))</span>
<span class="k">if</span> <span class="n">use_mkl_packed</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">layer_type</span> <span class="o">=</span> <span class="n">mkl_packed_</span><span class="o">*</span>
</pre></div>
</div>
<p>所有相关的<code class="docutils literal"><span class="pre">layer_type</span></code>会以*mkl_packed_*开头,这些会在<code class="docutils literal"><span class="pre">MKLPacked*Layer</span></code>注册layer的时候保证,以示区分。</p>
</div>
<div class="section" id="benchmarking">
<span id="benchmarking"></span><h3>Benchmarking<a class="headerlink" href="#benchmarking" title="永久链接至标题"></a></h3>
......
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册