Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
Paddle
提交
5123156e
P
Paddle
项目概览
PaddlePaddle
/
Paddle
大约 1 年 前同步成功
通知
2298
Star
20931
Fork
5422
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
1423
列表
看板
标记
里程碑
合并请求
543
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
Paddle
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
1,423
Issue
1,423
列表
看板
标记
里程碑
合并请求
543
合并请求
543
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
5123156e
编写于
12月 19, 2017
作者:
T
Travis CI
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
Deploy to GitHub Pages:
5f90a31f
上级
a45a5160
变更
10
展开全部
隐藏空白更改
内联
并排
Showing
10 changed file
with
224 addition
and
24 deletion
+224
-24
develop/doc/_sources/api/v2/fluid/layers.rst.txt
develop/doc/_sources/api/v2/fluid/layers.rst.txt
+4
-0
develop/doc/_sources/design/mkl/mkl_packed.md.txt
develop/doc/_sources/design/mkl/mkl_packed.md.txt
+18
-5
develop/doc/api/v2/fluid/layers.html
develop/doc/api/v2/fluid/layers.html
+73
-0
develop/doc/design/mkl/mkl_packed.html
develop/doc/design/mkl/mkl_packed.html
+16
-6
develop/doc/searchindex.js
develop/doc/searchindex.js
+1
-1
develop/doc_cn/_sources/api/v2/fluid/layers.rst.txt
develop/doc_cn/_sources/api/v2/fluid/layers.rst.txt
+4
-0
develop/doc_cn/_sources/design/mkl/mkl_packed.md.txt
develop/doc_cn/_sources/design/mkl/mkl_packed.md.txt
+18
-5
develop/doc_cn/api/v2/fluid/layers.html
develop/doc_cn/api/v2/fluid/layers.html
+73
-0
develop/doc_cn/design/mkl/mkl_packed.html
develop/doc_cn/design/mkl/mkl_packed.html
+16
-6
develop/doc_cn/searchindex.js
develop/doc_cn/searchindex.js
+1
-1
未找到文件。
develop/doc/_sources/api/v2/fluid/layers.rst.txt
浏览文件 @
5123156e
...
...
@@ -300,3 +300,7 @@ conv2d_transpose
.. autofunction:: paddle.v2.fluid.layers.conv2d_transpose
:noindex:
sequence_expand
---------
.. autofunction:: paddle.v2.fluid.layers.sequence_expand
:noindex:
develop/doc/_sources/design/mkl/mkl_packed.md.txt
浏览文件 @
5123156e
...
...
@@ -30,10 +30,10 @@
由于在现有的某些情况下(例如RNN),多次调用 cblas_?gemm 会使用相同的原数据,因此,每次调用时对原数据的重复Packing便成为了冗余。
为了最大程度减少多次调用 cblas_?gemm 在Packing上的耗时,Intel® MKL 引入了以下四个API:
*
cblas_?gemm_alloc
*
cblas_?gemm_pack
*
cblas_?gemm_compute
*
cblas_?gemm_free
*
[cblas_?gemm_alloc](https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-alloc)
*
[cblas_?gemm_pack](https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-pack)
*
[cblas_?gemm_compute](https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-compute)
*
[cblas_?gemm_free](https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-free)
通过使用这些API,我们可以先完成对原数据的Packing操作,再把已转换为Packed格式的数据传递给那些复用同一数据的gemm_compute函数,从而避免了Packing冗余。
...
...
@@ -84,7 +84,20 @@ PaddlePaddle/Paddle
2. 对比优化后layer与相对应的PaddlePaddle原有layer, 在batch mode下的结果。
### Python API
TBD
计划在`paddle/utils.Flags`中添加`use_mkl_packed`的flag,用于选择是否使用相关功能,并且当编译时`WITH_MKL=ON`的情况下,默认设置为`true`。
同时,在`python/paddle/trainer/config_parser.py`中对应的layer处,添加`use_mkl_packed`这个选择,方便用户在Python端选择是否启用这个功能。
具体实现方式比如:
```python
use_mkl_packed = bool(int(g_command_config_args.get("use_mkl_packed", 0)))
if use_mkl_packed:
self.layer_type = mkl_packed_*
```
所有相关的`layer_type`会以*mkl_packed_*开头,这些会在`MKLPacked*Layer`注册layer的时候保证,以示区分。
### Benchmarking
会添加相应的脚本用于测试和对比在使用MKL Packed recurrent layers 前后的网络性能。
...
...
develop/doc/api/v2/fluid/layers.html
浏览文件 @
5123156e
...
...
@@ -1065,6 +1065,79 @@ stride_H = stride_W = stride.</li>
</table>
</dd></dl>
</div>
<div
class=
"section"
id=
"sequence-expand"
>
<h2>
sequence_expand
<a
class=
"headerlink"
href=
"#sequence-expand"
title=
"Permalink to this headline"
>
¶
</a></h2>
<dl
class=
"function"
>
<dt>
<code
class=
"descclassname"
>
paddle.v2.fluid.layers.
</code><code
class=
"descname"
>
sequence_expand
</code><span
class=
"sig-paren"
>
(
</span><em>
x
</em>
,
<em>
y
</em>
,
<em>
main_program=None
</em>
,
<em>
startup_program=None
</em><span
class=
"sig-paren"
>
)
</span></dt>
<dd><p>
Sequence Expand Layer. This layer will expand the input variable
<strong>
x
</strong>
according to LoD information of
<strong>
y
</strong>
. And the following examples will
explain how sequence_expand works:
</p>
<div
class=
"highlight-text"
><div
class=
"highlight"
><pre><span></span>
* Case 1
x is a LoDTensor:
x.lod = [[0, 2, 3],
[0, 1, 3, 4]]
x.data = [a, b, c, d]
x.dims = [4, 1]
y is a LoDTensor:
y.lod = [[0, 2, 4],
[0, 3, 6, 7, 8]]
with condition len(y.lod[-1]) - 1 == x.dims[0]
then output is a 2-level LoDTensor:
out.lod = [[0, 2, 4],
[0, 3, 6, 7, 8]]
out.data = [a, a, a, b, b, b, c, d]
out.dims = [8, 1]
* Case 2
x is a Tensor:
x.data = [a, b, c]
x.dims = [3, 1]
y is a LoDTensor:
y.lod = [[0, 2, 3, 6]]
with condition len(y.lod[-1]) - 1 == x.dims[0]
then output is a 1-level LoDTensor:
out.lod = [[0, 2, 3, 6]]
out.data = [a, a, b, c, c, c]
out.dims = [6, 1]
</pre></div>
</div>
<table
class=
"docutils field-list"
frame=
"void"
rules=
"none"
>
<col
class=
"field-name"
/>
<col
class=
"field-body"
/>
<tbody
valign=
"top"
>
<tr
class=
"field-odd field"
><th
class=
"field-name"
>
Parameters:
</th><td
class=
"field-body"
><ul
class=
"first simple"
>
<li><strong>
x
</strong>
(
<em>
Variable
</em>
)
–
The input variable which is a Tensor or LoDTensor.
</li>
<li><strong>
y
</strong>
(
<em>
Variable
</em>
)
–
The input variable which is a LoDTensor.
</li>
<li><strong>
main_program
</strong>
(
<em>
Program
</em>
)
–
The main program.
</li>
<li><strong>
startup_program
</strong>
(
<em>
Program
</em>
)
–
The startup program.
</li>
</ul>
</td>
</tr>
<tr
class=
"field-even field"
><th
class=
"field-name"
>
Returns:
</th><td
class=
"field-body"
><p
class=
"first"
>
The expanded variable which is a LoDTensor.
</p>
</td>
</tr>
<tr
class=
"field-odd field"
><th
class=
"field-name"
>
Return type:
</th><td
class=
"field-body"
><p
class=
"first last"
>
Variable
</p>
</td>
</tr>
</tbody>
</table>
<p
class=
"rubric"
>
Examples
</p>
<div
class=
"highlight-python"
><div
class=
"highlight"
><pre><span></span><span
class=
"n"
>
x
</span>
<span
class=
"o"
>
=
</span>
<span
class=
"n"
>
fluid
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
layers
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
data
</span><span
class=
"p"
>
(
</span><span
class=
"n"
>
name
</span><span
class=
"o"
>
=
</span><span
class=
"s1"
>
'
x
'
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
shape
</span><span
class=
"o"
>
=
</span><span
class=
"p"
>
[
</span><span
class=
"mi"
>
10
</span><span
class=
"p"
>
],
</span>
<span
class=
"n"
>
dtype
</span><span
class=
"o"
>
=
</span><span
class=
"s1"
>
'
float32
'
</span><span
class=
"p"
>
)
</span>
<span
class=
"n"
>
y
</span>
<span
class=
"o"
>
=
</span>
<span
class=
"n"
>
fluid
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
layers
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
data
</span><span
class=
"p"
>
(
</span><span
class=
"n"
>
name
</span><span
class=
"o"
>
=
</span><span
class=
"s1"
>
'
y
'
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
shape
</span><span
class=
"o"
>
=
</span><span
class=
"p"
>
[
</span><span
class=
"mi"
>
10
</span><span
class=
"p"
>
,
</span>
<span
class=
"mi"
>
20
</span><span
class=
"p"
>
],
</span>
<span
class=
"n"
>
dtype
</span><span
class=
"o"
>
=
</span><span
class=
"s1"
>
'
float32
'
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
lod_level
</span><span
class=
"o"
>
=
</span><span
class=
"mi"
>
1
</span><span
class=
"p"
>
)
</span>
<span
class=
"n"
>
out
</span>
<span
class=
"o"
>
=
</span>
<span
class=
"n"
>
layers
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
sequence_expand
</span><span
class=
"p"
>
(
</span><span
class=
"n"
>
x
</span><span
class=
"o"
>
=
</span><span
class=
"n"
>
x
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
y
</span><span
class=
"o"
>
=
</span><span
class=
"n"
>
y
</span><span
class=
"p"
>
)
</span>
</pre></div>
</div>
</dd></dl>
</div>
</div>
...
...
develop/doc/design/mkl/mkl_packed.html
浏览文件 @
5123156e
...
...
@@ -238,12 +238,14 @@
<li>
转换冗余 由于在现有的某些情况下(例如RNN),多次调用 cblas_?gemm 会使用相同的原数据,因此,每次调用时对原数据的重复Packing便成为了冗余。
</li>
</ol>
<p>
为了最大程度减少多次调用 cblas_?gemm 在Packing上的耗时,Intel® MKL 引入了以下四个API:
</p>
<ul
class=
"simple"
>
<li>
cblas_?gemm_alloc
</li>
<li>
cblas_?gemm_pack
</li>
<li>
cblas_?gemm_compute
</li>
<li>
cblas_?gemm_free
</li>
<div
class=
"toctree-wrapper compound"
>
<ul>
<li
class=
"toctree-l1"
><a
class=
"reference external"
href=
"https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-alloc"
>
cblas
</a></li>
<li
class=
"toctree-l1"
><a
class=
"reference external"
href=
"https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-pack"
>
cblas
</a></li>
<li
class=
"toctree-l1"
><a
class=
"reference external"
href=
"https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-compute"
>
cblas
</a></li>
<li
class=
"toctree-l1"
><a
class=
"reference external"
href=
"https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-free"
>
cblas
</a></li>
</ul>
</div>
<p>
通过使用这些API,我们可以先完成对原数据的Packing操作,再把已转换为Packed格式的数据传递给那些复用同一数据的gemm_compute函数,从而避免了Packing冗余。
</p>
</div>
<div
class=
"section"
id=
"solution"
>
...
...
@@ -303,7 +305,15 @@
</div>
<div
class=
"section"
id=
"python-api"
>
<span
id=
"python-api"
></span><h3>
Python API
<a
class=
"headerlink"
href=
"#python-api"
title=
"Permalink to this headline"
>
¶
</a></h3>
<p>
TBD
</p>
<p>
计划在
<code
class=
"docutils literal"
><span
class=
"pre"
>
paddle/utils.Flags
</span></code>
中添加
<code
class=
"docutils literal"
><span
class=
"pre"
>
use_mkl_packed
</span></code>
的flag,用于选择是否使用相关功能,并且当编译时
<code
class=
"docutils literal"
><span
class=
"pre"
>
WITH_MKL=ON
</span></code>
的情况下,默认设置为
<code
class=
"docutils literal"
><span
class=
"pre"
>
true
</span></code>
。
</p>
<p>
同时,在
<code
class=
"docutils literal"
><span
class=
"pre"
>
python/paddle/trainer/config_parser.py
</span></code>
中对应的layer处,添加
<code
class=
"docutils literal"
><span
class=
"pre"
>
use_mkl_packed
</span></code>
这个选择,方便用户在Python端选择是否启用这个功能。
</p>
<p>
具体实现方式比如:
</p>
<div
class=
"highlight-python"
><div
class=
"highlight"
><pre><span></span><span
class=
"n"
>
use_mkl_packed
</span>
<span
class=
"o"
>
=
</span>
<span
class=
"nb"
>
bool
</span><span
class=
"p"
>
(
</span><span
class=
"nb"
>
int
</span><span
class=
"p"
>
(
</span><span
class=
"n"
>
g_command_config_args
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
get
</span><span
class=
"p"
>
(
</span><span
class=
"s2"
>
"
use_mkl_packed
"
</span><span
class=
"p"
>
,
</span>
<span
class=
"mi"
>
0
</span><span
class=
"p"
>
)))
</span>
<span
class=
"k"
>
if
</span>
<span
class=
"n"
>
use_mkl_packed
</span><span
class=
"p"
>
:
</span>
<span
class=
"bp"
>
self
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
layer_type
</span>
<span
class=
"o"
>
=
</span>
<span
class=
"n"
>
mkl_packed_
</span><span
class=
"o"
>
*
</span>
</pre></div>
</div>
<p>
所有相关的
<code
class=
"docutils literal"
><span
class=
"pre"
>
layer_type
</span></code>
会以*mkl_packed_*开头,这些会在
<code
class=
"docutils literal"
><span
class=
"pre"
>
MKLPacked*Layer
</span></code>
注册layer的时候保证,以示区分。
</p>
</div>
<div
class=
"section"
id=
"benchmarking"
>
<span
id=
"benchmarking"
></span><h3>
Benchmarking
<a
class=
"headerlink"
href=
"#benchmarking"
title=
"Permalink to this headline"
>
¶
</a></h3>
...
...
develop/doc/searchindex.js
浏览文件 @
5123156e
因为 它太大了无法显示 source diff 。你可以改为
查看blob
。
develop/doc_cn/_sources/api/v2/fluid/layers.rst.txt
浏览文件 @
5123156e
...
...
@@ -300,3 +300,7 @@ conv2d_transpose
.. autofunction:: paddle.v2.fluid.layers.conv2d_transpose
:noindex:
sequence_expand
---------
.. autofunction:: paddle.v2.fluid.layers.sequence_expand
:noindex:
develop/doc_cn/_sources/design/mkl/mkl_packed.md.txt
浏览文件 @
5123156e
...
...
@@ -30,10 +30,10 @@
由于在现有的某些情况下(例如RNN),多次调用 cblas_?gemm 会使用相同的原数据,因此,每次调用时对原数据的重复Packing便成为了冗余。
为了最大程度减少多次调用 cblas_?gemm 在Packing上的耗时,Intel® MKL 引入了以下四个API:
*
cblas_?gemm_alloc
*
cblas_?gemm_pack
*
cblas_?gemm_compute
*
cblas_?gemm_free
*
[cblas_?gemm_alloc](https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-alloc)
*
[cblas_?gemm_pack](https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-pack)
*
[cblas_?gemm_compute](https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-compute)
*
[cblas_?gemm_free](https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-free)
通过使用这些API,我们可以先完成对原数据的Packing操作,再把已转换为Packed格式的数据传递给那些复用同一数据的gemm_compute函数,从而避免了Packing冗余。
...
...
@@ -84,7 +84,20 @@ PaddlePaddle/Paddle
2. 对比优化后layer与相对应的PaddlePaddle原有layer, 在batch mode下的结果。
### Python API
TBD
计划在`paddle/utils.Flags`中添加`use_mkl_packed`的flag,用于选择是否使用相关功能,并且当编译时`WITH_MKL=ON`的情况下,默认设置为`true`。
同时,在`python/paddle/trainer/config_parser.py`中对应的layer处,添加`use_mkl_packed`这个选择,方便用户在Python端选择是否启用这个功能。
具体实现方式比如:
```python
use_mkl_packed = bool(int(g_command_config_args.get("use_mkl_packed", 0)))
if use_mkl_packed:
self.layer_type = mkl_packed_*
```
所有相关的`layer_type`会以*mkl_packed_*开头,这些会在`MKLPacked*Layer`注册layer的时候保证,以示区分。
### Benchmarking
会添加相应的脚本用于测试和对比在使用MKL Packed recurrent layers 前后的网络性能。
...
...
develop/doc_cn/api/v2/fluid/layers.html
浏览文件 @
5123156e
...
...
@@ -1084,6 +1084,79 @@ stride_H = stride_W = stride.</li>
</table>
</dd></dl>
</div>
<div
class=
"section"
id=
"sequence-expand"
>
<h2>
sequence_expand
<a
class=
"headerlink"
href=
"#sequence-expand"
title=
"永久链接至标题"
>
¶
</a></h2>
<dl
class=
"function"
>
<dt>
<code
class=
"descclassname"
>
paddle.v2.fluid.layers.
</code><code
class=
"descname"
>
sequence_expand
</code><span
class=
"sig-paren"
>
(
</span><em>
x
</em>
,
<em>
y
</em>
,
<em>
main_program=None
</em>
,
<em>
startup_program=None
</em><span
class=
"sig-paren"
>
)
</span></dt>
<dd><p>
Sequence Expand Layer. This layer will expand the input variable
<strong>
x
</strong>
according to LoD information of
<strong>
y
</strong>
. And the following examples will
explain how sequence_expand works:
</p>
<div
class=
"highlight-text"
><div
class=
"highlight"
><pre><span></span>
* Case 1
x is a LoDTensor:
x.lod = [[0, 2, 3],
[0, 1, 3, 4]]
x.data = [a, b, c, d]
x.dims = [4, 1]
y is a LoDTensor:
y.lod = [[0, 2, 4],
[0, 3, 6, 7, 8]]
with condition len(y.lod[-1]) - 1 == x.dims[0]
then output is a 2-level LoDTensor:
out.lod = [[0, 2, 4],
[0, 3, 6, 7, 8]]
out.data = [a, a, a, b, b, b, c, d]
out.dims = [8, 1]
* Case 2
x is a Tensor:
x.data = [a, b, c]
x.dims = [3, 1]
y is a LoDTensor:
y.lod = [[0, 2, 3, 6]]
with condition len(y.lod[-1]) - 1 == x.dims[0]
then output is a 1-level LoDTensor:
out.lod = [[0, 2, 3, 6]]
out.data = [a, a, b, c, c, c]
out.dims = [6, 1]
</pre></div>
</div>
<table
class=
"docutils field-list"
frame=
"void"
rules=
"none"
>
<col
class=
"field-name"
/>
<col
class=
"field-body"
/>
<tbody
valign=
"top"
>
<tr
class=
"field-odd field"
><th
class=
"field-name"
>
参数:
</th><td
class=
"field-body"
><ul
class=
"first simple"
>
<li><strong>
x
</strong>
(
<em>
Variable
</em>
)
–
The input variable which is a Tensor or LoDTensor.
</li>
<li><strong>
y
</strong>
(
<em>
Variable
</em>
)
–
The input variable which is a LoDTensor.
</li>
<li><strong>
main_program
</strong>
(
<em>
Program
</em>
)
–
The main program.
</li>
<li><strong>
startup_program
</strong>
(
<em>
Program
</em>
)
–
The startup program.
</li>
</ul>
</td>
</tr>
<tr
class=
"field-even field"
><th
class=
"field-name"
>
返回:
</th><td
class=
"field-body"
><p
class=
"first"
>
The expanded variable which is a LoDTensor.
</p>
</td>
</tr>
<tr
class=
"field-odd field"
><th
class=
"field-name"
>
返回类型:
</th><td
class=
"field-body"
><p
class=
"first last"
>
Variable
</p>
</td>
</tr>
</tbody>
</table>
<p
class=
"rubric"
>
Examples
</p>
<div
class=
"highlight-python"
><div
class=
"highlight"
><pre><span></span><span
class=
"n"
>
x
</span>
<span
class=
"o"
>
=
</span>
<span
class=
"n"
>
fluid
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
layers
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
data
</span><span
class=
"p"
>
(
</span><span
class=
"n"
>
name
</span><span
class=
"o"
>
=
</span><span
class=
"s1"
>
'
x
'
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
shape
</span><span
class=
"o"
>
=
</span><span
class=
"p"
>
[
</span><span
class=
"mi"
>
10
</span><span
class=
"p"
>
],
</span>
<span
class=
"n"
>
dtype
</span><span
class=
"o"
>
=
</span><span
class=
"s1"
>
'
float32
'
</span><span
class=
"p"
>
)
</span>
<span
class=
"n"
>
y
</span>
<span
class=
"o"
>
=
</span>
<span
class=
"n"
>
fluid
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
layers
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
data
</span><span
class=
"p"
>
(
</span><span
class=
"n"
>
name
</span><span
class=
"o"
>
=
</span><span
class=
"s1"
>
'
y
'
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
shape
</span><span
class=
"o"
>
=
</span><span
class=
"p"
>
[
</span><span
class=
"mi"
>
10
</span><span
class=
"p"
>
,
</span>
<span
class=
"mi"
>
20
</span><span
class=
"p"
>
],
</span>
<span
class=
"n"
>
dtype
</span><span
class=
"o"
>
=
</span><span
class=
"s1"
>
'
float32
'
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
lod_level
</span><span
class=
"o"
>
=
</span><span
class=
"mi"
>
1
</span><span
class=
"p"
>
)
</span>
<span
class=
"n"
>
out
</span>
<span
class=
"o"
>
=
</span>
<span
class=
"n"
>
layers
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
sequence_expand
</span><span
class=
"p"
>
(
</span><span
class=
"n"
>
x
</span><span
class=
"o"
>
=
</span><span
class=
"n"
>
x
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
y
</span><span
class=
"o"
>
=
</span><span
class=
"n"
>
y
</span><span
class=
"p"
>
)
</span>
</pre></div>
</div>
</dd></dl>
</div>
</div>
...
...
develop/doc_cn/design/mkl/mkl_packed.html
浏览文件 @
5123156e
...
...
@@ -257,12 +257,14 @@
<li>
转换冗余 由于在现有的某些情况下(例如RNN),多次调用 cblas_?gemm 会使用相同的原数据,因此,每次调用时对原数据的重复Packing便成为了冗余。
</li>
</ol>
<p>
为了最大程度减少多次调用 cblas_?gemm 在Packing上的耗时,Intel® MKL 引入了以下四个API:
</p>
<ul
class=
"simple"
>
<li>
cblas_?gemm_alloc
</li>
<li>
cblas_?gemm_pack
</li>
<li>
cblas_?gemm_compute
</li>
<li>
cblas_?gemm_free
</li>
<div
class=
"toctree-wrapper compound"
>
<ul>
<li
class=
"toctree-l1"
><a
class=
"reference external"
href=
"https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-alloc"
>
cblas
</a></li>
<li
class=
"toctree-l1"
><a
class=
"reference external"
href=
"https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-pack"
>
cblas
</a></li>
<li
class=
"toctree-l1"
><a
class=
"reference external"
href=
"https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-compute"
>
cblas
</a></li>
<li
class=
"toctree-l1"
><a
class=
"reference external"
href=
"https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-free"
>
cblas
</a></li>
</ul>
</div>
<p>
通过使用这些API,我们可以先完成对原数据的Packing操作,再把已转换为Packed格式的数据传递给那些复用同一数据的gemm_compute函数,从而避免了Packing冗余。
</p>
</div>
<div
class=
"section"
id=
"solution"
>
...
...
@@ -322,7 +324,15 @@
</div>
<div
class=
"section"
id=
"python-api"
>
<span
id=
"python-api"
></span><h3>
Python API
<a
class=
"headerlink"
href=
"#python-api"
title=
"永久链接至标题"
>
¶
</a></h3>
<p>
TBD
</p>
<p>
计划在
<code
class=
"docutils literal"
><span
class=
"pre"
>
paddle/utils.Flags
</span></code>
中添加
<code
class=
"docutils literal"
><span
class=
"pre"
>
use_mkl_packed
</span></code>
的flag,用于选择是否使用相关功能,并且当编译时
<code
class=
"docutils literal"
><span
class=
"pre"
>
WITH_MKL=ON
</span></code>
的情况下,默认设置为
<code
class=
"docutils literal"
><span
class=
"pre"
>
true
</span></code>
。
</p>
<p>
同时,在
<code
class=
"docutils literal"
><span
class=
"pre"
>
python/paddle/trainer/config_parser.py
</span></code>
中对应的layer处,添加
<code
class=
"docutils literal"
><span
class=
"pre"
>
use_mkl_packed
</span></code>
这个选择,方便用户在Python端选择是否启用这个功能。
</p>
<p>
具体实现方式比如:
</p>
<div
class=
"highlight-python"
><div
class=
"highlight"
><pre><span></span><span
class=
"n"
>
use_mkl_packed
</span>
<span
class=
"o"
>
=
</span>
<span
class=
"nb"
>
bool
</span><span
class=
"p"
>
(
</span><span
class=
"nb"
>
int
</span><span
class=
"p"
>
(
</span><span
class=
"n"
>
g_command_config_args
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
get
</span><span
class=
"p"
>
(
</span><span
class=
"s2"
>
"
use_mkl_packed
"
</span><span
class=
"p"
>
,
</span>
<span
class=
"mi"
>
0
</span><span
class=
"p"
>
)))
</span>
<span
class=
"k"
>
if
</span>
<span
class=
"n"
>
use_mkl_packed
</span><span
class=
"p"
>
:
</span>
<span
class=
"bp"
>
self
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
layer_type
</span>
<span
class=
"o"
>
=
</span>
<span
class=
"n"
>
mkl_packed_
</span><span
class=
"o"
>
*
</span>
</pre></div>
</div>
<p>
所有相关的
<code
class=
"docutils literal"
><span
class=
"pre"
>
layer_type
</span></code>
会以*mkl_packed_*开头,这些会在
<code
class=
"docutils literal"
><span
class=
"pre"
>
MKLPacked*Layer
</span></code>
注册layer的时候保证,以示区分。
</p>
</div>
<div
class=
"section"
id=
"benchmarking"
>
<span
id=
"benchmarking"
></span><h3>
Benchmarking
<a
class=
"headerlink"
href=
"#benchmarking"
title=
"永久链接至标题"
>
¶
</a></h3>
...
...
develop/doc_cn/searchindex.js
浏览文件 @
5123156e
此差异已折叠。
点击以展开。
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录