Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
BaiXuePrincess
Paddle
提交
5123156e
P
Paddle
项目概览
BaiXuePrincess
/
Paddle
与 Fork 源项目一致
Fork自
PaddlePaddle / Paddle
通知
1
Star
1
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
Paddle
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
5123156e
编写于
12月 19, 2017
作者:
T
Travis CI
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
Deploy to GitHub Pages:
5f90a31f
上级
a45a5160
变更
10
隐藏空白更改
内联
并排
Showing
10 changed file
with
224 addition
and
24 deletion
+224
-24
develop/doc/_sources/api/v2/fluid/layers.rst.txt
develop/doc/_sources/api/v2/fluid/layers.rst.txt
+4
-0
develop/doc/_sources/design/mkl/mkl_packed.md.txt
develop/doc/_sources/design/mkl/mkl_packed.md.txt
+18
-5
develop/doc/api/v2/fluid/layers.html
develop/doc/api/v2/fluid/layers.html
+73
-0
develop/doc/design/mkl/mkl_packed.html
develop/doc/design/mkl/mkl_packed.html
+16
-6
develop/doc/searchindex.js
develop/doc/searchindex.js
+1
-1
develop/doc_cn/_sources/api/v2/fluid/layers.rst.txt
develop/doc_cn/_sources/api/v2/fluid/layers.rst.txt
+4
-0
develop/doc_cn/_sources/design/mkl/mkl_packed.md.txt
develop/doc_cn/_sources/design/mkl/mkl_packed.md.txt
+18
-5
develop/doc_cn/api/v2/fluid/layers.html
develop/doc_cn/api/v2/fluid/layers.html
+73
-0
develop/doc_cn/design/mkl/mkl_packed.html
develop/doc_cn/design/mkl/mkl_packed.html
+16
-6
develop/doc_cn/searchindex.js
develop/doc_cn/searchindex.js
+1
-1
未找到文件。
develop/doc/_sources/api/v2/fluid/layers.rst.txt
浏览文件 @
5123156e
...
...
@@ -300,3 +300,7 @@ conv2d_transpose
.. autofunction:: paddle.v2.fluid.layers.conv2d_transpose
:noindex:
sequence_expand
---------
.. autofunction:: paddle.v2.fluid.layers.sequence_expand
:noindex:
develop/doc/_sources/design/mkl/mkl_packed.md.txt
浏览文件 @
5123156e
...
...
@@ -30,10 +30,10 @@
由于在现有的某些情况下(例如RNN),多次调用 cblas_?gemm 会使用相同的原数据,因此,每次调用时对原数据的重复Packing便成为了冗余。
为了最大程度减少多次调用 cblas_?gemm 在Packing上的耗时,Intel® MKL 引入了以下四个API:
*
cblas_?gemm_alloc
*
cblas_?gemm_pack
*
cblas_?gemm_compute
*
cblas_?gemm_free
*
[cblas_?gemm_alloc](https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-alloc)
*
[cblas_?gemm_pack](https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-pack)
*
[cblas_?gemm_compute](https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-compute)
*
[cblas_?gemm_free](https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-free)
通过使用这些API,我们可以先完成对原数据的Packing操作,再把已转换为Packed格式的数据传递给那些复用同一数据的gemm_compute函数,从而避免了Packing冗余。
...
...
@@ -84,7 +84,20 @@ PaddlePaddle/Paddle
2. 对比优化后layer与相对应的PaddlePaddle原有layer, 在batch mode下的结果。
### Python API
TBD
计划在`paddle/utils.Flags`中添加`use_mkl_packed`的flag,用于选择是否使用相关功能,并且当编译时`WITH_MKL=ON`的情况下,默认设置为`true`。
同时,在`python/paddle/trainer/config_parser.py`中对应的layer处,添加`use_mkl_packed`这个选择,方便用户在Python端选择是否启用这个功能。
具体实现方式比如:
```python
use_mkl_packed = bool(int(g_command_config_args.get("use_mkl_packed", 0)))
if use_mkl_packed:
self.layer_type = mkl_packed_*
```
所有相关的`layer_type`会以*mkl_packed_*开头,这些会在`MKLPacked*Layer`注册layer的时候保证,以示区分。
### Benchmarking
会添加相应的脚本用于测试和对比在使用MKL Packed recurrent layers 前后的网络性能。
...
...
develop/doc/api/v2/fluid/layers.html
浏览文件 @
5123156e
...
...
@@ -1065,6 +1065,79 @@ stride_H = stride_W = stride.</li>
</table>
</dd></dl>
</div>
<div
class=
"section"
id=
"sequence-expand"
>
<h2>
sequence_expand
<a
class=
"headerlink"
href=
"#sequence-expand"
title=
"Permalink to this headline"
>
¶
</a></h2>
<dl
class=
"function"
>
<dt>
<code
class=
"descclassname"
>
paddle.v2.fluid.layers.
</code><code
class=
"descname"
>
sequence_expand
</code><span
class=
"sig-paren"
>
(
</span><em>
x
</em>
,
<em>
y
</em>
,
<em>
main_program=None
</em>
,
<em>
startup_program=None
</em><span
class=
"sig-paren"
>
)
</span></dt>
<dd><p>
Sequence Expand Layer. This layer will expand the input variable
<strong>
x
</strong>
according to LoD information of
<strong>
y
</strong>
. And the following examples will
explain how sequence_expand works:
</p>
<div
class=
"highlight-text"
><div
class=
"highlight"
><pre><span></span>
* Case 1
x is a LoDTensor:
x.lod = [[0, 2, 3],
[0, 1, 3, 4]]
x.data = [a, b, c, d]
x.dims = [4, 1]
y is a LoDTensor:
y.lod = [[0, 2, 4],
[0, 3, 6, 7, 8]]
with condition len(y.lod[-1]) - 1 == x.dims[0]
then output is a 2-level LoDTensor:
out.lod = [[0, 2, 4],
[0, 3, 6, 7, 8]]
out.data = [a, a, a, b, b, b, c, d]
out.dims = [8, 1]
* Case 2
x is a Tensor:
x.data = [a, b, c]
x.dims = [3, 1]
y is a LoDTensor:
y.lod = [[0, 2, 3, 6]]
with condition len(y.lod[-1]) - 1 == x.dims[0]
then output is a 1-level LoDTensor:
out.lod = [[0, 2, 3, 6]]
out.data = [a, a, b, c, c, c]
out.dims = [6, 1]
</pre></div>
</div>
<table
class=
"docutils field-list"
frame=
"void"
rules=
"none"
>
<col
class=
"field-name"
/>
<col
class=
"field-body"
/>
<tbody
valign=
"top"
>
<tr
class=
"field-odd field"
><th
class=
"field-name"
>
Parameters:
</th><td
class=
"field-body"
><ul
class=
"first simple"
>
<li><strong>
x
</strong>
(
<em>
Variable
</em>
)
–
The input variable which is a Tensor or LoDTensor.
</li>
<li><strong>
y
</strong>
(
<em>
Variable
</em>
)
–
The input variable which is a LoDTensor.
</li>
<li><strong>
main_program
</strong>
(
<em>
Program
</em>
)
–
The main program.
</li>
<li><strong>
startup_program
</strong>
(
<em>
Program
</em>
)
–
The startup program.
</li>
</ul>
</td>
</tr>
<tr
class=
"field-even field"
><th
class=
"field-name"
>
Returns:
</th><td
class=
"field-body"
><p
class=
"first"
>
The expanded variable which is a LoDTensor.
</p>
</td>
</tr>
<tr
class=
"field-odd field"
><th
class=
"field-name"
>
Return type:
</th><td
class=
"field-body"
><p
class=
"first last"
>
Variable
</p>
</td>
</tr>
</tbody>
</table>
<p
class=
"rubric"
>
Examples
</p>
<div
class=
"highlight-python"
><div
class=
"highlight"
><pre><span></span><span
class=
"n"
>
x
</span>
<span
class=
"o"
>
=
</span>
<span
class=
"n"
>
fluid
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
layers
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
data
</span><span
class=
"p"
>
(
</span><span
class=
"n"
>
name
</span><span
class=
"o"
>
=
</span><span
class=
"s1"
>
'
x
'
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
shape
</span><span
class=
"o"
>
=
</span><span
class=
"p"
>
[
</span><span
class=
"mi"
>
10
</span><span
class=
"p"
>
],
</span>
<span
class=
"n"
>
dtype
</span><span
class=
"o"
>
=
</span><span
class=
"s1"
>
'
float32
'
</span><span
class=
"p"
>
)
</span>
<span
class=
"n"
>
y
</span>
<span
class=
"o"
>
=
</span>
<span
class=
"n"
>
fluid
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
layers
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
data
</span><span
class=
"p"
>
(
</span><span
class=
"n"
>
name
</span><span
class=
"o"
>
=
</span><span
class=
"s1"
>
'
y
'
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
shape
</span><span
class=
"o"
>
=
</span><span
class=
"p"
>
[
</span><span
class=
"mi"
>
10
</span><span
class=
"p"
>
,
</span>
<span
class=
"mi"
>
20
</span><span
class=
"p"
>
],
</span>
<span
class=
"n"
>
dtype
</span><span
class=
"o"
>
=
</span><span
class=
"s1"
>
'
float32
'
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
lod_level
</span><span
class=
"o"
>
=
</span><span
class=
"mi"
>
1
</span><span
class=
"p"
>
)
</span>
<span
class=
"n"
>
out
</span>
<span
class=
"o"
>
=
</span>
<span
class=
"n"
>
layers
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
sequence_expand
</span><span
class=
"p"
>
(
</span><span
class=
"n"
>
x
</span><span
class=
"o"
>
=
</span><span
class=
"n"
>
x
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
y
</span><span
class=
"o"
>
=
</span><span
class=
"n"
>
y
</span><span
class=
"p"
>
)
</span>
</pre></div>
</div>
</dd></dl>
</div>
</div>
...
...
develop/doc/design/mkl/mkl_packed.html
浏览文件 @
5123156e
...
...
@@ -238,12 +238,14 @@
<li>
转换冗余 由于在现有的某些情况下(例如RNN),多次调用 cblas_?gemm 会使用相同的原数据,因此,每次调用时对原数据的重复Packing便成为了冗余。
</li>
</ol>
<p>
为了最大程度减少多次调用 cblas_?gemm 在Packing上的耗时,Intel® MKL 引入了以下四个API:
</p>
<ul
class=
"simple"
>
<li>
cblas_?gemm_alloc
</li>
<li>
cblas_?gemm_pack
</li>
<li>
cblas_?gemm_compute
</li>
<li>
cblas_?gemm_free
</li>
<div
class=
"toctree-wrapper compound"
>
<ul>
<li
class=
"toctree-l1"
><a
class=
"reference external"
href=
"https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-alloc"
>
cblas
</a></li>
<li
class=
"toctree-l1"
><a
class=
"reference external"
href=
"https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-pack"
>
cblas
</a></li>
<li
class=
"toctree-l1"
><a
class=
"reference external"
href=
"https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-compute"
>
cblas
</a></li>
<li
class=
"toctree-l1"
><a
class=
"reference external"
href=
"https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-free"
>
cblas
</a></li>
</ul>
</div>
<p>
通过使用这些API,我们可以先完成对原数据的Packing操作,再把已转换为Packed格式的数据传递给那些复用同一数据的gemm_compute函数,从而避免了Packing冗余。
</p>
</div>
<div
class=
"section"
id=
"solution"
>
...
...
@@ -303,7 +305,15 @@
</div>
<div
class=
"section"
id=
"python-api"
>
<span
id=
"python-api"
></span><h3>
Python API
<a
class=
"headerlink"
href=
"#python-api"
title=
"Permalink to this headline"
>
¶
</a></h3>
<p>
TBD
</p>
<p>
计划在
<code
class=
"docutils literal"
><span
class=
"pre"
>
paddle/utils.Flags
</span></code>
中添加
<code
class=
"docutils literal"
><span
class=
"pre"
>
use_mkl_packed
</span></code>
的flag,用于选择是否使用相关功能,并且当编译时
<code
class=
"docutils literal"
><span
class=
"pre"
>
WITH_MKL=ON
</span></code>
的情况下,默认设置为
<code
class=
"docutils literal"
><span
class=
"pre"
>
true
</span></code>
。
</p>
<p>
同时,在
<code
class=
"docutils literal"
><span
class=
"pre"
>
python/paddle/trainer/config_parser.py
</span></code>
中对应的layer处,添加
<code
class=
"docutils literal"
><span
class=
"pre"
>
use_mkl_packed
</span></code>
这个选择,方便用户在Python端选择是否启用这个功能。
</p>
<p>
具体实现方式比如:
</p>
<div
class=
"highlight-python"
><div
class=
"highlight"
><pre><span></span><span
class=
"n"
>
use_mkl_packed
</span>
<span
class=
"o"
>
=
</span>
<span
class=
"nb"
>
bool
</span><span
class=
"p"
>
(
</span><span
class=
"nb"
>
int
</span><span
class=
"p"
>
(
</span><span
class=
"n"
>
g_command_config_args
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
get
</span><span
class=
"p"
>
(
</span><span
class=
"s2"
>
"
use_mkl_packed
"
</span><span
class=
"p"
>
,
</span>
<span
class=
"mi"
>
0
</span><span
class=
"p"
>
)))
</span>
<span
class=
"k"
>
if
</span>
<span
class=
"n"
>
use_mkl_packed
</span><span
class=
"p"
>
:
</span>
<span
class=
"bp"
>
self
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
layer_type
</span>
<span
class=
"o"
>
=
</span>
<span
class=
"n"
>
mkl_packed_
</span><span
class=
"o"
>
*
</span>
</pre></div>
</div>
<p>
所有相关的
<code
class=
"docutils literal"
><span
class=
"pre"
>
layer_type
</span></code>
会以*mkl_packed_*开头,这些会在
<code
class=
"docutils literal"
><span
class=
"pre"
>
MKLPacked*Layer
</span></code>
注册layer的时候保证,以示区分。
</p>
</div>
<div
class=
"section"
id=
"benchmarking"
>
<span
id=
"benchmarking"
></span><h3>
Benchmarking
<a
class=
"headerlink"
href=
"#benchmarking"
title=
"Permalink to this headline"
>
¶
</a></h3>
...
...
develop/doc/searchindex.js
浏览文件 @
5123156e
因为 它太大了无法显示 source diff 。你可以改为
查看blob
。
develop/doc_cn/_sources/api/v2/fluid/layers.rst.txt
浏览文件 @
5123156e
...
...
@@ -300,3 +300,7 @@ conv2d_transpose
.. autofunction:: paddle.v2.fluid.layers.conv2d_transpose
:noindex:
sequence_expand
---------
.. autofunction:: paddle.v2.fluid.layers.sequence_expand
:noindex:
develop/doc_cn/_sources/design/mkl/mkl_packed.md.txt
浏览文件 @
5123156e
...
...
@@ -30,10 +30,10 @@
由于在现有的某些情况下(例如RNN),多次调用 cblas_?gemm 会使用相同的原数据,因此,每次调用时对原数据的重复Packing便成为了冗余。
为了最大程度减少多次调用 cblas_?gemm 在Packing上的耗时,Intel® MKL 引入了以下四个API:
*
cblas_?gemm_alloc
*
cblas_?gemm_pack
*
cblas_?gemm_compute
*
cblas_?gemm_free
*
[cblas_?gemm_alloc](https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-alloc)
*
[cblas_?gemm_pack](https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-pack)
*
[cblas_?gemm_compute](https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-compute)
*
[cblas_?gemm_free](https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-free)
通过使用这些API,我们可以先完成对原数据的Packing操作,再把已转换为Packed格式的数据传递给那些复用同一数据的gemm_compute函数,从而避免了Packing冗余。
...
...
@@ -84,7 +84,20 @@ PaddlePaddle/Paddle
2. 对比优化后layer与相对应的PaddlePaddle原有layer, 在batch mode下的结果。
### Python API
TBD
计划在`paddle/utils.Flags`中添加`use_mkl_packed`的flag,用于选择是否使用相关功能,并且当编译时`WITH_MKL=ON`的情况下,默认设置为`true`。
同时,在`python/paddle/trainer/config_parser.py`中对应的layer处,添加`use_mkl_packed`这个选择,方便用户在Python端选择是否启用这个功能。
具体实现方式比如:
```python
use_mkl_packed = bool(int(g_command_config_args.get("use_mkl_packed", 0)))
if use_mkl_packed:
self.layer_type = mkl_packed_*
```
所有相关的`layer_type`会以*mkl_packed_*开头,这些会在`MKLPacked*Layer`注册layer的时候保证,以示区分。
### Benchmarking
会添加相应的脚本用于测试和对比在使用MKL Packed recurrent layers 前后的网络性能。
...
...
develop/doc_cn/api/v2/fluid/layers.html
浏览文件 @
5123156e
...
...
@@ -1084,6 +1084,79 @@ stride_H = stride_W = stride.</li>
</table>
</dd></dl>
</div>
<div
class=
"section"
id=
"sequence-expand"
>
<h2>
sequence_expand
<a
class=
"headerlink"
href=
"#sequence-expand"
title=
"永久链接至标题"
>
¶
</a></h2>
<dl
class=
"function"
>
<dt>
<code
class=
"descclassname"
>
paddle.v2.fluid.layers.
</code><code
class=
"descname"
>
sequence_expand
</code><span
class=
"sig-paren"
>
(
</span><em>
x
</em>
,
<em>
y
</em>
,
<em>
main_program=None
</em>
,
<em>
startup_program=None
</em><span
class=
"sig-paren"
>
)
</span></dt>
<dd><p>
Sequence Expand Layer. This layer will expand the input variable
<strong>
x
</strong>
according to LoD information of
<strong>
y
</strong>
. And the following examples will
explain how sequence_expand works:
</p>
<div
class=
"highlight-text"
><div
class=
"highlight"
><pre><span></span>
* Case 1
x is a LoDTensor:
x.lod = [[0, 2, 3],
[0, 1, 3, 4]]
x.data = [a, b, c, d]
x.dims = [4, 1]
y is a LoDTensor:
y.lod = [[0, 2, 4],
[0, 3, 6, 7, 8]]
with condition len(y.lod[-1]) - 1 == x.dims[0]
then output is a 2-level LoDTensor:
out.lod = [[0, 2, 4],
[0, 3, 6, 7, 8]]
out.data = [a, a, a, b, b, b, c, d]
out.dims = [8, 1]
* Case 2
x is a Tensor:
x.data = [a, b, c]
x.dims = [3, 1]
y is a LoDTensor:
y.lod = [[0, 2, 3, 6]]
with condition len(y.lod[-1]) - 1 == x.dims[0]
then output is a 1-level LoDTensor:
out.lod = [[0, 2, 3, 6]]
out.data = [a, a, b, c, c, c]
out.dims = [6, 1]
</pre></div>
</div>
<table
class=
"docutils field-list"
frame=
"void"
rules=
"none"
>
<col
class=
"field-name"
/>
<col
class=
"field-body"
/>
<tbody
valign=
"top"
>
<tr
class=
"field-odd field"
><th
class=
"field-name"
>
参数:
</th><td
class=
"field-body"
><ul
class=
"first simple"
>
<li><strong>
x
</strong>
(
<em>
Variable
</em>
)
–
The input variable which is a Tensor or LoDTensor.
</li>
<li><strong>
y
</strong>
(
<em>
Variable
</em>
)
–
The input variable which is a LoDTensor.
</li>
<li><strong>
main_program
</strong>
(
<em>
Program
</em>
)
–
The main program.
</li>
<li><strong>
startup_program
</strong>
(
<em>
Program
</em>
)
–
The startup program.
</li>
</ul>
</td>
</tr>
<tr
class=
"field-even field"
><th
class=
"field-name"
>
返回:
</th><td
class=
"field-body"
><p
class=
"first"
>
The expanded variable which is a LoDTensor.
</p>
</td>
</tr>
<tr
class=
"field-odd field"
><th
class=
"field-name"
>
返回类型:
</th><td
class=
"field-body"
><p
class=
"first last"
>
Variable
</p>
</td>
</tr>
</tbody>
</table>
<p
class=
"rubric"
>
Examples
</p>
<div
class=
"highlight-python"
><div
class=
"highlight"
><pre><span></span><span
class=
"n"
>
x
</span>
<span
class=
"o"
>
=
</span>
<span
class=
"n"
>
fluid
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
layers
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
data
</span><span
class=
"p"
>
(
</span><span
class=
"n"
>
name
</span><span
class=
"o"
>
=
</span><span
class=
"s1"
>
'
x
'
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
shape
</span><span
class=
"o"
>
=
</span><span
class=
"p"
>
[
</span><span
class=
"mi"
>
10
</span><span
class=
"p"
>
],
</span>
<span
class=
"n"
>
dtype
</span><span
class=
"o"
>
=
</span><span
class=
"s1"
>
'
float32
'
</span><span
class=
"p"
>
)
</span>
<span
class=
"n"
>
y
</span>
<span
class=
"o"
>
=
</span>
<span
class=
"n"
>
fluid
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
layers
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
data
</span><span
class=
"p"
>
(
</span><span
class=
"n"
>
name
</span><span
class=
"o"
>
=
</span><span
class=
"s1"
>
'
y
'
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
shape
</span><span
class=
"o"
>
=
</span><span
class=
"p"
>
[
</span><span
class=
"mi"
>
10
</span><span
class=
"p"
>
,
</span>
<span
class=
"mi"
>
20
</span><span
class=
"p"
>
],
</span>
<span
class=
"n"
>
dtype
</span><span
class=
"o"
>
=
</span><span
class=
"s1"
>
'
float32
'
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
lod_level
</span><span
class=
"o"
>
=
</span><span
class=
"mi"
>
1
</span><span
class=
"p"
>
)
</span>
<span
class=
"n"
>
out
</span>
<span
class=
"o"
>
=
</span>
<span
class=
"n"
>
layers
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
sequence_expand
</span><span
class=
"p"
>
(
</span><span
class=
"n"
>
x
</span><span
class=
"o"
>
=
</span><span
class=
"n"
>
x
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
y
</span><span
class=
"o"
>
=
</span><span
class=
"n"
>
y
</span><span
class=
"p"
>
)
</span>
</pre></div>
</div>
</dd></dl>
</div>
</div>
...
...
develop/doc_cn/design/mkl/mkl_packed.html
浏览文件 @
5123156e
...
...
@@ -257,12 +257,14 @@
<li>
转换冗余 由于在现有的某些情况下(例如RNN),多次调用 cblas_?gemm 会使用相同的原数据,因此,每次调用时对原数据的重复Packing便成为了冗余。
</li>
</ol>
<p>
为了最大程度减少多次调用 cblas_?gemm 在Packing上的耗时,Intel® MKL 引入了以下四个API:
</p>
<ul
class=
"simple"
>
<li>
cblas_?gemm_alloc
</li>
<li>
cblas_?gemm_pack
</li>
<li>
cblas_?gemm_compute
</li>
<li>
cblas_?gemm_free
</li>
<div
class=
"toctree-wrapper compound"
>
<ul>
<li
class=
"toctree-l1"
><a
class=
"reference external"
href=
"https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-alloc"
>
cblas
</a></li>
<li
class=
"toctree-l1"
><a
class=
"reference external"
href=
"https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-pack"
>
cblas
</a></li>
<li
class=
"toctree-l1"
><a
class=
"reference external"
href=
"https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-compute"
>
cblas
</a></li>
<li
class=
"toctree-l1"
><a
class=
"reference external"
href=
"https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-free"
>
cblas
</a></li>
</ul>
</div>
<p>
通过使用这些API,我们可以先完成对原数据的Packing操作,再把已转换为Packed格式的数据传递给那些复用同一数据的gemm_compute函数,从而避免了Packing冗余。
</p>
</div>
<div
class=
"section"
id=
"solution"
>
...
...
@@ -322,7 +324,15 @@
</div>
<div
class=
"section"
id=
"python-api"
>
<span
id=
"python-api"
></span><h3>
Python API
<a
class=
"headerlink"
href=
"#python-api"
title=
"永久链接至标题"
>
¶
</a></h3>
<p>
TBD
</p>
<p>
计划在
<code
class=
"docutils literal"
><span
class=
"pre"
>
paddle/utils.Flags
</span></code>
中添加
<code
class=
"docutils literal"
><span
class=
"pre"
>
use_mkl_packed
</span></code>
的flag,用于选择是否使用相关功能,并且当编译时
<code
class=
"docutils literal"
><span
class=
"pre"
>
WITH_MKL=ON
</span></code>
的情况下,默认设置为
<code
class=
"docutils literal"
><span
class=
"pre"
>
true
</span></code>
。
</p>
<p>
同时,在
<code
class=
"docutils literal"
><span
class=
"pre"
>
python/paddle/trainer/config_parser.py
</span></code>
中对应的layer处,添加
<code
class=
"docutils literal"
><span
class=
"pre"
>
use_mkl_packed
</span></code>
这个选择,方便用户在Python端选择是否启用这个功能。
</p>
<p>
具体实现方式比如:
</p>
<div
class=
"highlight-python"
><div
class=
"highlight"
><pre><span></span><span
class=
"n"
>
use_mkl_packed
</span>
<span
class=
"o"
>
=
</span>
<span
class=
"nb"
>
bool
</span><span
class=
"p"
>
(
</span><span
class=
"nb"
>
int
</span><span
class=
"p"
>
(
</span><span
class=
"n"
>
g_command_config_args
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
get
</span><span
class=
"p"
>
(
</span><span
class=
"s2"
>
"
use_mkl_packed
"
</span><span
class=
"p"
>
,
</span>
<span
class=
"mi"
>
0
</span><span
class=
"p"
>
)))
</span>
<span
class=
"k"
>
if
</span>
<span
class=
"n"
>
use_mkl_packed
</span><span
class=
"p"
>
:
</span>
<span
class=
"bp"
>
self
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
layer_type
</span>
<span
class=
"o"
>
=
</span>
<span
class=
"n"
>
mkl_packed_
</span><span
class=
"o"
>
*
</span>
</pre></div>
</div>
<p>
所有相关的
<code
class=
"docutils literal"
><span
class=
"pre"
>
layer_type
</span></code>
会以*mkl_packed_*开头,这些会在
<code
class=
"docutils literal"
><span
class=
"pre"
>
MKLPacked*Layer
</span></code>
注册layer的时候保证,以示区分。
</p>
</div>
<div
class=
"section"
id=
"benchmarking"
>
<span
id=
"benchmarking"
></span><h3>
Benchmarking
<a
class=
"headerlink"
href=
"#benchmarking"
title=
"永久链接至标题"
>
¶
</a></h3>
...
...
develop/doc_cn/searchindex.js
浏览文件 @
5123156e
因为 它太大了无法显示 source diff 。你可以改为
查看blob
。
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录