diff --git a/develop/doc/_sources/api/v2/fluid/layers.rst.txt b/develop/doc/_sources/api/v2/fluid/layers.rst.txt index 89e5fec13bf9062dc7a7187b1334c8f5486a980b..9f3669e11583a4ed6467f1a1bb509481fdf0b9d1 100644 --- a/develop/doc/_sources/api/v2/fluid/layers.rst.txt +++ b/develop/doc/_sources/api/v2/fluid/layers.rst.txt @@ -300,3 +300,7 @@ conv2d_transpose .. autofunction:: paddle.v2.fluid.layers.conv2d_transpose :noindex: +sequence_expand +--------- +.. autofunction:: paddle.v2.fluid.layers.sequence_expand + :noindex: diff --git a/develop/doc/_sources/design/mkl/mkl_packed.md.txt b/develop/doc/_sources/design/mkl/mkl_packed.md.txt index c07f7d0cbe9942e626bddbc37477e84e135f8e49..0123315ad4368e68b377f66119949bfd6c1c7860 100644 --- a/develop/doc/_sources/design/mkl/mkl_packed.md.txt +++ b/develop/doc/_sources/design/mkl/mkl_packed.md.txt @@ -30,10 +30,10 @@ 由于在现有的某些情况下(例如RNN),多次调用 cblas_?gemm 会使用相同的原数据,因此,每次调用时对原数据的重复Packing便成为了冗余。 为了最大程度减少多次调用 cblas_?gemm 在Packing上的耗时,Intel® MKL 引入了以下四个API: - * cblas_?gemm_alloc - * cblas_?gemm_pack - * cblas_?gemm_compute - * cblas_?gemm_free + * [cblas_?gemm_alloc](https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-alloc) + * [cblas_?gemm_pack](https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-pack) + * [cblas_?gemm_compute](https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-compute) + * [cblas_?gemm_free](https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-free) 通过使用这些API,我们可以先完成对原数据的Packing操作,再把已转换为Packed格式的数据传递给那些复用同一数据的gemm_compute函数,从而避免了Packing冗余。 @@ -84,7 +84,20 @@ PaddlePaddle/Paddle 2. 对比优化后layer与相对应的PaddlePaddle原有layer, 在batch mode下的结果。 ### Python API -TBD +计划在`paddle/utils.Flags`中添加`use_mkl_packed`的flag,用于选择是否使用相关功能,并且当编译时`WITH_MKL=ON`的情况下,默认设置为`true`。 + +同时,在`python/paddle/trainer/config_parser.py`中对应的layer处,添加`use_mkl_packed`这个选择,方便用户在Python端选择是否启用这个功能。 + +具体实现方式比如: + +```python +use_mkl_packed = bool(int(g_command_config_args.get("use_mkl_packed", 0))) +if use_mkl_packed: + self.layer_type = mkl_packed_* +``` + +所有相关的`layer_type`会以*mkl_packed_*开头,这些会在`MKLPacked*Layer`注册layer的时候保证,以示区分。 + ### Benchmarking 会添加相应的脚本用于测试和对比在使用MKL Packed recurrent layers 前后的网络性能。 diff --git a/develop/doc/api/v2/fluid/layers.html b/develop/doc/api/v2/fluid/layers.html index f67b4d62f3ce077c87f3cac15e64b895e77d812f..c9908c38178c14c09888ab1f08c0df7a162aa9fa 100644 --- a/develop/doc/api/v2/fluid/layers.html +++ b/develop/doc/api/v2/fluid/layers.html @@ -1065,6 +1065,79 @@ stride_H = stride_W = stride. + +
+

sequence_expand

+
+
+paddle.v2.fluid.layers.sequence_expand(x, y, main_program=None, startup_program=None)
+

Sequence Expand Layer. This layer will expand the input variable x +according to LoD information of y. And the following examples will +explain how sequence_expand works:

+
* Case 1
+    x is a LoDTensor:
+        x.lod = [[0,       2, 3],
+                 [0, 1,    3, 4]]
+        x.data = [a, b, c, d]
+        x.dims = [4, 1]
+
+    y is a LoDTensor:
+        y.lod = [[0,    2,    4],
+                 [0, 3, 6, 7, 8]]
+
+    with condition len(y.lod[-1]) - 1 == x.dims[0]
+
+    then output is a 2-level LoDTensor:
+        out.lod = [[0,                2,    4],
+                   [0,       3,       6, 7, 8]]
+        out.data = [a, a, a, b, b, b, c, d]
+        out.dims = [8, 1]
+
+* Case 2
+    x is a Tensor:
+        x.data = [a, b, c]
+        x.dims = [3, 1]
+
+    y is a LoDTensor:
+        y.lod = [[0, 2, 3, 6]]
+
+    with condition len(y.lod[-1]) - 1 == x.dims[0]
+
+    then output is a 1-level LoDTensor:
+        out.lod = [[0,    2, 3,      6]]
+        out.data = [a, a, b, c, c, c]
+        out.dims = [6, 1]
+
+
+ +++ + + + + + + + +
Parameters:
    +
  • x (Variable) – The input variable which is a Tensor or LoDTensor.
  • +
  • y (Variable) – The input variable which is a LoDTensor.
  • +
  • main_program (Program) – The main program.
  • +
  • startup_program (Program) – The startup program.
  • +
+
Returns:

The expanded variable which is a LoDTensor.

+
Return type:

Variable

+
+

Examples

+
x = fluid.layers.data(name='x', shape=[10], dtype='float32')
+y = fluid.layers.data(name='y', shape=[10, 20],
+                 dtype='float32', lod_level=1)
+out = layers.sequence_expand(x=x, y=y)
+
+
+
+
diff --git a/develop/doc/design/mkl/mkl_packed.html b/develop/doc/design/mkl/mkl_packed.html index feb5f4edacb001889256ab40aa0b00e16b01eb9c..32639fe8801cc59ccbb0127c62d73d27ae4d0030 100644 --- a/develop/doc/design/mkl/mkl_packed.html +++ b/develop/doc/design/mkl/mkl_packed.html @@ -238,12 +238,14 @@
  • 转换冗余 由于在现有的某些情况下(例如RNN),多次调用 cblas_?gemm 会使用相同的原数据,因此,每次调用时对原数据的重复Packing便成为了冗余。
  • 为了最大程度减少多次调用 cblas_?gemm 在Packing上的耗时,Intel® MKL 引入了以下四个API:

    -