Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
机器未来
Paddle
提交
29bf727e
P
Paddle
项目概览
机器未来
/
Paddle
与 Fork 源项目一致
Fork自
PaddlePaddle / Paddle
通知
1
Star
1
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
1
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
Paddle
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
1
Issue
1
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
29bf727e
编写于
6月 13, 2018
作者:
F
fengjiayi
浏览文件
操作
浏览文件
下载
差异文件
Merge branch 'develop' of
https://github.com/PaddlePaddle/Paddle
into dev_add_doc
上级
c6c9c657
d3b0129c
变更
17
显示空白变更内容
内联
并排
Showing
17 changed file
with
379 addition
and
261 deletion
+379
-261
doc/v2/api/config/evaluators.rst
doc/v2/api/config/evaluators.rst
+1
-1
doc/v2/api/config/layer.rst
doc/v2/api/config/layer.rst
+94
-99
doc/v2/api/index_en.rst
doc/v2/api/index_en.rst
+0
-1
doc/v2/build_and_install/pip_install_cn.rst
doc/v2/build_and_install/pip_install_cn.rst
+1
-0
doc/v2/build_and_install/pip_install_en.rst
doc/v2/build_and_install/pip_install_en.rst
+1
-0
paddle/fluid/operators/conv_mkldnn_op.cc
paddle/fluid/operators/conv_mkldnn_op.cc
+229
-134
paddle/fluid/operators/conv_op.cc
paddle/fluid/operators/conv_op.cc
+1
-2
paddle/fluid/operators/gen_nccl_id_op.cc
paddle/fluid/operators/gen_nccl_id_op.cc
+4
-0
python/paddle/batch.py
python/paddle/batch.py
+1
-1
python/paddle/fluid/layers/nn.py
python/paddle/fluid/layers/nn.py
+24
-6
python/paddle/fluid/tests/book/high-level-api/image_classification/test_image_classification_resnet.py
.../image_classification/test_image_classification_resnet.py
+3
-2
python/paddle/fluid/tests/book/high-level-api/image_classification/test_image_classification_vgg.py
...api/image_classification/test_image_classification_vgg.py
+3
-2
python/paddle/fluid/tests/book/high-level-api/understand_sentiment/test_understand_sentiment_stacked_lstm.py
...stand_sentiment/test_understand_sentiment_stacked_lstm.py
+5
-2
python/paddle/fluid/tests/book_memory_optimization/test_memopt_fit_a_line.py
.../tests/book_memory_optimization/test_memopt_fit_a_line.py
+1
-1
python/paddle/trainer_config_helpers/attrs.py
python/paddle/trainer_config_helpers/attrs.py
+7
-6
python/paddle/trainer_config_helpers/layers.py
python/paddle/trainer_config_helpers/layers.py
+3
-3
python/paddle/v2/minibatch.py
python/paddle/v2/minibatch.py
+1
-1
未找到文件。
doc/v2/api/config/evaluators.rst
浏览文件 @
29bf727e
...
...
@@ -101,7 +101,7 @@ value_printer
:noindex:
Detection
=====
=====
=====
detection_map
-------------
...
...
doc/v2/api/config/layer.rst
浏览文件 @
29bf727e
...
...
@@ -11,7 +11,7 @@ Data layer
data
----
.. auto
class
:: paddle.v2.layer.data
.. auto
function
:: paddle.v2.layer.data
:noindex:
Fully Connected Layers
...
...
@@ -21,12 +21,12 @@ Fully Connected Layers
fc
--
.. auto
class
:: paddle.v2.layer.fc
.. auto
function
:: paddle.v2.layer.fc
:noindex:
selective_fc
------------
.. auto
class
:: paddle.v2.layer.selective_fc
.. auto
function
:: paddle.v2.layer.selective_fc
:noindex:
Conv Layers
...
...
@@ -34,34 +34,34 @@ Conv Layers
conv_operator
-------------
.. auto
class
:: paddle.v2.layer.conv_operator
.. auto
function
:: paddle.v2.layer.conv_operator
:noindex:
conv_projection
---------------
.. auto
class
:: paddle.v2.layer.conv_projection
.. auto
function
:: paddle.v2.layer.conv_projection
:noindex:
conv_shift
----------
.. auto
class
:: paddle.v2.layer.conv_shift
.. auto
function
:: paddle.v2.layer.conv_shift
:noindex:
img_conv
--------
.. auto
class
:: paddle.v2.layer.img_conv
.. auto
function
:: paddle.v2.layer.img_conv
:noindex:
.. _api_v2.layer_context_projection:
context_projection
------------------
.. auto
class
:: paddle.v2.layer.context_projection
.. auto
function
:: paddle.v2.layer.context_projection
:noindex:
row_conv
--------
.. auto
class
:: paddle.v2.layer.row_conv
.. auto
function
:: paddle.v2.layer.row_conv
:noindex:
Image Pooling Layer
...
...
@@ -69,27 +69,27 @@ Image Pooling Layer
img_pool
--------
.. auto
class
:: paddle.v2.layer.img_pool
.. auto
function
:: paddle.v2.layer.img_pool
:noindex:
spp
---
.. auto
class
:: paddle.v2.layer.spp
.. auto
function
:: paddle.v2.layer.spp
:noindex:
maxout
------
.. auto
class
:: paddle.v2.layer.maxout
.. auto
function
:: paddle.v2.layer.maxout
:noindex:
roi_pool
--------
.. auto
class
:: paddle.v2.layer.roi_pool
.. auto
function
:: paddle.v2.layer.roi_pool
:noindex:
pad
----
.. auto
class
:: paddle.v2.layer.pad
.. auto
function
:: paddle.v2.layer.pad
:noindex:
Norm Layer
...
...
@@ -97,27 +97,27 @@ Norm Layer
img_cmrnorm
-----------
.. auto
class
:: paddle.v2.layer.img_cmrnorm
.. auto
function
:: paddle.v2.layer.img_cmrnorm
:noindex:
batch_norm
----------
.. auto
class
:: paddle.v2.layer.batch_norm
.. auto
function
:: paddle.v2.layer.batch_norm
:noindex:
sum_to_one_norm
---------------
.. auto
class
:: paddle.v2.layer.sum_to_one_norm
.. auto
function
:: paddle.v2.layer.sum_to_one_norm
:noindex:
cross_channel_norm
------------------
.. auto
class
:: paddle.v2.layer.cross_channel_norm
.. auto
function
:: paddle.v2.layer.cross_channel_norm
:noindex:
row_l2_norm
-----------
.. auto
class
:: paddle.v2.layer.row_l2_norm
.. auto
function
:: paddle.v2.layer.row_l2_norm
:noindex:
Recurrent Layers
...
...
@@ -125,22 +125,22 @@ Recurrent Layers
recurrent
---------
.. auto
class
:: paddle.v2.layer.recurrent
.. auto
function
:: paddle.v2.layer.recurrent
:noindex:
lstmemory
---------
.. auto
class
:: paddle.v2.layer.lstmemory
.. auto
function
:: paddle.v2.layer.lstmemory
:noindex:
grumemory
---------
.. auto
class
:: paddle.v2.layer.grumemory
.. auto
function
:: paddle.v2.layer.grumemory
:noindex:
gated_unit
-----------
.. auto
class
:: paddle.v2.layer.gated_unit
.. auto
function
:: paddle.v2.layer.gated_unit
:noindex:
Recurrent Layer Group
...
...
@@ -148,32 +148,32 @@ Recurrent Layer Group
memory
------
.. auto
class
:: paddle.v2.layer.memory
.. auto
function
:: paddle.v2.layer.memory
:noindex:
recurrent_group
---------------
.. auto
class
:: paddle.v2.layer.recurrent_group
.. auto
function
:: paddle.v2.layer.recurrent_group
:noindex:
lstm_step
---------
.. auto
class
:: paddle.v2.layer.lstm_step
.. auto
function
:: paddle.v2.layer.lstm_step
:noindex:
gru_step
--------
.. auto
class
:: paddle.v2.layer.gru_step
.. auto
function
:: paddle.v2.layer.gru_step
:noindex:
beam_search
------------
.. auto
class
:: paddle.v2.layer.beam_search
.. auto
function
:: paddle.v2.layer.beam_search
:noindex:
get_output
----------
.. auto
class
:: paddle.v2.layer.get_output
.. auto
function
:: paddle.v2.layer.get_output
:noindex:
Mixed Layer
...
...
@@ -183,54 +183,54 @@ Mixed Layer
mixed
-----
.. auto
class
:: paddle.v2.layer.mixed
.. auto
function
:: paddle.v2.layer.mixed
:noindex:
.. _api_v2.layer_embedding:
embedding
---------
.. auto
class
:: paddle.v2.layer.embedding
.. auto
function
:: paddle.v2.layer.embedding
:noindex:
scaling_projection
------------------
.. auto
class
:: paddle.v2.layer.scaling_projection
.. auto
function
:: paddle.v2.layer.scaling_projection
:noindex:
dotmul_projection
-----------------
.. auto
class
:: paddle.v2.layer.dotmul_projection
.. auto
function
:: paddle.v2.layer.dotmul_projection
:noindex:
dotmul_operator
---------------
.. auto
class
:: paddle.v2.layer.dotmul_operator
.. auto
function
:: paddle.v2.layer.dotmul_operator
:noindex:
full_matrix_projection
----------------------
.. auto
class
:: paddle.v2.layer.full_matrix_projection
.. auto
function
:: paddle.v2.layer.full_matrix_projection
:noindex:
identity_projection
-------------------
.. auto
class
:: paddle.v2.layer.identity_projection
.. auto
function
:: paddle.v2.layer.identity_projection
:noindex:
slice_projection
-------------------
.. auto
class
:: paddle.v2.layer.slice_projection
.. auto
function
:: paddle.v2.layer.slice_projection
:noindex:
table_projection
----------------
.. auto
class
:: paddle.v2.layer.table_projection
.. auto
function
:: paddle.v2.layer.table_projection
:noindex:
trans_full_matrix_projection
----------------------------
.. auto
class
:: paddle.v2.layer.trans_full_matrix_projection
.. auto
function
:: paddle.v2.layer.trans_full_matrix_projection
:noindex:
Aggregate Layers
...
...
@@ -245,51 +245,46 @@ AggregateLevel
pooling
-------
.. auto
class
:: paddle.v2.layer.pooling
.. auto
function
:: paddle.v2.layer.pooling
:noindex:
.. _api_v2.layer_last_seq:
last_seq
--------
.. auto
class
:: paddle.v2.layer.last_seq
.. auto
function
:: paddle.v2.layer.last_seq
:noindex:
.. _api_v2.layer_first_seq:
first_seq
---------
.. auto
class
:: paddle.v2.layer.first_seq
.. auto
function
:: paddle.v2.layer.first_seq
:noindex:
sub_seq
---------
.. auto
class
:: paddle.v2.layer.sub_seq
.. auto
function
:: paddle.v2.layer.sub_seq
:noindex:
concat
------
.. auto
class
:: paddle.v2.layer.concat
.. auto
function
:: paddle.v2.layer.concat
:noindex:
seq_concat
----------
.. auto
class
:: paddle.v2.layer.seq_concat
.. auto
function
:: paddle.v2.layer.seq_concat
:noindex:
seq_slice
---------
.. autoclass:: paddle.v2.layer.seq_slice
:noindex:
kmax_sequence_score
-------------------
.. autoclass:: paddle.v2.layer.kmax_sequence_score
.. autofunction:: paddle.v2.layer.seq_slice
:noindex:
sub_nested_seq
--------------
.. auto
class
:: paddle.v2.layer.sub_nested_seq
.. auto
function
:: paddle.v2.layer.sub_nested_seq
:noindex:
Reshaping Layers
...
...
@@ -297,7 +292,7 @@ Reshaping Layers
block_expand
------------
.. auto
class
:: paddle.v2.layer.block_expand
.. auto
function
:: paddle.v2.layer.block_expand
:noindex:
.. _api_v2.layer_expand:
...
...
@@ -309,22 +304,22 @@ ExpandLevel
expand
------
.. auto
class
:: paddle.v2.layer.expand
.. auto
function
:: paddle.v2.layer.expand
:noindex:
repeat
------
.. auto
class
:: paddle.v2.layer.repeat
.. auto
function
:: paddle.v2.layer.repeat
:noindex:
rotate
------
.. auto
class
:: paddle.v2.layer.rotate
.. auto
function
:: paddle.v2.layer.rotate
:noindex:
seq_reshape
-----------
.. auto
class
:: paddle.v2.layer.seq_reshape
.. auto
function
:: paddle.v2.layer.seq_reshape
:noindex:
Math Layers
...
...
@@ -332,94 +327,94 @@ Math Layers
addto
-----
.. auto
class
:: paddle.v2.layer.addto
.. auto
function
:: paddle.v2.layer.addto
:noindex:
linear_comb
-----------
.. auto
class
:: paddle.v2.layer.linear_comb
.. auto
function
:: paddle.v2.layer.linear_comb
:noindex:
interpolation
-------------
.. auto
class
:: paddle.v2.layer.interpolation
.. auto
function
:: paddle.v2.layer.interpolation
:noindex:
bilinear_interp
---------------
.. auto
class
:: paddle.v2.layer.bilinear_interp
.. auto
function
:: paddle.v2.layer.bilinear_interp
:noindex:
dropout
--------
.. auto
class
:: paddle.v2.layer.dropout
.. auto
function
:: paddle.v2.layer.dropout
:noindex:
dot_prod
---------
.. auto
class
:: paddle.v2.layer.dot_prod
.. auto
function
:: paddle.v2.layer.dot_prod
:noindex:
out_prod
--------
.. auto
class
:: paddle.v2.layer.out_prod
.. auto
function
:: paddle.v2.layer.out_prod
:noindex:
power
-----
.. auto
class
:: paddle.v2.layer.power
.. auto
function
:: paddle.v2.layer.power
:noindex:
scaling
-------
.. auto
class
:: paddle.v2.layer.scaling
.. auto
function
:: paddle.v2.layer.scaling
:noindex:
clip
----
.. auto
class
:: paddle.v2.layer.clip
.. auto
function
:: paddle.v2.layer.clip
:noindex:
resize
------
.. auto
class
:: paddle.v2.layer.resize
.. auto
function
:: paddle.v2.layer.resize
:noindex:
slope_intercept
---------------
.. auto
class
:: paddle.v2.layer.slope_intercept
.. auto
function
:: paddle.v2.layer.slope_intercept
:noindex:
tensor
------
.. auto
class
:: paddle.v2.layer.tensor
.. auto
function
:: paddle.v2.layer.tensor
:noindex:
.. _api_v2.layer_cos_sim:
cos_sim
-------
.. auto
class
:: paddle.v2.layer.cos_sim
.. auto
function
:: paddle.v2.layer.cos_sim
:noindex:
l2_distance
-----------
.. auto
class
:: paddle.v2.layer.l2_distance
.. auto
function
:: paddle.v2.layer.l2_distance
:noindex:
trans
-----
.. auto
class
:: paddle.v2.layer.trans
.. auto
function
:: paddle.v2.layer.trans
:noindex:
scale_shift
-----------
.. auto
class
:: paddle.v2.layer.scale_shift
.. auto
function
:: paddle.v2.layer.scale_shift
:noindex:
factorization_machine
---------------------
.. auto
class
:: paddle.v2.layer.factorization_machine
.. auto
function
:: paddle.v2.layer.factorization_machine
:noindex:
Sampling Layers
...
...
@@ -427,17 +422,17 @@ Sampling Layers
maxid
-----
.. auto
class
:: paddle.v2.layer.max_id
.. auto
function
:: paddle.v2.layer.max_id
:noindex:
sampling_id
-----------
.. auto
class
:: paddle.v2.layer.sampling_id
.. auto
function
:: paddle.v2.layer.sampling_id
:noindex:
multiplex
---------
.. auto
class
:: paddle.v2.layer.multiplex
.. auto
function
:: paddle.v2.layer.multiplex
:noindex:
.. _api_v2.layer_costs:
...
...
@@ -447,97 +442,97 @@ Cost Layers
cross_entropy_cost
------------------
.. auto
class
:: paddle.v2.layer.cross_entropy_cost
.. auto
function
:: paddle.v2.layer.cross_entropy_cost
:noindex:
cross_entropy_with_selfnorm_cost
--------------------------------
.. auto
class
:: paddle.v2.layer.cross_entropy_with_selfnorm_cost
.. auto
function
:: paddle.v2.layer.cross_entropy_with_selfnorm_cost
:noindex:
multi_binary_label_cross_entropy_cost
-------------------------------------
.. auto
class
:: paddle.v2.layer.multi_binary_label_cross_entropy_cost
.. auto
function
:: paddle.v2.layer.multi_binary_label_cross_entropy_cost
:noindex:
classification_cost
-------------------
.. auto
class
:: paddle.v2.layer.classification_cost
.. auto
function
:: paddle.v2.layer.classification_cost
:noindex:
huber_regression_cost
-------------------------
.. auto
class
:: paddle.v2.layer.huber_regression_cost
.. auto
function
:: paddle.v2.layer.huber_regression_cost
:noindex:
huber_classification_cost
-------------------------
.. auto
class
:: paddle.v2.layer.huber_classification_cost
.. auto
function
:: paddle.v2.layer.huber_classification_cost
:noindex:
lambda_cost
-----------
.. auto
class
:: paddle.v2.layer.lambda_cost
.. auto
function
:: paddle.v2.layer.lambda_cost
:noindex:
square_error_cost
-----------------
.. auto
class
:: paddle.v2.layer.square_error_cost
.. auto
function
:: paddle.v2.layer.square_error_cost
:noindex:
rank_cost
---------
.. auto
class
:: paddle.v2.layer.rank_cost
.. auto
function
:: paddle.v2.layer.rank_cost
:noindex:
sum_cost
---------
.. auto
class
:: paddle.v2.layer.sum_cost
.. auto
function
:: paddle.v2.layer.sum_cost
:noindex:
crf
---
.. auto
class
:: paddle.v2.layer.crf
.. auto
function
:: paddle.v2.layer.crf
:noindex:
crf_decoding
------------
.. auto
class
:: paddle.v2.layer.crf_decoding
.. auto
function
:: paddle.v2.layer.crf_decoding
:noindex:
ctc
---
.. auto
class
:: paddle.v2.layer.ctc
.. auto
function
:: paddle.v2.layer.ctc
:noindex:
warp_ctc
--------
.. auto
class
:: paddle.v2.layer.warp_ctc
.. auto
function
:: paddle.v2.layer.warp_ctc
:noindex:
nce
---
.. auto
class
:: paddle.v2.layer.nce
.. auto
function
:: paddle.v2.layer.nce
:noindex:
hsigmoid
---------
.. auto
class
:: paddle.v2.layer.hsigmoid
.. auto
function
:: paddle.v2.layer.hsigmoid
:noindex:
smooth_l1_cost
--------------
.. auto
class
:: paddle.v2.layer.smooth_l1_cost
.. auto
function
:: paddle.v2.layer.smooth_l1_cost
:noindex:
multibox_loss
--------------
.. auto
class
:: paddle.v2.layer.multibox_loss
.. auto
function
:: paddle.v2.layer.multibox_loss
:noindex:
detection_output
----------------
.. auto
class
:: paddle.v2.layer.detection_output
.. auto
function
:: paddle.v2.layer.detection_output
:noindex:
Check Layer
...
...
@@ -545,7 +540,7 @@ Check Layer
eos
---
.. auto
class
:: paddle.v2.layer.eos
.. auto
function
:: paddle.v2.layer.eos
:noindex:
Activation
...
...
@@ -553,5 +548,5 @@ Activation
prelu
--------
.. auto
class
:: paddle.v2.layer.prelu
.. auto
function
:: paddle.v2.layer.prelu
:noindex:
doc/v2/api/index_en.rst
浏览文件 @
29bf727e
...
...
@@ -8,4 +8,3 @@ API
model_configs.rst
data.rst
run_logic.rst
fluid/index.rst
doc/v2/build_and_install/pip_install_cn.rst
浏览文件 @
29bf727e
...
...
@@ -60,6 +60,7 @@ paddlepaddle-gpu==0.11.0 使用CUDA 7.5和cuDNN 5编译的0.11.0版
"cpu_noavx_openblas", "`paddlepaddle-latest-cp27-cp27mu-linux_x86_64.whl <https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_CpuNoavxOpenblas/.lastSuccessful/paddlepaddle-latest-cp27-cp27mu-linux_x86_64.whl>`__", "`paddlepaddle-latest-cp27-cp27m-linux_x86_64.whl <https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_CpuNoavxOpenblas/.lastSuccessful/paddlepaddle-latest-cp27-cp27m-linux_x86_64.whl>`_"
"cuda8.0_cudnn5_avx_mkl", "`paddlepaddle_gpu-latest-cp27-cp27mu-linux_x86_64.whl <https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_Cuda80cudnn5cp27cp27mu/.lastSuccessful/paddlepaddle_gpu-latest-cp27-cp27mu-linux_x86_64.whl>`__", "`paddlepaddle_gpu-latest-cp27-cp27m-linux_x86_64.whl <https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_Cuda80cudnn5cp27cp27mu/.lastSuccessful/paddlepaddle_gpu-latest-cp27-cp27m-linux_x86_64.whl>`__"
"cuda8.0_cudnn7_avx_mkl", "`paddlepaddle_gpu-latest-cp27-cp27mu-linux_x86_64.whl <https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_Cuda8cudnn7cp27cp27mu/.lastSuccessful/paddlepaddle_gpu-latest-cp27-cp27mu-linux_x86_64.whl>`__", "`paddlepaddle_gpu-latest-cp27-cp27m-linux_x86_64.whl <https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_Cuda8cudnn7cp27cp27mu/.lastSuccessful/paddlepaddle_gpu-latest-cp27-cp27m-linux_x86_64.whl>`__"
"cuda9.0_cudnn7_avx_mkl", "`paddlepaddle_gpu-latest-cp27-cp27mu-linux_x86_64.whl <https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_Cuda90cudnn7avxMkl/.lastSuccessful/paddlepaddle_gpu-latest-cp27-cp27mu-linux_x86_64.whl>`__", "`paddlepaddle_gpu-latest-cp27-cp27m-linux_x86_64.whl <https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_Cuda90cudnn7avxMkl/.lastSuccessful/paddlepaddle_gpu-latest-cp27-cp27m-linux_x86_64.whl>`__"
.. _pip_dependency:
...
...
doc/v2/build_and_install/pip_install_en.rst
浏览文件 @
29bf727e
...
...
@@ -63,6 +63,7 @@ If the links below shows up the login form, just click "Log in as guest" to star
"cpu_noavx_openblas", "`paddlepaddle-latest-cp27-cp27mu-linux_x86_64.whl <https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_CpuNoavxOpenblas/.lastSuccessful/paddlepaddle-latest-cp27-cp27mu-linux_x86_64.whl>`__", "`paddlepaddle-latest-cp27-cp27m-linux_x86_64.whl <https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_CpuNoavxOpenblas/.lastSuccessful/paddlepaddle-latest-cp27-cp27m-linux_x86_64.whl>`__"
"cuda8.0_cudnn5_avx_mkl", "`paddlepaddle_gpu-latest-cp27-cp27mu-linux_x86_64.whl <https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_Cuda80cudnn5cp27cp27mu/.lastSuccessful/paddlepaddle_gpu-latest-cp27-cp27mu-linux_x86_64.whl>`__", "`paddlepaddle_gpu-latest-cp27-cp27m-linux_x86_64.whl <https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_Cuda80cudnn5cp27cp27mu/.lastSuccessful/paddlepaddle_gpu-latest-cp27-cp27m-linux_x86_64.whl>`__"
"cuda8.0_cudnn7_avx_mkl", "`paddlepaddle_gpu-latest-cp27-cp27mu-linux_x86_64.whl <https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_Cuda8cudnn7cp27cp27mu/.lastSuccessful/paddlepaddle_gpu-latest-cp27-cp27mu-linux_x86_64.whl>`__", "`paddlepaddle_gpu-latest-cp27-cp27m-linux_x86_64.whl <https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_Cuda8cudnn7cp27cp27mu/.lastSuccessful/paddlepaddle_gpu-latest-cp27-cp27m-linux_x86_64.whl>`__"
"cuda9.0_cudnn7_avx_mkl", "`paddlepaddle_gpu-latest-cp27-cp27mu-linux_x86_64.whl <https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_Cuda90cudnn7avxMkl/.lastSuccessful/paddlepaddle_gpu-latest-cp27-cp27mu-linux_x86_64.whl>`__", "`paddlepaddle_gpu-latest-cp27-cp27m-linux_x86_64.whl <https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_Cuda90cudnn7avxMkl/.lastSuccessful/paddlepaddle_gpu-latest-cp27-cp27m-linux_x86_64.whl>`__"
.. _pip_dependency:
...
...
paddle/fluid/operators/conv_mkldnn_op.cc
浏览文件 @
29bf727e
...
...
@@ -18,6 +18,17 @@
namespace
paddle
{
namespace
operators
{
using
conv_bwd_data
=
mkldnn
::
convolution_backward_data
;
using
conv_bwd_weights
=
mkldnn
::
convolution_backward_weights
;
using
conv_fwd
=
mkldnn
::
convolution_forward
;
using
framework
::
DataLayout
;
using
mkldnn
::
memory
;
using
mkldnn
::
primitive
;
using
mkldnn
::
reorder
;
using
mkldnn
::
stream
;
using
platform
::
to_void_cast
;
using
platform
::
GetMKLDNNFormat
;
template
<
typename
T
>
class
ConvMKLDNNOpKernel
:
public
paddle
::
framework
::
OpKernel
<
T
>
{
public:
...
...
@@ -25,6 +36,10 @@ class ConvMKLDNNOpKernel : public paddle::framework::OpKernel<T> {
PADDLE_ENFORCE
(
paddle
::
platform
::
is_cpu_place
(
ctx
.
GetPlace
()),
"It must use CPUPlace."
);
// Get unique name for index
const
std
::
string
key
=
ctx
.
op
().
Output
(
"Output"
);
const
std
::
string
key_conv_pd
=
key
+
"@conv_pd"
;
auto
&
dev_ctx
=
ctx
.
template
device_context
<
paddle
::
platform
::
MKLDNNDeviceContext
>();
const
auto
&
mkldnn_engine
=
dev_ctx
.
GetEngine
();
...
...
@@ -33,10 +48,12 @@ class ConvMKLDNNOpKernel : public paddle::framework::OpKernel<T> {
auto
*
filter
=
ctx
.
Input
<
Tensor
>
(
"Filter"
);
auto
*
output
=
ctx
.
Output
<
Tensor
>
(
"Output"
);
// Get an unique name from "argument" name of "Output" variable
// This name will be used as key when saving info into device context
const
std
::
string
key
=
ctx
.
op
().
Output
(
"Output"
);
const
std
::
string
key_conv_pd
=
key
+
"@conv_pd"
;
PADDLE_ENFORCE
(
input
->
layout
()
==
DataLayout
::
kMKLDNN
&&
input
->
format
()
!=
memory
::
format
::
format_undef
,
"Wrong layout/format set for Input tensor"
);
PADDLE_ENFORCE
(
filter
->
layout
()
==
DataLayout
::
kMKLDNN
&&
filter
->
format
()
!=
memory
::
format
::
format_undef
,
"Wrong layout/format set for Filter tensor"
);
std
::
vector
<
int
>
strides
=
ctx
.
Attr
<
std
::
vector
<
int
>>
(
"strides"
);
std
::
vector
<
int
>
paddings
=
ctx
.
Attr
<
std
::
vector
<
int
>>
(
"paddings"
);
...
...
@@ -63,60 +80,86 @@ class ConvMKLDNNOpKernel : public paddle::framework::OpKernel<T> {
paddle
::
framework
::
vectorize2int
(
filter
->
dims
());
std
::
vector
<
int
>
dst_tz
=
paddle
::
framework
::
vectorize2int
(
output
->
dims
());
// TODO(pzelazko-intel): support more formats
auto
src_md
=
platform
::
MKLDNNMemDesc
(
src_tz
,
mkldnn
::
memory
::
data_type
::
f32
,
mkldnn
::
memory
::
format
::
nchw
);
auto
weights_md
=
platform
::
MKLDNNMemDesc
(
weights_tz
,
mkldnn
::
memory
::
data_type
::
f32
,
mkldnn
::
memory
::
format
::
oihw
);
auto
dst_md
=
platform
::
MKLDNNMemDesc
(
dst_tz
,
mkldnn
::
memory
::
data_type
::
f32
,
mkldnn
::
memory
::
format
::
nchw
);
auto
src_memory
=
mkldnn
::
memory
({
src_md
,
mkldnn_engine
},
reinterpret_cast
<
void
*>
(
const_cast
<
T
*>
(
input_data
)));
auto
weights_memory
=
mkldnn
::
memory
({
weights_md
,
mkldnn_engine
},
reinterpret_cast
<
void
*>
(
const_cast
<
T
*>
(
filter_data
)));
auto
dst_memory
=
mkldnn
::
memory
({
dst_md
,
mkldnn_engine
},
output_data
);
std
::
shared_ptr
<
mkldnn
::
convolution_forward
::
primitive_desc
>
conv_pd
=
ConvFwdPrimitiveDesc
(
src_md
,
weights_md
,
dst_md
,
strides
,
paddings
,
mkldnn_engine
);
// save conv_pd into global device context to be referred in backward path
dev_ctx
.
SetBlob
(
key_conv_pd
,
conv_pd
);
// create mkldnn memory from input tensors (data/weights)
auto
user_src_memory
=
memory
(
{{{
src_tz
},
memory
::
data_type
::
f32
,
input
->
format
()},
mkldnn_engine
},
to_void_cast
(
input_data
));
auto
user_weights_memory
=
memory
({{{
weights_tz
},
memory
::
data_type
::
f32
,
filter
->
format
()},
mkldnn_engine
},
to_void_cast
(
filter_data
));
/* create memory descriptor for convolution without specified format
* ('any') which lets a primitive (convolution in this case) choose
* the memory format preferred for best performance
*/
auto
src_md
=
platform
::
MKLDNNMemDesc
(
src_tz
,
memory
::
data_type
::
f32
,
memory
::
format
::
any
);
auto
weights_md
=
platform
::
MKLDNNMemDesc
(
weights_tz
,
memory
::
data_type
::
f32
,
memory
::
format
::
any
);
auto
dst_md
=
platform
::
MKLDNNMemDesc
(
dst_tz
,
memory
::
data_type
::
f32
,
memory
::
format
::
any
);
// create a conv primitive descriptor and save it for usage in backward
std
::
shared_ptr
<
conv_fwd
::
primitive_desc
>
conv_pd
=
ConvFwdPrimitiveDesc
(
src_md
,
weights_md
,
dst_md
,
strides
,
paddings
,
mkldnn_engine
);
// create reorder primitive if the input format is not the preferred one
auto
src_memory
=
user_src_memory
;
primitive
reorder_src
;
bool
is_src_reordered
=
false
;
if
(
memory
::
primitive_desc
(
conv_pd
->
src_primitive_desc
())
!=
user_src_memory
.
get_primitive_desc
())
{
src_memory
=
memory
(
conv_pd
->
src_primitive_desc
());
reorder_src
=
reorder
(
user_src_memory
,
src_memory
);
is_src_reordered
=
true
;
}
auto
weights_memory
=
user_weights_memory
;
primitive
reorder_weights
;
bool
is_weights_reordered
=
false
;
if
(
memory
::
primitive_desc
(
conv_pd
->
weights_primitive_desc
())
!=
user_weights_memory
.
get_primitive_desc
())
{
weights_memory
=
memory
(
conv_pd
->
weights_primitive_desc
());
reorder_weights
=
reorder
(
user_weights_memory
,
weights_memory
);
is_weights_reordered
=
true
;
}
// create memory primitive for conv dst
auto
dst_memory
=
memory
(
conv_pd
->
dst_primitive_desc
(),
output_data
);
// create convolution op primitive
auto
conv_prim
=
mkldnn
::
convolution_forward
(
*
conv_pd
,
src_memory
,
weights_memory
,
dst_memory
);
auto
conv_prim
=
conv_fwd
(
*
conv_pd
,
src_memory
,
weights_memory
,
dst_memory
);
// push primitive to stream and wait until it's executed
std
::
vector
<
mkldnn
::
primitive
>
pipeline
{
conv_prim
};
mkldnn
::
stream
(
mkldnn
::
stream
::
kind
::
eager
).
submit
(
pipeline
).
wait
();
std
::
vector
<
primitive
>
pipeline
;
if
(
is_src_reordered
)
pipeline
.
push_back
(
reorder_src
);
if
(
is_weights_reordered
)
pipeline
.
push_back
(
reorder_weights
);
pipeline
.
push_back
(
conv_prim
);
stream
(
stream
::
kind
::
eager
).
submit
(
pipeline
).
wait
();
// Save conv_pd/src_memory/weights_memory for backward pass
dev_ctx
.
SetBlob
(
key_conv_pd
,
conv_pd
);
output
->
set_layout
(
DataLayout
::
kMKLDNN
);
output
->
set_format
(
GetMKLDNNFormat
(
dst_memory
));
}
private:
std
::
unique_ptr
<
mkldnn
::
convolution_forward
::
primitive_desc
>
ConvFwdPrimitiveDesc
(
const
mkldnn
::
memory
::
desc
&
src
,
const
mkldnn
::
memory
::
desc
&
weights
,
const
mkldnn
::
memory
::
desc
&
dst
,
const
std
::
vector
<
int
>&
strides
,
const
std
::
vector
<
int
>&
paddings
,
const
mkldnn
::
engine
&
engine
)
const
{
mkldnn
::
memory
::
dims
stride_dims
=
{
strides
[
0
],
strides
[
1
]};
mkldnn
::
memory
::
dims
padding_dims
=
{
paddings
[
0
],
paddings
[
1
]};
auto
conv_desc
=
mkldnn
::
convolution_forward
::
desc
(
mkldnn
::
prop_kind
::
forward
,
mkldnn
::
convolution_direct
,
src
,
weights
,
dst
,
stride_dims
,
padding_dims
,
padding_dims
,
mkldnn
::
padding_kind
::
zero
);
auto
p_conv_pd
=
new
mkldnn
::
convolution_forward
::
primitive_desc
(
conv_desc
,
engine
);
return
std
::
unique_ptr
<
mkldnn
::
convolution_forward
::
primitive_desc
>
(
p_conv_pd
);
std
::
unique_ptr
<
conv_fwd
::
primitive_desc
>
ConvFwdPrimitiveDesc
(
const
memory
::
desc
&
src
,
const
memory
::
desc
&
weights
,
const
memory
::
desc
&
dst
,
const
std
::
vector
<
int
>&
strides
,
const
std
::
vector
<
int
>&
paddings
,
const
mkldnn
::
engine
&
engine
)
const
{
memory
::
dims
stride_dims
=
{
strides
[
0
],
strides
[
1
]};
memory
::
dims
padding_dims
=
{
paddings
[
0
],
paddings
[
1
]};
auto
conv_desc
=
conv_fwd
::
desc
(
mkldnn
::
prop_kind
::
forward
,
mkldnn
::
convolution_direct
,
src
,
weights
,
dst
,
stride_dims
,
padding_dims
,
padding_dims
,
mkldnn
::
padding_kind
::
zero
);
auto
p_conv_pd
=
new
conv_fwd
::
primitive_desc
(
conv_desc
,
engine
);
return
std
::
unique_ptr
<
conv_fwd
::
primitive_desc
>
(
p_conv_pd
);
}
};
...
...
@@ -139,6 +182,19 @@ class ConvMKLDNNGradOpKernel : public paddle::framework::OpKernel<T> {
Tensor
*
input_grad
=
ctx
.
Output
<
Tensor
>
(
framework
::
GradVarName
(
"Input"
));
Tensor
*
filter_grad
=
ctx
.
Output
<
Tensor
>
(
framework
::
GradVarName
(
"Filter"
));
PADDLE_ENFORCE
(
input
->
layout
()
==
DataLayout
::
kMKLDNN
&&
input
->
format
()
!=
memory
::
format
::
format_undef
,
"Wrong layout/format set for Input tensor"
);
PADDLE_ENFORCE
(
filter
->
layout
()
==
DataLayout
::
kMKLDNN
&&
filter
->
format
()
!=
memory
::
format
::
format_undef
,
"Wrong layout/format set for Filter tensor"
);
PADDLE_ENFORCE
(
output
->
layout
()
==
DataLayout
::
kMKLDNN
&&
output
->
format
()
!=
memory
::
format
::
format_undef
,
"Wrong layout/format set for Output tensor"
);
PADDLE_ENFORCE
(
output_grad
->
layout
()
==
DataLayout
::
kMKLDNN
&&
output_grad
->
format
()
!=
memory
::
format
::
format_undef
,
"Wrong layout/format set for output_grad tensor"
);
if
(
!
input_grad
&&
!
filter_grad
)
return
;
// Get an unique name from "argument" name of "Output" variable
...
...
@@ -167,108 +223,147 @@ class ConvMKLDNNGradOpKernel : public paddle::framework::OpKernel<T> {
paddle
::
framework
::
vectorize2int
(
filter
->
dims
());
std
::
vector
<
int
>
dst_tz
=
paddle
::
framework
::
vectorize2int
(
output
->
dims
());
// TODO(pzelazko-intel): support more formats
auto
src_md
=
platform
::
MKLDNNMemDesc
(
src_tz
,
mkldnn
::
memory
::
data_type
::
f32
,
mkldnn
::
memory
::
format
::
nchw
);
auto
diff_src_md
=
platform
::
MKLDNNMemDesc
(
src_tz
,
mkldnn
::
memory
::
data_type
::
f32
,
mkldnn
::
memory
::
format
::
nchw
);
auto
weights_md
=
platform
::
MKLDNNMemDesc
(
weights_tz
,
mkldnn
::
memory
::
data_type
::
f32
,
mkldnn
::
memory
::
format
::
oihw
);
auto
diff_weights_md
=
platform
::
MKLDNNMemDesc
(
weights_tz
,
mkldnn
::
memory
::
data_type
::
f32
,
mkldnn
::
memory
::
format
::
oihw
);
auto
diff_dst_md
=
platform
::
MKLDNNMemDesc
(
dst_tz
,
mkldnn
::
memory
::
data_type
::
f32
,
mkldnn
::
memory
::
format
::
nchw
);
// create memory
auto
diff_dst_memory
=
mkldnn
::
memory
(
{
diff_weights_md
,
mkldnn_engine
},
reinterpret_cast
<
void
*>
(
const_cast
<
T
*>
(
output_grad_data
)));
// create mkldnn memory from input tensors (input/weights/output_grad)
auto
user_src_memory
=
memory
(
{{{
src_tz
},
memory
::
data_type
::
f32
,
input
->
format
()},
mkldnn_engine
},
to_void_cast
(
input_data
));
auto
user_weights_memory
=
memory
({{{
weights_tz
},
memory
::
data_type
::
f32
,
filter
->
format
()},
mkldnn_engine
},
to_void_cast
(
filter_data
));
auto
user_diff_dst_memory
=
memory
({{{
dst_tz
},
memory
::
data_type
::
f32
,
output_grad
->
format
()},
mkldnn_engine
},
to_void_cast
(
output_grad_data
));
/* create memory descriptor for conv backward without specified format
* ('any') which lets a primitive (conv backward in this case) choose
* the memory format preferred for best performance
*/
auto
src_md
=
platform
::
MKLDNNMemDesc
(
src_tz
,
memory
::
data_type
::
f32
,
memory
::
format
::
any
);
auto
diff_src_md
=
platform
::
MKLDNNMemDesc
(
src_tz
,
memory
::
data_type
::
f32
,
memory
::
format
::
any
);
auto
weights_md
=
platform
::
MKLDNNMemDesc
(
weights_tz
,
memory
::
data_type
::
f32
,
memory
::
format
::
any
);
auto
diff_weights_md
=
platform
::
MKLDNNMemDesc
(
weights_tz
,
memory
::
data_type
::
f32
,
memory
::
format
::
any
);
auto
diff_dst_md
=
platform
::
MKLDNNMemDesc
(
dst_tz
,
memory
::
data_type
::
f32
,
memory
::
format
::
any
);
// Retrieve conv_pd from device context
auto
conv_pd
=
std
::
static_pointer_cast
<
mkldnn
::
convolution_forward
::
primitive_desc
>
(
auto
conv_pd
=
std
::
static_pointer_cast
<
conv_fwd
::
primitive_desc
>
(
dev_ctx
.
GetBlob
(
key_conv_pd
));
PADDLE_ENFORCE
(
conv_pd
!=
nullptr
,
"Fail to find conv_pd in device context"
);
// create backward conv primitive for weights
if
(
filter_grad
)
{
// create primitive descriptor
mkldnn
::
convolution_backward_weights
::
primitive_desc
conv_bwd_weights_pd
=
ConvBwdWeightsPrimitiveDesc
(
src_md
,
diff_weights_md
,
diff_dst_md
,
strides
,
paddings
,
*
conv_pd
,
mkldnn_engine
);
// create backward convolution primitive descriptor
auto
conv_bwd_weights_desc
=
conv_bwd_weights
::
desc
(
mkldnn
::
convolution_direct
,
src_md
,
diff_weights_md
,
diff_dst_md
,
strides
,
paddings
,
paddings
,
mkldnn
::
padding_kind
::
zero
);
auto
conv_bwd_weights_pd
=
conv_bwd_weights
::
primitive_desc
(
conv_bwd_weights_desc
,
mkldnn_engine
,
*
conv_pd
);
// create reorder primitive if the input format is not the preferred one
auto
src_memory
=
user_src_memory
;
primitive
reorder_src
;
bool
is_src_reordered
=
false
;
if
(
memory
::
primitive_desc
(
conv_bwd_weights_pd
.
src_primitive_desc
())
!=
user_src_memory
.
get_primitive_desc
())
{
src_memory
=
memory
(
conv_bwd_weights_pd
.
src_primitive_desc
());
reorder_src
=
reorder
(
user_src_memory
,
src_memory
);
is_src_reordered
=
true
;
}
// create memory
auto
diff_dst_memory_4filter
=
user_diff_dst_memory
;
primitive
reorder_diff_dst_4filter
;
bool
is_diff_dst_reordered_4filter
=
false
;
if
(
memory
::
primitive_desc
(
conv_bwd_weights_pd
.
diff_dst_primitive_desc
())
!=
user_diff_dst_memory
.
get_primitive_desc
())
{
diff_dst_memory_4filter
=
memory
(
conv_bwd_weights_pd
.
diff_dst_primitive_desc
());
reorder_diff_dst_4filter
=
reorder
(
user_diff_dst_memory
,
diff_dst_memory_4filter
);
is_diff_dst_reordered_4filter
=
true
;
}
// create mkldnn memory for output (i.e. diff weights)
auto
diff_weights_memory
=
m
kldnn
::
memory
({
diff_weights_md
,
mkldnn_engine
}
,
m
emory
(
conv_bwd_weights_pd
.
diff_weights_primitive_desc
()
,
reinterpret_cast
<
void
*>
(
filter_grad_data
));
auto
src_memory
=
mkldnn
::
memory
({
src_md
,
mkldnn_engine
},
reinterpret_cast
<
void
*>
(
const_cast
<
T
*>
(
input_data
)));
// create backward conv primitive for weights
auto
conv_bwd_weights_prim
=
mkldnn
::
convolution_backward_weights
(
conv_bwd_weights
_pd
,
src_memory
,
diff_dst
_memory
,
diff_weights_memory
);
auto
conv_bwd_weights_prim
=
conv_bwd_weights
(
conv_bwd_weights_pd
,
src
_memory
,
diff_dst_memory_4filter
,
diff_weights_memory
);
// push primitive and execute it
std
::
vector
<
mkldnn
::
primitive
>
pipeline
{
conv_bwd_weights_prim
};
mkldnn
::
stream
(
mkldnn
::
stream
::
kind
::
eager
).
submit
(
pipeline
).
wait
();
std
::
vector
<
primitive
>
pipeline
;
if
(
is_src_reordered
)
pipeline
.
push_back
(
reorder_src
);
if
(
is_diff_dst_reordered_4filter
)
pipeline
.
push_back
(
reorder_diff_dst_4filter
);
pipeline
.
push_back
(
conv_bwd_weights_prim
);
stream
(
stream
::
kind
::
eager
).
submit
(
pipeline
).
wait
();
filter_grad
->
set_layout
(
DataLayout
::
kMKLDNN
);
filter_grad
->
set_format
(
GetMKLDNNFormat
(
diff_weights_memory
));
}
if
(
input_grad
)
{
// create primitive descriptor
mkldnn
::
convolution_backward_data
::
primitive_desc
conv_bwd_data_pd
=
ConvBwdDataPrimitiveDesc
(
diff_src_md
,
weights_md
,
diff_dst_md
,
strides
,
paddings
,
*
conv_pd
,
mkldnn_engine
);
// create memory
auto
diff_src_memory
=
mkldnn
::
memory
(
{
diff_src_md
,
mkldnn_engine
},
reinterpret_cast
<
void
*>
(
const_cast
<
T
*>
(
input_grad_data
)));
auto
weights_memory
=
mkldnn
::
memory
({
weights_md
,
mkldnn_engine
},
reinterpret_cast
<
void
*>
(
const_cast
<
T
*>
(
filter_data
)));
// create backward conv primitive for data
auto
conv_bwd_data_prim
=
mkldnn
::
convolution_backward_data
(
conv_bwd_data_pd
,
diff_dst_memory
,
weights_memory
,
diff_src_memory
);
// push primitive to stream and wait until it's executed
std
::
vector
<
mkldnn
::
primitive
>
pipeline
{
conv_bwd_data_prim
};
mkldnn
::
stream
(
mkldnn
::
stream
::
kind
::
eager
).
submit
(
pipeline
).
wait
();
// create backward convolution primitive descriptor
auto
conv_bwd_data_desc
=
conv_bwd_data
::
desc
(
mkldnn
::
convolution_direct
,
diff_src_md
,
weights_md
,
diff_dst_md
,
strides
,
paddings
,
paddings
,
mkldnn
::
padding_kind
::
zero
);
auto
conv_bwd_data_pd
=
conv_bwd_data
::
primitive_desc
(
conv_bwd_data_desc
,
mkldnn_engine
,
*
conv_pd
);
// create reorder primitive if the input format is not the preferred one
auto
weights_memory
=
user_weights_memory
;
primitive
reorder_weights
;
bool
is_weights_reordered
=
false
;
if
(
memory
::
primitive_desc
(
conv_bwd_data_pd
.
weights_primitive_desc
())
!=
user_weights_memory
.
get_primitive_desc
())
{
weights_memory
=
memory
(
conv_bwd_data_pd
.
weights_primitive_desc
());
reorder_weights
=
reorder
(
user_weights_memory
,
weights_memory
);
is_weights_reordered
=
true
;
}
}
// Compute()
private:
mkldnn
::
convolution_backward_weights
::
primitive_desc
ConvBwdWeightsPrimitiveDesc
(
const
mkldnn
::
memory
::
desc
&
src
,
const
mkldnn
::
memory
::
desc
&
diff_weights
,
const
mkldnn
::
memory
::
desc
&
diff_dst
,
const
std
::
vector
<
int
>&
strides
,
const
std
::
vector
<
int
>&
paddings
,
const
mkldnn
::
convolution_forward
::
primitive_desc
&
conv_pd
,
const
mkldnn
::
engine
&
engine
)
const
{
auto
conv_bwd_weights_desc
=
mkldnn
::
convolution_backward_weights
::
desc
(
mkldnn
::
convolution_direct
,
src
,
diff_weights
,
diff_dst
,
strides
,
paddings
,
paddings
,
mkldnn
::
padding_kind
::
zero
);
return
mkldnn
::
convolution_backward_weights
::
primitive_desc
(
conv_bwd_weights_desc
,
engine
,
conv_pd
);
auto
diff_dst_memory_4data
=
user_diff_dst_memory
;
primitive
reorder_diff_dst_4data
;
bool
is_diff_dst_reordered_4data
=
false
;
if
(
memory
::
primitive_desc
(
conv_bwd_data_pd
.
diff_dst_primitive_desc
())
!=
user_diff_dst_memory
.
get_primitive_desc
())
{
diff_dst_memory_4data
=
memory
(
conv_bwd_data_pd
.
diff_dst_primitive_desc
());
reorder_diff_dst_4data
=
reorder
(
user_diff_dst_memory
,
diff_dst_memory_4data
);
is_diff_dst_reordered_4data
=
true
;
}
mkldnn
::
convolution_backward_data
::
primitive_desc
ConvBwdDataPrimitiveDesc
(
const
mkldnn
::
memory
::
desc
&
diff_src
,
const
mkldnn
::
memory
::
desc
&
weights
,
const
mkldnn
::
memory
::
desc
&
diff_dst
,
const
std
::
vector
<
int
>&
strides
,
const
std
::
vector
<
int
>&
paddings
,
const
mkldnn
::
convolution_forward
::
primitive_desc
&
conv_pd
,
const
mkldnn
::
engine
&
engine
)
const
{
auto
conv_bwd_data_desc
=
mkldnn
::
convolution_backward_data
::
desc
(
mkldnn
::
convolution_direct
,
diff_src
,
weights
,
diff_dst
,
strides
,
paddings
,
paddings
,
mkldnn
::
padding_kind
::
zero
);
return
mkldnn
::
convolution_backward_data
::
primitive_desc
(
conv_bwd_data_desc
,
engine
,
conv_pd
);
// create mkldnn memory for output (i.e. diff src)
auto
diff_src_memory
=
memory
(
conv_bwd_data_pd
.
diff_src_primitive_desc
(),
reinterpret_cast
<
void
*>
(
input_grad_data
));
// create backward conv primitive for data
auto
conv_bwd_data_prim
=
conv_bwd_data
(
conv_bwd_data_pd
,
diff_dst_memory_4data
,
weights_memory
,
diff_src_memory
);
// push primitive and execute it
std
::
vector
<
primitive
>
pipeline
;
if
(
is_weights_reordered
)
pipeline
.
push_back
(
reorder_weights
);
if
(
is_diff_dst_reordered_4data
)
pipeline
.
push_back
(
reorder_diff_dst_4data
);
pipeline
.
push_back
(
conv_bwd_data_prim
);
stream
(
stream
::
kind
::
eager
).
submit
(
pipeline
).
wait
();
input_grad
->
set_layout
(
DataLayout
::
kMKLDNN
);
input_grad
->
set_format
(
GetMKLDNNFormat
(
diff_src_memory
));
}
}
// Compute()
};
}
// namespace operators
...
...
paddle/fluid/operators/conv_op.cc
浏览文件 @
29bf727e
...
...
@@ -75,9 +75,8 @@ void ConvOp::InferShape(framework::InferShapeContext* ctx) const {
framework
::
OpKernelType
ConvOp
::
GetExpectedKernelType
(
const
framework
::
ExecutionContext
&
ctx
)
const
{
framework
::
LibraryType
library
{
framework
::
LibraryType
::
kPlain
};
std
::
string
data_format
=
ctx
.
Attr
<
std
::
string
>
(
"data_format"
);
// TODO(pzelazko-intel): enable MKLDNN layout when it's ready
std
::
string
data_format
=
ctx
.
Attr
<
std
::
string
>
(
"data_format"
);
framework
::
DataLayout
layout
=
framework
::
StringToDataLayout
(
data_format
);
#ifdef PADDLE_WITH_CUDA
...
...
paddle/fluid/operators/gen_nccl_id_op.cc
浏览文件 @
29bf727e
...
...
@@ -67,6 +67,10 @@ class GenNCCLIdOp : public framework::OperatorBase {
client
->
AsyncSendVar
(
ep
,
dev_ctx
,
*
scope
,
NCCL_ID_VARNAME
);
}
client
->
Wait
();
for
(
auto
&
ep
:
endpoint_list
)
{
client
->
AsyncSendBatchBarrier
(
ep
);
}
client
->
Wait
();
VLOG
(
3
)
<<
"sending completed..."
;
}
...
...
python/paddle/batch.py
浏览文件 @
29bf727e
...
...
@@ -15,7 +15,7 @@
__all__
=
[
'batch'
]
def
batch
(
reader
,
batch_size
,
drop_last
=
Fals
e
):
def
batch
(
reader
,
batch_size
,
drop_last
=
Tru
e
):
"""
Create a batched reader.
...
...
python/paddle/fluid/layers/nn.py
浏览文件 @
29bf727e
...
...
@@ -262,9 +262,10 @@ def embedding(input,
return
tmp
# TODO(qijun): expose H0 and C0
def
dynamic_lstm
(
input
,
size
,
h_0
=
None
,
c_0
=
None
,
param_attr
=
None
,
bias_attr
=
None
,
use_peepholes
=
True
,
...
...
@@ -325,6 +326,13 @@ def dynamic_lstm(input,
(T X 4D), where T is the total time steps in this
mini-batch, D is the hidden size.
size(int): 4 * hidden size.
h_0(Variable): The initial hidden state is an optional input, default is zero.
This is a tensor with shape (N x D), where N is the
batch size and D is the hidden size.
c_0(Variable): The initial cell state is an optional input, default is zero.
This is a tensor with shape (N x D), where N is the
batch size. `h_0` and `c_0` can be NULL but only at the same time.
param_attr(ParamAttr|None): The parameter attribute for the learnable
hidden-hidden weights.
...
...
@@ -388,12 +396,20 @@ def dynamic_lstm(input,
cell
=
helper
.
create_tmp_variable
(
dtype
)
batch_gate
=
helper
.
create_tmp_variable
(
dtype
)
batch_cell_pre_act
=
helper
.
create_tmp_variable
(
dtype
)
inputs
=
{
'Input'
:
input
,
'Weight'
:
weight
,
'Bias'
:
bias
}
batch_size
=
input
.
shape
[
0
]
if
h_0
:
assert
h_0
.
shape
==
(
batch_size
,
size
),
\
'The shape of h0 should be (batch_size, %d)'
%
size
inputs
[
'H0'
]
=
h_0
if
c_0
:
assert
c_0
.
shape
==
(
batch_size
,
size
),
\
'The shape of c0 should be (batch_size, %d)'
%
size
inputs
[
'C0'
]
=
c_0
helper
.
append_op
(
type
=
'lstm'
,
inputs
=
{
'Input'
:
input
,
'Weight'
:
weight
,
'Bias'
:
bias
},
inputs
=
inputs
,
outputs
=
{
'Hidden'
:
hidden
,
'Cell'
:
cell
,
...
...
@@ -678,11 +694,13 @@ def dynamic_gru(input,
attr
=
helper
.
param_attr
,
shape
=
[
size
,
3
*
size
],
dtype
=
dtype
)
bias
=
helper
.
create_parameter
(
attr
=
helper
.
bias_attr
,
shape
=
[
1
,
3
*
size
],
dtype
=
dtype
,
is_bias
=
True
)
batch_size
=
input
.
shape
[
0
]
inputs
=
{
'Input'
:
input
,
'Weight'
:
weight
,
'Bias'
:
bias
}
if
h_0
!=
None
:
assert
h_0
.
shape
==
(
size
,
size
),
'The shape of h0 should be(%d, %d)'
%
(
size
,
size
)
inputs
[
'h0'
]
=
h_0
batch_size
,
size
),
'The shape of h0 should be(batch_size, %d)'
%
size
inputs
[
'H0'
]
=
h_0
hidden
=
helper
.
create_tmp_variable
(
dtype
)
batch_gate
=
helper
.
create_tmp_variable
(
dtype
)
...
...
python/paddle/fluid/tests/book/high-level-api/image_classification/test_image_classification_resnet.py
浏览文件 @
29bf727e
...
...
@@ -96,10 +96,11 @@ def train(use_cuda, train_program, params_dirname):
train_reader
=
paddle
.
batch
(
paddle
.
reader
.
shuffle
(
cifar10_small_test_set
.
train10
(
batch_size
=
10
),
buf_size
=
128
*
10
),
batch_size
=
BATCH_SIZE
)
batch_size
=
BATCH_SIZE
,
drop_last
=
False
)
test_reader
=
paddle
.
batch
(
paddle
.
dataset
.
cifar
.
test10
(),
batch_size
=
BATCH_SIZE
)
paddle
.
dataset
.
cifar
.
test10
(),
batch_size
=
BATCH_SIZE
,
drop_last
=
False
)
def
event_handler
(
event
):
if
isinstance
(
event
,
fluid
.
EndStepEvent
):
...
...
python/paddle/fluid/tests/book/high-level-api/image_classification/test_image_classification_vgg.py
浏览文件 @
29bf727e
...
...
@@ -73,10 +73,11 @@ def train(use_cuda, train_program, params_dirname):
train_reader
=
paddle
.
batch
(
paddle
.
reader
.
shuffle
(
cifar10_small_test_set
.
train10
(
batch_size
=
10
),
buf_size
=
128
*
10
),
batch_size
=
BATCH_SIZE
)
batch_size
=
BATCH_SIZE
,
drop_last
=
False
)
test_reader
=
paddle
.
batch
(
paddle
.
dataset
.
cifar
.
test10
(),
batch_size
=
BATCH_SIZE
)
paddle
.
dataset
.
cifar
.
test10
(),
batch_size
=
BATCH_SIZE
,
drop_last
=
False
)
def
event_handler
(
event
):
if
isinstance
(
event
,
fluid
.
EndStepEvent
):
...
...
python/paddle/fluid/tests/book/high-level-api/understand_sentiment/test_understand_sentiment_stacked_lstm.py
浏览文件 @
29bf727e
...
...
@@ -87,7 +87,9 @@ def train(use_cuda, train_program, params_dirname):
def
event_handler
(
event
):
if
isinstance
(
event
,
fluid
.
EndEpochEvent
):
test_reader
=
paddle
.
batch
(
paddle
.
dataset
.
imdb
.
test
(
word_dict
),
batch_size
=
BATCH_SIZE
)
paddle
.
dataset
.
imdb
.
test
(
word_dict
),
batch_size
=
BATCH_SIZE
,
drop_last
=
False
)
avg_cost
,
acc
=
trainer
.
test
(
reader
=
test_reader
,
feed_order
=
[
'words'
,
'label'
])
...
...
@@ -113,7 +115,8 @@ def train(use_cuda, train_program, params_dirname):
train_reader
=
paddle
.
batch
(
paddle
.
reader
.
shuffle
(
paddle
.
dataset
.
imdb
.
train
(
word_dict
),
buf_size
=
25000
),
batch_size
=
BATCH_SIZE
)
batch_size
=
BATCH_SIZE
,
drop_last
=
False
)
trainer
.
train
(
num_epochs
=
1
,
...
...
python/paddle/fluid/tests/book_memory_optimization/test_memopt_fit_a_line.py
浏览文件 @
29bf727e
...
...
@@ -56,7 +56,7 @@ BATCH_SIZE = 200
# fix the order of training data
train_reader
=
paddle
.
batch
(
paddle
.
dataset
.
uci_housing
.
train
(),
batch_size
=
BATCH_SIZE
)
paddle
.
dataset
.
uci_housing
.
train
(),
batch_size
=
BATCH_SIZE
,
drop_last
=
False
)
# train_reader = paddle.batch(
# paddle.reader.shuffle(
...
...
python/paddle/trainer_config_helpers/attrs.py
浏览文件 @
29bf727e
...
...
@@ -240,14 +240,15 @@ class ExtraLayerAttribute(object):
:type error_clipping_threshold: float
:param drop_rate: Dropout rate. Dropout will create a mask on layer output.
The dropout rate is the zero rate of this mask. The
details of what dropout is please refer to `
here
<https://www.cs.toronto.edu/~hinton/absps/
JMLRdropout.pdf
>`_.
details of what dropout is please refer to `
JMLRdropout
<https://www.cs.toronto.edu/~hinton/absps/
JMLRdropout.pdf
>`_.
:type drop_rate: float
:param device: device ID of layer. device=-1, use CPU. device>=0, use GPU.
The details allocation in parallel_nn please refer to `here
<http://www.paddlepaddle.org/doc/ui/cmd_argument/
use_case.html#case-2-specify-layers-in-different-devices>`_.
The details allocation in parallel_nn please refer to `use_case
<https://github.com/PaddlePaddle/Paddle/blob/develop/doc/v2
/howto/cmd_parameter/use_case_en.md#case-2-specify-layers-in
-different-devices>`_.
:type device: int
"""
...
...
python/paddle/trainer_config_helpers/layers.py
浏览文件 @
29bf727e
...
...
@@ -2556,7 +2556,7 @@ def img_conv_layer(input,
the output will be obtained by concatenating the two results.
The details of grouped convolution, please refer to:
`ImageNet Classification
w
ith Deep Convolutional Neural Networks
`ImageNet Classification
W
ith Deep Convolutional Neural Networks
<http://www.cs.toronto.edu/~kriz/imagenet_classification_with_deep_convolutional.pdf>`_
The example usage is:
...
...
@@ -5678,8 +5678,8 @@ def warp_ctc_layer(input,
<https://github.com/baidu-research/warp-ctc>`_ library, which is used in
`Deep Speech 2: End-toEnd Speech Recognition in English and Mandarin
<https://arxiv.org/pdf/1512.02595v1.pdf>`_, to compute Connectionist Temporal
Classification (CTC) loss. Besides, another `warp-ctc
<https://github.com/gangliao/warp-ctc>`_
repository
, which is forked from
Classification (CTC) loss. Besides, another `warp-ctc
repository
<https://github.com/gangliao/warp-ctc>`_ , which is forked from
the official one, is maintained to enable more compiling options. During the
building process, PaddlePaddle will clone the source codes, build and
install it to :code:`third_party/install/warpctc` directory.
...
...
python/paddle/v2/minibatch.py
浏览文件 @
29bf727e
...
...
@@ -15,7 +15,7 @@
__all__
=
[
'batch'
]
def
batch
(
reader
,
batch_size
,
drop_last
=
Fals
e
):
def
batch
(
reader
,
batch_size
,
drop_last
=
Tru
e
):
"""
Create a batched reader.
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录