提交 29bf727e 编写于 作者: F fengjiayi

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into dev_add_doc

...@@ -101,7 +101,7 @@ value_printer ...@@ -101,7 +101,7 @@ value_printer
:noindex: :noindex:
Detection Detection
===== ==========
detection_map detection_map
------------- -------------
......
...@@ -11,7 +11,7 @@ Data layer ...@@ -11,7 +11,7 @@ Data layer
data data
---- ----
.. autoclass:: paddle.v2.layer.data .. autofunction:: paddle.v2.layer.data
:noindex: :noindex:
Fully Connected Layers Fully Connected Layers
...@@ -21,12 +21,12 @@ Fully Connected Layers ...@@ -21,12 +21,12 @@ Fully Connected Layers
fc fc
-- --
.. autoclass:: paddle.v2.layer.fc .. autofunction:: paddle.v2.layer.fc
:noindex: :noindex:
selective_fc selective_fc
------------ ------------
.. autoclass:: paddle.v2.layer.selective_fc .. autofunction:: paddle.v2.layer.selective_fc
:noindex: :noindex:
Conv Layers Conv Layers
...@@ -34,34 +34,34 @@ Conv Layers ...@@ -34,34 +34,34 @@ Conv Layers
conv_operator conv_operator
------------- -------------
.. autoclass:: paddle.v2.layer.conv_operator .. autofunction:: paddle.v2.layer.conv_operator
:noindex: :noindex:
conv_projection conv_projection
--------------- ---------------
.. autoclass:: paddle.v2.layer.conv_projection .. autofunction:: paddle.v2.layer.conv_projection
:noindex: :noindex:
conv_shift conv_shift
---------- ----------
.. autoclass:: paddle.v2.layer.conv_shift .. autofunction:: paddle.v2.layer.conv_shift
:noindex: :noindex:
img_conv img_conv
-------- --------
.. autoclass:: paddle.v2.layer.img_conv .. autofunction:: paddle.v2.layer.img_conv
:noindex: :noindex:
.. _api_v2.layer_context_projection: .. _api_v2.layer_context_projection:
context_projection context_projection
------------------ ------------------
.. autoclass:: paddle.v2.layer.context_projection .. autofunction:: paddle.v2.layer.context_projection
:noindex: :noindex:
row_conv row_conv
-------- --------
.. autoclass:: paddle.v2.layer.row_conv .. autofunction:: paddle.v2.layer.row_conv
:noindex: :noindex:
Image Pooling Layer Image Pooling Layer
...@@ -69,27 +69,27 @@ Image Pooling Layer ...@@ -69,27 +69,27 @@ Image Pooling Layer
img_pool img_pool
-------- --------
.. autoclass:: paddle.v2.layer.img_pool .. autofunction:: paddle.v2.layer.img_pool
:noindex: :noindex:
spp spp
--- ---
.. autoclass:: paddle.v2.layer.spp .. autofunction:: paddle.v2.layer.spp
:noindex: :noindex:
maxout maxout
------ ------
.. autoclass:: paddle.v2.layer.maxout .. autofunction:: paddle.v2.layer.maxout
:noindex: :noindex:
roi_pool roi_pool
-------- --------
.. autoclass:: paddle.v2.layer.roi_pool .. autofunction:: paddle.v2.layer.roi_pool
:noindex: :noindex:
pad pad
---- ----
.. autoclass:: paddle.v2.layer.pad .. autofunction:: paddle.v2.layer.pad
:noindex: :noindex:
Norm Layer Norm Layer
...@@ -97,27 +97,27 @@ Norm Layer ...@@ -97,27 +97,27 @@ Norm Layer
img_cmrnorm img_cmrnorm
----------- -----------
.. autoclass:: paddle.v2.layer.img_cmrnorm .. autofunction:: paddle.v2.layer.img_cmrnorm
:noindex: :noindex:
batch_norm batch_norm
---------- ----------
.. autoclass:: paddle.v2.layer.batch_norm .. autofunction:: paddle.v2.layer.batch_norm
:noindex: :noindex:
sum_to_one_norm sum_to_one_norm
--------------- ---------------
.. autoclass:: paddle.v2.layer.sum_to_one_norm .. autofunction:: paddle.v2.layer.sum_to_one_norm
:noindex: :noindex:
cross_channel_norm cross_channel_norm
------------------ ------------------
.. autoclass:: paddle.v2.layer.cross_channel_norm .. autofunction:: paddle.v2.layer.cross_channel_norm
:noindex: :noindex:
row_l2_norm row_l2_norm
----------- -----------
.. autoclass:: paddle.v2.layer.row_l2_norm .. autofunction:: paddle.v2.layer.row_l2_norm
:noindex: :noindex:
Recurrent Layers Recurrent Layers
...@@ -125,22 +125,22 @@ Recurrent Layers ...@@ -125,22 +125,22 @@ Recurrent Layers
recurrent recurrent
--------- ---------
.. autoclass:: paddle.v2.layer.recurrent .. autofunction:: paddle.v2.layer.recurrent
:noindex: :noindex:
lstmemory lstmemory
--------- ---------
.. autoclass:: paddle.v2.layer.lstmemory .. autofunction:: paddle.v2.layer.lstmemory
:noindex: :noindex:
grumemory grumemory
--------- ---------
.. autoclass:: paddle.v2.layer.grumemory .. autofunction:: paddle.v2.layer.grumemory
:noindex: :noindex:
gated_unit gated_unit
----------- -----------
.. autoclass:: paddle.v2.layer.gated_unit .. autofunction:: paddle.v2.layer.gated_unit
:noindex: :noindex:
Recurrent Layer Group Recurrent Layer Group
...@@ -148,32 +148,32 @@ Recurrent Layer Group ...@@ -148,32 +148,32 @@ Recurrent Layer Group
memory memory
------ ------
.. autoclass:: paddle.v2.layer.memory .. autofunction:: paddle.v2.layer.memory
:noindex: :noindex:
recurrent_group recurrent_group
--------------- ---------------
.. autoclass:: paddle.v2.layer.recurrent_group .. autofunction:: paddle.v2.layer.recurrent_group
:noindex: :noindex:
lstm_step lstm_step
--------- ---------
.. autoclass:: paddle.v2.layer.lstm_step .. autofunction:: paddle.v2.layer.lstm_step
:noindex: :noindex:
gru_step gru_step
-------- --------
.. autoclass:: paddle.v2.layer.gru_step .. autofunction:: paddle.v2.layer.gru_step
:noindex: :noindex:
beam_search beam_search
------------ ------------
.. autoclass:: paddle.v2.layer.beam_search .. autofunction:: paddle.v2.layer.beam_search
:noindex: :noindex:
get_output get_output
---------- ----------
.. autoclass:: paddle.v2.layer.get_output .. autofunction:: paddle.v2.layer.get_output
:noindex: :noindex:
Mixed Layer Mixed Layer
...@@ -183,54 +183,54 @@ Mixed Layer ...@@ -183,54 +183,54 @@ Mixed Layer
mixed mixed
----- -----
.. autoclass:: paddle.v2.layer.mixed .. autofunction:: paddle.v2.layer.mixed
:noindex: :noindex:
.. _api_v2.layer_embedding: .. _api_v2.layer_embedding:
embedding embedding
--------- ---------
.. autoclass:: paddle.v2.layer.embedding .. autofunction:: paddle.v2.layer.embedding
:noindex: :noindex:
scaling_projection scaling_projection
------------------ ------------------
.. autoclass:: paddle.v2.layer.scaling_projection .. autofunction:: paddle.v2.layer.scaling_projection
:noindex: :noindex:
dotmul_projection dotmul_projection
----------------- -----------------
.. autoclass:: paddle.v2.layer.dotmul_projection .. autofunction:: paddle.v2.layer.dotmul_projection
:noindex: :noindex:
dotmul_operator dotmul_operator
--------------- ---------------
.. autoclass:: paddle.v2.layer.dotmul_operator .. autofunction:: paddle.v2.layer.dotmul_operator
:noindex: :noindex:
full_matrix_projection full_matrix_projection
---------------------- ----------------------
.. autoclass:: paddle.v2.layer.full_matrix_projection .. autofunction:: paddle.v2.layer.full_matrix_projection
:noindex: :noindex:
identity_projection identity_projection
------------------- -------------------
.. autoclass:: paddle.v2.layer.identity_projection .. autofunction:: paddle.v2.layer.identity_projection
:noindex: :noindex:
slice_projection slice_projection
------------------- -------------------
.. autoclass:: paddle.v2.layer.slice_projection .. autofunction:: paddle.v2.layer.slice_projection
:noindex: :noindex:
table_projection table_projection
---------------- ----------------
.. autoclass:: paddle.v2.layer.table_projection .. autofunction:: paddle.v2.layer.table_projection
:noindex: :noindex:
trans_full_matrix_projection trans_full_matrix_projection
---------------------------- ----------------------------
.. autoclass:: paddle.v2.layer.trans_full_matrix_projection .. autofunction:: paddle.v2.layer.trans_full_matrix_projection
:noindex: :noindex:
Aggregate Layers Aggregate Layers
...@@ -245,51 +245,46 @@ AggregateLevel ...@@ -245,51 +245,46 @@ AggregateLevel
pooling pooling
------- -------
.. autoclass:: paddle.v2.layer.pooling .. autofunction:: paddle.v2.layer.pooling
:noindex: :noindex:
.. _api_v2.layer_last_seq: .. _api_v2.layer_last_seq:
last_seq last_seq
-------- --------
.. autoclass:: paddle.v2.layer.last_seq .. autofunction:: paddle.v2.layer.last_seq
:noindex: :noindex:
.. _api_v2.layer_first_seq: .. _api_v2.layer_first_seq:
first_seq first_seq
--------- ---------
.. autoclass:: paddle.v2.layer.first_seq .. autofunction:: paddle.v2.layer.first_seq
:noindex: :noindex:
sub_seq sub_seq
--------- ---------
.. autoclass:: paddle.v2.layer.sub_seq .. autofunction:: paddle.v2.layer.sub_seq
:noindex: :noindex:
concat concat
------ ------
.. autoclass:: paddle.v2.layer.concat .. autofunction:: paddle.v2.layer.concat
:noindex: :noindex:
seq_concat seq_concat
---------- ----------
.. autoclass:: paddle.v2.layer.seq_concat .. autofunction:: paddle.v2.layer.seq_concat
:noindex: :noindex:
seq_slice seq_slice
--------- ---------
.. autoclass:: paddle.v2.layer.seq_slice .. autofunction:: paddle.v2.layer.seq_slice
:noindex:
kmax_sequence_score
-------------------
.. autoclass:: paddle.v2.layer.kmax_sequence_score
:noindex: :noindex:
sub_nested_seq sub_nested_seq
-------------- --------------
.. autoclass:: paddle.v2.layer.sub_nested_seq .. autofunction:: paddle.v2.layer.sub_nested_seq
:noindex: :noindex:
Reshaping Layers Reshaping Layers
...@@ -297,7 +292,7 @@ Reshaping Layers ...@@ -297,7 +292,7 @@ Reshaping Layers
block_expand block_expand
------------ ------------
.. autoclass:: paddle.v2.layer.block_expand .. autofunction:: paddle.v2.layer.block_expand
:noindex: :noindex:
.. _api_v2.layer_expand: .. _api_v2.layer_expand:
...@@ -309,22 +304,22 @@ ExpandLevel ...@@ -309,22 +304,22 @@ ExpandLevel
expand expand
------ ------
.. autoclass:: paddle.v2.layer.expand .. autofunction:: paddle.v2.layer.expand
:noindex: :noindex:
repeat repeat
------ ------
.. autoclass:: paddle.v2.layer.repeat .. autofunction:: paddle.v2.layer.repeat
:noindex: :noindex:
rotate rotate
------ ------
.. autoclass:: paddle.v2.layer.rotate .. autofunction:: paddle.v2.layer.rotate
:noindex: :noindex:
seq_reshape seq_reshape
----------- -----------
.. autoclass:: paddle.v2.layer.seq_reshape .. autofunction:: paddle.v2.layer.seq_reshape
:noindex: :noindex:
Math Layers Math Layers
...@@ -332,94 +327,94 @@ Math Layers ...@@ -332,94 +327,94 @@ Math Layers
addto addto
----- -----
.. autoclass:: paddle.v2.layer.addto .. autofunction:: paddle.v2.layer.addto
:noindex: :noindex:
linear_comb linear_comb
----------- -----------
.. autoclass:: paddle.v2.layer.linear_comb .. autofunction:: paddle.v2.layer.linear_comb
:noindex: :noindex:
interpolation interpolation
------------- -------------
.. autoclass:: paddle.v2.layer.interpolation .. autofunction:: paddle.v2.layer.interpolation
:noindex: :noindex:
bilinear_interp bilinear_interp
--------------- ---------------
.. autoclass:: paddle.v2.layer.bilinear_interp .. autofunction:: paddle.v2.layer.bilinear_interp
:noindex: :noindex:
dropout dropout
-------- --------
.. autoclass:: paddle.v2.layer.dropout .. autofunction:: paddle.v2.layer.dropout
:noindex: :noindex:
dot_prod dot_prod
--------- ---------
.. autoclass:: paddle.v2.layer.dot_prod .. autofunction:: paddle.v2.layer.dot_prod
:noindex: :noindex:
out_prod out_prod
-------- --------
.. autoclass:: paddle.v2.layer.out_prod .. autofunction:: paddle.v2.layer.out_prod
:noindex: :noindex:
power power
----- -----
.. autoclass:: paddle.v2.layer.power .. autofunction:: paddle.v2.layer.power
:noindex: :noindex:
scaling scaling
------- -------
.. autoclass:: paddle.v2.layer.scaling .. autofunction:: paddle.v2.layer.scaling
:noindex: :noindex:
clip clip
---- ----
.. autoclass:: paddle.v2.layer.clip .. autofunction:: paddle.v2.layer.clip
:noindex: :noindex:
resize resize
------ ------
.. autoclass:: paddle.v2.layer.resize .. autofunction:: paddle.v2.layer.resize
:noindex: :noindex:
slope_intercept slope_intercept
--------------- ---------------
.. autoclass:: paddle.v2.layer.slope_intercept .. autofunction:: paddle.v2.layer.slope_intercept
:noindex: :noindex:
tensor tensor
------ ------
.. autoclass:: paddle.v2.layer.tensor .. autofunction:: paddle.v2.layer.tensor
:noindex: :noindex:
.. _api_v2.layer_cos_sim: .. _api_v2.layer_cos_sim:
cos_sim cos_sim
------- -------
.. autoclass:: paddle.v2.layer.cos_sim .. autofunction:: paddle.v2.layer.cos_sim
:noindex: :noindex:
l2_distance l2_distance
----------- -----------
.. autoclass:: paddle.v2.layer.l2_distance .. autofunction:: paddle.v2.layer.l2_distance
:noindex: :noindex:
trans trans
----- -----
.. autoclass:: paddle.v2.layer.trans .. autofunction:: paddle.v2.layer.trans
:noindex: :noindex:
scale_shift scale_shift
----------- -----------
.. autoclass:: paddle.v2.layer.scale_shift .. autofunction:: paddle.v2.layer.scale_shift
:noindex: :noindex:
factorization_machine factorization_machine
--------------------- ---------------------
.. autoclass:: paddle.v2.layer.factorization_machine .. autofunction:: paddle.v2.layer.factorization_machine
:noindex: :noindex:
Sampling Layers Sampling Layers
...@@ -427,17 +422,17 @@ Sampling Layers ...@@ -427,17 +422,17 @@ Sampling Layers
maxid maxid
----- -----
.. autoclass:: paddle.v2.layer.max_id .. autofunction:: paddle.v2.layer.max_id
:noindex: :noindex:
sampling_id sampling_id
----------- -----------
.. autoclass:: paddle.v2.layer.sampling_id .. autofunction:: paddle.v2.layer.sampling_id
:noindex: :noindex:
multiplex multiplex
--------- ---------
.. autoclass:: paddle.v2.layer.multiplex .. autofunction:: paddle.v2.layer.multiplex
:noindex: :noindex:
.. _api_v2.layer_costs: .. _api_v2.layer_costs:
...@@ -447,97 +442,97 @@ Cost Layers ...@@ -447,97 +442,97 @@ Cost Layers
cross_entropy_cost cross_entropy_cost
------------------ ------------------
.. autoclass:: paddle.v2.layer.cross_entropy_cost .. autofunction:: paddle.v2.layer.cross_entropy_cost
:noindex: :noindex:
cross_entropy_with_selfnorm_cost cross_entropy_with_selfnorm_cost
-------------------------------- --------------------------------
.. autoclass:: paddle.v2.layer.cross_entropy_with_selfnorm_cost .. autofunction:: paddle.v2.layer.cross_entropy_with_selfnorm_cost
:noindex: :noindex:
multi_binary_label_cross_entropy_cost multi_binary_label_cross_entropy_cost
------------------------------------- -------------------------------------
.. autoclass:: paddle.v2.layer.multi_binary_label_cross_entropy_cost .. autofunction:: paddle.v2.layer.multi_binary_label_cross_entropy_cost
:noindex: :noindex:
classification_cost classification_cost
------------------- -------------------
.. autoclass:: paddle.v2.layer.classification_cost .. autofunction:: paddle.v2.layer.classification_cost
:noindex: :noindex:
huber_regression_cost huber_regression_cost
------------------------- -------------------------
.. autoclass:: paddle.v2.layer.huber_regression_cost .. autofunction:: paddle.v2.layer.huber_regression_cost
:noindex: :noindex:
huber_classification_cost huber_classification_cost
------------------------- -------------------------
.. autoclass:: paddle.v2.layer.huber_classification_cost .. autofunction:: paddle.v2.layer.huber_classification_cost
:noindex: :noindex:
lambda_cost lambda_cost
----------- -----------
.. autoclass:: paddle.v2.layer.lambda_cost .. autofunction:: paddle.v2.layer.lambda_cost
:noindex: :noindex:
square_error_cost square_error_cost
----------------- -----------------
.. autoclass:: paddle.v2.layer.square_error_cost .. autofunction:: paddle.v2.layer.square_error_cost
:noindex: :noindex:
rank_cost rank_cost
--------- ---------
.. autoclass:: paddle.v2.layer.rank_cost .. autofunction:: paddle.v2.layer.rank_cost
:noindex: :noindex:
sum_cost sum_cost
--------- ---------
.. autoclass:: paddle.v2.layer.sum_cost .. autofunction:: paddle.v2.layer.sum_cost
:noindex: :noindex:
crf crf
--- ---
.. autoclass:: paddle.v2.layer.crf .. autofunction:: paddle.v2.layer.crf
:noindex: :noindex:
crf_decoding crf_decoding
------------ ------------
.. autoclass:: paddle.v2.layer.crf_decoding .. autofunction:: paddle.v2.layer.crf_decoding
:noindex: :noindex:
ctc ctc
--- ---
.. autoclass:: paddle.v2.layer.ctc .. autofunction:: paddle.v2.layer.ctc
:noindex: :noindex:
warp_ctc warp_ctc
-------- --------
.. autoclass:: paddle.v2.layer.warp_ctc .. autofunction:: paddle.v2.layer.warp_ctc
:noindex: :noindex:
nce nce
--- ---
.. autoclass:: paddle.v2.layer.nce .. autofunction:: paddle.v2.layer.nce
:noindex: :noindex:
hsigmoid hsigmoid
--------- ---------
.. autoclass:: paddle.v2.layer.hsigmoid .. autofunction:: paddle.v2.layer.hsigmoid
:noindex: :noindex:
smooth_l1_cost smooth_l1_cost
-------------- --------------
.. autoclass:: paddle.v2.layer.smooth_l1_cost .. autofunction:: paddle.v2.layer.smooth_l1_cost
:noindex: :noindex:
multibox_loss multibox_loss
-------------- --------------
.. autoclass:: paddle.v2.layer.multibox_loss .. autofunction:: paddle.v2.layer.multibox_loss
:noindex: :noindex:
detection_output detection_output
---------------- ----------------
.. autoclass:: paddle.v2.layer.detection_output .. autofunction:: paddle.v2.layer.detection_output
:noindex: :noindex:
Check Layer Check Layer
...@@ -545,7 +540,7 @@ Check Layer ...@@ -545,7 +540,7 @@ Check Layer
eos eos
--- ---
.. autoclass:: paddle.v2.layer.eos .. autofunction:: paddle.v2.layer.eos
:noindex: :noindex:
Activation Activation
...@@ -553,5 +548,5 @@ Activation ...@@ -553,5 +548,5 @@ Activation
prelu prelu
-------- --------
.. autoclass:: paddle.v2.layer.prelu .. autofunction:: paddle.v2.layer.prelu
:noindex: :noindex:
...@@ -8,4 +8,3 @@ API ...@@ -8,4 +8,3 @@ API
model_configs.rst model_configs.rst
data.rst data.rst
run_logic.rst run_logic.rst
fluid/index.rst
...@@ -60,6 +60,7 @@ paddlepaddle-gpu==0.11.0 使用CUDA 7.5和cuDNN 5编译的0.11.0版 ...@@ -60,6 +60,7 @@ paddlepaddle-gpu==0.11.0 使用CUDA 7.5和cuDNN 5编译的0.11.0版
"cpu_noavx_openblas", "`paddlepaddle-latest-cp27-cp27mu-linux_x86_64.whl <https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_CpuNoavxOpenblas/.lastSuccessful/paddlepaddle-latest-cp27-cp27mu-linux_x86_64.whl>`__", "`paddlepaddle-latest-cp27-cp27m-linux_x86_64.whl <https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_CpuNoavxOpenblas/.lastSuccessful/paddlepaddle-latest-cp27-cp27m-linux_x86_64.whl>`_" "cpu_noavx_openblas", "`paddlepaddle-latest-cp27-cp27mu-linux_x86_64.whl <https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_CpuNoavxOpenblas/.lastSuccessful/paddlepaddle-latest-cp27-cp27mu-linux_x86_64.whl>`__", "`paddlepaddle-latest-cp27-cp27m-linux_x86_64.whl <https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_CpuNoavxOpenblas/.lastSuccessful/paddlepaddle-latest-cp27-cp27m-linux_x86_64.whl>`_"
"cuda8.0_cudnn5_avx_mkl", "`paddlepaddle_gpu-latest-cp27-cp27mu-linux_x86_64.whl <https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_Cuda80cudnn5cp27cp27mu/.lastSuccessful/paddlepaddle_gpu-latest-cp27-cp27mu-linux_x86_64.whl>`__", "`paddlepaddle_gpu-latest-cp27-cp27m-linux_x86_64.whl <https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_Cuda80cudnn5cp27cp27mu/.lastSuccessful/paddlepaddle_gpu-latest-cp27-cp27m-linux_x86_64.whl>`__" "cuda8.0_cudnn5_avx_mkl", "`paddlepaddle_gpu-latest-cp27-cp27mu-linux_x86_64.whl <https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_Cuda80cudnn5cp27cp27mu/.lastSuccessful/paddlepaddle_gpu-latest-cp27-cp27mu-linux_x86_64.whl>`__", "`paddlepaddle_gpu-latest-cp27-cp27m-linux_x86_64.whl <https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_Cuda80cudnn5cp27cp27mu/.lastSuccessful/paddlepaddle_gpu-latest-cp27-cp27m-linux_x86_64.whl>`__"
"cuda8.0_cudnn7_avx_mkl", "`paddlepaddle_gpu-latest-cp27-cp27mu-linux_x86_64.whl <https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_Cuda8cudnn7cp27cp27mu/.lastSuccessful/paddlepaddle_gpu-latest-cp27-cp27mu-linux_x86_64.whl>`__", "`paddlepaddle_gpu-latest-cp27-cp27m-linux_x86_64.whl <https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_Cuda8cudnn7cp27cp27mu/.lastSuccessful/paddlepaddle_gpu-latest-cp27-cp27m-linux_x86_64.whl>`__" "cuda8.0_cudnn7_avx_mkl", "`paddlepaddle_gpu-latest-cp27-cp27mu-linux_x86_64.whl <https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_Cuda8cudnn7cp27cp27mu/.lastSuccessful/paddlepaddle_gpu-latest-cp27-cp27mu-linux_x86_64.whl>`__", "`paddlepaddle_gpu-latest-cp27-cp27m-linux_x86_64.whl <https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_Cuda8cudnn7cp27cp27mu/.lastSuccessful/paddlepaddle_gpu-latest-cp27-cp27m-linux_x86_64.whl>`__"
"cuda9.0_cudnn7_avx_mkl", "`paddlepaddle_gpu-latest-cp27-cp27mu-linux_x86_64.whl <https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_Cuda90cudnn7avxMkl/.lastSuccessful/paddlepaddle_gpu-latest-cp27-cp27mu-linux_x86_64.whl>`__", "`paddlepaddle_gpu-latest-cp27-cp27m-linux_x86_64.whl <https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_Cuda90cudnn7avxMkl/.lastSuccessful/paddlepaddle_gpu-latest-cp27-cp27m-linux_x86_64.whl>`__"
.. _pip_dependency: .. _pip_dependency:
......
...@@ -63,6 +63,7 @@ If the links below shows up the login form, just click "Log in as guest" to star ...@@ -63,6 +63,7 @@ If the links below shows up the login form, just click "Log in as guest" to star
"cpu_noavx_openblas", "`paddlepaddle-latest-cp27-cp27mu-linux_x86_64.whl <https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_CpuNoavxOpenblas/.lastSuccessful/paddlepaddle-latest-cp27-cp27mu-linux_x86_64.whl>`__", "`paddlepaddle-latest-cp27-cp27m-linux_x86_64.whl <https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_CpuNoavxOpenblas/.lastSuccessful/paddlepaddle-latest-cp27-cp27m-linux_x86_64.whl>`__" "cpu_noavx_openblas", "`paddlepaddle-latest-cp27-cp27mu-linux_x86_64.whl <https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_CpuNoavxOpenblas/.lastSuccessful/paddlepaddle-latest-cp27-cp27mu-linux_x86_64.whl>`__", "`paddlepaddle-latest-cp27-cp27m-linux_x86_64.whl <https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_CpuNoavxOpenblas/.lastSuccessful/paddlepaddle-latest-cp27-cp27m-linux_x86_64.whl>`__"
"cuda8.0_cudnn5_avx_mkl", "`paddlepaddle_gpu-latest-cp27-cp27mu-linux_x86_64.whl <https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_Cuda80cudnn5cp27cp27mu/.lastSuccessful/paddlepaddle_gpu-latest-cp27-cp27mu-linux_x86_64.whl>`__", "`paddlepaddle_gpu-latest-cp27-cp27m-linux_x86_64.whl <https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_Cuda80cudnn5cp27cp27mu/.lastSuccessful/paddlepaddle_gpu-latest-cp27-cp27m-linux_x86_64.whl>`__" "cuda8.0_cudnn5_avx_mkl", "`paddlepaddle_gpu-latest-cp27-cp27mu-linux_x86_64.whl <https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_Cuda80cudnn5cp27cp27mu/.lastSuccessful/paddlepaddle_gpu-latest-cp27-cp27mu-linux_x86_64.whl>`__", "`paddlepaddle_gpu-latest-cp27-cp27m-linux_x86_64.whl <https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_Cuda80cudnn5cp27cp27mu/.lastSuccessful/paddlepaddle_gpu-latest-cp27-cp27m-linux_x86_64.whl>`__"
"cuda8.0_cudnn7_avx_mkl", "`paddlepaddle_gpu-latest-cp27-cp27mu-linux_x86_64.whl <https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_Cuda8cudnn7cp27cp27mu/.lastSuccessful/paddlepaddle_gpu-latest-cp27-cp27mu-linux_x86_64.whl>`__", "`paddlepaddle_gpu-latest-cp27-cp27m-linux_x86_64.whl <https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_Cuda8cudnn7cp27cp27mu/.lastSuccessful/paddlepaddle_gpu-latest-cp27-cp27m-linux_x86_64.whl>`__" "cuda8.0_cudnn7_avx_mkl", "`paddlepaddle_gpu-latest-cp27-cp27mu-linux_x86_64.whl <https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_Cuda8cudnn7cp27cp27mu/.lastSuccessful/paddlepaddle_gpu-latest-cp27-cp27mu-linux_x86_64.whl>`__", "`paddlepaddle_gpu-latest-cp27-cp27m-linux_x86_64.whl <https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_Cuda8cudnn7cp27cp27mu/.lastSuccessful/paddlepaddle_gpu-latest-cp27-cp27m-linux_x86_64.whl>`__"
"cuda9.0_cudnn7_avx_mkl", "`paddlepaddle_gpu-latest-cp27-cp27mu-linux_x86_64.whl <https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_Cuda90cudnn7avxMkl/.lastSuccessful/paddlepaddle_gpu-latest-cp27-cp27mu-linux_x86_64.whl>`__", "`paddlepaddle_gpu-latest-cp27-cp27m-linux_x86_64.whl <https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_Cuda90cudnn7avxMkl/.lastSuccessful/paddlepaddle_gpu-latest-cp27-cp27m-linux_x86_64.whl>`__"
.. _pip_dependency: .. _pip_dependency:
......
...@@ -18,6 +18,17 @@ ...@@ -18,6 +18,17 @@
namespace paddle { namespace paddle {
namespace operators { namespace operators {
using conv_bwd_data = mkldnn::convolution_backward_data;
using conv_bwd_weights = mkldnn::convolution_backward_weights;
using conv_fwd = mkldnn::convolution_forward;
using framework::DataLayout;
using mkldnn::memory;
using mkldnn::primitive;
using mkldnn::reorder;
using mkldnn::stream;
using platform::to_void_cast;
using platform::GetMKLDNNFormat;
template <typename T> template <typename T>
class ConvMKLDNNOpKernel : public paddle::framework::OpKernel<T> { class ConvMKLDNNOpKernel : public paddle::framework::OpKernel<T> {
public: public:
...@@ -25,6 +36,10 @@ class ConvMKLDNNOpKernel : public paddle::framework::OpKernel<T> { ...@@ -25,6 +36,10 @@ class ConvMKLDNNOpKernel : public paddle::framework::OpKernel<T> {
PADDLE_ENFORCE(paddle::platform::is_cpu_place(ctx.GetPlace()), PADDLE_ENFORCE(paddle::platform::is_cpu_place(ctx.GetPlace()),
"It must use CPUPlace."); "It must use CPUPlace.");
// Get unique name for index
const std::string key = ctx.op().Output("Output");
const std::string key_conv_pd = key + "@conv_pd";
auto& dev_ctx = auto& dev_ctx =
ctx.template device_context<paddle::platform::MKLDNNDeviceContext>(); ctx.template device_context<paddle::platform::MKLDNNDeviceContext>();
const auto& mkldnn_engine = dev_ctx.GetEngine(); const auto& mkldnn_engine = dev_ctx.GetEngine();
...@@ -33,10 +48,12 @@ class ConvMKLDNNOpKernel : public paddle::framework::OpKernel<T> { ...@@ -33,10 +48,12 @@ class ConvMKLDNNOpKernel : public paddle::framework::OpKernel<T> {
auto* filter = ctx.Input<Tensor>("Filter"); auto* filter = ctx.Input<Tensor>("Filter");
auto* output = ctx.Output<Tensor>("Output"); auto* output = ctx.Output<Tensor>("Output");
// Get an unique name from "argument" name of "Output" variable PADDLE_ENFORCE(input->layout() == DataLayout::kMKLDNN &&
// This name will be used as key when saving info into device context input->format() != memory::format::format_undef,
const std::string key = ctx.op().Output("Output"); "Wrong layout/format set for Input tensor");
const std::string key_conv_pd = key + "@conv_pd"; PADDLE_ENFORCE(filter->layout() == DataLayout::kMKLDNN &&
filter->format() != memory::format::format_undef,
"Wrong layout/format set for Filter tensor");
std::vector<int> strides = ctx.Attr<std::vector<int>>("strides"); std::vector<int> strides = ctx.Attr<std::vector<int>>("strides");
std::vector<int> paddings = ctx.Attr<std::vector<int>>("paddings"); std::vector<int> paddings = ctx.Attr<std::vector<int>>("paddings");
...@@ -63,60 +80,86 @@ class ConvMKLDNNOpKernel : public paddle::framework::OpKernel<T> { ...@@ -63,60 +80,86 @@ class ConvMKLDNNOpKernel : public paddle::framework::OpKernel<T> {
paddle::framework::vectorize2int(filter->dims()); paddle::framework::vectorize2int(filter->dims());
std::vector<int> dst_tz = paddle::framework::vectorize2int(output->dims()); std::vector<int> dst_tz = paddle::framework::vectorize2int(output->dims());
// TODO(pzelazko-intel): support more formats // create mkldnn memory from input tensors (data/weights)
auto src_md = platform::MKLDNNMemDesc( auto user_src_memory = memory(
src_tz, mkldnn::memory::data_type::f32, mkldnn::memory::format::nchw); {{{src_tz}, memory::data_type::f32, input->format()}, mkldnn_engine},
auto weights_md = to_void_cast(input_data));
platform::MKLDNNMemDesc(weights_tz, mkldnn::memory::data_type::f32, auto user_weights_memory =
mkldnn::memory::format::oihw); memory({{{weights_tz}, memory::data_type::f32, filter->format()},
auto dst_md = platform::MKLDNNMemDesc( mkldnn_engine},
dst_tz, mkldnn::memory::data_type::f32, mkldnn::memory::format::nchw); to_void_cast(filter_data));
auto src_memory = /* create memory descriptor for convolution without specified format
mkldnn::memory({src_md, mkldnn_engine}, * ('any') which lets a primitive (convolution in this case) choose
reinterpret_cast<void*>(const_cast<T*>(input_data))); * the memory format preferred for best performance
auto weights_memory = */
mkldnn::memory({weights_md, mkldnn_engine}, auto src_md = platform::MKLDNNMemDesc(src_tz, memory::data_type::f32,
reinterpret_cast<void*>(const_cast<T*>(filter_data))); memory::format::any);
auto dst_memory = mkldnn::memory({dst_md, mkldnn_engine}, output_data); auto weights_md = platform::MKLDNNMemDesc(
weights_tz, memory::data_type::f32, memory::format::any);
std::shared_ptr<mkldnn::convolution_forward::primitive_desc> conv_pd = auto dst_md = platform::MKLDNNMemDesc(dst_tz, memory::data_type::f32,
ConvFwdPrimitiveDesc(src_md, weights_md, dst_md, strides, paddings, memory::format::any);
mkldnn_engine);
// create a conv primitive descriptor and save it for usage in backward
// save conv_pd into global device context to be referred in backward path std::shared_ptr<conv_fwd::primitive_desc> conv_pd = ConvFwdPrimitiveDesc(
dev_ctx.SetBlob(key_conv_pd, conv_pd); src_md, weights_md, dst_md, strides, paddings, mkldnn_engine);
// create reorder primitive if the input format is not the preferred one
auto src_memory = user_src_memory;
primitive reorder_src;
bool is_src_reordered = false;
if (memory::primitive_desc(conv_pd->src_primitive_desc()) !=
user_src_memory.get_primitive_desc()) {
src_memory = memory(conv_pd->src_primitive_desc());
reorder_src = reorder(user_src_memory, src_memory);
is_src_reordered = true;
}
auto weights_memory = user_weights_memory;
primitive reorder_weights;
bool is_weights_reordered = false;
if (memory::primitive_desc(conv_pd->weights_primitive_desc()) !=
user_weights_memory.get_primitive_desc()) {
weights_memory = memory(conv_pd->weights_primitive_desc());
reorder_weights = reorder(user_weights_memory, weights_memory);
is_weights_reordered = true;
}
// create memory primitive for conv dst
auto dst_memory = memory(conv_pd->dst_primitive_desc(), output_data);
// create convolution op primitive // create convolution op primitive
auto conv_prim = mkldnn::convolution_forward(*conv_pd, src_memory, auto conv_prim = conv_fwd(*conv_pd, src_memory, weights_memory, dst_memory);
weights_memory, dst_memory);
// push primitive to stream and wait until it's executed // push primitive to stream and wait until it's executed
std::vector<mkldnn::primitive> pipeline{conv_prim}; std::vector<primitive> pipeline;
mkldnn::stream(mkldnn::stream::kind::eager).submit(pipeline).wait(); if (is_src_reordered) pipeline.push_back(reorder_src);
if (is_weights_reordered) pipeline.push_back(reorder_weights);
pipeline.push_back(conv_prim);
stream(stream::kind::eager).submit(pipeline).wait();
// Save conv_pd/src_memory/weights_memory for backward pass
dev_ctx.SetBlob(key_conv_pd, conv_pd);
output->set_layout(DataLayout::kMKLDNN);
output->set_format(GetMKLDNNFormat(dst_memory));
} }
private: private:
std::unique_ptr<mkldnn::convolution_forward::primitive_desc> std::unique_ptr<conv_fwd::primitive_desc> ConvFwdPrimitiveDesc(
ConvFwdPrimitiveDesc(const mkldnn::memory::desc& src, const memory::desc& src, const memory::desc& weights,
const mkldnn::memory::desc& weights, const memory::desc& dst, const std::vector<int>& strides,
const mkldnn::memory::desc& dst, const std::vector<int>& paddings, const mkldnn::engine& engine) const {
const std::vector<int>& strides, memory::dims stride_dims = {strides[0], strides[1]};
const std::vector<int>& paddings, memory::dims padding_dims = {paddings[0], paddings[1]};
const mkldnn::engine& engine) const {
mkldnn::memory::dims stride_dims = {strides[0], strides[1]}; auto conv_desc =
mkldnn::memory::dims padding_dims = {paddings[0], paddings[1]}; conv_fwd::desc(mkldnn::prop_kind::forward, mkldnn::convolution_direct,
src, weights, dst, stride_dims, padding_dims,
auto conv_desc = mkldnn::convolution_forward::desc( padding_dims, mkldnn::padding_kind::zero);
mkldnn::prop_kind::forward, mkldnn::convolution_direct, src, weights,
dst, stride_dims, padding_dims, padding_dims, auto p_conv_pd = new conv_fwd::primitive_desc(conv_desc, engine);
mkldnn::padding_kind::zero);
return std::unique_ptr<conv_fwd::primitive_desc>(p_conv_pd);
auto p_conv_pd =
new mkldnn::convolution_forward::primitive_desc(conv_desc, engine);
return std::unique_ptr<mkldnn::convolution_forward::primitive_desc>(
p_conv_pd);
} }
}; };
...@@ -139,6 +182,19 @@ class ConvMKLDNNGradOpKernel : public paddle::framework::OpKernel<T> { ...@@ -139,6 +182,19 @@ class ConvMKLDNNGradOpKernel : public paddle::framework::OpKernel<T> {
Tensor* input_grad = ctx.Output<Tensor>(framework::GradVarName("Input")); Tensor* input_grad = ctx.Output<Tensor>(framework::GradVarName("Input"));
Tensor* filter_grad = ctx.Output<Tensor>(framework::GradVarName("Filter")); Tensor* filter_grad = ctx.Output<Tensor>(framework::GradVarName("Filter"));
PADDLE_ENFORCE(input->layout() == DataLayout::kMKLDNN &&
input->format() != memory::format::format_undef,
"Wrong layout/format set for Input tensor");
PADDLE_ENFORCE(filter->layout() == DataLayout::kMKLDNN &&
filter->format() != memory::format::format_undef,
"Wrong layout/format set for Filter tensor");
PADDLE_ENFORCE(output->layout() == DataLayout::kMKLDNN &&
output->format() != memory::format::format_undef,
"Wrong layout/format set for Output tensor");
PADDLE_ENFORCE(output_grad->layout() == DataLayout::kMKLDNN &&
output_grad->format() != memory::format::format_undef,
"Wrong layout/format set for output_grad tensor");
if (!input_grad && !filter_grad) return; if (!input_grad && !filter_grad) return;
// Get an unique name from "argument" name of "Output" variable // Get an unique name from "argument" name of "Output" variable
...@@ -167,108 +223,147 @@ class ConvMKLDNNGradOpKernel : public paddle::framework::OpKernel<T> { ...@@ -167,108 +223,147 @@ class ConvMKLDNNGradOpKernel : public paddle::framework::OpKernel<T> {
paddle::framework::vectorize2int(filter->dims()); paddle::framework::vectorize2int(filter->dims());
std::vector<int> dst_tz = paddle::framework::vectorize2int(output->dims()); std::vector<int> dst_tz = paddle::framework::vectorize2int(output->dims());
// TODO(pzelazko-intel): support more formats // create mkldnn memory from input tensors (input/weights/output_grad)
auto src_md = platform::MKLDNNMemDesc( auto user_src_memory = memory(
src_tz, mkldnn::memory::data_type::f32, mkldnn::memory::format::nchw); {{{src_tz}, memory::data_type::f32, input->format()}, mkldnn_engine},
auto diff_src_md = platform::MKLDNNMemDesc( to_void_cast(input_data));
src_tz, mkldnn::memory::data_type::f32, mkldnn::memory::format::nchw); auto user_weights_memory =
auto weights_md = memory({{{weights_tz}, memory::data_type::f32, filter->format()},
platform::MKLDNNMemDesc(weights_tz, mkldnn::memory::data_type::f32, mkldnn_engine},
mkldnn::memory::format::oihw); to_void_cast(filter_data));
auto diff_weights_md = auto user_diff_dst_memory =
platform::MKLDNNMemDesc(weights_tz, mkldnn::memory::data_type::f32, memory({{{dst_tz}, memory::data_type::f32, output_grad->format()},
mkldnn::memory::format::oihw); mkldnn_engine},
auto diff_dst_md = platform::MKLDNNMemDesc( to_void_cast(output_grad_data));
dst_tz, mkldnn::memory::data_type::f32, mkldnn::memory::format::nchw);
/* create memory descriptor for conv backward without specified format
// create memory * ('any') which lets a primitive (conv backward in this case) choose
auto diff_dst_memory = mkldnn::memory( * the memory format preferred for best performance
{diff_weights_md, mkldnn_engine}, */
reinterpret_cast<void*>(const_cast<T*>(output_grad_data))); auto src_md = platform::MKLDNNMemDesc(src_tz, memory::data_type::f32,
memory::format::any);
auto diff_src_md = platform::MKLDNNMemDesc(src_tz, memory::data_type::f32,
memory::format::any);
auto weights_md = platform::MKLDNNMemDesc(
weights_tz, memory::data_type::f32, memory::format::any);
auto diff_weights_md = platform::MKLDNNMemDesc(
weights_tz, memory::data_type::f32, memory::format::any);
auto diff_dst_md = platform::MKLDNNMemDesc(dst_tz, memory::data_type::f32,
memory::format::any);
// Retrieve conv_pd from device context // Retrieve conv_pd from device context
auto conv_pd = auto conv_pd = std::static_pointer_cast<conv_fwd::primitive_desc>(
std::static_pointer_cast<mkldnn::convolution_forward::primitive_desc>( dev_ctx.GetBlob(key_conv_pd));
dev_ctx.GetBlob(key_conv_pd));
PADDLE_ENFORCE(conv_pd != nullptr, PADDLE_ENFORCE(conv_pd != nullptr,
"Fail to find conv_pd in device context"); "Fail to find conv_pd in device context");
// create backward conv primitive for weights // create backward conv primitive for weights
if (filter_grad) { if (filter_grad) {
// create primitive descriptor // create backward convolution primitive descriptor
mkldnn::convolution_backward_weights::primitive_desc conv_bwd_weights_pd = auto conv_bwd_weights_desc = conv_bwd_weights::desc(
ConvBwdWeightsPrimitiveDesc(src_md, diff_weights_md, diff_dst_md, mkldnn::convolution_direct, src_md, diff_weights_md, diff_dst_md,
strides, paddings, *conv_pd, strides, paddings, paddings, mkldnn::padding_kind::zero);
mkldnn_engine); auto conv_bwd_weights_pd = conv_bwd_weights::primitive_desc(
conv_bwd_weights_desc, mkldnn_engine, *conv_pd);
// create memory
// create reorder primitive if the input format is not the preferred one
auto src_memory = user_src_memory;
primitive reorder_src;
bool is_src_reordered = false;
if (memory::primitive_desc(conv_bwd_weights_pd.src_primitive_desc()) !=
user_src_memory.get_primitive_desc()) {
src_memory = memory(conv_bwd_weights_pd.src_primitive_desc());
reorder_src = reorder(user_src_memory, src_memory);
is_src_reordered = true;
}
auto diff_dst_memory_4filter = user_diff_dst_memory;
primitive reorder_diff_dst_4filter;
bool is_diff_dst_reordered_4filter = false;
if (memory::primitive_desc(
conv_bwd_weights_pd.diff_dst_primitive_desc()) !=
user_diff_dst_memory.get_primitive_desc()) {
diff_dst_memory_4filter =
memory(conv_bwd_weights_pd.diff_dst_primitive_desc());
reorder_diff_dst_4filter =
reorder(user_diff_dst_memory, diff_dst_memory_4filter);
is_diff_dst_reordered_4filter = true;
}
// create mkldnn memory for output (i.e. diff weights)
auto diff_weights_memory = auto diff_weights_memory =
mkldnn::memory({diff_weights_md, mkldnn_engine}, memory(conv_bwd_weights_pd.diff_weights_primitive_desc(),
reinterpret_cast<void*>(filter_grad_data)); reinterpret_cast<void*>(filter_grad_data));
auto src_memory =
mkldnn::memory({src_md, mkldnn_engine},
reinterpret_cast<void*>(const_cast<T*>(input_data)));
// create backward conv primitive for weights // create backward conv primitive for weights
auto conv_bwd_weights_prim = mkldnn::convolution_backward_weights( auto conv_bwd_weights_prim =
conv_bwd_weights_pd, src_memory, diff_dst_memory, conv_bwd_weights(conv_bwd_weights_pd, src_memory,
diff_weights_memory); diff_dst_memory_4filter, diff_weights_memory);
// push primitive and execute it // push primitive and execute it
std::vector<mkldnn::primitive> pipeline{conv_bwd_weights_prim}; std::vector<primitive> pipeline;
mkldnn::stream(mkldnn::stream::kind::eager).submit(pipeline).wait(); if (is_src_reordered) pipeline.push_back(reorder_src);
if (is_diff_dst_reordered_4filter)
pipeline.push_back(reorder_diff_dst_4filter);
pipeline.push_back(conv_bwd_weights_prim);
stream(stream::kind::eager).submit(pipeline).wait();
filter_grad->set_layout(DataLayout::kMKLDNN);
filter_grad->set_format(GetMKLDNNFormat(diff_weights_memory));
} }
if (input_grad) { if (input_grad) {
// create primitive descriptor // create backward convolution primitive descriptor
mkldnn::convolution_backward_data::primitive_desc conv_bwd_data_pd = auto conv_bwd_data_desc = conv_bwd_data::desc(
ConvBwdDataPrimitiveDesc(diff_src_md, weights_md, diff_dst_md, mkldnn::convolution_direct, diff_src_md, weights_md, diff_dst_md,
strides, paddings, *conv_pd, mkldnn_engine); strides, paddings, paddings, mkldnn::padding_kind::zero);
auto conv_bwd_data_pd = conv_bwd_data::primitive_desc(
// create memory conv_bwd_data_desc, mkldnn_engine, *conv_pd);
auto diff_src_memory = mkldnn::memory(
{diff_src_md, mkldnn_engine}, // create reorder primitive if the input format is not the preferred one
reinterpret_cast<void*>(const_cast<T*>(input_grad_data))); auto weights_memory = user_weights_memory;
auto weights_memory = primitive reorder_weights;
mkldnn::memory({weights_md, mkldnn_engine}, bool is_weights_reordered = false;
reinterpret_cast<void*>(const_cast<T*>(filter_data))); if (memory::primitive_desc(conv_bwd_data_pd.weights_primitive_desc()) !=
user_weights_memory.get_primitive_desc()) {
weights_memory = memory(conv_bwd_data_pd.weights_primitive_desc());
reorder_weights = reorder(user_weights_memory, weights_memory);
is_weights_reordered = true;
}
auto diff_dst_memory_4data = user_diff_dst_memory;
primitive reorder_diff_dst_4data;
bool is_diff_dst_reordered_4data = false;
if (memory::primitive_desc(conv_bwd_data_pd.diff_dst_primitive_desc()) !=
user_diff_dst_memory.get_primitive_desc()) {
diff_dst_memory_4data =
memory(conv_bwd_data_pd.diff_dst_primitive_desc());
reorder_diff_dst_4data =
reorder(user_diff_dst_memory, diff_dst_memory_4data);
is_diff_dst_reordered_4data = true;
}
// create mkldnn memory for output (i.e. diff src)
auto diff_src_memory = memory(conv_bwd_data_pd.diff_src_primitive_desc(),
reinterpret_cast<void*>(input_grad_data));
// create backward conv primitive for data // create backward conv primitive for data
auto conv_bwd_data_prim = mkldnn::convolution_backward_data( auto conv_bwd_data_prim =
conv_bwd_data_pd, diff_dst_memory, weights_memory, diff_src_memory); conv_bwd_data(conv_bwd_data_pd, diff_dst_memory_4data, weights_memory,
diff_src_memory);
// push primitive to stream and wait until it's executed // push primitive and execute it
std::vector<mkldnn::primitive> pipeline{conv_bwd_data_prim}; std::vector<primitive> pipeline;
mkldnn::stream(mkldnn::stream::kind::eager).submit(pipeline).wait(); if (is_weights_reordered) pipeline.push_back(reorder_weights);
if (is_diff_dst_reordered_4data)
pipeline.push_back(reorder_diff_dst_4data);
pipeline.push_back(conv_bwd_data_prim);
stream(stream::kind::eager).submit(pipeline).wait();
input_grad->set_layout(DataLayout::kMKLDNN);
input_grad->set_format(GetMKLDNNFormat(diff_src_memory));
} }
} // Compute() } // Compute()
private:
mkldnn::convolution_backward_weights::primitive_desc
ConvBwdWeightsPrimitiveDesc(
const mkldnn::memory::desc& src, const mkldnn::memory::desc& diff_weights,
const mkldnn::memory::desc& diff_dst, const std::vector<int>& strides,
const std::vector<int>& paddings,
const mkldnn::convolution_forward::primitive_desc& conv_pd,
const mkldnn::engine& engine) const {
auto conv_bwd_weights_desc = mkldnn::convolution_backward_weights::desc(
mkldnn::convolution_direct, src, diff_weights, diff_dst, strides,
paddings, paddings, mkldnn::padding_kind::zero);
return mkldnn::convolution_backward_weights::primitive_desc(
conv_bwd_weights_desc, engine, conv_pd);
}
mkldnn::convolution_backward_data::primitive_desc ConvBwdDataPrimitiveDesc(
const mkldnn::memory::desc& diff_src, const mkldnn::memory::desc& weights,
const mkldnn::memory::desc& diff_dst, const std::vector<int>& strides,
const std::vector<int>& paddings,
const mkldnn::convolution_forward::primitive_desc& conv_pd,
const mkldnn::engine& engine) const {
auto conv_bwd_data_desc = mkldnn::convolution_backward_data::desc(
mkldnn::convolution_direct, diff_src, weights, diff_dst, strides,
paddings, paddings, mkldnn::padding_kind::zero);
return mkldnn::convolution_backward_data::primitive_desc(conv_bwd_data_desc,
engine, conv_pd);
}
}; };
} // namespace operators } // namespace operators
......
...@@ -75,9 +75,8 @@ void ConvOp::InferShape(framework::InferShapeContext* ctx) const { ...@@ -75,9 +75,8 @@ void ConvOp::InferShape(framework::InferShapeContext* ctx) const {
framework::OpKernelType ConvOp::GetExpectedKernelType( framework::OpKernelType ConvOp::GetExpectedKernelType(
const framework::ExecutionContext& ctx) const { const framework::ExecutionContext& ctx) const {
framework::LibraryType library{framework::LibraryType::kPlain}; framework::LibraryType library{framework::LibraryType::kPlain};
std::string data_format = ctx.Attr<std::string>("data_format");
// TODO(pzelazko-intel): enable MKLDNN layout when it's ready // TODO(pzelazko-intel): enable MKLDNN layout when it's ready
std::string data_format = ctx.Attr<std::string>("data_format");
framework::DataLayout layout = framework::StringToDataLayout(data_format); framework::DataLayout layout = framework::StringToDataLayout(data_format);
#ifdef PADDLE_WITH_CUDA #ifdef PADDLE_WITH_CUDA
......
...@@ -67,6 +67,10 @@ class GenNCCLIdOp : public framework::OperatorBase { ...@@ -67,6 +67,10 @@ class GenNCCLIdOp : public framework::OperatorBase {
client->AsyncSendVar(ep, dev_ctx, *scope, NCCL_ID_VARNAME); client->AsyncSendVar(ep, dev_ctx, *scope, NCCL_ID_VARNAME);
} }
client->Wait(); client->Wait();
for (auto& ep : endpoint_list) {
client->AsyncSendBatchBarrier(ep);
}
client->Wait();
VLOG(3) << "sending completed..."; VLOG(3) << "sending completed...";
} }
......
...@@ -15,7 +15,7 @@ ...@@ -15,7 +15,7 @@
__all__ = ['batch'] __all__ = ['batch']
def batch(reader, batch_size, drop_last=False): def batch(reader, batch_size, drop_last=True):
""" """
Create a batched reader. Create a batched reader.
......
...@@ -262,9 +262,10 @@ def embedding(input, ...@@ -262,9 +262,10 @@ def embedding(input,
return tmp return tmp
# TODO(qijun): expose H0 and C0
def dynamic_lstm(input, def dynamic_lstm(input,
size, size,
h_0=None,
c_0=None,
param_attr=None, param_attr=None,
bias_attr=None, bias_attr=None,
use_peepholes=True, use_peepholes=True,
...@@ -325,6 +326,13 @@ def dynamic_lstm(input, ...@@ -325,6 +326,13 @@ def dynamic_lstm(input,
(T X 4D), where T is the total time steps in this (T X 4D), where T is the total time steps in this
mini-batch, D is the hidden size. mini-batch, D is the hidden size.
size(int): 4 * hidden size. size(int): 4 * hidden size.
h_0(Variable): The initial hidden state is an optional input, default is zero.
This is a tensor with shape (N x D), where N is the
batch size and D is the hidden size.
c_0(Variable): The initial cell state is an optional input, default is zero.
This is a tensor with shape (N x D), where N is the
batch size. `h_0` and `c_0` can be NULL but only at the same time.
param_attr(ParamAttr|None): The parameter attribute for the learnable param_attr(ParamAttr|None): The parameter attribute for the learnable
hidden-hidden weights. hidden-hidden weights.
...@@ -388,12 +396,20 @@ def dynamic_lstm(input, ...@@ -388,12 +396,20 @@ def dynamic_lstm(input,
cell = helper.create_tmp_variable(dtype) cell = helper.create_tmp_variable(dtype)
batch_gate = helper.create_tmp_variable(dtype) batch_gate = helper.create_tmp_variable(dtype)
batch_cell_pre_act = helper.create_tmp_variable(dtype) batch_cell_pre_act = helper.create_tmp_variable(dtype)
inputs = {'Input': input, 'Weight': weight, 'Bias': bias}
batch_size = input.shape[0]
if h_0:
assert h_0.shape == (batch_size, size), \
'The shape of h0 should be (batch_size, %d)' % size
inputs['H0'] = h_0
if c_0:
assert c_0.shape == (batch_size, size), \
'The shape of c0 should be (batch_size, %d)' % size
inputs['C0'] = c_0
helper.append_op( helper.append_op(
type='lstm', type='lstm',
inputs={'Input': input, inputs=inputs,
'Weight': weight,
'Bias': bias},
outputs={ outputs={
'Hidden': hidden, 'Hidden': hidden,
'Cell': cell, 'Cell': cell,
...@@ -678,11 +694,13 @@ def dynamic_gru(input, ...@@ -678,11 +694,13 @@ def dynamic_gru(input,
attr=helper.param_attr, shape=[size, 3 * size], dtype=dtype) attr=helper.param_attr, shape=[size, 3 * size], dtype=dtype)
bias = helper.create_parameter( bias = helper.create_parameter(
attr=helper.bias_attr, shape=[1, 3 * size], dtype=dtype, is_bias=True) attr=helper.bias_attr, shape=[1, 3 * size], dtype=dtype, is_bias=True)
batch_size = input.shape[0]
inputs = {'Input': input, 'Weight': weight, 'Bias': bias} inputs = {'Input': input, 'Weight': weight, 'Bias': bias}
if h_0 != None: if h_0 != None:
assert h_0.shape == ( assert h_0.shape == (
size, size), 'The shape of h0 should be(%d, %d)' % (size, size) batch_size, size
inputs['h0'] = h_0 ), 'The shape of h0 should be(batch_size, %d)' % size
inputs['H0'] = h_0
hidden = helper.create_tmp_variable(dtype) hidden = helper.create_tmp_variable(dtype)
batch_gate = helper.create_tmp_variable(dtype) batch_gate = helper.create_tmp_variable(dtype)
......
...@@ -96,10 +96,11 @@ def train(use_cuda, train_program, params_dirname): ...@@ -96,10 +96,11 @@ def train(use_cuda, train_program, params_dirname):
train_reader = paddle.batch( train_reader = paddle.batch(
paddle.reader.shuffle( paddle.reader.shuffle(
cifar10_small_test_set.train10(batch_size=10), buf_size=128 * 10), cifar10_small_test_set.train10(batch_size=10), buf_size=128 * 10),
batch_size=BATCH_SIZE) batch_size=BATCH_SIZE,
drop_last=False)
test_reader = paddle.batch( test_reader = paddle.batch(
paddle.dataset.cifar.test10(), batch_size=BATCH_SIZE) paddle.dataset.cifar.test10(), batch_size=BATCH_SIZE, drop_last=False)
def event_handler(event): def event_handler(event):
if isinstance(event, fluid.EndStepEvent): if isinstance(event, fluid.EndStepEvent):
......
...@@ -73,10 +73,11 @@ def train(use_cuda, train_program, params_dirname): ...@@ -73,10 +73,11 @@ def train(use_cuda, train_program, params_dirname):
train_reader = paddle.batch( train_reader = paddle.batch(
paddle.reader.shuffle( paddle.reader.shuffle(
cifar10_small_test_set.train10(batch_size=10), buf_size=128 * 10), cifar10_small_test_set.train10(batch_size=10), buf_size=128 * 10),
batch_size=BATCH_SIZE) batch_size=BATCH_SIZE,
drop_last=False)
test_reader = paddle.batch( test_reader = paddle.batch(
paddle.dataset.cifar.test10(), batch_size=BATCH_SIZE) paddle.dataset.cifar.test10(), batch_size=BATCH_SIZE, drop_last=False)
def event_handler(event): def event_handler(event):
if isinstance(event, fluid.EndStepEvent): if isinstance(event, fluid.EndStepEvent):
......
...@@ -87,7 +87,9 @@ def train(use_cuda, train_program, params_dirname): ...@@ -87,7 +87,9 @@ def train(use_cuda, train_program, params_dirname):
def event_handler(event): def event_handler(event):
if isinstance(event, fluid.EndEpochEvent): if isinstance(event, fluid.EndEpochEvent):
test_reader = paddle.batch( test_reader = paddle.batch(
paddle.dataset.imdb.test(word_dict), batch_size=BATCH_SIZE) paddle.dataset.imdb.test(word_dict),
batch_size=BATCH_SIZE,
drop_last=False)
avg_cost, acc = trainer.test( avg_cost, acc = trainer.test(
reader=test_reader, feed_order=['words', 'label']) reader=test_reader, feed_order=['words', 'label'])
...@@ -113,7 +115,8 @@ def train(use_cuda, train_program, params_dirname): ...@@ -113,7 +115,8 @@ def train(use_cuda, train_program, params_dirname):
train_reader = paddle.batch( train_reader = paddle.batch(
paddle.reader.shuffle( paddle.reader.shuffle(
paddle.dataset.imdb.train(word_dict), buf_size=25000), paddle.dataset.imdb.train(word_dict), buf_size=25000),
batch_size=BATCH_SIZE) batch_size=BATCH_SIZE,
drop_last=False)
trainer.train( trainer.train(
num_epochs=1, num_epochs=1,
......
...@@ -56,7 +56,7 @@ BATCH_SIZE = 200 ...@@ -56,7 +56,7 @@ BATCH_SIZE = 200
# fix the order of training data # fix the order of training data
train_reader = paddle.batch( train_reader = paddle.batch(
paddle.dataset.uci_housing.train(), batch_size=BATCH_SIZE) paddle.dataset.uci_housing.train(), batch_size=BATCH_SIZE, drop_last=False)
# train_reader = paddle.batch( # train_reader = paddle.batch(
# paddle.reader.shuffle( # paddle.reader.shuffle(
......
...@@ -240,14 +240,15 @@ class ExtraLayerAttribute(object): ...@@ -240,14 +240,15 @@ class ExtraLayerAttribute(object):
:type error_clipping_threshold: float :type error_clipping_threshold: float
:param drop_rate: Dropout rate. Dropout will create a mask on layer output. :param drop_rate: Dropout rate. Dropout will create a mask on layer output.
The dropout rate is the zero rate of this mask. The The dropout rate is the zero rate of this mask. The
details of what dropout is please refer to `here details of what dropout is please refer to `JMLRdropout
<https://www.cs.toronto.edu/~hinton/absps/ <https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf
JMLRdropout.pdf>`_. >`_.
:type drop_rate: float :type drop_rate: float
:param device: device ID of layer. device=-1, use CPU. device>=0, use GPU. :param device: device ID of layer. device=-1, use CPU. device>=0, use GPU.
The details allocation in parallel_nn please refer to `here The details allocation in parallel_nn please refer to `use_case
<http://www.paddlepaddle.org/doc/ui/cmd_argument/ <https://github.com/PaddlePaddle/Paddle/blob/develop/doc/v2
use_case.html#case-2-specify-layers-in-different-devices>`_. /howto/cmd_parameter/use_case_en.md#case-2-specify-layers-in
-different-devices>`_.
:type device: int :type device: int
""" """
......
...@@ -2556,7 +2556,7 @@ def img_conv_layer(input, ...@@ -2556,7 +2556,7 @@ def img_conv_layer(input,
the output will be obtained by concatenating the two results. the output will be obtained by concatenating the two results.
The details of grouped convolution, please refer to: The details of grouped convolution, please refer to:
`ImageNet Classification with Deep Convolutional Neural Networks `ImageNet Classification With Deep Convolutional Neural Networks
<http://www.cs.toronto.edu/~kriz/imagenet_classification_with_deep_convolutional.pdf>`_ <http://www.cs.toronto.edu/~kriz/imagenet_classification_with_deep_convolutional.pdf>`_
The example usage is: The example usage is:
...@@ -5678,8 +5678,8 @@ def warp_ctc_layer(input, ...@@ -5678,8 +5678,8 @@ def warp_ctc_layer(input,
<https://github.com/baidu-research/warp-ctc>`_ library, which is used in <https://github.com/baidu-research/warp-ctc>`_ library, which is used in
`Deep Speech 2: End-toEnd Speech Recognition in English and Mandarin `Deep Speech 2: End-toEnd Speech Recognition in English and Mandarin
<https://arxiv.org/pdf/1512.02595v1.pdf>`_, to compute Connectionist Temporal <https://arxiv.org/pdf/1512.02595v1.pdf>`_, to compute Connectionist Temporal
Classification (CTC) loss. Besides, another `warp-ctc Classification (CTC) loss. Besides, another `warp-ctc repository
<https://github.com/gangliao/warp-ctc>`_ repository, which is forked from <https://github.com/gangliao/warp-ctc>`_ , which is forked from
the official one, is maintained to enable more compiling options. During the the official one, is maintained to enable more compiling options. During the
building process, PaddlePaddle will clone the source codes, build and building process, PaddlePaddle will clone the source codes, build and
install it to :code:`third_party/install/warpctc` directory. install it to :code:`third_party/install/warpctc` directory.
......
...@@ -15,7 +15,7 @@ ...@@ -15,7 +15,7 @@
__all__ = ['batch'] __all__ = ['batch']
def batch(reader, batch_size, drop_last=False): def batch(reader, batch_size, drop_last=True):
""" """
Create a batched reader. Create a batched reader.
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册