- 21 10月, 2018 4 次提交
-
-
由 Tomasz Patejko 提交于
MKLDNN conv + elementwise_add fusion: skip connection attribute renamed. Comments about patterns added. test=develop
-
由 Tomasz Patejko 提交于
-
由 Tomasz Patejko 提交于
MKLDNN conv + elementwise_add fusion: output and elemwise param share data in conv primitive. Output is properly allocated
-
由 Tomasz Patejko 提交于
-
- 14 9月, 2018 1 次提交
-
-
由 Michał Gallus 提交于
-
- 11 9月, 2018 1 次提交
-
-
由 Michal Gallus 提交于
-
- 10 9月, 2018 1 次提交
-
-
由 Krzysztof Binias 提交于
-
- 21 8月, 2018 1 次提交
-
-
由 Michał Gallus 提交于
* Fuse Convolution and Eltwise Add into Conv+Bias * Reduce bias branching at conv_mkldnn_op * Add MKLDNN build checks for Conv Bias * Conv-bias: check if bias input exist befor assignment * Conv-bias: Remove Bias dim check from infershape It was causing conv3d test to crash upon\ncalling HasInput(Bias)
-
- 11 6月, 2018 2 次提交
-
-
由 dzhwinter 提交于
* "add inplace attribute" * "register inplace attribute" * "change se-next model for memory-reuse" * "fix typo" * repick * fix merge conflict * "fix stupid error"
-
由 mozga-intel 提交于
-
- 07 6月, 2018 1 次提交
-
-
由 mozga-intel 提交于
* Add MKLDNN layout support in Paddle Add MKLDNN layout in Paddle so that MKLDNN friendly memory layout can be used in MKLDNN enabled OP kernel. Before this commit, NCHW is hardcode to be used in all MKLDNN op kernels. As a result, non-optimized execution path is selected in MKLDNN primitive which bring worse performance. Besides framework change, three MKLDNN OP kernels were updated for using new MKLDNN layout. They are conv/pool2d/batch_norm. Other MKLDNN OP kernels need be also updated in similar way to achieve best performance. * Add MKLDNN layout support in activation OP * Don't populate layout from input to output when kMKLDNN in * Refine pool mkldnn op kernel * MKLDNN layout * Remove the inferitance from tensor file * MKLDNN layout: refactoring * Remove additional #define to register new operator * Prepare mkldnn tests to work with layout
-
- 08 5月, 2018 1 次提交
-
-
由 Yu Yang 提交于
Do not use ctor * Reduce line of codes. * We can use virtual function for Maker now. * The implementation does not care what maker holds, it is easier to refactor later.
-
- 19 4月, 2018 1 次提交
-
-
由 Yang Yang(Tony) 提交于
* script to add semicolon * fix typo
-
- 17 4月, 2018 1 次提交
-
-
由 Yang Yang 提交于
-
- 04 4月, 2018 1 次提交
-
-
由 Yi Wang 提交于
-
- 16 3月, 2018 2 次提交
-
-
由 Liu Yiqun 提交于
-
由 Kexin Zhao 提交于
-
- 07 3月, 2018 1 次提交
-
-
由 pzelazko-intel 提交于
* MKLDNN conv2 OP kernel added * TODOs added * mkldnn conv2d OP refactor * CanCUDNNBeUsed and CanMKLDNNBeUsed moved
-
- 28 2月, 2018 1 次提交
-
-
由 chengduoZH 提交于
-
- 27 2月, 2018 1 次提交
-
-
由 chengduoZH 提交于
-
- 16 2月, 2018 2 次提交
- 12 2月, 2018 1 次提交
-
-
由 qingqing01 提交于
-
- 10 2月, 2018 2 次提交
- 02 2月, 2018 1 次提交
-
-
由 xzl 提交于
-
- 23 1月, 2018 1 次提交
-
-
由 xzl 提交于
-
- 22 1月, 2018 1 次提交
-
-
由 zlx 提交于
-
- 17 1月, 2018 3 次提交
-
-
由 chengduoZH 提交于
-
由 chengduoZH 提交于
-
由 chengduoZH 提交于
-
- 15 1月, 2018 2 次提交
-
-
由 chengduoZH 提交于
-
由 chengduoZH 提交于
-
- 14 1月, 2018 1 次提交
-
-
由 dzhwinter 提交于
* "unified operators" * "add CUDNN register" * "add use cudnn attribute" * "add attribute" * "test conv tranpose op" * "remove duplicated attr" * "fix op test" * "add attribute to set cudnn" * "add more log" * "need layout op register support" * "add more log" * "change GetExpectedKernelType " * "fix Get attr in conv_op" * "fix CI" * "fix tests" * "removed kernel priority fallback" * "fix CI" * "fix stack pointer bug" * "refine buggy interface" * "add const cast to save life" * "fix get_output_with_grad" * "fix op test with dataformat" * ""fix pooling * "fix pooling test" * "fix CI" * "fix with_gpu error" * "add transform needed functional check" * "fix unpack list error" * "comment out parallel.do temporary" * "fix CI" * "fix compile doc error" * "make threshold larger"
-
- 09 1月, 2018 1 次提交
-
-
由 Yiqun Liu 提交于
* Add Seq2BatchFunctor, which will be used in WarpCTCOp. * Implement WrapCTCFunctor and WrapCTCKernel. * Add unittest of warpctc_op. * Modify the check_output inferface in python unittest framework to allow check a subset of outputs. * Use absolute offset lod in warpctc_op and related functors. * Refine the comments of warpctc_op. * The new python unittest supports checking a subset of the outputs, so revoke the previous change. * Rename the transform from LoDTensor to Tensor with shape [max_sequence_length, num_sequences, sequence_width] to PaddingSequenceFunctor. * Update to the newest codes. * Rename the PaddingSequenceFunctor to PaddingLoDTensorFunctor and remove the computation of dimensions out of the functos.
-
- 04 1月, 2018 1 次提交
-
-
由 Yang Yu 提交于
-
- 27 12月, 2017 1 次提交
-
-
由 fengjiayi 提交于
-
- 26 12月, 2017 1 次提交
-
-
由 Luo Tao 提交于
-
- 20 12月, 2017 1 次提交
-
-
由 Yu Yang 提交于
* Move framework.proto to proto namespace * Fix compile * Fix compile * Fix Compile
-
- 12 12月, 2017 1 次提交
-
-
由 QI JUN 提交于
There are mainly following fixes: - take `DeviceContext` as the template parameter of math functors and OpKernel instead of `Place` - remove `eigen_device` interface in base class `DeviceContext` - remove `GetEigenDevice` interface in `ExecutionContext` and base class `DeviceContext` - remove unused `platform::EigenDeviceConverter` - rename `REGISTER_OP_GPU_KERNEL` to `REGISTER_OP_CUDA_KERNEL` - rename `USE_GPU_ONLY_OP` to `USE_CUDA_ONLY_OP`
-