- 03 7月, 2018 1 次提交
-
-
由 Xin Pan 提交于
-
- 30 6月, 2018 2 次提交
-
-
由 Yan Chunwei 提交于
-
由 gongweibao 提交于
-
- 28 6月, 2018 1 次提交
-
-
由 mozga-intel 提交于
-
- 23 6月, 2018 1 次提交
-
-
由 Yi Wang 提交于
* Make paddle no longer depend on boost * Update enforce.h
-
- 22 6月, 2018 1 次提交
-
-
由 chengduo 提交于
-
- 21 6月, 2018 5 次提交
-
-
由 Jacek Czaja 提交于
- Added hash function inside of MKLDNN softmax op to be used as handle for primitives stroing in a context - Style fixes to softmax mkldnn op - Fixes after review - Coding style - Fix to style - style fixes - style fix - style fixes - Fix to cody style check - Rephrasing a comment fix t obroken merge Fixes to rebase Conflicts: benchmark/fluid/models/machine_translation.py cmake/external/mkldnn.cmake paddle/fluid/operators/softmax_mkldnn_op.cc - Bumped revision of MKL-DNN up to have softmax backward primitive - Added choosing MKLDNN softmax grad operator - First reuse of softmax backward - Reinvented reusing for softmax - Fix to crash in reinvented reuse - Clang format fixes - Clang format fixes - Improved softmax mkldnn reuse mechanism - clang format fixes - Fix to broken merge - Fix
-
由 tensor-tang 提交于
This reverts commit 4d8e8ee2, reversing changes made to d6a9f005.
-
由 tensor-tang 提交于
-
由 tensor-tang 提交于
-
由 chengduoZH 提交于
-
- 20 6月, 2018 2 次提交
-
-
由 tensor-tang 提交于
-
由 tensor-tang 提交于
-
- 19 6月, 2018 2 次提交
-
-
由 mozga-intel 提交于
-
由 tensor-tang 提交于
-
- 16 6月, 2018 1 次提交
-
-
由 tensor-tang 提交于
-
- 14 6月, 2018 2 次提交
-
-
由 Qiyang Min 提交于
* 1. Create buddy allocator in each places before NcclBcast the variables 2. Check the memory usage of ALL gpus rather than the first one * 1. Make NCCLGroupGuard guards only the ncclBcast part, which avoid ncclGroupEnd blocking the exception throwing 2. NOTE the usage of NCCLGroupGuard * Remove the memory usage check of gpus * Fix code style
-
由 Xin Pan 提交于
In cupti samples, only cuptiFlush is used. I can't find any places calling cuptiFinalize and this API can error out as not_implemented in some cuda installation.
-
- 13 6月, 2018 1 次提交
-
-
由 qiaolongfei 提交于
-
- 12 6月, 2018 1 次提交
-
-
由 tensor-tang 提交于
-
- 11 6月, 2018 1 次提交
-
-
由 yuyang18 提交于
-
- 08 6月, 2018 2 次提交
-
-
由 guochaorong 提交于
-
由 guochaorong 提交于
-
- 07 6月, 2018 1 次提交
-
-
由 mozga-intel 提交于
* Add MKLDNN layout support in Paddle Add MKLDNN layout in Paddle so that MKLDNN friendly memory layout can be used in MKLDNN enabled OP kernel. Before this commit, NCHW is hardcode to be used in all MKLDNN op kernels. As a result, non-optimized execution path is selected in MKLDNN primitive which bring worse performance. Besides framework change, three MKLDNN OP kernels were updated for using new MKLDNN layout. They are conv/pool2d/batch_norm. Other MKLDNN OP kernels need be also updated in similar way to achieve best performance. * Add MKLDNN layout support in activation OP * Don't populate layout from input to output when kMKLDNN in * Refine pool mkldnn op kernel * MKLDNN layout * Remove the inferitance from tensor file * MKLDNN layout: refactoring * Remove additional #define to register new operator * Prepare mkldnn tests to work with layout
-
- 06 6月, 2018 5 次提交
-
-
由 qingqing01 提交于
* Enable assertions in CUDA. * Fix PADDLE_ASSERT.
-
由 dzhwinter 提交于
-
由 dzhwinter 提交于
-
由 dzhwinter 提交于
-
由 dzhwinter 提交于
* "fix deterministic" * "fix ci" * "fix init"
-
- 01 6月, 2018 4 次提交
-
-
由 yuyang18 提交于
-
由 yuyang18 提交于
-
由 yuyang18 提交于
-
由 gongweibao 提交于
-
- 31 5月, 2018 1 次提交
-
-
由 Xin Pan 提交于
Sometimes dev_ctx is not available when RecordEvent.
-
- 30 5月, 2018 2 次提交
- 23 5月, 2018 1 次提交
-
-
由 Xin Pan 提交于
-
- 22 5月, 2018 1 次提交
-
-
由 Xin Pan 提交于
Experiment on vgg flower, 2 trainers, 1ps. more trainer could have more speedup. After: Pass = 0, Iters = 327, Speed = (7.52) img/s Before: Pass = 0, Iters = 385, Speed = (6.77) img/s
-
- 21 5月, 2018 2 次提交
-
-
由 Krzysztof Binias 提交于
-
由 dzhwinter 提交于
-