- 14 6月, 2018 2 次提交
-
-
由 Qiyang Min 提交于
* 1. Create buddy allocator in each places before NcclBcast the variables 2. Check the memory usage of ALL gpus rather than the first one * 1. Make NCCLGroupGuard guards only the ncclBcast part, which avoid ncclGroupEnd blocking the exception throwing 2. NOTE the usage of NCCLGroupGuard * Remove the memory usage check of gpus * Fix code style
-
由 Xin Pan 提交于
In cupti samples, only cuptiFlush is used. I can't find any places calling cuptiFinalize and this API can error out as not_implemented in some cuda installation.
-
- 13 6月, 2018 1 次提交
-
-
由 qiaolongfei 提交于
-
- 12 6月, 2018 1 次提交
-
-
由 tensor-tang 提交于
-
- 11 6月, 2018 1 次提交
-
-
由 yuyang18 提交于
-
- 08 6月, 2018 2 次提交
-
-
由 guochaorong 提交于
-
由 guochaorong 提交于
-
- 07 6月, 2018 1 次提交
-
-
由 mozga-intel 提交于
* Add MKLDNN layout support in Paddle Add MKLDNN layout in Paddle so that MKLDNN friendly memory layout can be used in MKLDNN enabled OP kernel. Before this commit, NCHW is hardcode to be used in all MKLDNN op kernels. As a result, non-optimized execution path is selected in MKLDNN primitive which bring worse performance. Besides framework change, three MKLDNN OP kernels were updated for using new MKLDNN layout. They are conv/pool2d/batch_norm. Other MKLDNN OP kernels need be also updated in similar way to achieve best performance. * Add MKLDNN layout support in activation OP * Don't populate layout from input to output when kMKLDNN in * Refine pool mkldnn op kernel * MKLDNN layout * Remove the inferitance from tensor file * MKLDNN layout: refactoring * Remove additional #define to register new operator * Prepare mkldnn tests to work with layout
-
- 06 6月, 2018 5 次提交
-
-
由 qingqing01 提交于
* Enable assertions in CUDA. * Fix PADDLE_ASSERT.
-
由 dzhwinter 提交于
-
由 dzhwinter 提交于
-
由 dzhwinter 提交于
-
由 dzhwinter 提交于
* "fix deterministic" * "fix ci" * "fix init"
-
- 01 6月, 2018 4 次提交
-
-
由 yuyang18 提交于
-
由 yuyang18 提交于
-
由 yuyang18 提交于
-
由 gongweibao 提交于
-
- 31 5月, 2018 1 次提交
-
-
由 Xin Pan 提交于
Sometimes dev_ctx is not available when RecordEvent.
-
- 30 5月, 2018 2 次提交
- 23 5月, 2018 1 次提交
-
-
由 Xin Pan 提交于
-
- 22 5月, 2018 1 次提交
-
-
由 Xin Pan 提交于
Experiment on vgg flower, 2 trainers, 1ps. more trainer could have more speedup. After: Pass = 0, Iters = 327, Speed = (7.52) img/s Before: Pass = 0, Iters = 385, Speed = (6.77) img/s
-
- 21 5月, 2018 2 次提交
-
-
由 Krzysztof Binias 提交于
-
由 dzhwinter 提交于
-
- 17 5月, 2018 1 次提交
-
-
由 Jacek Czaja 提交于
- Finished draft of pooling reusing of operators - Using gethash in PoolGrad added - Removed diagnostic - Added pool mkldnn grad reusing of primitives - Added diagnostic - Removed diagnostic - added dependency to mkldnn data type for pooling mkldnn - Added mkldnn memory data type determining based on template type of op - Compilation warning fix - codying style fixes
-
- 15 5月, 2018 2 次提交
- 14 5月, 2018 2 次提交
-
-
由 yuyang18 提交于
-
由 typhoonzero 提交于
-
- 11 5月, 2018 1 次提交
-
-
由 typhoonzero 提交于
-
- 09 5月, 2018 1 次提交
-
-
由 fengjiayi 提交于
-
- 08 5月, 2018 1 次提交
-
-
由 chengduoZH 提交于
-
- 07 5月, 2018 1 次提交
-
-
由 typhoonzero 提交于
-
- 05 5月, 2018 1 次提交
-
-
由 typhoonzero 提交于
-
- 04 5月, 2018 3 次提交
-
-
由 chengduoZH 提交于
-
由 typhoonzero 提交于
-
由 Xin Pan 提交于
-
- 03 5月, 2018 3 次提交
-
-
由 Xin Pan 提交于
-
由 chengduoZH 提交于
-
由 Yiqun Liu 提交于
* Fix the bug when a input variable of op is dispensable. * Add HasInputs/Outputs interfaces to OperatorBase. * Remove the unreferenced header file.
-