提交 · c90e64e796e2a304e534d82242935b4cbab2d6b9 · 机器未来 / Paddle

21 6月, 2018 2 次提交
- T
  
  Revert "MKLDNN layout: Support for sum operator" · 90780e22
  由 tensor-tang 提交于 6月 21, 2018
  
  90780e22
- C
  
  Add No Mutex · c99fca5f
  由 chengduoZH 提交于 6月 21, 2018
  
  c99fca5f
19 6月, 2018 2 次提交
- M
  
  MKLDNN layout: the code-review changes · 6512be59
  由 mozga-intel 提交于 6月 15, 2018
  
  6512be59
- T
  
  update the default cpu memory with MKLDNN · 9a25f289
  由 tensor-tang 提交于 6月 19, 2018
  
  9a25f289
16 6月, 2018 1 次提交
- T
  
  refine the initial cpu memory flag for mkldnn · a8c2ff31
  由 tensor-tang 提交于 6月 16, 2018
  
  a8c2ff31
14 6月, 2018 2 次提交

Fix NCCLBcast hang up bug in Parallel Executor (#11377) · 046bb5c8

由 Qiyang Min 提交于 6月 13, 2018

* 1. Create buddy allocator in each places before NcclBcast the variables
2. Check the memory usage of ALL gpus rather than the first one

* 1. Make NCCLGroupGuard guards only the ncclBcast part, which avoid ncclGroupEnd blocking the exception throwing
2. NOTE the usage of NCCLGroupGuard

* Remove the memory usage check of gpus

* Fix code style

046bb5c8

Remove cuptiFinalize. · d2afd210

由 Xin Pan 提交于 6月 14, 2018

In cupti samples, only cuptiFlush is used.
I can't find any places calling cuptiFinalize and
this API can error out as not_implemented in some
cuda installation.

d2afd210

13 6月, 2018 1 次提交
- Q
  
  fix build on mac · 9ebbfa6b
  由 qiaolongfei 提交于 6月 13, 2018
  
  9ebbfa6b
12 6月, 2018 1 次提交
- T
  
  add initial memory flag in MB for infer · 056dd404
  由 tensor-tang 提交于 6月 12, 2018
  
  056dd404
11 6月, 2018 1 次提交
- Y
  
  Add lock to record_event. · a1254a86
  由 yuyang18 提交于 6月 11, 2018
  
  a1254a86
08 6月, 2018 2 次提交
- G
  
  Update device_tracer.cc · 310598f9
  由 guochaorong 提交于 6月 08, 2018
  
  310598f9
- G
  
  fix some bugs introduced by unfreed memory · 0fec9469
  由 guochaorong 提交于 6月 08, 2018
  
  0fec9469
07 6月, 2018 1 次提交

Mkldnn layout (#11040) · 3ff9ba0e

由 mozga-intel 提交于 6月 07, 2018

* Add MKLDNN layout support in Paddle

Add MKLDNN layout in Paddle so that MKLDNN friendly memory layout
can be used in MKLDNN enabled OP kernel. Before this commit, NCHW
is hardcode to be used in all MKLDNN op kernels. As a result,
non-optimized execution path is selected in MKLDNN primitive which
bring worse performance.
Besides framework change, three MKLDNN OP kernels were updated
for using new MKLDNN layout. They are conv/pool2d/batch_norm.
Other MKLDNN OP kernels need be also updated in similar way to
achieve best performance.

* Add MKLDNN layout support in activation OP

* Don't populate layout from input to output when kMKLDNN in

* Refine pool mkldnn op kernel

* MKLDNN layout

* Remove the inferitance from tensor file

* MKLDNN layout: refactoring

* Remove additional #define to register new operator

* Prepare mkldnn tests to work with layout

3ff9ba0e

06 6月, 2018 5 次提交
- Q
  Fix PADDLE_ASSERT. (#10981) · e0a32074
  由 qingqing01 提交于 6月 06, 2018
```
* Enable assertions in CUDA.

* Fix PADDLE_ASSERT.
```
  e0a32074
- D
  
  "fix" · 2b9ef7e2
  由 dzhwinter 提交于 6月 05, 2018
  
  2b9ef7e2
- D
  
  "fix compiled in manylinux" · 75d8e8ca
  由 dzhwinter 提交于 6月 05, 2018
  
  75d8e8ca
- D
  
  "done" · 4777aec9
  由 dzhwinter 提交于 6月 05, 2018
  
  4777aec9
- D
  Feature/deterministic (#11205) · 7971d4a3
  由 dzhwinter 提交于 6月 06, 2018
```
* "fix deterministic"

* "fix ci"

* "fix init"
```
  7971d4a3
01 6月, 2018 4 次提交
- Y
  
  Static DSO handle · 53dab95b
  由 yuyang18 提交于 6月 01, 2018
  
  53dab95b
- Y
  
  Use static for dlsym · c5115950
  由 yuyang18 提交于 6月 01, 2018
  
  c5115950
- Y
  
  Remove lock in device context · 7cf8b656
  由 yuyang18 提交于 5月 31, 2018
  
  7cf8b656
- G
  
  Move sync_mode device ctx from grpc server (#10881) · 4fb7cc7f
  由 gongweibao 提交于 5月 31, 2018
  
  4fb7cc7f
31 5月, 2018 1 次提交
- X
  allow profiler and timeline to work when dev_ctx is nullptr. · 75ea577f
  由 Xin Pan 提交于 5月 31, 2018
```
Sometimes dev_ctx is not available when RecordEvent.
```
  75ea577f
30 5月, 2018 2 次提交
- X
  
  clean up · f14e579c
  由 Xin Pan 提交于 5月 30, 2018
  
  f14e579c
- X
  
  better profiler and benchmark · 3cb63956
  由 Xin Pan 提交于 5月 30, 2018
  
  3cb63956
23 5月, 2018 1 次提交
- X
  
  follow comments · 08e4970e
  由 Xin Pan 提交于 5月 23, 2018
  
  08e4970e
22 5月, 2018 1 次提交

multi-thread handlerequest · b4dd4c04

由 Xin Pan 提交于 5月 21, 2018

    Experiment on vgg flower, 2 trainers, 1ps.
    more trainer could have more speedup.

    After:
    Pass = 0, Iters = 327, Speed = (7.52) img/s
    Before:
    Pass = 0, Iters = 385, Speed = (6.77) img/s

b4dd4c04

21 5月, 2018 2 次提交
- K
  
  Add backward · 0aa01929
  由 Krzysztof Binias 提交于 5月 17, 2018
  
  0aa01929
- D
  
  "fix compile" (#10657) · 0e4467ee
  由 dzhwinter 提交于 5月 21, 2018
  
  0e4467ee
17 5月, 2018 1 次提交

- Draft of reuse of pooling mkldnn operator · 5f133305

由 Jacek Czaja 提交于 5月 14, 2018

- Finished draft of pooling reusing of operators

- Using gethash in PoolGrad added

- Removed diagnostic

- Added pool mkldnn grad reusing of primitives

- Added diagnostic

- Removed diagnostic

- added dependency to mkldnn data type for pooling mkldnn

- Added mkldnn memory data type determining based on template type of op

- Compilation warning fix

- codying style fixes

5f133305

15 5月, 2018 2 次提交

Fix a profiler race condition · 94c0a64d

由 Xin Pan 提交于 5月 14, 2018

In multi-thread condition, EnableProfiler can
be called after RecordEvent is constructed. In this
case, RecordEvent constructor will not init anything,
but RecordEvent destructor will do something since EnableProfiler
was called.
This PR fixes it.

94c0a64d

Y

Polish cmake · dc6ce071
由 yuyang18 提交于 5月 15, 2018

dc6ce071

14 5月, 2018 2 次提交
- Y
  
  Add build strategy · 08295f98
  由 yuyang18 提交于 5月 14, 2018
  
  08295f98
- T
  
  update by comments · 7b0c0273
  由 typhoonzero 提交于 5月 14, 2018
  
  7b0c0273
11 5月, 2018 1 次提交
- T
  
  follow comments · f5840d89
  由 typhoonzero 提交于 5月 11, 2018
  
  f5840d89
09 5月, 2018 1 次提交
- F
  
  fix a compile error (#10488) · 2bff03bc
  由 fengjiayi 提交于 5月 09, 2018
  
  2bff03bc
08 5月, 2018 1 次提交
- C
  
  add sync · 345737d0
  由 chengduoZH 提交于 5月 08, 2018
  
  345737d0
07 5月, 2018 1 次提交
- T
  
  workable version · 17009d06
  由 typhoonzero 提交于 5月 07, 2018
  
  17009d06
05 5月, 2018 1 次提交
- T
  
  testing · 3667578e
  由 typhoonzero 提交于 5月 05, 2018
  
  3667578e
04 5月, 2018 1 次提交
- C
  
  wrap_shfl_x_sync · d36af62c
  由 chengduoZH 提交于 5月 03, 2018
  
  d36af62c

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致