提交 · d2afd2102155e535c009f75a679a744cd1aff905 · Crayon鑫 / Paddle

14 6月, 2018 1 次提交

由 Xin Pan 提交于 6月 14, 2018

In cupti samples, only cuptiFlush is used.
I can't find any places calling cuptiFinalize and
this API can error out as not_implemented in some
cuda installation.

d2afd210

13 6月, 2018 1 次提交
- Q
  
  fix build on mac · 9ebbfa6b
  由 qiaolongfei 提交于 6月 13, 2018
  
  9ebbfa6b
12 6月, 2018 1 次提交
- T
  
  add initial memory flag in MB for infer · 056dd404
  由 tensor-tang 提交于 6月 12, 2018
  
  056dd404
11 6月, 2018 1 次提交
- Y
  
  Add lock to record_event. · a1254a86
  由 yuyang18 提交于 6月 11, 2018
  
  a1254a86
08 6月, 2018 2 次提交
- G
  
  Update device_tracer.cc · 310598f9
  由 guochaorong 提交于 6月 08, 2018
  
  310598f9
- G
  
  fix some bugs introduced by unfreed memory · 0fec9469
  由 guochaorong 提交于 6月 08, 2018
  
  0fec9469
07 6月, 2018 1 次提交

Mkldnn layout (#11040) · 3ff9ba0e

由 mozga-intel 提交于 6月 07, 2018

* Add MKLDNN layout support in Paddle

Add MKLDNN layout in Paddle so that MKLDNN friendly memory layout
can be used in MKLDNN enabled OP kernel. Before this commit, NCHW
is hardcode to be used in all MKLDNN op kernels. As a result,
non-optimized execution path is selected in MKLDNN primitive which
bring worse performance.
Besides framework change, three MKLDNN OP kernels were updated
for using new MKLDNN layout. They are conv/pool2d/batch_norm.
Other MKLDNN OP kernels need be also updated in similar way to
achieve best performance.

* Add MKLDNN layout support in activation OP

* Don't populate layout from input to output when kMKLDNN in

* Refine pool mkldnn op kernel

* MKLDNN layout

* Remove the inferitance from tensor file

* MKLDNN layout: refactoring

* Remove additional #define to register new operator

* Prepare mkldnn tests to work with layout

3ff9ba0e

06 6月, 2018 5 次提交
- Q
  Fix PADDLE_ASSERT. (#10981) · e0a32074
  由 qingqing01 提交于 6月 06, 2018
```
* Enable assertions in CUDA.

* Fix PADDLE_ASSERT.
```
  e0a32074
- D
  
  "fix" · 2b9ef7e2
  由 dzhwinter 提交于 6月 05, 2018
  
  2b9ef7e2
- D
  
  "fix compiled in manylinux" · 75d8e8ca
  由 dzhwinter 提交于 6月 05, 2018
  
  75d8e8ca
- D
  
  "done" · 4777aec9
  由 dzhwinter 提交于 6月 05, 2018
  
  4777aec9
- D
  Feature/deterministic (#11205) · 7971d4a3
  由 dzhwinter 提交于 6月 06, 2018
```
* "fix deterministic"

* "fix ci"

* "fix init"
```
  7971d4a3
01 6月, 2018 4 次提交
- Y
  
  Static DSO handle · 53dab95b
  由 yuyang18 提交于 6月 01, 2018
  
  53dab95b
- Y
  
  Use static for dlsym · c5115950
  由 yuyang18 提交于 6月 01, 2018
  
  c5115950
- Y
  
  Remove lock in device context · 7cf8b656
  由 yuyang18 提交于 5月 31, 2018
  
  7cf8b656
- G
  
  Move sync_mode device ctx from grpc server (#10881) · 4fb7cc7f
  由 gongweibao 提交于 5月 31, 2018
  
  4fb7cc7f
31 5月, 2018 1 次提交
- X
  allow profiler and timeline to work when dev_ctx is nullptr. · 75ea577f
  由 Xin Pan 提交于 5月 31, 2018
```
Sometimes dev_ctx is not available when RecordEvent.
```
  75ea577f
30 5月, 2018 2 次提交
- X
  
  clean up · f14e579c
  由 Xin Pan 提交于 5月 30, 2018
  
  f14e579c
- X
  
  better profiler and benchmark · 3cb63956
  由 Xin Pan 提交于 5月 30, 2018
  
  3cb63956
23 5月, 2018 1 次提交
- X
  
  follow comments · 08e4970e
  由 Xin Pan 提交于 5月 23, 2018
  
  08e4970e
22 5月, 2018 1 次提交

multi-thread handlerequest · b4dd4c04

由 Xin Pan 提交于 5月 21, 2018

    Experiment on vgg flower, 2 trainers, 1ps.
    more trainer could have more speedup.

    After:
    Pass = 0, Iters = 327, Speed = (7.52) img/s
    Before:
    Pass = 0, Iters = 385, Speed = (6.77) img/s

b4dd4c04

21 5月, 2018 2 次提交
- K
  
  Add backward · 0aa01929
  由 Krzysztof Binias 提交于 5月 17, 2018
  
  0aa01929
- D
  
  "fix compile" (#10657) · 0e4467ee
  由 dzhwinter 提交于 5月 21, 2018
  
  0e4467ee
17 5月, 2018 1 次提交

- Draft of reuse of pooling mkldnn operator · 5f133305

由 Jacek Czaja 提交于 5月 14, 2018

- Finished draft of pooling reusing of operators

- Using gethash in PoolGrad added

- Removed diagnostic

- Added pool mkldnn grad reusing of primitives

- Added diagnostic

- Removed diagnostic

- added dependency to mkldnn data type for pooling mkldnn

- Added mkldnn memory data type determining based on template type of op

- Compilation warning fix

- codying style fixes

5f133305

15 5月, 2018 2 次提交

Fix a profiler race condition · 94c0a64d

由 Xin Pan 提交于 5月 14, 2018

In multi-thread condition, EnableProfiler can
be called after RecordEvent is constructed. In this
case, RecordEvent constructor will not init anything,
but RecordEvent destructor will do something since EnableProfiler
was called.
This PR fixes it.

94c0a64d

Y

Polish cmake · dc6ce071
由 yuyang18 提交于 5月 15, 2018

dc6ce071

14 5月, 2018 2 次提交
- Y
  
  Add build strategy · 08295f98
  由 yuyang18 提交于 5月 14, 2018
  
  08295f98
- T
  
  update by comments · 7b0c0273
  由 typhoonzero 提交于 5月 14, 2018
  
  7b0c0273
11 5月, 2018 1 次提交
- T
  
  follow comments · f5840d89
  由 typhoonzero 提交于 5月 11, 2018
  
  f5840d89
09 5月, 2018 1 次提交
- F
  
  fix a compile error (#10488) · 2bff03bc
  由 fengjiayi 提交于 5月 09, 2018
  
  2bff03bc
08 5月, 2018 1 次提交
- C
  
  add sync · 345737d0
  由 chengduoZH 提交于 5月 08, 2018
  
  345737d0
07 5月, 2018 1 次提交
- T
  
  workable version · 17009d06
  由 typhoonzero 提交于 5月 07, 2018
  
  17009d06
05 5月, 2018 1 次提交
- T
  
  testing · 3667578e
  由 typhoonzero 提交于 5月 05, 2018
  
  3667578e
04 5月, 2018 3 次提交
- C
  
  wrap_shfl_x_sync · d36af62c
  由 chengduoZH 提交于 5月 03, 2018
  
  d36af62c
- T
  
  complete code · d9320dcd
  由 typhoonzero 提交于 5月 04, 2018
  
  d9320dcd
- X
  
  clean up · 5a9f17f0
  由 Xin Pan 提交于 5月 04, 2018
  
  5a9f17f0
03 5月, 2018 4 次提交
- X
  
  Add timeline support for distributed training · 76d8b14b
  由 Xin Pan 提交于 5月 03, 2018
  
  76d8b14b
- C
  
  fix __shfl · e97c1a8c
  由 chengduoZH 提交于 5月 03, 2018
  
  e97c1a8c
- Y
  Fix the bug when a input variable of op is dispensable. (#10268) · 6084af47
  由 Yiqun Liu 提交于 5月 03, 2018
```
* Fix the bug when a input variable of op is dispensable.

* Add HasInputs/Outputs interfaces to OperatorBase.

* Remove the unreferenced header file.
```
  6084af47
- C
  Fix __shfl_down_sync_ of cross_entropy (#10345) · 4fbde42c
  由 chengduo 提交于 5月 03, 2018
```
* fix __shfl_down_sync_ of cross_entropy

* use reduceSum

* "fix ci"
```
  4fbde42c

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致