提交 · 60647c9aa48296d746b89ee68198f7f36db7b154 · BaiXuePrincess / Paddle

22 6月, 2018 1 次提交
- C
  
  enhance ParallelExecutor stable (#11637) · da556ed6
  由 chengduo 提交于 6月 22, 2018
  
  da556ed6
21 6月, 2018 4 次提交

由 Jacek Czaja 提交于 5月 08, 2018

- Added hash function inside of MKLDNN softmax op to be used as handle for primitives stroing in a
context

- Style fixes to softmax mkldnn op

- Fixes after review

- Coding style

- Fix to style

- style fixes

- style fix

- style fixes

- Fix to cody style check

- Rephrasing a comment

fix t obroken merge

Fixes to rebase

Conflicts:
	benchmark/fluid/models/machine_translation.py
	cmake/external/mkldnn.cmake
	paddle/fluid/operators/softmax_mkldnn_op.cc

- Bumped revision of MKL-DNN up to have softmax backward primitive

- Added choosing MKLDNN softmax grad operator

- First reuse of softmax backward

- Reinvented reusing for softmax

- Fix to crash in reinvented reuse

- Clang format fixes

- Clang format fixes

- Improved softmax mkldnn reuse mechanism

- clang format fixes

- Fix to broken merge

- Fix

98f3ad3b

T
Revert "Merge pull request #11628 from PaddlePaddle/revert-11102-mozga-intel/Sum_mkldnn_layout" · d5fb8fa7
由 tensor-tang 提交于 6月 21, 2018
```
This reverts commit 4d8e8ee2, reversing
changes made to d6a9f005.
```
d5fb8fa7
T

Revert "MKLDNN layout: Support for sum operator" · 90780e22
由 tensor-tang 提交于 6月 21, 2018

90780e22
C

Add No Mutex · c99fca5f
由 chengduoZH 提交于 6月 21, 2018

c99fca5f

19 6月, 2018 2 次提交
- M
  
  MKLDNN layout: the code-review changes · 6512be59
  由 mozga-intel 提交于 6月 15, 2018
  
  6512be59
- T
  
  update the default cpu memory with MKLDNN · 9a25f289
  由 tensor-tang 提交于 6月 19, 2018
  
  9a25f289
16 6月, 2018 1 次提交
- T
  
  refine the initial cpu memory flag for mkldnn · a8c2ff31
  由 tensor-tang 提交于 6月 16, 2018
  
  a8c2ff31
14 6月, 2018 2 次提交

Fix NCCLBcast hang up bug in Parallel Executor (#11377) · 046bb5c8

由 Qiyang Min 提交于 6月 13, 2018

* 1. Create buddy allocator in each places before NcclBcast the variables
2. Check the memory usage of ALL gpus rather than the first one

* 1. Make NCCLGroupGuard guards only the ncclBcast part, which avoid ncclGroupEnd blocking the exception throwing
2. NOTE the usage of NCCLGroupGuard

* Remove the memory usage check of gpus

* Fix code style

046bb5c8

Remove cuptiFinalize. · d2afd210

由 Xin Pan 提交于 6月 14, 2018

In cupti samples, only cuptiFlush is used.
I can't find any places calling cuptiFinalize and
this API can error out as not_implemented in some
cuda installation.

d2afd210

13 6月, 2018 1 次提交
- Q
  
  fix build on mac · 9ebbfa6b
  由 qiaolongfei 提交于 6月 13, 2018
  
  9ebbfa6b
12 6月, 2018 1 次提交
- T
  
  add initial memory flag in MB for infer · 056dd404
  由 tensor-tang 提交于 6月 12, 2018
  
  056dd404
11 6月, 2018 1 次提交
- Y
  
  Add lock to record_event. · a1254a86
  由 yuyang18 提交于 6月 11, 2018
  
  a1254a86
08 6月, 2018 2 次提交
- G
  
  Update device_tracer.cc · 310598f9
  由 guochaorong 提交于 6月 08, 2018
  
  310598f9
- G
  
  fix some bugs introduced by unfreed memory · 0fec9469
  由 guochaorong 提交于 6月 08, 2018
  
  0fec9469
07 6月, 2018 1 次提交

Mkldnn layout (#11040) · 3ff9ba0e

由 mozga-intel 提交于 6月 07, 2018

* Add MKLDNN layout support in Paddle

Add MKLDNN layout in Paddle so that MKLDNN friendly memory layout
can be used in MKLDNN enabled OP kernel. Before this commit, NCHW
is hardcode to be used in all MKLDNN op kernels. As a result,
non-optimized execution path is selected in MKLDNN primitive which
bring worse performance.
Besides framework change, three MKLDNN OP kernels were updated
for using new MKLDNN layout. They are conv/pool2d/batch_norm.
Other MKLDNN OP kernels need be also updated in similar way to
achieve best performance.

* Add MKLDNN layout support in activation OP

* Don't populate layout from input to output when kMKLDNN in

* Refine pool mkldnn op kernel

* MKLDNN layout

* Remove the inferitance from tensor file

* MKLDNN layout: refactoring

* Remove additional #define to register new operator

* Prepare mkldnn tests to work with layout

3ff9ba0e

06 6月, 2018 5 次提交
- Q
  Fix PADDLE_ASSERT. (#10981) · e0a32074
  由 qingqing01 提交于 6月 06, 2018
```
* Enable assertions in CUDA.

* Fix PADDLE_ASSERT.
```
  e0a32074
- D
  
  "fix" · 2b9ef7e2
  由 dzhwinter 提交于 6月 05, 2018
  
  2b9ef7e2
- D
  
  "fix compiled in manylinux" · 75d8e8ca
  由 dzhwinter 提交于 6月 05, 2018
  
  75d8e8ca
- D
  
  "done" · 4777aec9
  由 dzhwinter 提交于 6月 05, 2018
  
  4777aec9
- D
  Feature/deterministic (#11205) · 7971d4a3
  由 dzhwinter 提交于 6月 06, 2018
```
* "fix deterministic"

* "fix ci"

* "fix init"
```
  7971d4a3
01 6月, 2018 4 次提交
- Y
  
  Static DSO handle · 53dab95b
  由 yuyang18 提交于 6月 01, 2018
  
  53dab95b
- Y
  
  Use static for dlsym · c5115950
  由 yuyang18 提交于 6月 01, 2018
  
  c5115950
- Y
  
  Remove lock in device context · 7cf8b656
  由 yuyang18 提交于 5月 31, 2018
  
  7cf8b656
- G
  
  Move sync_mode device ctx from grpc server (#10881) · 4fb7cc7f
  由 gongweibao 提交于 5月 31, 2018
  
  4fb7cc7f
31 5月, 2018 1 次提交
- X
  allow profiler and timeline to work when dev_ctx is nullptr. · 75ea577f
  由 Xin Pan 提交于 5月 31, 2018
```
Sometimes dev_ctx is not available when RecordEvent.
```
  75ea577f
30 5月, 2018 2 次提交
- X
  
  clean up · f14e579c
  由 Xin Pan 提交于 5月 30, 2018
  
  f14e579c
- X
  
  better profiler and benchmark · 3cb63956
  由 Xin Pan 提交于 5月 30, 2018
  
  3cb63956
23 5月, 2018 1 次提交
- X
  
  follow comments · 08e4970e
  由 Xin Pan 提交于 5月 23, 2018
  
  08e4970e
22 5月, 2018 1 次提交

multi-thread handlerequest · b4dd4c04

由 Xin Pan 提交于 5月 21, 2018

    Experiment on vgg flower, 2 trainers, 1ps.
    more trainer could have more speedup.

    After:
    Pass = 0, Iters = 327, Speed = (7.52) img/s
    Before:
    Pass = 0, Iters = 385, Speed = (6.77) img/s

b4dd4c04

21 5月, 2018 2 次提交
- K
  
  Add backward · 0aa01929
  由 Krzysztof Binias 提交于 5月 17, 2018
  
  0aa01929
- D
  
  "fix compile" (#10657) · 0e4467ee
  由 dzhwinter 提交于 5月 21, 2018
  
  0e4467ee
17 5月, 2018 1 次提交

- Draft of reuse of pooling mkldnn operator · 5f133305

由 Jacek Czaja 提交于 5月 14, 2018

- Finished draft of pooling reusing of operators

- Using gethash in PoolGrad added

- Removed diagnostic

- Added pool mkldnn grad reusing of primitives

- Added diagnostic

- Removed diagnostic

- added dependency to mkldnn data type for pooling mkldnn

- Added mkldnn memory data type determining based on template type of op

- Compilation warning fix

- codying style fixes

5f133305

15 5月, 2018 2 次提交

Fix a profiler race condition · 94c0a64d

由 Xin Pan 提交于 5月 14, 2018

In multi-thread condition, EnableProfiler can
be called after RecordEvent is constructed. In this
case, RecordEvent constructor will not init anything,
but RecordEvent destructor will do something since EnableProfiler
was called.
This PR fixes it.

94c0a64d

Y

Polish cmake · dc6ce071
由 yuyang18 提交于 5月 15, 2018

dc6ce071

14 5月, 2018 2 次提交
- Y
  
  Add build strategy · 08295f98
  由 yuyang18 提交于 5月 14, 2018
  
  08295f98
- T
  
  update by comments · 7b0c0273
  由 typhoonzero 提交于 5月 14, 2018
  
  7b0c0273
11 5月, 2018 1 次提交
- T
  
  follow comments · f5840d89
  由 typhoonzero 提交于 5月 11, 2018
  
  f5840d89
09 5月, 2018 1 次提交
- F
  
  fix a compile error (#10488) · 2bff03bc
  由 fengjiayi 提交于 5月 09, 2018
  
  2bff03bc
08 5月, 2018 1 次提交
- C
  
  add sync · 345737d0
  由 chengduoZH 提交于 5月 08, 2018
  
  345737d0

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致