提交 · f4634d76d719810da4b4d1bfe9549ab814dfc58a · PaddlePaddle / Paddle

26 2月, 2019 7 次提交

Optimize the CUDA implementation of sequence_expand op by reduce the times of... · f4634d76

由 Yiqun Liu 提交于 2月 26, 2019

Optimize the CUDA implementation of sequence_expand op by reduce the times of copying lod data from CPU to GPU. (#15493)

* Optimize the CUDA implementation of sequence_expand op by reduce the times of copying lod data from CPU to GPU.
test=develop

* Refine the op benchmark to support setting lod in config.
test=develop

f4634d76

T
Merge pull request #15923 from Sand3r-/mgallus/conv-residual-ut · 60546b78
由 Tao Luo 提交于 2月 26, 2019
```
Add Conv Residual Connection UT for Projection
```
60546b78

This PR improve performance of prior_box op about 1.25x faster on CPU. (#15909) · 630c1e83

由 guomingz 提交于 2月 26, 2019

* This PR improve performance of prior_box op about 1.25x faster on CPU.

* Test Env:SKX 8180 with fake data on 28 threads(bs=1).
* The below table shows the ~25% improvement which generated by [eval_tp_fake_data.py](https://github.com/PaddlePaddle/Paddle/issues/15618#issuecomment-464613976).

| Type |Event | Calls |   Total     |  Min.    |   Max.      |  Ave.      |  Ratio.|
| ---------------- | ------------------ | ---- | ------- | -------- | -------- | ------------ | -------- |
| w/ optimization  | thread0::prior_box | 6000 | 921.201 | 0.110572 | 0.383402 | **0.153533** | 0.084585 |
| w/o optimization | thread0::prior_box | 6000 | 1151.85 | 0.102276 | 0.426702 | **0.191976** | 0.103337 |

test=develop

* Fix the style issue.

test=develop

630c1e83

T
Merge pull request #15914 from Sand3r-/mgallus/mkldnn-sum-code-reuse · 9c05421c
由 Tao Luo 提交于 2月 26, 2019
```
Refactor MKL-DNN Sum to use reference version on fallback
```
9c05421c

Add alloc_continuous_space_op (#15900) · 7ca8553d

由 chengduo 提交于 2月 25, 2019

* add alloc_continuous_space_op
test=develop

* Polish code
test=develop

* follow comment
test=develop

7ca8553d

D
Merge pull request #15904 from dzhwinter/fix/disable_temp · 131e4a3b
由 dzhwinter 提交于 2月 26, 2019
```
fix nightly build
```
131e4a3b
W
Merge pull request #15916 from wopeizl/win/fixevent1 · 2192c464
由 wopeizl 提交于 2月 26, 2019
```
fix build issue for cudaEvent_t
```
2192c464

25 2月, 2019 19 次提交
- M
  Add Conv Residual Connection UT for Projection · 6a2bc9a2
  由 Michal Gallus 提交于 2月 25, 2019
```
test=develop
```
  6a2bc9a2
- M
  Improve code reuse at MKL-DNN sum · 6ebe9877
  由 Michal Gallus 提交于 2月 25, 2019
```
test=develop
```
  6ebe9877
- D
  Merge pull request #15855 from dzhwinter/fix/nightly_test · 660e4106
  由 dzhwinter 提交于 2月 25, 2019
```
accelerate memory optimize process
```
  660e4106
- P
  
  test=develop · c6472579
  由 peizhilin 提交于 2月 25, 2019
  
  c6472579
- P
  fix build issue for cudaEvent_t · b5d6e38b
  由 peizhilin 提交于 2月 25, 2019
```
test=develop
```
  b5d6e38b
- Q
  Merge pull request #15831 from velconia/imperative_engine · 4bd28b30
  由 Qiyang Min 提交于 2月 25, 2019
```
Imperative training network to the end
```
  4bd28b30
- X
  Merge pull request #15425 from panyx0718/api · a6e3cd5e
  由 Xin Pan 提交于 2月 25, 2019
```
Pass graph to parallel executor instead of program
```
  a6e3cd5e
- W
  Merge pull request #15905 from wopeizl/win/fix_eigen · 3ccd8964
  由 wopeizl 提交于 2月 25, 2019
```
fix build issue on windows for sample prop op
```
  3ccd8964
- C
  Remove unnecessary dependence for profiler (#15899) · 8e904d32
  由 chengduo 提交于 2月 25, 2019
```
* refile profiler
test=develop

* follow comment
test=develop
```
  8e904d32
- Q
  Refine doc of uniform_random and fix dtype (#15873) · d8128930
  由 qingqing01 提交于 2月 25, 2019
```
* Refine doc of uniform_random and fix dtype
* Update defaule value in the arguments
```
  d8128930
- X
  Merge pull request #15844 from panyx0718/infer · 44e7fcdd
  由 Xin Pan 提交于 2月 25, 2019
```
add per kernel config and remove const_cast.
```
  44e7fcdd
- D
  
  fix default value. test=develop · a71f2fbe
  由 dzhwinter 提交于 2月 25, 2019
  
  a71f2fbe
- J
  [MKL-DNN] MKL-DNN specific Tensor modification (#15429) · dec9cf53
  由 Jacek Czaja 提交于 2月 25, 2019
```
* - Implemented draft of primitive desc keeping in Tensor

test=develop

- TransposeMKLDNNHandler::AcquireSrcMemory was reimplemented

- Added nchw and nc formats setting for sake of compatiblity

Fixed unit tests

- Worakaround to problem with 5D data in conv

- Added 3D and 1D MKL-DNN formats for name handles for tensor

test=develop

- Fix to UTs

test=develop

- Conv fp32 op was updated

Cosmetic fixes

test=develop

- tensor mkldnn cosmetics

test=develop

- Moved most of mkl-dnn specific code from Tensor to mkl-dnn utils

* - Lint fixes

test=develop

* - setting prim dec in Tensor , sets also layout to kMKLDNN

test=develop

* - Moved creation of prim desc totally out of Tensor

test=develop

* - Cosmetic fixes adter review

test=develop
```
  dec9cf53
- K
  Merge pull request #15897 from heavengate/fix_pool3d_doc · e41db6e1
  由 Kaipeng Deng 提交于 2月 25, 2019
```
fix pool3d doc. test=develop
```
  e41db6e1
- X
  follow comments · 8b1672fe
  由 Xin Pan 提交于 2月 25, 2019
```
test=develop
```
  8b1672fe
- M
  Polish code · e9fdf909
  由 minqiyang 提交于 2月 25, 2019
```
test=develop
```
  e9fdf909
- X
  polish · 5dd281f7
  由 Xin Pan 提交于 2月 25, 2019
```
test=develop
```
  5dd281f7
- P
  fix build issue on windows for sample prop op · 6ccdb1b9
  由 peizhilin 提交于 2月 25, 2019
```
test=develop
```
  6ccdb1b9
- D
  
  fix default value. test=develop · 25782419
  由 dzhwinter 提交于 2月 25, 2019
  
  25782419
24 2月, 2019 7 次提交
- D
  
  use kernel size in global_pooling. test=develop · 373cfb0c
  由 dengkaipeng 提交于 2月 24, 2019
  
  373cfb0c
- D
  
  add python example. test=develop · de50854e
  由 dengkaipeng 提交于 2月 24, 2019
  
  de50854e
- D
  
  fix spell mistakes. test=develop · 60305196
  由 dengkaipeng 提交于 2月 24, 2019
  
  60305196
- D
  
  use comment in pool3d. test=develop · 26825d99
  由 dengkaipeng 提交于 2月 24, 2019
  
  26825d99
- D
  
  fix pool3d doc. test=develop · aecc9741
  由 dengkaipeng 提交于 2月 24, 2019
  
  aecc9741
- D
  
  add memset CUPTI && test=develop (#15868) · c6bd434f
  由 Dun 提交于 2月 24, 2019
  
  c6bd434f
- X
  Merge pull request #15893 from xuezhong/add_sample_logits_op · f857e079
  由 xuezhong 提交于 2月 24, 2019
```
fix bug for sampled softmax
```
  f857e079
23 2月, 2019 7 次提交
- 乔
  Merge pull request #15840 from jacquesqiao/revert-15684-revert-15661-fix-cpu-broadcast · ec8e8782
  由乔龙飞 Qiao Longfei 提交于 2月 23, 2019
```
fix cpu broadcast
```
  ec8e8782
- M
  Polish code · a15a3fc3
  由 minqiyang 提交于 2月 23, 2019
```
test=develop
```
  a15a3fc3
- T
  Merge pull request #15882 from sfraczek/unique_ptr_dereference · 8a7efc78
  由 Tao Luo 提交于 2月 23, 2019
```
Change *(smart_ptr.get()) -> *smart_ptr
```
  8a7efc78
- X
  use soft label for sampled softmax · a5acb37e
  由 xuezhong 提交于 2月 23, 2019
```
test=develop
```
  a5acb37e
- C
  [Don't merge now]update_readme_to_1.3 (#15837) · 5b06ec25
  由 Cheerego 提交于 2月 23, 2019
```
* [Don't merge now]update_readme_to_1.3

* fix sth

test=develop

* update reademe_cn

test=develop

* fix en

test=develop
```
  5b06ec25
- X
  Merge pull request #15609 from xuezhong/add_sample_logits_op · 1dad36f6
  由 xuezhong 提交于 2月 23, 2019
```
add sample_logits  and sampled_softmax_with_cross_entropy op
```
  1dad36f6
- Q
  
  refine code test=develop · 2b7931d5
  由 Qiao Longfei 提交于 2月 23, 2019
  
  2b7931d5

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功