提交 · 1dad8ceaabfb7d46d229a67ce54846d583c071de · 机器未来 / Paddle

20 11月, 2020 2 次提交

G

Fix gpu memory allocation bug. (#28703) · 1dad8cea
由 gongweibao 提交于 11月 20, 2020

1dad8cea

adjust kunlun header file (#28536) · 30ef3815

由 QingshuChen 提交于 11月 20, 2020

* adjust kunlun header file
*test=kunlun

* update kunlun unittest
*test=kunlun

* update xpu unitest
* test = kunlun

* update xpu unittest
* test=kunlun

* update xpu unitest
* test=kunlun

30ef3815

17 11月, 2020 2 次提交
- J
  
  [oneDNN] Layer norm bf16 kernel (#28619) · 6d8d3d4c
  由 Jacek Czaja 提交于 11月 17, 2020
  
  6d8d3d4c
- L
  
  bug fix, test=develop (#28674) · 80d20246
  由 lilong12 提交于 11月 17, 2020
  
  80d20246
13 11月, 2020 1 次提交
- Z
  
  fix user set CUDA_VISIBLE_DEVICES start/end with quotation marks (#28547) · 849467b5
  由 Zhou Wei 提交于 11月 13, 2020
  
  849467b5
04 11月, 2020 1 次提交
- C
  
  show cpp stack when catch signal (#28415) · 23439b16
  由 Chen Weihang 提交于 11月 04, 2020
  
  23439b16
03 11月, 2020 4 次提交

TensorRT中ernie模型推理性能优化，支持变长输入 (#28367) · ea851796

由 Shang Zhizhou 提交于 11月 03, 2020

* fp16 result ok

* change -DWITH_NVINFER_PLUGIN toconfig.EnableTensorRtOSS

* auto detect special slice op converter for ernie with trt oss

* ernie oss only support fp16

* fix special_slice_plugin serialize bug

* matmul in tensorrt ok

* ernie unittest ok

* add matmul tensorrt unittest

* remove demo code

ea851796

J

[oneDNN] sum op refactor (#28318) · 84cc61b2
由 Jacek Czaja 提交于 11月 03, 2020

84cc61b2
W

Paddle support compile on sw (#27858) · 09fd2b2a
由 Wilber 提交于 11月 03, 2020

09fd2b2a

Add rnn_op (#28197) · 9a600df3

由 Guo Sheng 提交于 11月 03, 2020

* Add rnn_op.
test=develop

* Fix rnn_op grad maker's drop_empty_grad.
test=develop

9a600df3

02 11月, 2020 2 次提交

W

refine the gpu config for performance optimization (#28291) · 0f4b6247
由 wangchaochaohu 提交于 11月 02, 2020

0f4b6247

Retry CUDA Initialization to Fix Random Failure, test=develop (#28323) · acc11c2a

由 Huihuang Zheng 提交于 11月 02, 2020

This PR is follow up of #28213. On that PR we tried to decrease GPU usage, however the CI still randomly failed. So I added retry logic for the initialization of nccl and cusolver. If the initialization failed, we can retry to avoid the random failure.

acc11c2a

30 10月, 2020 1 次提交
- L
  
  hide some logs of p2p (#28307) · 18c86fb2
  由 Leo Chen 提交于 10月 30, 2020
  
  18c86fb2
28 10月, 2020 1 次提交
- J
  
  [oneDNN ] conv2d fwd&bwd optimization (#27871) · c11d9b30
  由 Jacek Czaja 提交于 10月 28, 2020
  
  c11d9b30
27 10月, 2020 1 次提交

Enrich the python error types of paddle & polish format (#28124) · 813b2ade

由 Chen Weihang 提交于 10月 27, 2020

* add multiple exception type

* define all exception & polish compile pystack

* mapping paddle error to python exception

* polish static mode error format

* fix failed unittests

* fix dytostatic test_error

* fix check_nan_inf failed

* add unittest for coverage

* revert some code try to solve compile error

* refactor enforce & error change

* polish code & add unittest

813b2ade

23 10月, 2020 1 次提交

Add compile limit for PADDLE_ENFORCE without error message (#28221) · 2babd6ff

由 Chen Weihang 提交于 10月 23, 2020

* add compile limit for paddle enforce

* polish elementwise_op_function.cu.h

* fix failed unittest

* fix windows compile failed

* detail polish

* revert no type constructor

2babd6ff

21 10月, 2020 1 次提交
- Z
  
  fix dynamic_loader more safe and error message on windows (#28117) · 5d700021
  由 Zhou Wei 提交于 10月 21, 2020
  
  5d700021
20 10月, 2020 1 次提交
- W
  
  refine gpu kernel config for Paddle (#28085) · 463c72c2
  由 wangchaochaohu 提交于 10月 20, 2020
  
  463c72c2
19 10月, 2020 1 次提交
- P
  
  reduce trt warning message (#28011) · a0b2f936
  由 Pei Yang 提交于 10月 19, 2020
  
  a0b2f936
16 10月, 2020 1 次提交

[oneDNN] Conv dilation support (#27914) · 7cb4a8b8

由 lidanqing 提交于 10月 16, 2020

* conv dilated mkldnn support: forward and backward pass

* add mkldnn conv_transpose dilation UT
test=develop

* remove unnecessary PADDLE_ENFORCE

* add int8 and bf16 dilated conv UT

* update according to reviews

7cb4a8b8

14 10月, 2020 1 次提交
- Z
  tune backward filter algorithm for float16 (#27529) · d5cc144c
  由 Zhang Ting 提交于 10月 14, 2020
```
* use exhaustive_search for float16

* tune algo only when dtype is float16
```
  d5cc144c
12 10月, 2020 2 次提交
- J
  
  [oneDNN] adaptive pool support (#27747) · 55e63763
  由 Jacek Czaja 提交于 10月 12, 2020
  
  55e63763
- add musl option (#27798) · 6335e6a0
  由 chen.zhiyu 提交于 10月 12, 2020
  
  6335e6a0
01 10月, 2020 1 次提交

Fix to issue #25537 (#27546) · b9fda2ff

由 Jacek Czaja 提交于 10月 01, 2020

* - condidate fix to issue #25537

test=develop

* - UT for transpose NHWC

test=develop

b9fda2ff

30 9月, 2020 1 次提交
- J
  Add avx512 core instructions check (#27732) · 0cd4907e
  由 joanna.wozna.intel 提交于 9月 30, 2020
```
* Add avx instructions check

* Small fix

* Change function name

* Change uint to unsigned int
```
  0cd4907e
29 9月, 2020 2 次提交
- 1
  test=develop, optimize geo communicator (#26857) · cc780b19
  由 123malin 提交于 9月 29, 2020
```
* test=develop, optimize geo communicator 
```
  cc780b19
- L
  Initialize gloo for low level collective apis (#27672) · bbc2add7
  由 lilong12 提交于 9月 29, 2020
```
* add gloo initializer, test=develop
```
  bbc2add7
28 9月, 2020 4 次提交
- A
  Add support for mkldnn ops types selection with FLAGS in dygraph (#27482) · 0ecf441a
  由 arlesniak 提交于 9月 28, 2020
```
* Add support for mkldnn ops types selection with FLAGS in dygraph

* use regex to match DNNL verbose

* python3 encoding fix
```
  0ecf441a
- L
  
  Revert "Initialize gloo for low level collective apis (#27356)", test=document_fix (#27665) · 36c04102
  由 lilong12 提交于 9月 28, 2020
  
  36c04102
- L
  add ncclSend and ncclRecv (#27621) · 5218b7af
  由 lilong12 提交于 9月 28, 2020
```
* include ncclRecv and ncclSend, test=develop
```
  5218b7af
- L
  Initialize gloo for low level collective apis (#27356) · fa73e4a2
  由 lilong12 提交于 9月 28, 2020
```
* add gloo initializer, test=develop
```
  fa73e4a2
27 9月, 2020 2 次提交

add support to float64 input of warpctc op. (#27399) · 1501a80f

由 Li Fuchen 提交于 9月 27, 2020

* add float64 input to ctc_loss

* modified error message of  warpctc

* update repo and tag of warpctc

* add test for warpctc with float64 input

* modified warpctc.cmake to make sure build always

* resolved sample code bug of warpctc

* add core.ops in warpctc dygraph

* fix a bug of test

1501a80f

support elementwise add, activation, matmul on Baidu Kunlun (#27143) · 6b727e08

由 QingshuChen 提交于 9月 27, 2020

* support elementwise add, activation, matmul on Baidu Kunlun
* test=kunlun

* minor
* test=kunlun

* reconstuct the xpu directory
* test=kunlun

* minor
* test=kunlun

* minor
* test=kunlun

* minor
* test=kunlun

* minor
* test=kunlun

* minor
* test=kunlun

6b727e08

26 9月, 2020 1 次提交
- Z
  fix cpplint error for the autmic max/min · a85592bc
  由 Zhong Hui 提交于 9月 26, 2020
```
fix cpplint error for the autmic max/min
```
  a85592bc
25 9月, 2020 1 次提交
- Z
  fix cuda atomic for ARCH<350 for the automic_max · 597345d1
  由 Zhong Hui 提交于 9月 25, 2020
```
fix cuda atomic for ARCH<350 for the automic_max
```
  597345d1
24 9月, 2020 3 次提交

S
fix tensorrt 6 build error. test=develop (#27511) · 8f7bb52b
由 Shibo Tao 提交于 9月 24, 2020
```
* fix tensorrt 6 build error. test=develop

* fix. test=develop

* bug fix

* test=develop
```
8f7bb52b

use iwyu clean include (#27267) · df43905f

由 wanghuancoder 提交于 9月 24, 2020

* use iwyu clean include, test=develop, test=win

* compilation error, test=develop

* fix compilation error2, test=develop

* fix compilation error3, test=develop

* fix compilation error4, test=develop

* fix compilation error5, test=develop

* fix compilation error6, test=develop

* fix compilation error7, test=develop

* fix compilation error8, test=develop

* fix compilation error8, test=develop

* fix compilation error10, test=develop

* fix compilation error11, test=develop

df43905f

Z
Add GPU Kernels of Segment Ops, support, sum, max, min, mean · 4a9d21de
由 Zhong Hui 提交于 9月 24, 2020
```
Add GPU Kernels of Segment Ops,  support, sum, max, min, mean
```
4a9d21de

23 9月, 2020 2 次提交
- S
  [bug fix]:Memory increases after adapting the cudnn version to cudnn8 (#27436) · c17f9cf2
  由 Shang Zhizhou 提交于 9月 23, 2020
```
* [bug fix]:Memory increases after adapting the cudnn version to 8

* [bug fix]cudnnGetConvolutionForwardAlgorithm not defined
```
  c17f9cf2
- C
  Polish some lost invalid error message (#27445) · 76506447
  由 Chen Weihang 提交于 9月 23, 2020
```
* polish some lost error msg

* add some math file to white list

* polish detail based reviewer commnet
```
  76506447

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致