提交 · c9e874fc8e40352d581e6c80e0c1bb573f1cd834 · Crayon鑫 / Paddle

23 12月, 2020 1 次提交
- J
  
  [oneDNN] Unit test for checking oneDNN caching (#29606) · c9e874fc
  由 Jacek Czaja 提交于 12月 23, 2020
  
  c9e874fc
21 12月, 2020 1 次提交
- H
  Add Retry Logic to CublasHandlerHolder · 1cbb282d
  由 Huihuang Zheng 提交于 12月 21, 2020
```
Add Retry Logic to CublasHandlerHolder to avoid random unittest failure.
```
  1cbb282d
19 12月, 2020 1 次提交
- J
  [oneDNN] Reimplemented elementwise_add grad (#29747) · 07790ba1
  由 Jacek Czaja 提交于 12月 19, 2020
```
* - Reimplemented elementwise_add grad

- lint

* - fix after review

* - Fix to fix after review
```
  07790ba1
18 12月, 2020 1 次提交
- A
  
  Polish code in gpu_launch_config.h (#29730) · 17c8e3ad
  由 Aurelius84 提交于 12月 18, 2020
  
  17c8e3ad
17 12月, 2020 3 次提交

W
Windows generate pdb and dump, for debug (#29628) · 0c59ad2a
由 wanghuancoder 提交于 12月 17, 2020
```
* Windows generate pdb and dump, for debug

* fix code style, test=develop

* modify cmakelist
```
0c59ad2a

Modify CublasHandleHolder to Fix Random Unittest Failure. test=develop (#29617) · 4c4d4ba5

由 Huihuang Zheng 提交于 12月 17, 2020

Modify CublasHandleHolder from using PADDLE_ENFORCE_CUDA_SUCCESS to PADDLE_RETRY_CUDA_SUCCESS to fix random unittest failure. We checked that the unittest log showed CUDA allocation error at this file, which may due to GPU not enough. We fixed similar failure in the past, so we applied PADDLE_RETRY_CUDA_SUCCESS here.

4c4d4ba5

J

Added missing format of oneDNN (#29670) · 9eff1a67
由 Jacek Czaja 提交于 12月 17, 2020

9eff1a67

16 12月, 2020 2 次提交

L

[Kunlun] PR1:Support one Kunlun card training in parallel executor (#29337) · f13c3a9c
由 liuyuhui 提交于 12月 16, 2020

f13c3a9c

添加rocm平台支持代码 (#29342) · 76738504

由 Y_Xuan 提交于 12月 16, 2020

* 添加rocm平台支持代码

* 修改一些问题

* 修改一些歧义并添加备注

* 修改代码格式

* 解决冲突后的代码修改

* 修改operators.cmake

* 修改格式

* 修正错误

* 统一接口

* 修改日期

76738504

15 12月, 2020 1 次提交
- A
  
  Add tf32 support for A100 tensor core acceleration for cuBLAS (#28732) · efea540c
  由 AshburnLee 提交于 12月 15, 2020
  
  efea540c
14 12月, 2020 2 次提交
- A
  
  Added verbose oneDNN lib version (#29378) · 62d44836
  由 arlesniak 提交于 12月 14, 2020
  
  62d44836
- J
  
  [oneDNN] Making ThreadID info in caching key optional (#29272) · f6cca625
  由 Jacek Czaja 提交于 12月 14, 2020
  
  f6cca625
11 12月, 2020 1 次提交
- T
  add xpu ops for training transformer in kunlun (#29539) · 760d015c
  由 taixiurong 提交于 12月 11, 2020
```
* 1.fix matmul bug 2. add one hot

* add xpu error msg
```
  760d015c
09 12月, 2020 1 次提交

Fix Unit Test: Add Sleep Time for CUDA Retry (#29442) · a1909aff

由 Huihuang Zheng 提交于 12月 09, 2020

Add Sleep Time for CUDA Retry, which is similar to our GPU retry logic. This is a try to avoid init GPU allocation random failure in unit test.

a1909aff

08 12月, 2020 1 次提交

added internal and external reorders to profiler (#29443) · 57a4f16d

由 jakpiase 提交于 12月 08, 2020

* added external reorder to profiler

* added external and internal reorders to profiler

* added internal and external reorder to profiler

* added formatting to int/ext reorder commit

* removed unnecessary comment

57a4f16d

07 12月, 2020 1 次提交
- J
  
  fix rnn_op bug in cudnn_version>= 8 (#29406) · 1dd7b97b
  由 Jack Zhou 提交于 12月 07, 2020
  
  1dd7b97b
04 12月, 2020 4 次提交

Make transpose, trace, kron, reshape, sum op support complex type (#29321) · 879e913b

由 chentianyu03 提交于 12月 04, 2020

* add complex64 and complex128 type; add +-*/@ and slice opreator for complex types

* add test cases for complex elementwise, matmul and getitem unittest

* add test cases for complex types

* add test cases for complex matmul unittest

* kron, reshape, transpose support complex types

* sum and trace op support complex types

* add test case of sum and trace op

* fix the bug of imag part of complex not initialized

* format file

* format code style

* kron support type promotion; modify test cases

879e913b

卖
fix expand/uniform_random && concat/transpose to new api on xpu (#29280) · 074065e5
由卖鱼的哲学提交于 12月 04, 2020
```
* fix expand && concat/transpose to new api

* update uniform_random_op

* update xpu_header
```
074065e5
L

update, test=develop (#29331) · 1decf4ad
由 lilong12 提交于 12月 04, 2020

1decf4ad

Support type promote for basic math ops (quantum required) (#29265) · 9ad800eb

由 Chen Weihang 提交于 12月 04, 2020

* basic impl of type promote

* add comment & another testcase

* fix complex bugs & support python op promote type

* fix failed unittests & polish code

* add unittest for coverage

* change to only promote complex type

* polish code details

* polish several comments

9ad800eb

01 12月, 2020 2 次提交

update kunlun conv2d/softmax/elementwise implemetation (#29229) · 64f29fbb

由 QingshuChen 提交于 12月 01, 2020

* update conv2d & softmax to new xpu api
* test=kunlun

* remove useless comments
* test=kunlun

* remote softmax xpu op
* test=kunlun

* update kunlun softmax
* test=kunlun

* update xpu unitest
* test=kunlun

* fix elementwise_grad bug for kunlun
*test=kunlun

64f29fbb

add complex64 and complex128 type; add +-*/@ and slice opreator for c… (#29199) · 8f45d142

由 chentianyu03 提交于 12月 01, 2020

* add complex64 and complex128 type; add +-*/@ and slice opreator for complex types

* add test cases for complex elementwise, matmul and getitem unittest

* add test cases for complex types

* add test cases for complex matmul unittest

8f45d142

27 11月, 2020 5 次提交

Support dynamic graph distributed (#28997) · e2d01eb6

由 ShenLiang 提交于 11月 27, 2020

* add reducer

* refine envent for memorycopy

* add concat&split for allreduce

* apply concat & split for fuse tensor

* fix nccl dep

* fix the untest, compile problem and ddp initialize problem

* fix untest for mac & add some comments & solve the repeated param in sublayers

* fix untest for windows & fix document

e2d01eb6

Z

fix CUDA 11 error on windows (#29101) · e668cb07
由 Zhou Wei 提交于 11月 27, 2020

e668cb07
A

Fixes mkldnn dygraph learning rate scheduler crashes (#28988) · bc902044
由 arlesniak 提交于 11月 27, 2020

bc902044

detect tensorRT plugin fp16 in runtime (#27933) · b9e76a01

由 Shang Zhizhou 提交于 11月 27, 2020

* remove -DSUPPORTS_CUDA_FP16 in cuda.cmake

* comile with cuda9

* add some unittest

* notest;test=coverage

* add unittest for trt plugin swish && split

* update ernie unittest

* fix some error message

* remove repeated judgement of CUDA version in mbEltwiseLayerNormOpConverter

* fix comile errror when CUDA_ARCH_NAME < Pascal"

* fix comile error

* update unittest timeout

* compile with cuda9

* update error msg

* fix code style

* add some comments

* add define IF_CUDA_ARCH_SUPPORT_FP16

* rename IF_CUDA_ARCH_SUPPORT_FP16 to CUDA_ARCH_FP16_SUPPORTED

b9e76a01

L

fix typo of flag name (#29154) · fd3fcb05
由 Leo Chen 提交于 11月 27, 2020

fd3fcb05

26 11月, 2020 1 次提交
- A
  
  Polish CUDA Information stdout (#29109) · 7ae3cb55
  由 Aurelius84 提交于 11月 26, 2020
  
  7ae3cb55
25 11月, 2020 2 次提交
- C
  Hide the C++ stack by default and add hints (#29042) · fea0e294
  由 Chen Weihang 提交于 11月 25, 2020
```
* default not show cpp statck & add hint

* fix failed unittest

* fix failed unittests
```
  fea0e294
- W
  remove eigen threadpool for the speed up · b2c8a007
  由 wawltor 提交于 11月 25, 2020
```
remove eigen threadpool for the speed up
```
  b2c8a007
23 11月, 2020 2 次提交
- J
  
  extends oneDNN caching keys so caching objects are unique to executor/predictor (#28758) · bd1d6d3b
  由 Jacek Czaja 提交于 11月 23, 2020
  
  bd1d6d3b
- P
  change avg pooling and global pooling to trt layer in dynamic shape mode (#28702) · 994673bf
  由 Pei Yang 提交于 11月 23, 2020
```
* change avg pooling and global pooling to trt layer

* add support for static shape global pooling

* modify trt errmsg
```
  994673bf
20 11月, 2020 2 次提交

G

Fix gpu memory allocation bug. (#28703) · 1dad8cea
由 gongweibao 提交于 11月 20, 2020

1dad8cea

adjust kunlun header file (#28536) · 30ef3815

由 QingshuChen 提交于 11月 20, 2020

* adjust kunlun header file
*test=kunlun

* update kunlun unittest
*test=kunlun

* update xpu unitest
* test = kunlun

* update xpu unittest
* test=kunlun

* update xpu unitest
* test=kunlun

30ef3815

17 11月, 2020 2 次提交
- J
  
  [oneDNN] Layer norm bf16 kernel (#28619) · 6d8d3d4c
  由 Jacek Czaja 提交于 11月 17, 2020
  
  6d8d3d4c
- L
  
  bug fix, test=develop (#28674) · 80d20246
  由 lilong12 提交于 11月 17, 2020
  
  80d20246
13 11月, 2020 1 次提交
- Z
  
  fix user set CUDA_VISIBLE_DEVICES start/end with quotation marks (#28547) · 849467b5
  由 Zhou Wei 提交于 11月 13, 2020
  
  849467b5
04 11月, 2020 1 次提交
- C
  
  show cpp stack when catch signal (#28415) · 23439b16
  由 Chen Weihang 提交于 11月 04, 2020
  
  23439b16
03 11月, 2020 2 次提交

TensorRT中ernie模型推理性能优化，支持变长输入 (#28367) · ea851796

由 Shang Zhizhou 提交于 11月 03, 2020

* fp16 result ok

* change -DWITH_NVINFER_PLUGIN toconfig.EnableTensorRtOSS

* auto detect special slice op converter for ernie with trt oss

* ernie oss only support fp16

* fix special_slice_plugin serialize bug

* matmul in tensorrt ok

* ernie unittest ok

* add matmul tensorrt unittest

* remove demo code

ea851796

J

[oneDNN] sum op refactor (#28318) · 84cc61b2
由 Jacek Czaja 提交于 11月 03, 2020

84cc61b2

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致