提交 · 6cfa59de1b57b7aad84ad87c6256c22bb4c5aed2 · 机器未来 / Paddle

17 12月, 2020 1 次提交
- J
  
  Added missing format of oneDNN (#29670) · 9eff1a67
  由 Jacek Czaja 提交于 12月 17, 2020
  
  9eff1a67
16 12月, 2020 2 次提交

L

[Kunlun] PR1:Support one Kunlun card training in parallel executor (#29337) · f13c3a9c
由 liuyuhui 提交于 12月 16, 2020

f13c3a9c

由 Y_Xuan 提交于 12月 16, 2020

* 添加rocm平台支持代码

* 修改一些问题

* 修改一些歧义并添加备注

* 修改代码格式

* 解决冲突后的代码修改

* 修改operators.cmake

* 修改格式

* 修正错误

* 统一接口

* 修改日期

76738504

15 12月, 2020 1 次提交
- A
  
  Add tf32 support for A100 tensor core acceleration for cuBLAS (#28732) · efea540c
  由 AshburnLee 提交于 12月 15, 2020
  
  efea540c
14 12月, 2020 2 次提交
- A
  
  Added verbose oneDNN lib version (#29378) · 62d44836
  由 arlesniak 提交于 12月 14, 2020
  
  62d44836
- J
  
  [oneDNN] Making ThreadID info in caching key optional (#29272) · f6cca625
  由 Jacek Czaja 提交于 12月 14, 2020
  
  f6cca625
11 12月, 2020 1 次提交
- T
  add xpu ops for training transformer in kunlun (#29539) · 760d015c
  由 taixiurong 提交于 12月 11, 2020
```
* 1.fix matmul bug 2. add one hot

* add xpu error msg
```
  760d015c
09 12月, 2020 1 次提交

Fix Unit Test: Add Sleep Time for CUDA Retry (#29442) · a1909aff

由 Huihuang Zheng 提交于 12月 09, 2020

Add Sleep Time for CUDA Retry, which is similar to our GPU retry logic. This is a try to avoid init GPU allocation random failure in unit test.

a1909aff

08 12月, 2020 1 次提交

added internal and external reorders to profiler (#29443) · 57a4f16d

由 jakpiase 提交于 12月 08, 2020

* added external reorder to profiler

* added external and internal reorders to profiler

* added internal and external reorder to profiler

* added formatting to int/ext reorder commit

* removed unnecessary comment

57a4f16d

07 12月, 2020 1 次提交
- J
  
  fix rnn_op bug in cudnn_version>= 8 (#29406) · 1dd7b97b
  由 Jack Zhou 提交于 12月 07, 2020
  
  1dd7b97b
04 12月, 2020 4 次提交

Make transpose, trace, kron, reshape, sum op support complex type (#29321) · 879e913b

由 chentianyu03 提交于 12月 04, 2020

* add complex64 and complex128 type; add +-*/@ and slice opreator for complex types

* add test cases for complex elementwise, matmul and getitem unittest

* add test cases for complex types

* add test cases for complex matmul unittest

* kron, reshape, transpose support complex types

* sum and trace op support complex types

* add test case of sum and trace op

* fix the bug of imag part of complex not initialized

* format file

* format code style

* kron support type promotion; modify test cases

879e913b

卖
fix expand/uniform_random && concat/transpose to new api on xpu (#29280) · 074065e5
由卖鱼的哲学提交于 12月 04, 2020
```
* fix expand && concat/transpose to new api

* update uniform_random_op

* update xpu_header
```
074065e5
L

update, test=develop (#29331) · 1decf4ad
由 lilong12 提交于 12月 04, 2020

1decf4ad

Support type promote for basic math ops (quantum required) (#29265) · 9ad800eb

由 Chen Weihang 提交于 12月 04, 2020

* basic impl of type promote

* add comment & another testcase

* fix complex bugs & support python op promote type

* fix failed unittests & polish code

* add unittest for coverage

* change to only promote complex type

* polish code details

* polish several comments

9ad800eb

01 12月, 2020 2 次提交

update kunlun conv2d/softmax/elementwise implemetation (#29229) · 64f29fbb

由 QingshuChen 提交于 12月 01, 2020

* update conv2d & softmax to new xpu api
* test=kunlun

* remove useless comments
* test=kunlun

* remote softmax xpu op
* test=kunlun

* update kunlun softmax
* test=kunlun

* update xpu unitest
* test=kunlun

* fix elementwise_grad bug for kunlun
*test=kunlun

64f29fbb

add complex64 and complex128 type; add +-*/@ and slice opreator for c… (#29199) · 8f45d142

由 chentianyu03 提交于 12月 01, 2020

* add complex64 and complex128 type; add +-*/@ and slice opreator for complex types

* add test cases for complex elementwise, matmul and getitem unittest

* add test cases for complex types

* add test cases for complex matmul unittest

8f45d142

27 11月, 2020 5 次提交

Support dynamic graph distributed (#28997) · e2d01eb6

由 ShenLiang 提交于 11月 27, 2020

* add reducer

* refine envent for memorycopy

* add concat&split for allreduce

* apply concat & split for fuse tensor

* fix nccl dep

* fix the untest, compile problem and ddp initialize problem

* fix untest for mac & add some comments & solve the repeated param in sublayers

* fix untest for windows & fix document

e2d01eb6

Z

fix CUDA 11 error on windows (#29101) · e668cb07
由 Zhou Wei 提交于 11月 27, 2020

e668cb07
A

Fixes mkldnn dygraph learning rate scheduler crashes (#28988) · bc902044
由 arlesniak 提交于 11月 27, 2020

bc902044

detect tensorRT plugin fp16 in runtime (#27933) · b9e76a01

由 Shang Zhizhou 提交于 11月 27, 2020

* remove -DSUPPORTS_CUDA_FP16 in cuda.cmake

* comile with cuda9

* add some unittest

* notest;test=coverage

* add unittest for trt plugin swish && split

* update ernie unittest

* fix some error message

* remove repeated judgement of CUDA version in mbEltwiseLayerNormOpConverter

* fix comile errror when CUDA_ARCH_NAME < Pascal"

* fix comile error

* update unittest timeout

* compile with cuda9

* update error msg

* fix code style

* add some comments

* add define IF_CUDA_ARCH_SUPPORT_FP16

* rename IF_CUDA_ARCH_SUPPORT_FP16 to CUDA_ARCH_FP16_SUPPORTED

b9e76a01

L

fix typo of flag name (#29154) · fd3fcb05
由 Leo Chen 提交于 11月 27, 2020

fd3fcb05

26 11月, 2020 1 次提交
- A
  
  Polish CUDA Information stdout (#29109) · 7ae3cb55
  由 Aurelius84 提交于 11月 26, 2020
  
  7ae3cb55
25 11月, 2020 2 次提交
- C
  Hide the C++ stack by default and add hints (#29042) · fea0e294
  由 Chen Weihang 提交于 11月 25, 2020
```
* default not show cpp statck & add hint

* fix failed unittest

* fix failed unittests
```
  fea0e294
- W
  remove eigen threadpool for the speed up · b2c8a007
  由 wawltor 提交于 11月 25, 2020
```
remove eigen threadpool for the speed up
```
  b2c8a007
23 11月, 2020 2 次提交
- J
  
  extends oneDNN caching keys so caching objects are unique to executor/predictor (#28758) · bd1d6d3b
  由 Jacek Czaja 提交于 11月 23, 2020
  
  bd1d6d3b
- P
  change avg pooling and global pooling to trt layer in dynamic shape mode (#28702) · 994673bf
  由 Pei Yang 提交于 11月 23, 2020
```
* change avg pooling and global pooling to trt layer

* add support for static shape global pooling

* modify trt errmsg
```
  994673bf
20 11月, 2020 2 次提交

G

Fix gpu memory allocation bug. (#28703) · 1dad8cea
由 gongweibao 提交于 11月 20, 2020

1dad8cea

adjust kunlun header file (#28536) · 30ef3815

由 QingshuChen 提交于 11月 20, 2020

* adjust kunlun header file
*test=kunlun

* update kunlun unittest
*test=kunlun

* update xpu unitest
* test = kunlun

* update xpu unittest
* test=kunlun

* update xpu unitest
* test=kunlun

30ef3815

17 11月, 2020 2 次提交
- J
  
  [oneDNN] Layer norm bf16 kernel (#28619) · 6d8d3d4c
  由 Jacek Czaja 提交于 11月 17, 2020
  
  6d8d3d4c
- L
  
  bug fix, test=develop (#28674) · 80d20246
  由 lilong12 提交于 11月 17, 2020
  
  80d20246
13 11月, 2020 1 次提交
- Z
  
  fix user set CUDA_VISIBLE_DEVICES start/end with quotation marks (#28547) · 849467b5
  由 Zhou Wei 提交于 11月 13, 2020
  
  849467b5
04 11月, 2020 1 次提交
- C
  
  show cpp stack when catch signal (#28415) · 23439b16
  由 Chen Weihang 提交于 11月 04, 2020
  
  23439b16
03 11月, 2020 4 次提交

TensorRT中ernie模型推理性能优化，支持变长输入 (#28367) · ea851796

由 Shang Zhizhou 提交于 11月 03, 2020

* fp16 result ok

* change -DWITH_NVINFER_PLUGIN toconfig.EnableTensorRtOSS

* auto detect special slice op converter for ernie with trt oss

* ernie oss only support fp16

* fix special_slice_plugin serialize bug

* matmul in tensorrt ok

* ernie unittest ok

* add matmul tensorrt unittest

* remove demo code

ea851796

J

[oneDNN] sum op refactor (#28318) · 84cc61b2
由 Jacek Czaja 提交于 11月 03, 2020

84cc61b2
W

Paddle support compile on sw (#27858) · 09fd2b2a
由 Wilber 提交于 11月 03, 2020

09fd2b2a

Add rnn_op (#28197) · 9a600df3

由 Guo Sheng 提交于 11月 03, 2020

* Add rnn_op.
test=develop

* Fix rnn_op grad maker's drop_empty_grad.
test=develop

9a600df3

02 11月, 2020 2 次提交

W

refine the gpu config for performance optimization (#28291) · 0f4b6247
由 wangchaochaohu 提交于 11月 02, 2020

0f4b6247

Retry CUDA Initialization to Fix Random Failure, test=develop (#28323) · acc11c2a

由 Huihuang Zheng 提交于 11月 02, 2020

This PR is follow up of #28213. On that PR we tried to decrease GPU usage, however the CI still randomly failed. So I added retry logic for the initialization of nccl and cusolver. If the initialization failed, we can retry to avoid the random failure.

acc11c2a

30 10月, 2020 1 次提交
- L
  
  hide some logs of p2p (#28307) · 18c86fb2
  由 Leo Chen 提交于 10月 30, 2020
  
  18c86fb2
28 10月, 2020 1 次提交
- J
  
  [oneDNN ] conv2d fwd&bwd optimization (#27871) · c11d9b30
  由 Jacek Czaja 提交于 10月 28, 2020
  
  c11d9b30

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致