提交 · 5d604a6b3050343efe5b62149ebcb06354e2b051 · BaiXuePrincess / Paddle

27 1月, 2021 1 次提交
- W
  - Disabling oneDNN inplace pass (#30588) (#30710) · 5d604a6b
  由 Wojciech Uss 提交于 1月 27, 2021
```
Co-authored-by: NJacek Czaja <jacek.czaja@intel.com>
```
  5d604a6b
20 1月, 2021 3 次提交
- A
  [cherry-pick]Add tf32 support for A100 tensor core acceleration for cuBLAS (#28732) (#30612) · fd9d6fda
  由 AshburnLee 提交于 1月 20, 2021
```
* Add tf32 support for A100 tensor core acceleration for cuBLAS (#28732)

* Fixed an error

* Fixed an error
```
  fd9d6fda
- A
  Add tf32 switch for cuDNN (#29192) (#30574) · 138a71b7
  由 AshburnLee 提交于 1月 20, 2021
```
This PR is cherry-picked from PR: #29192
Function: Added TF32 switch for cuDNN. Turned on as default, turned off when users set the switch as False
```
  138a71b7
- W
  
  fix compile error on sw and mips (#30584) · 619869bd
  由 Wilber 提交于 1月 20, 2021
  
  619869bd
19 1月, 2021 1 次提交
- L
  
  [Kunlun]PR3: add xpu executor, multi xpu card train function optimization (#30317) (#30535) · 420fdbb2
  由 liuyuhui 提交于 1月 19, 2021
  
  420fdbb2
14 1月, 2021 1 次提交

optimize memcpy perf for kunlun (#30291) (#30382) · 9de42be2

由 QingshuChen 提交于 1月 14, 2021

* optimize memcpy perf for kunlun (#30291)

* optimize memcpy perf for kunlun

* remove useless unitest for kunlun mean

* minor

* fix bug that cann't find mkldnn(kunlun) (#30394)

9de42be2

13 1月, 2021 1 次提交
- C
  [Cherry-pick] Remove c++ stacktrace open hint #30341 · 428c884f
  由 Chen Weihang 提交于 1月 13, 2021
```
[Cherry-pick] Remove c++ stacktrace open hint，cherry-pick of #30325
```
  428c884f
11 1月, 2021 1 次提交

[cherry pick] Add detailed error message for curandStatus_t, cublasStatus_t,... · 04cc659c

由 WeiXin 提交于 1月 11, 2021

[cherry pick] Add detailed error message for curandStatus_t, cublasStatus_t, cusolverStatus_t (#30161) (#30280)

为curandStatus_t、cublasStatus_t、cusolverStatus_t添加详细的报错信息。
原始PR：#30161

04cc659c

29 12月, 2020 5 次提交

[Kunlun] 2.0 cherry-pick:Support for Baidu Kunlun XPU multi card training (#29713) · 847aa172

由 liuyuhui 提交于 12月 29, 2020

* [Kunlun] PR1:Support one Kunlun card training in parallel executor (#29337)

* [Kunlun] PR2: Support MultiDevicePass and BKCL in parallel executor (#29574)

* [Kunlun] bug fix of PR2: Support MultiDevicePass and BKCL in parallel executor  (#29926)

* add bkcl.so in whl for kunlun (#29947)

* [Kunlun] bug fix of PR2: Support MultiDevicePass and BKCL in parallel executor  (#29961)
Co-authored-by: NQingshuChen <qingshu.chen714@gmail.com>

847aa172

[Cherry-pick] Complex network execute support (#29905) · 91ebc460

由 Chen Weihang 提交于 12月 29, 2020

* [Complex] Add support for complex grad accumulated (#29889)

* add support for complex grad accumulated

* add unittest for coverage

* update test dtype

* remove useless blank line

* [Complex] Handle complex to real after type promotion (#29855)

* try to add fwd op input dtypes

* refactor base impl

* return tmp_ins after dygraph prepare data

* fix typo found in debug

* polish comment & add complex net test

* revert detail change

* fix unittest failed

* add complex kernel condition control

* fix xpu test failed & polish comment

* polish details by review comments

* Complex op test (#29753)

* delete no need to calculate inputs in dygraph op_test

* delete no need to calculate inputs in dygraph op_test

* change grad elementwise_mul for complex types (#29757)

* add conj op for complex types

* add conj for complex types

* add more test case

* add conj_op test

* modify conj api and impl

* add complex type for fill_constant_op xpu

* add setConstant for complex type

* remove complex conj test file

* user define grad for test_conj_op

* add test case for static mode of conj api

* modify conj doc

* change input args name to x

* remove useless codes

* conj support real types

* add conj test case for real number

* delete no need to calculate inputs in dygraph op_test

* delete no need to calculate inputs in dygraph op_test

* modify grad of mul for complex types

* fix the grads of inputs args order not match bug

* change the grad of div when complex types (#29804)

* change the grad of div when complex types

* fix the grads of inputs args order not match bug
Co-authored-by: Nchentianyu03 <chentianyu03@baidu.com>

91ebc460

石

[cherry-pick] #26920 , #22924 (#29948) · bea300dd
由石晓伟提交于 12月 29, 2020

bea300dd
W

Support mips (#29943) · 5a8d43bb
由 Wilber 提交于 12月 29, 2020

5a8d43bb
W
[Inference] FLAGS_call_statck is turned on default when ON_INFER=ON (#29800) · fae406ae
由 Wilber 提交于 12月 29, 2020
```
* [Inference] FLAGS_call_statck is turned on default when ON_INFER=ON

* cherry-pick 29828
```
fae406ae

28 12月, 2020 1 次提交

[Cherry-pick] Cherry-pick of PR#29579 and PR#29617 (#29904) · 63939597

由 Huihuang Zheng 提交于 12月 28, 2020

* [Dy2stat] Enable jit.save to Save Without Running (#29579)

Enable jit.save to Save Without Running.

* Modify CublasHandleHolder to Fix Random Unittest Failure. test=develop (#29617)

Modify CublasHandleHolder from using PADDLE_ENFORCE_CUDA_SUCCESS to PADDLE_RETRY_CUDA_SUCCESS to fix random unittest failure. We checked that the unittest log showed CUDA allocation error at this file, which may due to GPU not enough. We fixed similar failure in the past, so we applied PADDLE_RETRY_CUDA_SUCCESS here.

63939597

21 12月, 2020 1 次提交
- J
  
  [oneDNN] Making ThreadID info in caching key optional (#29272) (#29598) · 2352a8af
  由 Jacek Czaja 提交于 12月 21, 2020
  
  2352a8af
17 12月, 2020 1 次提交

[bug fix] Added verbose oneDNN lib version (#29671) · ef04d3d3

由 arlesniak 提交于 12月 17, 2020

 fix #27935 (comment) by QA @OliverLPH (Could you add some MKLDNN-related print log when use FLAGS_use_mkldnn?)

ef04d3d3

15 12月, 2020 1 次提交

cherry-pick kunlun PR: 29458, 29539 (#29583) · 03ddf690

由 QingshuChen 提交于 12月 15, 2020

* support mobilenet for kunlun (#29458)

* add xpu ops for training transformer in kunlun (#29539)

* 1.fix matmul bug 2. add one hot

* add xpu error msg
Co-authored-by: Nprocr <procrboo@gmail.com>
Co-authored-by: Ntaixiurong <taixiurong@126.com>

03ddf690

08 12月, 2020 1 次提交

[2.0 rc1/cherrypick] cherry-pick kunlun PR:29234/29229/29293/29367/29280/29448 (#29466) · 6bfc5721

由 liuyuhui 提交于 12月 08, 2020

* add deformable_conv op on xpu (#29234)

* rebase develop

* update deformable_conv op on xpu

* update deformable_conv op on xpu

* update kunlun conv2d/softmax/elementwise implemetation (#29229)

* update conv2d & softmax to new xpu api
* test=kunlun

* remove useless comments
* test=kunlun

* remote softmax xpu op
* test=kunlun

* update kunlun softmax
* test=kunlun

* update xpu unitest
* test=kunlun

* fix elementwise_grad bug for kunlun
*test=kunlun

* support global pooling for kunlun (#29293)

* test=kunlun

* update reduce_sum op on xpu (#29367)

* update reduce_sum op on xpu

* update reduce_sum op on xpu

* support running on xpu

* fix expand/uniform_random && concat/transpose to new api on xpu (#29280)

* fix expand && concat/transpose to new api

* update uniform_random_op

* update xpu_header

* 1. fix elementwise ops'bug 2. fix softmax_with_cross_entropy_op 3. add biliner_interp_op (#29448)
Co-authored-by: Nroot <root@bjhw-sys-rpm0223.bjhw.baidu.com>
Co-authored-by: N卖鱼的哲学 <tangzhiyi11@users.noreply.github.com>
Co-authored-by: NQingshuChen <qingshu.chen714@gmail.com>
Co-authored-by: Ntaixiurong <taixiurong@126.com>
Co-authored-by: Nroot <root@bjhw-sys-rpm0223.bjhw.baidu.com>

6bfc5721

05 12月, 2020 1 次提交

Release/2.0 rc1 (#29388) · fbb6cd70

由 chentianyu03 提交于 12月 05, 2020

* fix random failed of complex matmul

* Make transpose, trace, kron, reshape, sum op support complex type (#29321)

* add complex64 and complex128 type; add +-*/@ and slice opreator for complex types

* add test cases for complex elementwise, matmul and getitem unittest

* add test cases for complex types

* add test cases for complex matmul unittest

* kron, reshape, transpose support complex types

* sum and trace op support complex types

* add test case of sum and trace op

* fix the bug of imag part of complex not initialized

* format file

* format code style

* kron support type promotion; modify test cases

fbb6cd70

04 12月, 2020 2 次提交

L

update, test=develop (#29331) (#29370) · 11980774
由 lilong12 提交于 12月 04, 2020

11980774

Support type promote for basic math ops (quantum required) (#29265) (#29354) · 0e7539e7

由 Chen Weihang 提交于 12月 04, 2020

* basic impl of type promote

* add comment & another testcase

* fix complex bugs & support python op promote type

* fix failed unittests & polish code

* add unittest for coverage

* change to only promote complex type

* polish code details

* polish several comments

0e7539e7

01 12月, 2020 1 次提交

add complex64 and complex128 type; add +-*/@ and slice opreator for c… (#29199) · 8f45d142

由 chentianyu03 提交于 12月 01, 2020

* add complex64 and complex128 type; add +-*/@ and slice opreator for complex types

* add test cases for complex elementwise, matmul and getitem unittest

* add test cases for complex types

* add test cases for complex matmul unittest

8f45d142

27 11月, 2020 5 次提交

Support dynamic graph distributed (#28997) · e2d01eb6

由 ShenLiang 提交于 11月 27, 2020

* add reducer

* refine envent for memorycopy

* add concat&split for allreduce

* apply concat & split for fuse tensor

* fix nccl dep

* fix the untest, compile problem and ddp initialize problem

* fix untest for mac & add some comments & solve the repeated param in sublayers

* fix untest for windows & fix document

e2d01eb6

Z

fix CUDA 11 error on windows (#29101) · e668cb07
由 Zhou Wei 提交于 11月 27, 2020

e668cb07
A

Fixes mkldnn dygraph learning rate scheduler crashes (#28988) · bc902044
由 arlesniak 提交于 11月 27, 2020

bc902044

detect tensorRT plugin fp16 in runtime (#27933) · b9e76a01

由 Shang Zhizhou 提交于 11月 27, 2020

* remove -DSUPPORTS_CUDA_FP16 in cuda.cmake

* comile with cuda9

* add some unittest

* notest;test=coverage

* add unittest for trt plugin swish && split

* update ernie unittest

* fix some error message

* remove repeated judgement of CUDA version in mbEltwiseLayerNormOpConverter

* fix comile errror when CUDA_ARCH_NAME < Pascal"

* fix comile error

* update unittest timeout

* compile with cuda9

* update error msg

* fix code style

* add some comments

* add define IF_CUDA_ARCH_SUPPORT_FP16

* rename IF_CUDA_ARCH_SUPPORT_FP16 to CUDA_ARCH_FP16_SUPPORTED

b9e76a01

L

fix typo of flag name (#29154) · fd3fcb05
由 Leo Chen 提交于 11月 27, 2020

fd3fcb05

26 11月, 2020 1 次提交
- A
  
  Polish CUDA Information stdout (#29109) · 7ae3cb55
  由 Aurelius84 提交于 11月 26, 2020
  
  7ae3cb55
25 11月, 2020 2 次提交
- C
  Hide the C++ stack by default and add hints (#29042) · fea0e294
  由 Chen Weihang 提交于 11月 25, 2020
```
* default not show cpp statck & add hint

* fix failed unittest

* fix failed unittests
```
  fea0e294
- W
  remove eigen threadpool for the speed up · b2c8a007
  由 wawltor 提交于 11月 25, 2020
```
remove eigen threadpool for the speed up
```
  b2c8a007
23 11月, 2020 2 次提交
- J
  
  extends oneDNN caching keys so caching objects are unique to executor/predictor (#28758) · bd1d6d3b
  由 Jacek Czaja 提交于 11月 23, 2020
  
  bd1d6d3b
- P
  change avg pooling and global pooling to trt layer in dynamic shape mode (#28702) · 994673bf
  由 Pei Yang 提交于 11月 23, 2020
```
* change avg pooling and global pooling to trt layer

* add support for static shape global pooling

* modify trt errmsg
```
  994673bf
20 11月, 2020 2 次提交

G

Fix gpu memory allocation bug. (#28703) · 1dad8cea
由 gongweibao 提交于 11月 20, 2020

1dad8cea

adjust kunlun header file (#28536) · 30ef3815

由 QingshuChen 提交于 11月 20, 2020

* adjust kunlun header file
*test=kunlun

* update kunlun unittest
*test=kunlun

* update xpu unitest
* test = kunlun

* update xpu unittest
* test=kunlun

* update xpu unitest
* test=kunlun

30ef3815

17 11月, 2020 2 次提交
- J
  
  [oneDNN] Layer norm bf16 kernel (#28619) · 6d8d3d4c
  由 Jacek Czaja 提交于 11月 17, 2020
  
  6d8d3d4c
- L
  
  bug fix, test=develop (#28674) · 80d20246
  由 lilong12 提交于 11月 17, 2020
  
  80d20246
13 11月, 2020 1 次提交
- Z
  
  fix user set CUDA_VISIBLE_DEVICES start/end with quotation marks (#28547) · 849467b5
  由 Zhou Wei 提交于 11月 13, 2020
  
  849467b5
04 11月, 2020 1 次提交
- C
  
  show cpp stack when catch signal (#28415) · 23439b16
  由 Chen Weihang 提交于 11月 04, 2020
  
  23439b16
03 11月, 2020 2 次提交

TensorRT中ernie模型推理性能优化，支持变长输入 (#28367) · ea851796

由 Shang Zhizhou 提交于 11月 03, 2020

* fp16 result ok

* change -DWITH_NVINFER_PLUGIN toconfig.EnableTensorRtOSS

* auto detect special slice op converter for ernie with trt oss

* ernie oss only support fp16

* fix special_slice_plugin serialize bug

* matmul in tensorrt ok

* ernie unittest ok

* add matmul tensorrt unittest

* remove demo code

ea851796

J

[oneDNN] sum op refactor (#28318) · 84cc61b2
由 Jacek Czaja 提交于 11月 03, 2020

84cc61b2

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致