提交 · cca57c4ac4856cde401071edd7e6a5219524270d · 机器未来 / Paddle

22 4月, 2022 8 次提交

由 zhaocaibei123 提交于 4月 22, 2022

* [cherry-pick2.3]fix compile bug of windows cuda11.5 (#41464)

cherry-pick

fix compile bug of windows cuda11.5 #41433

* fix bug of missing boost when compile cache.cc (#41449)

【chery-pick #41430】fix bug of random compile failure, due to incorrect compile order of dependencies

* Fix eager try catch (#41438) (#41477)

[Cherry-Pick]Fix eager try catch (#41438)

* Cherry-pick-PR41407, fix device_id bug for final_state op in multiprocess testcase (#41407) (#41475)

Cherry-pick PR #41407

* [BugFix] Add error hint for one_hot gpu version (#41335) (#41495)

* add one_hot gpu hint

* move allow_out_of_range judgement

* delete useless unittest

* fix bugs of reshape double grad infermeta (#41459) (#41493)

* [cherrypick-2.3] modify infer gpu memory strategy (#41427), remove cudnn_deterministic=True (#41341)  (#41491)
Co-authored-by: NJingZhuangzhuang <75348594+JZZ-NOTE@users.noreply.github.com>

* [Cherry-pick][ROCm] fix dcu error in device event base, test=develop (#41523)

Cherry-pick of #41521

* [Cherry-Pick]Cherry pick PR41200, PR41474, PR41382 (#41509)

* Use `self`as a parameter of _hash_with_id function to avoid error caused by hash_id reuse (#41200)

* Add fill_constant_batch_size YAML and UT (#41474)

* Switch some dy2st UT to eager mode (#41382)

* Sitch some dy2st UT to eager mode

* Fix test_lstm and remove test_transformer

* Run test_resnet_v2 in old dy mode

* Unittest recover (#41431)

* update name

* update name

* fix test

* fix fleet bind

* update name

* update name

* fix test

* fix gpups wrapper

* remove Push/Pull/Load/Save with context in client and wrapper base class

* fix

* fix

* remove some interface

* fix

* remove

* code style

* recover

* fix

* remove code unused

* remove some unused table & accessor & CommonDenseTable => MemoryDenseTable

* fix

* fix

* fix

* recover

* remove unused code

* recover unittest

* fix

* remove

* fix

* remove code unuseful

* remove

* fix

* recover

* remove
Co-authored-by: Nesythan <esythan@126.com>

* add ssd sparse table

* fix

* add cache shuffle

* fix

* fix

* fix

* fix

* fix

* fix

* add unit test

* fix
Co-authored-by: Zhou Wei <1183042833@qq.com>
Co-authored-by: NSing_chan <51314274+betterpig@users.noreply.github.com>
Co-authored-by: N0x45f <23097963+0x45f@users.noreply.github.com>
Co-authored-by: Npangyoki <pangyoki@126.com>
Co-authored-by: NSiming Dai <908660116@qq.com>
Co-authored-by: NYuanRisheng <yuanrisheng@baidu.com>
Co-authored-by: NZhang Jun <ewalker@live.cn>
Co-authored-by: NJingZhuangzhuang <75348594+JZZ-NOTE@users.noreply.github.com>
Co-authored-by: NQi Li <qili93@qq.com>
Co-authored-by: Nesythan <esythan@126.com>

cca57c4a

C
Reduce performance influence by record event in python (#42040) · 4fd190d5
由 chenjian 提交于 4月 22, 2022
```
* optimize performance

* fix

* improve coverage

* fix

* fix
```
4fd190d5

[WIP] Algorithm Cache of cuBlasLt Epilogue (#41010) · 19650d72

由 Ming-Xu Huang 提交于 4月 22, 2022

* Fix leading dimension setting error in fused_gemm_epilogue_grad_op.

* Add dyload to cuBlasLt functions.

* Added cublasLtMatmulAlgoGetHeuristic to improve performance.

* Added FLAGS_cublaslt_exhaustive_search_times to cublasLt epilogue

* Added UTs to FLAGS_cublaslt_exhaustive_search_times

* Added warmup runs in algo searching of Gemm epilogue.

* Update copyright and documents.

* Fixed error handling.

19650d72

C

fix kenrel name apperance (#42071) · 9e3cfdfa
由 chenjian 提交于 4月 22, 2022

9e3cfdfa
Z

Add Sparse BatchNorm and fix two bugs (#42013) · 8a6456db
由 zhangkaihuo 提交于 4月 22, 2022

8a6456db
W
[Eager] Fix CastPyArg2scalar for max value of int64 (#42098) · 281a5be7
由 Weilong Wu 提交于 4月 22, 2022
```
* [Eager] Fix CastPyArg2Scalar in Long case

* Add more test cases for paddle.clip

* Use PyLong_AsLongLong
```
281a5be7
N

Add AutoTune to reader.py for DataLoader (#41202) · f0ec580e
由 niuliling123 提交于 4月 22, 2022

f0ec580e
Y
Support double grad check of op in Eager mode and Add log double grad yaml (#42090) · 1b8fd85d
由 YuanRisheng 提交于 4月 22, 2022
```
* Support double grad check of op in Eager mode

* fix bugs of backward yaml

* adjust code format
```
1b8fd85d

21 4月, 2022 10 次提交
- A
  
  [CustomDevice] fix exit order (#42088) · 79303c2a
  由 Aganlengzi 提交于 4月 21, 2022
  
  79303c2a
- Q
  
  [MLU]:add elementwise_div op (#41810) · 5439f07d
  由 qipengh 提交于 4月 21, 2022
  
  5439f07d
- R
  Fix nms op docs (#41792) · fb87df66
  由 RichardWooSJTU 提交于 4月 21, 2022
```
* fix nms op doc missing default value
```
  fb87df66
- A
  【PaddlePaddle Hackathon 2】23、为 Paddle 新增 Softmax2D 组网API (#40910) · 920d44df
  由 Asthestarsfalll 提交于 4月 21, 2022
```
* Hackathon 23

* fix bug

* fix pylint error

* try

* fix CI-Coverage

* update and add more unittest

* update
```
  920d44df
- J
  
  oneDNN md-in-tensor 2nd batch of changes (#41997) · db468d7d
  由 jakpiase 提交于 4月 21, 2022
  
  db468d7d
- 0
  
  Remove wrong check_variable_and_dtype in matrix_rank (#42062) · 5c738223
  由 0x45f 提交于 4月 21, 2022
  
  5c738223
- S
  Support FP16 argmax/argmin kernel (#42038) · 7003dcaa
  由 sneaxiy 提交于 4月 21, 2022
```
* support int16 argmax kernel

* add fp16 test
```
  7003dcaa
- W
  
  [Eager] Support numpy.narray as input for eager expand (#42043) · 3da8066a
  由 Weilong Wu 提交于 4月 21, 2022
  
  3da8066a
- P
  add _grad_name and _grad_value for eager tensor (#41990) · 1bf2eeab
  由 pangyoki 提交于 4月 21, 2022
```
* add _grad_name and _grad_value for eager tensor

* fix paddle_enforce

* fix paddle_enforce 2

* fix grad_name

* _grad_value return lodtensor rather than tensor

* fix
```
  1bf2eeab
- D
  
  fix api math equation dispaly issue; test=document_fix (#42058) · f5ac9961
  由 David Nicolas 提交于 4月 21, 2022
  
  f5ac9961
20 4月, 2022 6 次提交

W

[Eager] remove useless logic (#42020) · d67abac6
由 Weilong Wu 提交于 4月 20, 2022

d67abac6
L

be compatible with the old version of alltoall (#42007) · c6a084ef
由 lilong12 提交于 4月 20, 2022

c6a084ef

【PaddlePaddle Hackathon 2】9、为 Paddle 新增 logspace API (#41261) · a3c50c42

由 BrilliantYuKaimin 提交于 4月 20, 2022

* 增加logspace的算子描述

* 增加logspace的形状推断

* 增加logspace核函数实现

* 在python中增加logspace接口

* 增加logspace单测

* 增加logspace

* Update logspace_kernel.cu

* Update logspace_op.cc

* 调整代码格式

* Update doc of logspace

* Update tensor.py

* Update logspace_op.cc

* Update logspace_kernel.cc

* Update logspace_kernel.cu

* Update test_logspace.py

* 调整 logspace 的位置

* 调整代码格式

a3c50c42

Fix paddle.t doc en and the annotation display on 4 en docs (#41699) · 885171e3

由 Yilingyelu 提交于 4月 20, 2022

* gradients; test=document_fix

* fix VarType; test=document_fix

* fix vartype; test=document_fix

* cumsum; test=document_fix

* t; test=document_fix

885171e3

F

[MLU] add gather mlu kernel (#41969) · 23ad2166
由 fwenguang 提交于 4月 20, 2022

23ad2166
C
[CustomOp] Fix custom op pinned input error (#41972) · f1711f24
由 Chen Weihang 提交于 4月 20, 2022
```
* fix custom op pinned input error

* fix compile error
```
f1711f24

19 4月, 2022 16 次提交
- C
  
  polish tensor api details (#41971) · e5c61b15
  由 Chen Weihang 提交于 4月 19, 2022
  
  e5c61b15
- W
  double accessor and show_scale (#41943) · 8113c913
  由 wangguanqun 提交于 4月 19, 2022
```
* double accessor and show_scale

* double accessor and show_scale

* rename

* fix bug in pslib config

* add unittest
```
  8113c913
- C
  reduce performance influence by RecordEvent in Python (#41822) · d3f95e5a
  由 chenjian 提交于 4月 19, 2022
```
* reduce performance influence

* add unit test

* fix
```
  d3f95e5a
- J
  OneDNN md-in-tensor refactoring part 1: Added main changes for md-in-tensor (#41303) · c9f4fcf3
  由 jakpiase 提交于 4月 19, 2022
```
* changes for md in tensor

* ci fix

* Temporarily limited dims for test

* ci fix

* removed unnecessary includes

* added reviewers suggestions

* checkouted two files to avoid changing more than 19 in single PR

* minor fix

* reverted one file to reduce files changed to 19
```
  c9f4fcf3
- C
  Rebase for profiler statistic ratio (#41939) · 9b54bf93
  由 chenjian 提交于 4月 19, 2022
```
* fix according to suggestion

* add kernel summary

* improve coverage
```
  9b54bf93
- X
  
  fix StickBreakingTransform forward error when input rank is over 2 (#41940) · 2b55290e
  由 Xiaoxu Chen 提交于 4月 19, 2022
  
  2b55290e
- K
  
  rm distri env (#41961) · 469e3198
  由 kuizhiqing 提交于 4月 19, 2022
  
  469e3198
- J
  [Eager] make fast through to linear (#41945) · 8631d73a
  由 Jiabin Yang 提交于 4月 19, 2022
```
* make fast through to linear

* make fast through to linear

* add to do for later upgrades

* support build once for now
```
  8631d73a
- Z
  
  Implement Amp Layout AutoTune (#41884) · c2bcb141
  由 Zhang Ting 提交于 4月 19, 2022
  
  c2bcb141
- support bmm&bmm_grad for KL2, *test=kunlun (#41935) · 60bec700
  由 z8hanghuan 提交于 4月 19, 2022
  
  60bec700
- G
  fix bug for MultiplicativeDecay (#41850) · 14573629
  由 guguguzi 提交于 4月 19, 2022
```
* fix bug for MultiplicativeDecay

* remove changes to test_lr_scheduler.py
```
  14573629
- A
  [Eager]Fix full_like/clip with np.generic type as attribute (#41808) · 9ac6b7ed
  由 Aurelius84 提交于 4月 19, 2022
```
* [Eager]Fix full_like/clip with np.generic type as attribute

* support numpy genertic

* remove usless code
```
  9ac6b7ed
- Q
  
  [MLU]add op: cumsum, fill_any_like, unsqueeze (#41791) · 6da637e8
  由 qipengh 提交于 4月 19, 2022
  
  6da637e8
- fix pad3d infer shape (#41753) · 8f77f8bc
  由 littletomatodonkey 提交于 4月 19, 2022
```
* fix pad3d infer shape
```
  8f77f8bc
- F
  
  [MLU] support add callback to stream (#41831) · 03533b0c
  由 fwenguang 提交于 4月 19, 2022
  
  03533b0c
- Z
  [AutoParallel] dist reshape op (#41821) · bb71d834
  由 zhaoyingli 提交于 4月 19, 2022
```
* add dist reshape impl_idx=2

* fix cmakelist
```
  bb71d834

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致