提交 · fccb08199f4e0399cf5d70ea4e6a2b1d18fc444c · BaiXuePrincess / Paddle

26 4月, 2022 7 次提交

Adapt BKCL comm for XPUPS (#42168) · fccb0819

由 Fan Zhang 提交于 4月 26, 2022

* Adapt XPUPS - 1st version - 3.24

* Adapt XPUPS - update XPU PushSparse -  2nd version - 3.24

* Adapt XPUPS - add XPU PullSparseOp - 3nd version - 3.25

* refactor heter comm kernel

* update. test=develop

* Adapt XPUPS - modify by compilation - 4th version - 3.27

* update calc_shard_offset. test=develop

* update xpu kernel. test=develop

* update args of calc_shard_offset

* update. test=develop

* remove customGradMerger

* update. test=develop

* heter_comm update

* heter_comm update

* update calc_shard_offset. test=develop

* heter_comm update

* update args of calc_shard_offset

* update. test=develop

* remove customGradMerger

* update. test=develop

* fix. test=develop

* update. test=develop

* update. test=develop

* update optimizer kernel

* Adapt XPUPS - use WITH_XPU_KP and modify wrapper kernel function - 5th version - 3.30

* update. test=develop

* update pslib.cmake

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* Adapt XPUPS - modify by kp compilation  - 6th version - 3.30

* update. test=develop

* update. test=develop

* update. test=develop

* update optimizer kernel

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* fix. test=develop

* fix. test=develop

* used by minxu

* update heter_comm_inl

* fix. test=develop

* Adapt XPUPS - modify by kp compilation  - 7th version - 3.30

* fix. test=develop

* add optimizer kernel. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* 3.31 update

* Adapt XPUPS - update kp compilation path  - 8th version - 3.31

* add optimizer kernel. test=develop

* fix kunlun not support size_t. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix kunlun not support size_t. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* update heter_comm_kernel.kps 3.31

* fix. test=develop

* fix. test=develop

* update heter_comm_kernel.kps 3.31

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* update heter_comm.h 3.31

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* update hashtable. test=develop

* update. test=develop

* Adapt XPUPS - update by kp compilation  - 9th version - 4.1

* update hashtable. test=develop

* fix. test=develop

* update hashtable 4.1

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* Adapt XPUPS - update by kp compilation  - 10th version - 4.1

* fix. test=develop

* fix. test=develop

* fix. test=develop

* update. test=develop

* modify by compilation 4.1

* update. test=develop

* update. test=develop

* fix. test=develop

* modify by compilation 4.1

* update. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* modify by compilation 4.1

* fix. test=develop

* fix. test=develop

* fix. test=develop

* modify by compilation 4.1 19:30

* fix. test=develop

* update ps_gpu_wrapper.kps 4.1

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* Adapt XPUPS - update by kp compilation  - 11th version - 4.1

* fix. test=develop

* Adapt XPUPS - update by kp compilation  - 12nd version - 4.2

* fix. test=develop

* fix. test=develop

* modify by compilation 4.2

* 4.2 update

* fix. test=develop

* template init. test=develop

* update 4.6

* fix. test=develop

* template init. test=develop

* 4.6 modify by compilation

* hashtable template init. test=develop

* hashtable template init. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=devlop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=devlop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* Adapt XPUPS - update by kp compilation  - 13nd version - 4.7

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* 4.11 update

* fix. test=develop

* fix. test=develop

* 4.11 update

* update by pre-commit

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* 4.12 update

* fix. test=develop

* Adapt XPUPS - update by kp compilation  - 14th version - 4.13

* 4.13 update

* 4.14 update

* 4.14 update

* 4.14 update

* 4.14 modify by merged latest compilation

* retry CI 4.14

* 4.15 pass static check

* 4.15 modify by gpups CI

* 3.16 update by gpups CI - modify ps_gpu_wrapper.h

* 4.16 update

* 4.16 pass xpu compile

* 4.16 retry CI

* 4.16 update

* Adapt XPUPS - adapt BKCL comm for XPUPS - 4.24

* update by compilation

* Adapt XPUPS - register PSGPUTrainer for XPUPS - 4.25

* update device_worker_factory
Co-authored-by: Nzmxdream <zhangminxu01@baidu.com>

fccb0819

S
fix bug: arange can not return shape when enable_static (#42182) · d5b4570d
由 ShiningZhang 提交于 4月 26, 2022
```
* fix bug: arange can not return shape when enable_static

* fix bug: test_arange
```
d5b4570d
Z
Optimize the performanece of sum api (#42231) · 2fe4bf2f
由 zyfncg 提交于 4月 26, 2022
```
* optimize the performanece of sum api

* optimize IsDenseTensorInput

* remove debug log
```
2fe4bf2f

align the API parameter “name” annotation in math.py; test=document_fix (#42200) · 51ea349c

由 David Nicolas 提交于 4月 26, 2022

* align the api name parameter annotation in math.py; test=document_fix

* Update math.py

* Update math.py

* for CI;test=document_fix
Co-authored-by: NChen Long <1300851984@qq.com>

51ea349c

L
fit for printing cinn_launch op (#42141) · ee56906e
由 Leo Chen 提交于 4月 26, 2022
```
* fit for printing cinn_launch op

* update boost::variant caster for bytes
```
ee56906e
Z

Add Sparse MaxPool3D (#42130) · 18e9aafb
由 zhangkaihuo 提交于 4月 26, 2022

18e9aafb

Add C++ EinsumOp which support 2 operands einsum. (#42105) · c7302f96

由 xiongkun 提交于 4月 26, 2022

* full api fix

* when out is None, go old dygraph mode

* by static check

* first version: support 2-inputs forwards. TODO: 1. backward  2. BroadCast  3. MultiVariable

* time out -> 120

c7302f96

25 4月, 2022 9 次提交

N
reimplement ResNeXt based on ResNet (#40588) · ba4e7c7e
由 Nyakku Shigure 提交于 4月 25, 2022
```
* refactor resnext
```
ba4e7c7e
Z

Increase test_export_deploy_model tolerance for broadwell CPU (#42230) · 45572700
由 zlsh80826 提交于 4月 25, 2022

45572700

fix en docs of some Apis (gradients, scope_guard, cuda_places, name_scope,... · 6dd9dd39

由 Yilingyelu 提交于 4月 25, 2022

fix en docs of some Apis (gradients, scope_guard, cuda_places, name_scope, device_guard, load_program_state, scale, ParamAttr and WeightNormParamAttr) (#41604)

* Update scope_guard; test=document_fix

* gradients; test=document_fix

* gradients; test=document_fix

* name_scope; test=document_fix

* cpu_places; test=document_fix

* WeightNormParamAttr; test=document_fix

* cuda_places; test=document_fix

* load_program_state; test=document_fix

* device_guard; test=document_fix

* device_guard; test=document_fix

* ParamAttr; test=document_fix

* scale; test=document_fix

* scale; test=document_fix

* update code example；test=document_fix
Co-authored-by: NChen Long <1300851984@qq.com>

6dd9dd39

F

remove redundant computation in Categorical.probs (#42114) · 9a0bfece
由 Feiyu Chan 提交于 4月 25, 2022

9a0bfece

【PaddlePaddle Hackathon 2】24、为 Paddle 新增 nn.ChannelShuffle 组网 API (#40743) · bbaaf217

由 BrilliantYuKaimin 提交于 4月 25, 2022

* Add infermeta for ChannelShuffle

* Create channel_shuffle_grad_kernel.h

* Create channel_shuffle_kernel.h

* Create channel_shuffle_sig.cc

* Create channel_shuffle_op.cc

ChannelShuffle算子的描述

* Create channel_shuffle_kernel_impl.h

ChannelShuffle核函数的实现

* Create channel_shuffle_grad_kernel_impl.h

ChannelShuffle反向核函数的实现

* Add kernel register of channel shuffle and grad

注册ChannelShuffle及其反向的核函数

* add nn.functional.channel_shuffle

* add nn.ChannelShuffle

* Create test_channel_shuffle.py

* Update example of ChannelShuffle in vision.py

* Update test_channel_shuffle.py

* 修改channel_shuffle核函数的实现位置

* 修正代码格式

* 删除多余空格

* 完善channel_shuffle的错误检查

* Update unary.cc

* Update channel_shuffle_op.cc

* Update test_channel_shuffle.py

* Update unary.cc

* add channel_shuffle

* Update test_channel_shuffle.py

* Update vision.py

* 调整代码格式

* Update channel_shuffle_sig.cc

* 更新ChannelShuffle的文档

* 更新channel_shuffle的文档

* remove ChannelShuffleOpArgumentMapping

* add ChannelShuffleGradInferMeta

* Update channel_shuffle_op.cc

* 调整channel_shuffle及其梯度的核函数的位置

bbaaf217

R
fix recompute (#42128) · f21824d9
由 Roc 提交于 4月 25, 2022
```
* fix recompute

* modify return
```
f21824d9

[Eager] Support div(scalar) in eager mode (#42148) · f4ce8a92

由 Weilong Wu 提交于 4月 25, 2022

* [Eager] Support div scalar in eager mode

* Updated and remove debug logs

* Remove list, use 'or' directly

* Remove useless statement

f4ce8a92

W

[Eager] Remove redundancy code, fix fp16 case (#42169) · 3b8f8b6c
由 Weilong Wu 提交于 4月 25, 2022

3b8f8b6c
W

[Eager] Support numpy.ndarry in CastNumpy2Scalar (#42136) · 4a16d5c6
由 Weilong Wu 提交于 4月 25, 2022

4a16d5c6

24 4月, 2022 3 次提交
- fix python3.10 compile bug on windows (#42140) · 13190707
  由 zhouweiwei2014 提交于 4月 24, 2022
  
  13190707
- P
  disable unittest failed in eager CI in temporary (#42101) · d6b66924
  由 pangyoki 提交于 4月 24, 2022
```
* test=py3-eager

* test=py3-eager

* test=py3-eager
```
  d6b66924
- Z
  
  refine optest logic for bfloat16 (#42151) · 532c3b4c
  由 zhangbo9674 提交于 4月 24, 2022
  
  532c3b4c
23 4月, 2022 2 次提交
- A
  
  [Performance]Remove CudaStreamSychornize in ClipGradByGlobalNorm (#42132) · 6700294c
  由 Aurelius84 提交于 4月 23, 2022
  
  6700294c
- N
  reuse ConvNormActivation in some vision models (#40431) · f6219dda
  由 Nyakku Shigure 提交于 4月 23, 2022
```
* reuse ConvNormActivation in some vision models
```
  f6219dda
22 4月, 2022 10 次提交

Y
Support triple grad check of op in Eager mode (#42131) · 34ac7b74
由 YuanRisheng 提交于 4月 22, 2022
```
* support 3-rd order gradient

* change code format
```
34ac7b74

Add gpudnn yaml config for some OPs (#41773) · 4940a525

由 Ruibiao Chen 提交于 4月 22, 2022

* Add gpudnn yaml config for some OPs

* Add grad gpudnn config

* Fix CI errors

* Fix CI errors

* Fix CI errors

* Fix conflicts

4940a525

Ssd sparse table (#41812) · cca57c4a

由 zhaocaibei123 提交于 4月 22, 2022

* [cherry-pick2.3]fix compile bug of windows cuda11.5 (#41464)

cherry-pick

fix compile bug of windows cuda11.5 #41433

* fix bug of missing boost when compile cache.cc (#41449)

【chery-pick #41430】fix bug of random compile failure, due to incorrect compile order of dependencies

* Fix eager try catch (#41438) (#41477)

[Cherry-Pick]Fix eager try catch (#41438)

* Cherry-pick-PR41407, fix device_id bug for final_state op in multiprocess testcase (#41407) (#41475)

Cherry-pick PR #41407

* [BugFix] Add error hint for one_hot gpu version (#41335) (#41495)

* add one_hot gpu hint

* move allow_out_of_range judgement

* delete useless unittest

* fix bugs of reshape double grad infermeta (#41459) (#41493)

* [cherrypick-2.3] modify infer gpu memory strategy (#41427), remove cudnn_deterministic=True (#41341)  (#41491)
Co-authored-by: NJingZhuangzhuang <75348594+JZZ-NOTE@users.noreply.github.com>

* [Cherry-pick][ROCm] fix dcu error in device event base, test=develop (#41523)

Cherry-pick of #41521

* [Cherry-Pick]Cherry pick PR41200, PR41474, PR41382 (#41509)

* Use `self`as a parameter of _hash_with_id function to avoid error caused by hash_id reuse (#41200)

* Add fill_constant_batch_size YAML and UT (#41474)

* Switch some dy2st UT to eager mode (#41382)

* Sitch some dy2st UT to eager mode

* Fix test_lstm and remove test_transformer

* Run test_resnet_v2 in old dy mode

* Unittest recover (#41431)

* update name

* update name

* fix test

* fix fleet bind

* update name

* update name

* fix test

* fix gpups wrapper

* remove Push/Pull/Load/Save with context in client and wrapper base class

* fix

* fix

* remove some interface

* fix

* remove

* code style

* recover

* fix

* remove code unused

* remove some unused table & accessor & CommonDenseTable => MemoryDenseTable

* fix

* fix

* fix

* recover

* remove unused code

* recover unittest

* fix

* remove

* fix

* remove code unuseful

* remove

* fix

* recover

* remove
Co-authored-by: Nesythan <esythan@126.com>

* add ssd sparse table

* fix

* add cache shuffle

* fix

* fix

* fix

* fix

* fix

* fix

* add unit test

* fix
Co-authored-by: Zhou Wei <1183042833@qq.com>
Co-authored-by: NSing_chan <51314274+betterpig@users.noreply.github.com>
Co-authored-by: N0x45f <23097963+0x45f@users.noreply.github.com>
Co-authored-by: Npangyoki <pangyoki@126.com>
Co-authored-by: NSiming Dai <908660116@qq.com>
Co-authored-by: NYuanRisheng <yuanrisheng@baidu.com>
Co-authored-by: NZhang Jun <ewalker@live.cn>
Co-authored-by: NJingZhuangzhuang <75348594+JZZ-NOTE@users.noreply.github.com>
Co-authored-by: NQi Li <qili93@qq.com>
Co-authored-by: Nesythan <esythan@126.com>

cca57c4a

C
Reduce performance influence by record event in python (#42040) · 4fd190d5
由 chenjian 提交于 4月 22, 2022
```
* optimize performance

* fix

* improve coverage

* fix

* fix
```
4fd190d5

[WIP] Algorithm Cache of cuBlasLt Epilogue (#41010) · 19650d72

由 Ming-Xu Huang 提交于 4月 22, 2022

* Fix leading dimension setting error in fused_gemm_epilogue_grad_op.

* Add dyload to cuBlasLt functions.

* Added cublasLtMatmulAlgoGetHeuristic to improve performance.

* Added FLAGS_cublaslt_exhaustive_search_times to cublasLt epilogue

* Added UTs to FLAGS_cublaslt_exhaustive_search_times

* Added warmup runs in algo searching of Gemm epilogue.

* Update copyright and documents.

* Fixed error handling.

19650d72

C

fix kenrel name apperance (#42071) · 9e3cfdfa
由 chenjian 提交于 4月 22, 2022

9e3cfdfa
Z

Add Sparse BatchNorm and fix two bugs (#42013) · 8a6456db
由 zhangkaihuo 提交于 4月 22, 2022

8a6456db
W
[Eager] Fix CastPyArg2scalar for max value of int64 (#42098) · 281a5be7
由 Weilong Wu 提交于 4月 22, 2022
```
* [Eager] Fix CastPyArg2Scalar in Long case

* Add more test cases for paddle.clip

* Use PyLong_AsLongLong
```
281a5be7
N

Add AutoTune to reader.py for DataLoader (#41202) · f0ec580e
由 niuliling123 提交于 4月 22, 2022

f0ec580e
Y
Support double grad check of op in Eager mode and Add log double grad yaml (#42090) · 1b8fd85d
由 YuanRisheng 提交于 4月 22, 2022
```
* Support double grad check of op in Eager mode

* fix bugs of backward yaml

* adjust code format
```
1b8fd85d

21 4月, 2022 9 次提交
- A
  
  [CustomDevice] fix exit order (#42088) · 79303c2a
  由 Aganlengzi 提交于 4月 21, 2022
  
  79303c2a
- Q
  
  [MLU]:add elementwise_div op (#41810) · 5439f07d
  由 qipengh 提交于 4月 21, 2022
  
  5439f07d
- R
  Fix nms op docs (#41792) · fb87df66
  由 RichardWooSJTU 提交于 4月 21, 2022
```
* fix nms op doc missing default value
```
  fb87df66
- A
  【PaddlePaddle Hackathon 2】23、为 Paddle 新增 Softmax2D 组网API (#40910) · 920d44df
  由 Asthestarsfalll 提交于 4月 21, 2022
```
* Hackathon 23

* fix bug

* fix pylint error

* try

* fix CI-Coverage

* update and add more unittest

* update
```
  920d44df
- J
  
  oneDNN md-in-tensor 2nd batch of changes (#41997) · db468d7d
  由 jakpiase 提交于 4月 21, 2022
  
  db468d7d
- 0
  
  Remove wrong check_variable_and_dtype in matrix_rank (#42062) · 5c738223
  由 0x45f 提交于 4月 21, 2022
  
  5c738223
- S
  Support FP16 argmax/argmin kernel (#42038) · 7003dcaa
  由 sneaxiy 提交于 4月 21, 2022
```
* support int16 argmax kernel

* add fp16 test
```
  7003dcaa
- W
  
  [Eager] Support numpy.narray as input for eager expand (#42043) · 3da8066a
  由 Weilong Wu 提交于 4月 21, 2022
  
  3da8066a
- P
  add _grad_name and _grad_value for eager tensor (#41990) · 1bf2eeab
  由 pangyoki 提交于 4月 21, 2022
```
* add _grad_name and _grad_value for eager tensor

* fix paddle_enforce

* fix paddle_enforce 2

* fix grad_name

* _grad_value return lodtensor rather than tensor

* fix
```
  1bf2eeab

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致