提交 · 2bcec75a10c3e35fb5b4d18f07606184dba28229 · BaiXuePrincess / Paddle

24 4月, 2022 4 次提交

B
fix FlattenContiguousRangeOpConverter out dim error (#42087) · 2bcec75a
由 baoachun 提交于 4月 24, 2022
```
* fix FlattenContiguousRangeOpConverter out dim error

* update code
```
2bcec75a
R

[CustomDevice] add eager mode support (#42034) · ccafd2e5
由 ronnywang 提交于 4月 24, 2022

ccafd2e5

combine graph_table and feature_table in graph_engine (#42134) · 0e0f7da6

由 seemingwang 提交于 4月 24, 2022

* extract sub-graph

* graph-engine merging

* fix

* fix

* fix heter-ps config

* test performance

* test performance

* test performance

* test

* test

* update bfs

* change cmake

* test

* test gpu speed

* gpu_graph_engine optimization

* add dsm sample method

* add graph_neighbor_sample_v2

* Add graph_neighbor_sample_v2

* fix for loop

* add cpu sample interface

* fix kernel judgement

* add ssd layer to graph_engine

* fix allocation

* fix syntax error

* fix syntax error

* fix pscore class

* fix

* change index settings

* recover test

* recover test

* fix spelling

* recover

* fix

* move cudamemcpy after cuda stream sync

* fix linking problem

* remove comment

* add cpu test

* test

* add cpu test

* change comment

* combine feature table and graph table

* test

* test

* pybind

* test

* test

* test

* test

* pybind

* pybind

* fix cmake

* pybind

* fix

* fix

* add pybind

* add pybind
Co-authored-by: NDesmonDay <908660116@qq.com>

0e0f7da6

C
Add paddle::variant and replace paddle::any (#42139) · 79f717d6
由 Chen Weihang 提交于 4月 24, 2022
```
* add variant and replace any

* split attribute
```
79f717d6

23 4月, 2022 3 次提交
- Z
  
  optimize performance of dygraph (#42137) · c56fffb4
  由 zyfncg 提交于 4月 23, 2022
  
  c56fffb4
- A
  [Performance]Set ShapeKernel with ALL_BACKEND and ALL_LAYOUT (#42138) · 79ac8870
  由 Aurelius84 提交于 4月 23, 2022
```
* [Performance]Set ShapeKernel with ALL_BACKEND and ALL_LAYOUT

* [Performance]Set ShapeKernel with ALL_BACKEND and ALL_LAYOUT
```
  79ac8870
- T
  
  update reduce_max for kunlun, *test=kunlun (#42116) · 1587ad07
  由 TTerror 提交于 4月 23, 2022
  
  1587ad07
22 4月, 2022 8 次提交

Add gpudnn yaml config for some OPs (#41773) · 4940a525

由 Ruibiao Chen 提交于 4月 22, 2022

* Add gpudnn yaml config for some OPs

* Add grad gpudnn config

* Fix CI errors

* Fix CI errors

* Fix CI errors

* Fix conflicts

4940a525

Ssd sparse table (#41812) · cca57c4a

由 zhaocaibei123 提交于 4月 22, 2022

* [cherry-pick2.3]fix compile bug of windows cuda11.5 (#41464)

cherry-pick

fix compile bug of windows cuda11.5 #41433

* fix bug of missing boost when compile cache.cc (#41449)

【chery-pick #41430】fix bug of random compile failure, due to incorrect compile order of dependencies

* Fix eager try catch (#41438) (#41477)

[Cherry-Pick]Fix eager try catch (#41438)

* Cherry-pick-PR41407, fix device_id bug for final_state op in multiprocess testcase (#41407) (#41475)

Cherry-pick PR #41407

* [BugFix] Add error hint for one_hot gpu version (#41335) (#41495)

* add one_hot gpu hint

* move allow_out_of_range judgement

* delete useless unittest

* fix bugs of reshape double grad infermeta (#41459) (#41493)

* [cherrypick-2.3] modify infer gpu memory strategy (#41427), remove cudnn_deterministic=True (#41341)  (#41491)
Co-authored-by: NJingZhuangzhuang <75348594+JZZ-NOTE@users.noreply.github.com>

* [Cherry-pick][ROCm] fix dcu error in device event base, test=develop (#41523)

Cherry-pick of #41521

* [Cherry-Pick]Cherry pick PR41200, PR41474, PR41382 (#41509)

* Use `self`as a parameter of _hash_with_id function to avoid error caused by hash_id reuse (#41200)

* Add fill_constant_batch_size YAML and UT (#41474)

* Switch some dy2st UT to eager mode (#41382)

* Sitch some dy2st UT to eager mode

* Fix test_lstm and remove test_transformer

* Run test_resnet_v2 in old dy mode

* Unittest recover (#41431)

* update name

* update name

* fix test

* fix fleet bind

* update name

* update name

* fix test

* fix gpups wrapper

* remove Push/Pull/Load/Save with context in client and wrapper base class

* fix

* fix

* remove some interface

* fix

* remove

* code style

* recover

* fix

* remove code unused

* remove some unused table & accessor & CommonDenseTable => MemoryDenseTable

* fix

* fix

* fix

* recover

* remove unused code

* recover unittest

* fix

* remove

* fix

* remove code unuseful

* remove

* fix

* recover

* remove
Co-authored-by: Nesythan <esythan@126.com>

* add ssd sparse table

* fix

* add cache shuffle

* fix

* fix

* fix

* fix

* fix

* fix

* add unit test

* fix
Co-authored-by: Zhou Wei <1183042833@qq.com>
Co-authored-by: NSing_chan <51314274+betterpig@users.noreply.github.com>
Co-authored-by: N0x45f <23097963+0x45f@users.noreply.github.com>
Co-authored-by: Npangyoki <pangyoki@126.com>
Co-authored-by: NSiming Dai <908660116@qq.com>
Co-authored-by: NYuanRisheng <yuanrisheng@baidu.com>
Co-authored-by: NZhang Jun <ewalker@live.cn>
Co-authored-by: NJingZhuangzhuang <75348594+JZZ-NOTE@users.noreply.github.com>
Co-authored-by: NQi Li <qili93@qq.com>
Co-authored-by: Nesythan <esythan@126.com>

cca57c4a

[WIP] Algorithm Cache of cuBlasLt Epilogue (#41010) · 19650d72

由 Ming-Xu Huang 提交于 4月 22, 2022

* Fix leading dimension setting error in fused_gemm_epilogue_grad_op.

* Add dyload to cuBlasLt functions.

* Added cublasLtMatmulAlgoGetHeuristic to improve performance.

* Added FLAGS_cublaslt_exhaustive_search_times to cublasLt epilogue

* Added UTs to FLAGS_cublaslt_exhaustive_search_times

* Added warmup runs in algo searching of Gemm epilogue.

* Update copyright and documents.

* Fixed error handling.

19650d72

Z

Add Sparse BatchNorm and fix two bugs (#42013) · 8a6456db
由 zhangkaihuo 提交于 4月 22, 2022

8a6456db
W
[Eager] Fix CastPyArg2scalar for max value of int64 (#42098) · 281a5be7
由 Weilong Wu 提交于 4月 22, 2022
```
* [Eager] Fix CastPyArg2Scalar in Long case

* Add more test cases for paddle.clip

* Use PyLong_AsLongLong
```
281a5be7
T

add build pylayer depend pybind (#42099) · e49b7b64
由 tianshuo78520a 提交于 4月 22, 2022

e49b7b64
Z
Dygraph performance optimization (v2) (#42103) · c79d1186
由 zyfncg 提交于 4月 22, 2022
```
* optimiaze performance of PreparePhiData

* dygraph performance optimization
```
c79d1186
J
[Eager] fix memory issue for eager (#42086) · 23d1b3e8
由 Jiabin Yang 提交于 4月 22, 2022
```
* fix memory issue for eager

* fix bug
```
23d1b3e8

21 4月, 2022 16 次提交
- H
  
  fix onnxruntime bug (#42095) · c51f55f9
  由 heliqi 提交于 4月 21, 2022
  
  c51f55f9
- Z
  
  optimiaze performance of PreparePhiData (#42093) · f1704b20
  由 zyfncg 提交于 4月 21, 2022
  
  f1704b20
- Q
  
  [MLU]:add elementwise_div op (#41810) · 5439f07d
  由 qipengh 提交于 4月 21, 2022
  
  5439f07d
- Z
  [XPUPS]add hashtable interface (#41987) · 6becabaa
  由 zmxdream 提交于 4月 21, 2022
```
* add hashtable interface. test=develop

* update. test=develop

* update. test=develop

* fix. test=develop

* fix optimizer config for xpups. test=develop

* fix. test=develop

* fix. test=develop
```
  6becabaa
- A
  [CustomDevice] fix macro (#42073) · ec995c59
  由 Aganlengzi 提交于 4月 21, 2022
```
* [CustomDevice] fix macro

* fix
```
  ec995c59
- L
  WorkQueue supports always_spinning option (#42029) · 9db6c762
  由 liutiexing 提交于 4月 21, 2022
```
* WorkQueue supports always_spinning option

* update

* update
```
  9db6c762
- J
  
  oneDNN md-in-tensor 2nd batch of changes (#41997) · db468d7d
  由 jakpiase 提交于 4月 21, 2022
  
  db468d7d
- S
  Support FP16 argmax/argmin kernel (#42038) · 7003dcaa
  由 sneaxiy 提交于 4月 21, 2022
```
* support int16 argmax kernel

* add fp16 test
```
  7003dcaa
- Z
  
  modify batch_norm and batch_norm_grad. *test=kunlun (#41976) · 9774f965
  由 Zhangjingyu06 提交于 4月 21, 2022
  
  9774f965
- S
  
  block kernel_signature in windows (#42033) · 2f283997
  由 Sing_chan 提交于 4月 21, 2022
  
  2f283997
- Z
  Move pass optimizations into CINN. (#42047) · 83d6e315
  由 Zhen Wang 提交于 4月 20, 2022
```
* Move pass optimizations into CINN.

* Update the commit id of used cinn codes.
```
  83d6e315
- W
  infer add io stream. (#42031) · 0d28ee29
  由 Wilber 提交于 4月 21, 2022
```
* infer add io stream.

* add macro
```
  0d28ee29
- R
  Support cinn_launch op in standalone executor (#42046) · f2f1de7b
  由 Ruibiao Chen 提交于 4月 21, 2022
```
* Support cinn_launch OP in standalone executor

* Remove some redundant code
```
  f2f1de7b
- W
  
  [Eager] Support numpy.narray as input for eager expand (#42043) · 3da8066a
  由 Weilong Wu 提交于 4月 21, 2022
  
  3da8066a
- P
  add _grad_name and _grad_value for eager tensor (#41990) · 1bf2eeab
  由 pangyoki 提交于 4月 21, 2022
```
* add _grad_name and _grad_value for eager tensor

* fix paddle_enforce

* fix paddle_enforce 2

* fix grad_name

* _grad_value return lodtensor rather than tensor

* fix
```
  1bf2eeab
- A
  
  [Eager]Fix SetDeviceId in eager_final_state_api from python_c_gen.py (#42025) · 94ffda57
  由 Aurelius84 提交于 4月 21, 2022
  
  94ffda57
20 4月, 2022 8 次提交

J

fix adaptive pool pass (#42019) · 747ba3f8
由 JingZhuangzhuang 提交于 4月 20, 2022

747ba3f8

【PaddlePaddle Hackathon 2】9、为 Paddle 新增 logspace API (#41261) · a3c50c42

由 BrilliantYuKaimin 提交于 4月 20, 2022

* 增加logspace的算子描述

* 增加logspace的形状推断

* 增加logspace核函数实现

* 在python中增加logspace接口

* 增加logspace单测

* 增加logspace

* Update logspace_kernel.cu

* Update logspace_op.cc

* 调整代码格式

* Update doc of logspace

* Update tensor.py

* Update logspace_op.cc

* Update logspace_kernel.cc

* Update logspace_kernel.cu

* Update test_logspace.py

* 调整 logspace 的位置

* 调整代码格式

a3c50c42

B

update demo_ci ut threshold (#41981) · 65a5492a
由 baoachun 提交于 4月 20, 2022

65a5492a
H

windows compile add onnxruntime switch (#41988) · 0f72c72c
由 heliqi 提交于 4月 20, 2022

0f72c72c

[new-exec] clear the scope listener after run (#41947) · d4cf5666

由 Leo Chen 提交于 4月 20, 2022

* clear the listener after run

* only sync variables in program

* refine code

* fit for lod_tensor_blocking_queue

d4cf5666

F

[MLU] add gather mlu kernel (#41969) · 23ad2166
由 fwenguang 提交于 4月 20, 2022

23ad2166
C
[CustomOp] Fix custom op pinned input error (#41972) · f1711f24
由 Chen Weihang 提交于 4月 20, 2022
```
* fix custom op pinned input error

* fix compile error
```
f1711f24

enable auto-tune when using cinn (#41795) · d70104e5

由 TeFeng Chen 提交于 4月 20, 2022

* optimize preparation overhead before executing cinn compiled program

* update code notes

* fix flag annotation

* enable auto-tune when using CINN

* update cinn commit tag

* skip test

* fix lacking header file

d70104e5

19 4月, 2022 1 次提交
- C
  
  polish tensor api details (#41971) · e5c61b15
  由 Chen Weihang 提交于 4月 19, 2022
  
  e5c61b15

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致