提交 · 7e940b844aaf5302a742d7bbf68775c44e952927 · PaddlePaddle / Paddle

15 6月, 2022 1 次提交
- Z
  [cherry-pick] Fix bug of strided_slice and slice (#43388, #43443) (#43432) · 7e940b84
  由 zyfncg 提交于 6月 15, 2022
```
* fix bug of strided_slice (#43388)

* fix stride_slice bug

* fix bug

* fix bug of infer shape for slice (#43443)
```
  7e940b84
14 6月, 2022 1 次提交

[ CherryPick ] Cherry pick for einsum optimization. (#43468) · 22e75d92

由 xiongkun 提交于 6月 14, 2022

* [EinsumOp] Polish forward logic and backward logic for optimize (#42603)

* change logic for optimize

* modifty

* merge

* change einsum_v2 as default and add new flags: FLAG_einsum_opt=1|0 (#43010)

* [EinsumOp] Make EinsumOp support bfloat16. (#43085)

* change einsum_v2 as default and add new flags: FLAG_einsum_opt=1|0

* make EInsumOP support bf16

* add unittest for BF16

* add condition for test_BF16

* fix bugs

* fix

* change the backward api to fit einsum op

22e75d92

08 6月, 2022 1 次提交

Replace ReduceAmax/Amax.part.cu with KP (#43202) (#43263) · e161979e

由 niuliling123 提交于 6月 08, 2022

Reduce amax/amin frobenius_norm_kerne原始实现为Eigen实现，文件编译时间较长，因此本PR将其替换为KP实现
删除DefaultElementwiseOperator中重复功能支持，减少elementwise_double_grad OP编译时间

e161979e

07 6月, 2022 1 次提交
- N
  [cherry-pick]Delete ElementwiseKernel in BroadcastKernel (#42779) (#43210) · 52ef8656
  由 niuliling123 提交于 6月 07, 2022
```
Delete ElementwiseKernel in BroadcastKernel
减少所有Broadcast中重复功能调用，同时减少编译时间和问题体积
```
  52ef8656
06 6月, 2022 1 次提交

cherry-pick 42645 (#43205) · 835a1888

由 niuliling123 提交于 6月 06, 2022

删除Broadcast function中rank例化以及Elementwise调用，降低编译时间。
从develop分支中的#42645 PR修改而来，由于develop分支与release分支相差较大，无法实现cherry-pick，因此针对release2.3重新提交PR.
Broadcast中关于rank的例化会导致底层模板展开较多，造成reduce_sum_grad_kernel.cu.o文件体积过大，修改后可以降低.o体积及编译时间

835a1888

10 5月, 2022 1 次提交

[cherry-pick][MLU] support add callback to stream and profiler (#42115) · 25124d7f

由 fwenguang 提交于 5月 10, 2022

* [MLU] add mlu new profiler (#41138)

* [MLU] add mlu new profiler

* fix format

* [MLU] support add callback to stream (#41831)

* [MLU] add gather mlu kernel (#41969)

* [MLU] add mlu activation kernels (#41751)

25124d7f

06 5月, 2022 1 次提交

Fix the race condition in cumsum operator (#42205) (#42500) · 58f40144

由 wawltor 提交于 5月 06, 2022

* Fix the race condition in cumsum operator

* Optimize cumsum operator
Co-authored-by: NLeo Chen <39020268+leo0519@users.noreply.github.com>

58f40144

05 5月, 2022 1 次提交
- X
  
  fix bugs (#42495) · 590b4dbc
  由 xiongkun 提交于 5月 05, 2022
  
  590b4dbc
04 5月, 2022 2 次提交
- X
  [cherry-pick 2.3] fix bug of batch_norm_grad kernel with fp16 (#42461) · a5745864
  由 XiaoguangHu 提交于 5月 04, 2022
```
* fix bug of batch_norm_grad kernel with fp16

* format code
```
  a5745864
- X
  
  fix bug when compiling with cusparse in CUDA version >=11.4 (#42456) · b57c132a
  由 XiaoguangHu 提交于 5月 04, 2022
  
  b57c132a
01 5月, 2022 1 次提交
- C
  
  remove useless lod copy (#42425) · 778ec77b
  由 Chen Weihang 提交于 5月 01, 2022
  
  778ec77b
30 4月, 2022 2 次提交

Make einsum_v2 support multi-operands (#42327) (#42397) · 34352fcd

由 xiongkun 提交于 4月 30, 2022

* Extend python einsum interface to make einsum_v2 support multi-operands and switch it to default.

* add opt_einsum dependence

* add yaml and support eager model

* fix by code review

34352fcd

R2.3/fix pad3d infer shape (#42414) · 2dce1e88

由 littletomatodonkey 提交于 4月 30, 2022

* fix pad3d infer shape

* fix pad3d

* fix pad default value

* fix order

* add unit test

* fix unittest for ci coverage

* add ndhwc check

2dce1e88

28 4月, 2022 5 次提交

Optimize attribute selected performence (#42294) (#42368) · e0e534ab

由 Chen Weihang 提交于 4月 28, 2022

* opt attr eaque perf

* opt attr select code

* fix one hot infermeta

* polish get attr impl

* fix tests failed

* add testcases

e0e534ab

Add C++ EinsumOp which support 2 operands einsum. (#42105) (#42357) · d04a68d3

由 xiongkun 提交于 4月 28, 2022

* full api fix

* when out is None, go old dygraph mode

* by static check

* first version: support 2-inputs forwards. TODO: 1. backward  2. BroadCast  3. MultiVariable

* time out -> 120

d04a68d3

F
set device id of Place() to get GPUContext needed by LimitGridDim in... · 0fe0aea9
由 FlyingQianMM 提交于 4月 28, 2022
```
set device id of Place() to get GPUContext needed by LimitGridDim in ElemwiseGradBroadcast (PaddlePaddle#42320) (#42332)
```
0fe0aea9

[cherry-pick] Optimize performance of dygraph (#42196) (#42329) · 2ea56c90

由 zyfncg 提交于 4月 28, 2022

* Optimize performance of dygraph (v4)  (#42196)

* optimize performance of dygraph

* optimize performance of dygraph and elementwise_add

* optimize the trace op

* fix bug

* fix bug

* fix unittest bug

* fix code format

* fix cherry-pick problem

2ea56c90

[cherry-pick] Optimize performance of dygraph (#42231, #42253) (#42309) · 69a92b7b

由 zyfncg 提交于 4月 28, 2022

* Optimize the performanece of sum api (#42231)

* optimize the performanece of sum api

* optimize IsDenseTensorInput

* remove debug log

* Add move construct for KernelSignature (#42253)

* add move construct for KernelSignature

* add noexcept

* fix cherry-pick problem

69a92b7b

27 4月, 2022 3 次提交

[Cherry-pick] Optimize dygraph performance part4 (#42306) · 9bc423b1

由 Chen Weihang 提交于 4月 27, 2022

* Remove std::type_index in AttributeArdDef (#42122)

* polish some impl

* add lost attr type

* polish details

* fix error type

* polish in name lists

* add double attr

* adapt infrt attr parse

* add attr type test (#42263)

* opt attr eaque perf (#42272)

9bc423b1

J
[Eager] fix memory issue for eager (#42086) (#42118) · 8964fea9
由 Jiabin Yang 提交于 4月 27, 2022
```
* fix memory issue for eager

* fix bug
```
8964fea9

[Cherry-pick2.3] Optimize dygraph performance part3 (#42256) · 9495708a

由 Chen Weihang 提交于 4月 27, 2022

* Change small vector size (#42202)

* change samll vector size

* Update type_defs.h

* Optimize dygraph InferShape perf (#42155)

* init commit

* remove two hash impl

* fix bug

* polish details

* fix compile failed

* fix compile failed

* fix compile failed

* add default kernel sig cache

* fix get kernel arg defs error

* remove kernel arg defs cache

* fix origin op execute

9495708a

26 4月, 2022 1 次提交

[Cherry-pick] Optimize dygraph performance part2 (#42224) · ab24b9c0

由 Chen Weihang 提交于 4月 26, 2022

* Add paddle::variant and replace paddle::any (#42139)

* add variant and replace any

* split attribute

* Optimize dygraph GetExpectedKernelType perf (#42154)

* opt dygraph scheduling

* revert part impl

* fix variant compile error (#42203)

* replace any by variant in infermeta (#42181)

ab24b9c0

25 4月, 2022 2 次提交

[cherry-pick] Optimize performance of dygraph (#42093, #42103, #42137) (#42171) · 0d537003

由 zyfncg 提交于 4月 25, 2022

* optimiaze performance of PreparePhiData (#42093)

* Dygraph performance optimization (v2) (#42103)

* optimiaze performance of PreparePhiData

* dygraph performance optimization

* optimize performance of dygraph (#42137)

0d537003

[Cherry-Pick][Performance]Remove CudaStreamSychornize in ClipGradByGlobalNorm... · 58d0d15e

由 Aurelius84 提交于 4月 25, 2022

[Cherry-Pick][Performance]Remove CudaStreamSychornize in ClipGradByGlobalNorm and fix shape op (#42170)

* [Performance]Set ShapeKernel with ALL_BACKEND and ALL_LAYOUT (#42138)

* [Performance]Set ShapeKernel with ALL_BACKEND and ALL_LAYOUT

* [Performance]Set ShapeKernel with ALL_BACKEND and ALL_LAYOUT

* [Performance]Remove CudaStreamSychornize in ClipGradByGlobalNorm (#42132)

58d0d15e

22 4月, 2022 1 次提交
- J
  
  Add UT (#42055) · 4f6aba87
  由 Jacek Czaja 提交于 4月 22, 2022
  
  4f6aba87
21 4月, 2022 4 次提交

[cherry-pick] Adjust the Phi C++ API and yaml (#41576, #41778, #41909) (#41928) · d24a402e

由 zyfncg 提交于 4月 21, 2022

* [PHI] Support some c++ api in paddle namespace (#41778)

* support some c++ api in paddle namespace

* change c++ api namespace in custom op

* [Phi] Support setting size of vector<Tensor> for out in yaml (#41576)

* support setting vector out size in yaml

* support setting size of vector<tensor> for out in yaml

* add data transform config for shape and size (#41909)

* fix api_gen bug

d24a402e

[Cherry-pick] Optimize dygraph scheduling performance (#42010) · ec1d2a16

由 Chen Weihang 提交于 4月 21, 2022

* [Phi] Support setting size of vector<Tensor> for out in yaml (#41576)

* support setting vector out size in yaml

* support setting size of vector<tensor> for out in yaml

* resolve conflict
Co-authored-by: Nzyfncg <zhangyunfei07@baidu.com>

ec1d2a16

J
[Cherry-pick] Enabled test_imperative_star_gan_with_gradient_penalty.py under eager mode (#41994) · af7439ad
由 Jiabin Yang 提交于 4月 21, 2022
```
* cherry-pick python/paddle/utils/code_gen/backward.yaml

* remove unsupported yaml
Co-authored-by: NZhanlue Yang <jim19930609@gmail.com>
```
af7439ad

[Cherry-pick] Polish custom op details (#42008) · f637e3d2

由 Chen Weihang 提交于 4月 21, 2022

* polish tensor api details (#41971)

* [CustomOp] Fix custom op pinned input error (#41972)

* fix custom op pinned input error

* fix compile error

* fix inference custom op (#41999)

* resolve conflict

f637e3d2

20 4月, 2022 2 次提交

[Phi] Support construct Scalar by using Non-CPU Tensor (#41765) (#41963) · 3b25afb2

由 YuanRisheng 提交于 4月 20, 2022

* support construct scalar using non-cpu tensor

* fix bugs when run unittest

* fix compile bugs

* fix bugs when run ci

* fix compile bugs

* fix bugs when move copy

* perfect unit test

* perfect unittest

* update according to comment

* add target dependency

* deal with conflict

* fix bugs when run unit test

* fix unit test bugs

3b25afb2

[Cherry-Pick]Fix expand_sig infershape BUG under static graph mode and... · 93f0e594

由 Aurelius84 提交于 4月 20, 2022

[Cherry-Pick]Fix expand_sig infershape BUG under static graph mode and NeedTransformPlace behavior if set skip_transform in yaml (#41973)

* [Phi]Fix expand_sig infershape BUG under static graph mode (#41936)

* [Phi]Fix expand_sig infershape BUG under static graph mode

* [Phi]Fix expand_sig infershape BUG under static graph mode

* [Phi]Fix unittest

* [Phi]Fix unittest

* [Eager]Fix NeedTransformPlace behavior if set skip_transform in yaml (#41920)

* [Eager]Fix NeedTransformPlace behavior if set skip_transform in yaml

* add unittest for full_like

* fix unittest

93f0e594

19 4月, 2022 6 次提交

[cherry-pick] add rsqrt, equal_all, expand yaml and unittest (#41443, #41540) (#41965) · 018245d8

由 zyfncg 提交于 4月 19, 2022

* add rsqrt yaml and unittest (#41443)

* Add expand equal all yaml (#41540)

* add expand, poisson

* add poison grad

* add expand equal_all poisson triangular solve yaml
Co-authored-by: Nhong <43953930+phlrain@users.noreply.github.com>

018245d8

[Eager] Fix numpy interface for constructing empty tensor (#41904) (#41954) · 551e9140

由 Weilong Wu 提交于 4月 19, 2022

* [Eager] Fix numpy interface for constructing empty tensor

* Fix CI, construct empty tensor

* Modify empty tensor's shape from [] to [0]

* Add more test for constructing empty tensor

551e9140

Y
[Cherry-pick 2.3] Autotune the workspace and kernel choosing of conv (#41833) · b4adbe5c
由 Yiqun Liu 提交于 4月 19, 2022
```
Cherry-pick #40338 #41741 #41313
```
b4adbe5c

[DoubleGrad] Enabled test_autograd_functional_dynamic.py under eager mode (#41668) (#41895) · 68643a9e

由 Zhanlue Yang 提交于 4月 19, 2022

* [DoubleGrad] Enabled double grad test cases in eager_mode for test_imperative_double_grad

* Fixed elementwise issue

* Addressed CI failures

* [DoubleGrad] Enabled test_imperative_triple_grad test cases under eager_mode

* [DoubleGrad] Enabled test_autograd_functional_dynamic.py under eager mode

* Enabled more test cases

* Fixed performance issues

* Fixed minor issue

68643a9e

Z
Add kernel sparse_mask_helper; sparse_coo_tensor_grad (#41586) (#41902) · 44d8c6ed
由 zhangkaihuo 提交于 4月 19, 2022
```
cherry-pick the PR#41586 to realese/2.3
```
44d8c6ed

Optimization for graph_sample_neighbors API (#41447) (#41897) · 6115b016

由 Siming Dai 提交于 4月 19, 2022

* add eids result for graph_sample_neighbors

* fix bug

* move fisher_yates sample to warp

* add cpu eid output

* delete comment

* delete comment

* change nullptr placeholder

* optimize sample kernel

* fix mutable_data

6115b016

18 4月, 2022 3 次提交

[Phi]Reduce kernels into multiply files (#41747) (#41854) · 688f4ec0

由 chentianyu03 提交于 4月 18, 2022

* split reduce_kernel

* rm reduce_kernel in cmake

* split reduce_grad kernels

* fix cmake build error

* format code

* fix standalone_executor_test error

688f4ec0

[DoubleGrad] Enabled double grad test cases in eager_mode for... · a367fbab

由 Zhanlue Yang 提交于 4月 18, 2022

[DoubleGrad] Enabled double grad test cases in eager_mode for test_imperative_double_grad (#41451) (#41893)

* [DoubleGrad] Enabled double grad test cases in eager_mode for test_imperative_double_grad

* Fixed elementwise issue

* Addressed CI failures

a367fbab

Add eager string tensor (#41039) (#41839) · 623f8308

由 Jack Zhou 提交于 4月 18, 2022

* Add core.eager.StringTensor __init__ which pyarray args can be passed

* Add the numpy method of core.eager.StringTensor

* revert tensor.to_string modification

* Add ToPyObject for core.eager.StringTensor

* Add debug string for core.eager.StringTensor

* Remove place args of core.eager.StringTensor temporarily

* Fix check string_tensor error

* remove dtype of core.eager.StringTensor

* add core.eager.StringTensor unittest

* remove pstring from VarDesc

* Add InitStringTensorWithStringTensor

* Remove to_string modification

* Remove zero_copy arg from StringTensor creator

623f8308

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功