提交 · 9e776f62f90ec525525c9f57072449811bc31b4f · 机器未来 / Paddle

27 6月, 2022 1 次提交

[Cherry-pick] Fix incompatible error for place type (#43830) · 9e776f62

由 Chen Weihang 提交于 6月 27, 2022

* Create Tensor by paddle::empty  in custom operator (#41840)

* create tensor by empty in custom op

* fix some bug

* update relu custom op demo (#43173)

* Fix incompatible error for custom op Placetype (#43749)

* fix incompatible error

* rmeove default constructor

* add macro

* fix cpu make error

* add DefaultGPUPlace api
Co-authored-by: Nzyfncg <zhangyunfei07@baidu.com>

9e776f62

24 6月, 2022 1 次提交
- W
  
  [cherry-pick] fix the cumsum big shape and random bug (#43777) · edff59b1
  由 wawltor 提交于 6月 24, 2022
  
  edff59b1
23 6月, 2022 1 次提交
- Z
  
  fix set_value (#43694) (#43783) · 9d12e70c
  由 zyfncg 提交于 6月 23, 2022
  
  9d12e70c
22 6月, 2022 4 次提交

X

gpu_context (#43661) · 90ae3533
由 xiaoxiaohehe001 提交于 6月 22, 2022

90ae3533

Optimize linspace to avoid GPU -> CPU copy. (#42750) (#43746) · 4dcfc6df

由 Yiqun Liu 提交于 6月 22, 2022

cherry-pick #42750。

QA反馈，#42750 优化后，solov2模型性能可提升6%，故cherry-pick到2.3。因#41096 将linspace python实现从fluid.layers.tensor挪到了paddle.tensor.creation下，该pr不在release/2.3分支中，故将#42750 中python修改同步到fluid.layers.tensor.linspace中。

4dcfc6df

[cherry pick] Support optional residual add in fused ops and slice large... · 0660d5f2

由 Zhang Ting 提交于 6月 22, 2022

[cherry pick] Support optional residual add in fused ops and slice large tensor for cudnn_softmax (#43719)

 [cherry pick] Support optional residual add in fused ops and slice large tensor for cudnn_softmax

cherry-pick #43635 #43681 #43474

0660d5f2

Z

fix tensor copy bug (#43299) (#43728) · 8760817a
由 zyfncg 提交于 6月 22, 2022

8760817a

20 6月, 2022 1 次提交
- X
  [Cherry pick] Einsum memory optimization PR #43397 (#43554) · 638b69dc
  由 xiongkun 提交于 6月 20, 2022
```
* cherry pick from #43397

* fix code
```
  638b69dc
15 6月, 2022 1 次提交
- Z
  [cherry-pick] Fix bug of strided_slice and slice (#43388, #43443) (#43432) · 7e940b84
  由 zyfncg 提交于 6月 15, 2022
```
* fix bug of strided_slice (#43388)

* fix stride_slice bug

* fix bug

* fix bug of infer shape for slice (#43443)
```
  7e940b84
14 6月, 2022 1 次提交

[ CherryPick ] Cherry pick for einsum optimization. (#43468) · 22e75d92

由 xiongkun 提交于 6月 14, 2022

* [EinsumOp] Polish forward logic and backward logic for optimize (#42603)

* change logic for optimize

* modifty

* merge

* change einsum_v2 as default and add new flags: FLAG_einsum_opt=1|0 (#43010)

* [EinsumOp] Make EinsumOp support bfloat16. (#43085)

* change einsum_v2 as default and add new flags: FLAG_einsum_opt=1|0

* make EInsumOP support bf16

* add unittest for BF16

* add condition for test_BF16

* fix bugs

* fix

* change the backward api to fit einsum op

22e75d92

08 6月, 2022 1 次提交

Replace ReduceAmax/Amax.part.cu with KP (#43202) (#43263) · e161979e

由 niuliling123 提交于 6月 08, 2022

Reduce amax/amin frobenius_norm_kerne原始实现为Eigen实现，文件编译时间较长，因此本PR将其替换为KP实现
删除DefaultElementwiseOperator中重复功能支持，减少elementwise_double_grad OP编译时间

e161979e

07 6月, 2022 1 次提交
- N
  [cherry-pick]Delete ElementwiseKernel in BroadcastKernel (#42779) (#43210) · 52ef8656
  由 niuliling123 提交于 6月 07, 2022
```
Delete ElementwiseKernel in BroadcastKernel
减少所有Broadcast中重复功能调用，同时减少编译时间和问题体积
```
  52ef8656
06 6月, 2022 1 次提交

cherry-pick 42645 (#43205) · 835a1888

由 niuliling123 提交于 6月 06, 2022

删除Broadcast function中rank例化以及Elementwise调用，降低编译时间。
从develop分支中的#42645 PR修改而来，由于develop分支与release分支相差较大，无法实现cherry-pick，因此针对release2.3重新提交PR.
Broadcast中关于rank的例化会导致底层模板展开较多，造成reduce_sum_grad_kernel.cu.o文件体积过大，修改后可以降低.o体积及编译时间

835a1888

10 5月, 2022 1 次提交

[cherry-pick][MLU] support add callback to stream and profiler (#42115) · 25124d7f

由 fwenguang 提交于 5月 10, 2022

* [MLU] add mlu new profiler (#41138)

* [MLU] add mlu new profiler

* fix format

* [MLU] support add callback to stream (#41831)

* [MLU] add gather mlu kernel (#41969)

* [MLU] add mlu activation kernels (#41751)

25124d7f

06 5月, 2022 1 次提交

Fix the race condition in cumsum operator (#42205) (#42500) · 58f40144

由 wawltor 提交于 5月 06, 2022

* Fix the race condition in cumsum operator

* Optimize cumsum operator
Co-authored-by: NLeo Chen <39020268+leo0519@users.noreply.github.com>

58f40144

05 5月, 2022 1 次提交
- X
  
  fix bugs (#42495) · 590b4dbc
  由 xiongkun 提交于 5月 05, 2022
  
  590b4dbc
04 5月, 2022 2 次提交
- X
  [cherry-pick 2.3] fix bug of batch_norm_grad kernel with fp16 (#42461) · a5745864
  由 XiaoguangHu 提交于 5月 04, 2022
```
* fix bug of batch_norm_grad kernel with fp16

* format code
```
  a5745864
- X
  
  fix bug when compiling with cusparse in CUDA version >=11.4 (#42456) · b57c132a
  由 XiaoguangHu 提交于 5月 04, 2022
  
  b57c132a
01 5月, 2022 1 次提交
- C
  
  remove useless lod copy (#42425) · 778ec77b
  由 Chen Weihang 提交于 5月 01, 2022
  
  778ec77b
30 4月, 2022 2 次提交

Make einsum_v2 support multi-operands (#42327) (#42397) · 34352fcd

由 xiongkun 提交于 4月 30, 2022

* Extend python einsum interface to make einsum_v2 support multi-operands and switch it to default.

* add opt_einsum dependence

* add yaml and support eager model

* fix by code review

34352fcd

R2.3/fix pad3d infer shape (#42414) · 2dce1e88

由 littletomatodonkey 提交于 4月 30, 2022

* fix pad3d infer shape

* fix pad3d

* fix pad default value

* fix order

* add unit test

* fix unittest for ci coverage

* add ndhwc check

2dce1e88

28 4月, 2022 5 次提交

Optimize attribute selected performence (#42294) (#42368) · e0e534ab

由 Chen Weihang 提交于 4月 28, 2022

* opt attr eaque perf

* opt attr select code

* fix one hot infermeta

* polish get attr impl

* fix tests failed

* add testcases

e0e534ab

Add C++ EinsumOp which support 2 operands einsum. (#42105) (#42357) · d04a68d3

由 xiongkun 提交于 4月 28, 2022

* full api fix

* when out is None, go old dygraph mode

* by static check

* first version: support 2-inputs forwards. TODO: 1. backward  2. BroadCast  3. MultiVariable

* time out -> 120

d04a68d3

F
set device id of Place() to get GPUContext needed by LimitGridDim in... · 0fe0aea9
由 FlyingQianMM 提交于 4月 28, 2022
```
set device id of Place() to get GPUContext needed by LimitGridDim in ElemwiseGradBroadcast (PaddlePaddle#42320) (#42332)
```
0fe0aea9

[cherry-pick] Optimize performance of dygraph (#42196) (#42329) · 2ea56c90

由 zyfncg 提交于 4月 28, 2022

* Optimize performance of dygraph (v4)  (#42196)

* optimize performance of dygraph

* optimize performance of dygraph and elementwise_add

* optimize the trace op

* fix bug

* fix bug

* fix unittest bug

* fix code format

* fix cherry-pick problem

2ea56c90

[cherry-pick] Optimize performance of dygraph (#42231, #42253) (#42309) · 69a92b7b

由 zyfncg 提交于 4月 28, 2022

* Optimize the performanece of sum api (#42231)

* optimize the performanece of sum api

* optimize IsDenseTensorInput

* remove debug log

* Add move construct for KernelSignature (#42253)

* add move construct for KernelSignature

* add noexcept

* fix cherry-pick problem

69a92b7b

27 4月, 2022 3 次提交

[Cherry-pick] Optimize dygraph performance part4 (#42306) · 9bc423b1

由 Chen Weihang 提交于 4月 27, 2022

* Remove std::type_index in AttributeArdDef (#42122)

* polish some impl

* add lost attr type

* polish details

* fix error type

* polish in name lists

* add double attr

* adapt infrt attr parse

* add attr type test (#42263)

* opt attr eaque perf (#42272)

9bc423b1

J
[Eager] fix memory issue for eager (#42086) (#42118) · 8964fea9
由 Jiabin Yang 提交于 4月 27, 2022
```
* fix memory issue for eager

* fix bug
```
8964fea9

[Cherry-pick2.3] Optimize dygraph performance part3 (#42256) · 9495708a

由 Chen Weihang 提交于 4月 27, 2022

* Change small vector size (#42202)

* change samll vector size

* Update type_defs.h

* Optimize dygraph InferShape perf (#42155)

* init commit

* remove two hash impl

* fix bug

* polish details

* fix compile failed

* fix compile failed

* fix compile failed

* add default kernel sig cache

* fix get kernel arg defs error

* remove kernel arg defs cache

* fix origin op execute

9495708a

26 4月, 2022 1 次提交

[Cherry-pick] Optimize dygraph performance part2 (#42224) · ab24b9c0

由 Chen Weihang 提交于 4月 26, 2022

* Add paddle::variant and replace paddle::any (#42139)

* add variant and replace any

* split attribute

* Optimize dygraph GetExpectedKernelType perf (#42154)

* opt dygraph scheduling

* revert part impl

* fix variant compile error (#42203)

* replace any by variant in infermeta (#42181)

ab24b9c0

25 4月, 2022 2 次提交

[cherry-pick] Optimize performance of dygraph (#42093, #42103, #42137) (#42171) · 0d537003

由 zyfncg 提交于 4月 25, 2022

* optimiaze performance of PreparePhiData (#42093)

* Dygraph performance optimization (v2) (#42103)

* optimiaze performance of PreparePhiData

* dygraph performance optimization

* optimize performance of dygraph (#42137)

0d537003

[Cherry-Pick][Performance]Remove CudaStreamSychornize in ClipGradByGlobalNorm... · 58d0d15e

由 Aurelius84 提交于 4月 25, 2022

[Cherry-Pick][Performance]Remove CudaStreamSychornize in ClipGradByGlobalNorm and fix shape op (#42170)

* [Performance]Set ShapeKernel with ALL_BACKEND and ALL_LAYOUT (#42138)

* [Performance]Set ShapeKernel with ALL_BACKEND and ALL_LAYOUT

* [Performance]Set ShapeKernel with ALL_BACKEND and ALL_LAYOUT

* [Performance]Remove CudaStreamSychornize in ClipGradByGlobalNorm (#42132)

58d0d15e

22 4月, 2022 1 次提交
- J
  
  Add UT (#42055) · 4f6aba87
  由 Jacek Czaja 提交于 4月 22, 2022
  
  4f6aba87
21 4月, 2022 4 次提交

[cherry-pick] Adjust the Phi C++ API and yaml (#41576, #41778, #41909) (#41928) · d24a402e

由 zyfncg 提交于 4月 21, 2022

* [PHI] Support some c++ api in paddle namespace (#41778)

* support some c++ api in paddle namespace

* change c++ api namespace in custom op

* [Phi] Support setting size of vector<Tensor> for out in yaml (#41576)

* support setting vector out size in yaml

* support setting size of vector<tensor> for out in yaml

* add data transform config for shape and size (#41909)

* fix api_gen bug

d24a402e

[Cherry-pick] Optimize dygraph scheduling performance (#42010) · ec1d2a16

由 Chen Weihang 提交于 4月 21, 2022

* [Phi] Support setting size of vector<Tensor> for out in yaml (#41576)

* support setting vector out size in yaml

* support setting size of vector<tensor> for out in yaml

* resolve conflict
Co-authored-by: Nzyfncg <zhangyunfei07@baidu.com>

ec1d2a16

J
[Cherry-pick] Enabled test_imperative_star_gan_with_gradient_penalty.py under eager mode (#41994) · af7439ad
由 Jiabin Yang 提交于 4月 21, 2022
```
* cherry-pick python/paddle/utils/code_gen/backward.yaml

* remove unsupported yaml
Co-authored-by: NZhanlue Yang <jim19930609@gmail.com>
```
af7439ad

[Cherry-pick] Polish custom op details (#42008) · f637e3d2

由 Chen Weihang 提交于 4月 21, 2022

* polish tensor api details (#41971)

* [CustomOp] Fix custom op pinned input error (#41972)

* fix custom op pinned input error

* fix compile error

* fix inference custom op (#41999)

* resolve conflict

f637e3d2

20 4月, 2022 2 次提交

[Phi] Support construct Scalar by using Non-CPU Tensor (#41765) (#41963) · 3b25afb2

由 YuanRisheng 提交于 4月 20, 2022

* support construct scalar using non-cpu tensor

* fix bugs when run unittest

* fix compile bugs

* fix bugs when run ci

* fix compile bugs

* fix bugs when move copy

* perfect unit test

* perfect unittest

* update according to comment

* add target dependency

* deal with conflict

* fix bugs when run unit test

* fix unit test bugs

3b25afb2

[Cherry-Pick]Fix expand_sig infershape BUG under static graph mode and... · 93f0e594

由 Aurelius84 提交于 4月 20, 2022

[Cherry-Pick]Fix expand_sig infershape BUG under static graph mode and NeedTransformPlace behavior if set skip_transform in yaml (#41973)

* [Phi]Fix expand_sig infershape BUG under static graph mode (#41936)

* [Phi]Fix expand_sig infershape BUG under static graph mode

* [Phi]Fix expand_sig infershape BUG under static graph mode

* [Phi]Fix unittest

* [Phi]Fix unittest

* [Eager]Fix NeedTransformPlace behavior if set skip_transform in yaml (#41920)

* [Eager]Fix NeedTransformPlace behavior if set skip_transform in yaml

* add unittest for full_like

* fix unittest

93f0e594

19 4月, 2022 1 次提交

[cherry-pick] add rsqrt, equal_all, expand yaml and unittest (#41443, #41540) (#41965) · 018245d8

由 zyfncg 提交于 4月 19, 2022

* add rsqrt yaml and unittest (#41443)

* Add expand equal all yaml (#41540)

* add expand, poisson

* add poison grad

* add expand equal_all poisson triangular solve yaml
Co-authored-by: Nhong <43953930+phlrain@users.noreply.github.com>

018245d8

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致