提交 · 20fbafe624cf93a3cc7768f6e191d3e82fb6bd95 · PaddlePaddle / Paddle

13 5月, 2023 1 次提交

[cherrypick][inference Zero-Dim] Support 0-Dim Tensor in Paddle-TensorRT (#53752) · 20fbafe6

由 Zhang Jun 提交于 5月 13, 2023

* scale, square, sum, swish trt op converter support zero dim (#53660)

* [Paddle-Inference] Support trt 0dims of expand_as_v2 and mish. (#53627)

* support_expand_mish

* add unitest for reshpe 0 dims (#53685)

* Add trt pow converter. (#53462)

* Add trt pow converter.

* update to use AddConstantLayer

* add dims=0 ut

* [inference Zero-Dim]add equal, elementwise_op trt 0d (#53704)

* [inference Zero-Dim]prelu trt converter support zero dim tensor (#53634)

* prelu op trt converter support zero dim

* [Inference Zero-Dim] Support trt 0dim of gelu, hard_swish, hard_sigmoid and leaky_relu (#53714)

* support_act
* delete_silu

* [inference zero dim] softmax, stack op trt converter support zero dim (#53729)

* softmax support

* support stack

* remove unused code

* update

---------
Co-authored-by: NYuanle Liu <yuanlehome@163.com>
Co-authored-by: Nxiaoxiaohehe001 <49090790+xiaoxiaohehe001@users.noreply.github.com>
Co-authored-by: Nzhoutianzi666 <39978853+zhoutianzi666@users.noreply.github.com>
Co-authored-by: NWilber <jiweibo@baidu.com>

20fbafe6

11 5月, 2023 1 次提交
- J
  [Cherrypick] up index warning level (#53692) · 9fbae766
  由 JYChen 提交于 5月 11, 2023
```
* up warning level

* numpy still vlog-0
```
  9fbae766
10 5月, 2023 1 次提交

[cherry-pick 2.5] Broadcast && Dropout_nd Performance Optimization into Release/2.5 (#53623) · f9ea2301

由 Bo Zhang 提交于 5月 10, 2023

* Support different dtypes of inputs for broadcast for dropout optimization  (#52093)

* change judgement for DropoutGradGPUKernelDriver

* add UnrollerWithoutVecSize and after this Loaddata to be refined

* pass unittest

* use same unroller with XPU

* BroadcastWithInt64Index

* BroadcastDataLoader template partial specialization

* fix compile errs in ROCms

* PR comment

* dropout_nd_optimization (#51479)

* with printf

* add DropOutNdForwardKernel

* PR comment

* Dropout optimize & clean broadcast inT and ElementwiseType (#52969)

* change judgement for DropoutGradGPUKernelDriver

* add UnrollerWithoutVecSize and after this Loaddata to be refined

* pass unittest

* use same unroller with XPU

* BroadcastWithInt64Index

* BroadcastDataLoader template partial specialization

* fix compile errs in ROCms

* clean ElementwiseT and InT for BroadcastKernel

* default axis and clean inT

* remove redundant fast divmod computation

* optimize drop_nd & drop_nd_grad

* optimize BroadcastDataLoader bf16 fp16

* rm InT etc. after merge develop

* delete constexpr for windows ci

* fix conflict

* fix conflic with develop

* fix conflic

* new clean

* clean

* Fix xpu2 kp compile error (#53548)

* fix conflict

* conflict

f9ea2301

09 5月, 2023 8 次提交

[Cherry-pick 2.5][Zero-Dim] paddle.to_tensor support 0D (#53599) · 2aefc45b

由 zqw_1997 提交于 5月 09, 2023

* fix doc erros, test=allcase

* conflict

* test=allcase

* test=allcase

* test=allcase

* test=allcase

* test=allcase

* test=allcase

* test=allcase

* test=allcase

* test=allcase

* test=allcase

* test=allcase

* test=allcase

* test=allcase

* test=allcase

* test=allcase

* test=allcase

* test=allcase

* test=allcase

* test=allcase

* test=allcase

* test=allcase

* test=allcase

* test=allcase

* test=allcase

* test=allcase

* fix doc erros, test=allcase

* fix the to_tensor error

2aefc45b

[Zero-Dim] Support p_norm/reduce_sum_p output 0D (#53421) (#53618) · 3ffe8f36
由 zhouweiwei2014 提交于 5月 09, 2023

3ffe8f36
L
Cherry pick fused linear (#53621) · f21b6f08
由 limingshu 提交于 5月 09, 2023
```
Cherry pick fused linear
```
f21b6f08

[cherry-pick 2.5][inference Zero-Dim] trt support 0 dims (#53497) · 77eeb226

由 Zhang Jun 提交于 5月 09, 2023

* [inference][trt]trt support 0 dims (#53383)

* trt support 0 dim
* update activation ut
* fix trt Unary operation do not support 0d when TRT < 8.6
* Update op_teller.cc
* update unary ut
* add rsqrt to unary_list
* move rsqrt to act_list

77eeb226

【cherry-pick】Op test add complex support (#53604) · c8504d86

由 GGBond8488 提交于 5月 09, 2023

* add complex support for  optest

* add complex grad test

* append one

* move some debug info

* move some debug info

* move some debug info

* move some debug info

* add more complex test

* Fix naming ambiguity

* Revert "add more complex test"

This reverts commit dbcb0516b8e53ba42e2d6089878a39b395345969.

* change backward gradient, add TODO

c8504d86

[cherry-pick 2.5][Zero-Dim] support paddle.sum/mean/loss api output 0D (#53601) · b6e23774

由 zhouweiwei2014 提交于 5月 09, 2023

* [Zero-Dim] fix functool.reduce more safe with intial value, to support empty list (#53182)

* [Zero-Dim] support 0d tensor for shape and squeeze onednn kernel (#52832)

* support 0d tensor for shape and squeeze onednn kernel

* set python api for shape op ut

* [Zero-Dim] distributed scatter/all_to_all support input 0D tensor (#53186)

* [Zero-Dim] Support paddle.sum/mean/loss api output 0D,test=allcase (#52739)

* [CINN Support 0D-Tensor] CINN supports 0D-Tensor with trick temporarily (#53382)

* [CINN Support 0D-Tensor] CINN supports 0D-Tensor with trick temporarily

* Add unittest

* [CINN Support 0D-Tensor] CINN hack squeeze2 with trick temporarily (#53454)

* fix test_autograd_dynamic (#53473)
Co-authored-by: Nzhwesky2010 <zhouwei25@baidu.com>

---------
Co-authored-by: NYangQun <qun.yang@intel.com>
Co-authored-by: NHongyuJia <jiahongyu@baidu.com>
Co-authored-by: NHydrogenSulfate <490868991@qq.com>

b6e23774

[Cherry-pick] zero-dim: support 0-D for getitem/setitem (#53441) · 767e7b3f

由 JYChen 提交于 5月 09, 2023

* support 0-D output and 0-D as indice in __getitem__

* fix tests

* fix inference and UT

* add unittest for setitem

* fix xpu test

* fix xpu 0-d

* fix right value is 0d and index is List/Tensor

* Hack__getitem__ from 0-d to 1-d with FLAGS_set_to_1d

* change PHI_DECLARE_xxx to DECLARE_xxx since the change not merged to 2.5

* hack 1-D tensor to Scalar

* throw warning at __getitem__, not slice_utils

767e7b3f

C

fix eval branch of prim vjp of batch_norm in amp mode (#53594) · 95a7bcf9
由 cyber-pioneer 提交于 5月 09, 2023

95a7bcf9

08 5月, 2023 3 次提交

Z
[Paddle-TRT] The Graph uses OpConverterType for op converter (#53214) (#53585) · 2cf4a04a
由 zhoutianzi666 提交于 5月 08, 2023
```
* add ```converter_type``` for op converter
```
2cf4a04a
N
[cherry-pick] Fix core dumped in training when check_nan_inf=1 (#53423) · d5c3f032
由 niuliling123 提交于 5月 08, 2023
```
修复优化器精度检查bug
```
d5c3f032

[Cherry-pick]Cherry pick 0d output (#53538) · 2d02b0c1

由 GGBond8488 提交于 5月 08, 2023

* add 0D output support for inalg.slogdet,test=allcase

* fix zerom dime test error test=allcase

* fix test error test=allcase

* add static backward test, test=allcase

* support_0D_output_for_matrix_rank_multi_dot, test=allcase

* add 0D output test for matrox_rank and mutli_dot test=allcase

* fix assert error ,test=allcase

* fix test error, test=allcase

* fix other test error, test=allcase

* fix other test error, test=allcase

* fix test error, test=allcase

* fix matrix_rank and multi dot test err test=allcase

* fix test error test=allcase

* fix test zero dim test, test=allcase

* add static backward test for multi_dot, test=allcase

* add tol 2d broadcast test case, test=allcase

* fix test error test=allcase

* fix test error test=allcase

* test=allcase

* support_0d_output_for_linalg.norm

* fix test error test=allcase

* fix 0D test

* fix test error test=allcase

* fix test error test=allcase

* fix tets,test=allcase

* fix error,test=allcase

* fix errors ,test=allcase

* add static backward , test=allcase

* add static backwward test, test=allcase

* slogdet_support_0D_output

* add new case

* fix tests, test=allcase

* cherry-pick

* cherry-pick

* fix trace gpu kernel 0d error, test=allcase

* fix windows error, test=allcase

* add matrixrank cherry-pick

2d02b0c1

06 5月, 2023 1 次提交

[Cherry-Pick] AMP OP&Test support from Hackathon (#53522) · 39b704c1

由 Zhang Zheng 提交于 5月 06, 2023

低精度算子支持和单测补充，合并 cherry pick 17个Hackathon PR，共覆盖25个OP的低精度支持及完善

39b704c1

05 5月, 2023 2 次提交
- A
  [Dy2St]Get grad names when call append backward to fix high order gradient (#53250) (#53493) · 584d6105
  由 Aurelius84 提交于 5月 05, 2023
```
[Dy2St]Get grad names when call append backward to fix high order gradient (#53250)
Co-authored-by: NWangZhen <23097963+0x45f@users.noreply.github.com>
```
  584d6105
- Z
  [AMP] Cherry-pick AMP (#53442) · 4d7e9b55
  由 Zhang Ting 提交于 5月 05, 2023
```
 Cherry-pick AMP 
```
  4d7e9b55
04 5月, 2023 1 次提交
- Y
  
  tensor should be defined (#53455) · ec849efd
  由 Yuanle Liu 提交于 5月 04, 2023
  
  ec849efd
28 4月, 2023 1 次提交
- D
  
  fix custom_device CopyToCpu to avoid crash (#53400) · 8413e4c3
  由 duanyanhui 提交于 4月 28, 2023
  
  8413e4c3
27 4月, 2023 1 次提交

[cherry-pick2.5] [Zero-Dim] Support... · b6996598

由 zhouweiwei2014 提交于 4月 27, 2023

[cherry-pick2.5] [Zero-Dim] Support all/any/min/max/prod/logsumexp/amax/amin/some loss output 0D (#53192)

b6996598

25 4月, 2023 2 次提交
- N
  Remove a LOG(INFO). test=document_fix (#53056) (#53324) · f55b387d
  由 niuliling123 提交于 4月 25, 2023
```
移除过多的日志打印
```
  f55b387d
- N
  [Cherry-pick] Add enable_tensor_checker and disable_tensor_checker to api list (#52936) (#53287) · ec77defc
  由 niuliling123 提交于 4月 25, 2023
```
新增enable_tensor_checker, disable_tensor_checker API (#52936)
```
  ec77defc
24 4月, 2023 2 次提交
- J
  Revert "Cherry pick getitem/setitem 0d (#53125)" (#53265) · 50f61213
  由 JYChen 提交于 4月 24, 2023
```
This reverts commit a79c04f3.
```
  50f61213
- N
  [cherry-pick] Add debugging api and python stack (#53217) · 1e7efd81
  由 niuliling123 提交于 4月 24, 2023
```
Print the forward's stack when backward op has nan/inf and FLAGS_check_nan_inf_level = 0
Delete temp param in eager_gen
```
  1e7efd81
23 4月, 2023 2 次提交

Cherry pick getitem/setitem 0d (#53125) · a79c04f3

由 JYChen 提交于 4月 23, 2023

* support 0-D output and 0-D as indice in __getitem__

* fix tests

* fix inference and UT

* add unittest for setitem

* fix xpu test

* fix xpu 0-d

a79c04f3

Fix bug of block desc. (#53163) (#53176) · 7adecf40

由 Ghost Screaming 提交于 4月 23, 2023

* Fix bug of reduce_sum op. When input.numel() > INT32_MAX, its result
is wrong.

* Remove climits.

* Fix bug of BlockDesc::MoveFrom(). It's used to rebuild main_program_desc from ProgramDesc modified by Fusion Pass. As some fused operators need to create new Variables in modified ProgramDesc, MoveFrom function uses std::move() function to move these VarDesc to main_program_desc. As a result, their pointers holded by modified ProgramDesc become nullptr. When call block()->Program()->proto() function, it will call ProgramDesc::Flush() function at first, which may cause a segmentation fault.

7adecf40

20 4月, 2023 3 次提交
- C
  
  Fix open missing mode on jetson (#53069) · 02f44fcc
  由 chalsliu 提交于 4月 20, 2023
  
  02f44fcc
- Y
  [cherry-pick] remove c++14 assert and remove include tensor.h in phi (#53071) · 356ba7e3
  由 Yuanle Liu 提交于 4月 20, 2023
```
* remove c++14 assert and remove include tensor.h in phi

* update

* remove delete_cast_op_pass
```
  356ba7e3
- R
  [CustomDevice] add c_identity op (#52982) (#53013) · d131e679
  由 ronnywang 提交于 4月 20, 2023
```
* [CustomDevice] add c_identity op

* fix use calc stream
```
  d131e679
17 4月, 2023 10 次提交

Y
[PHI]Unify fluid kernel (Part4) (#52626) · 1b5eba8a
由 YuanRisheng 提交于 4月 17, 2023
```
* unify kernel

* fix ci bugs

* fix py3 bugs

* fix py3 bugs

* perfect code
```
1b5eba8a
L
【fix bug】Fix bug in parse args with '{,}' (#52968) · be04f258
由 lzydev 提交于 4月 17, 2023
```
* fix bug in parse args

* fix bug

* recover legacy_*.yaml

* change 'Out' to Output
```
be04f258
L

add autogen code support for uniform_inplace (#52955) · b9830634
由 LoneRanger 提交于 4月 17, 2023

b9830634
G

remove some [-Wunused-paramter] warning (#52924) · 337cc2ca
由 Galaxy1458 提交于 4月 17, 2023

337cc2ca

[CINN] fix concat (#52341) · 31fc763a

由 wangzhen38 提交于 4月 17, 2023

* [CINN] fix concat&pow

* update concat

* composite_backward_api

* for ci

* for ci

* update test & fix opmaker

31fc763a

J

Support trt engine auto build in runtime for dynamic shape (#52162) · ebc58548
由 JingZhuangzhuang 提交于 4月 17, 2023

ebc58548
张

remove hccl in some .cc files (#52942) · 514d83de
由张春乔提交于 4月 17, 2023

514d83de

Add output defs for some kernelsPhi register (#52941) · 23f87442

由 Sonder 提交于 4月 17, 2023

* add register info for eigh and eig_gard

* add sync_batch_norm_op.cu register info

* add lamb output register info

* add unique register info

* change type name

* change type name

* add output register info for check_finite_and_unscale

* update cmake and config file

* add register info for adagrad

* fix build error

* add sync to run_unittests.sh

* add register info for unique_consecutive

* fix build error

* add eigh to STATIC_BUILD_TESTS

* update eig_kernel.cc

* update eig_kernel.cc

* fix infer mate error

* fix unique register error

* fix lamb register info error

* fix lamb register info

* update lamb register info

* fix lamb

* remove one Output Register

* update static build file

* add eigh op to disable_wingpu_test

* update run_unittests

23f87442

Z
[AMP OP&Test] Sync_batch_norm support bfloat16 (#52921) · 1080d4fc
由 Zhang Zheng 提交于 4月 17, 2023
```
* [AMP OP&Test] Sync_batch_norm support bfloat16

* fix

* fix
```
1080d4fc
H

[Dygraph] Support delaying div loss by accumulate_steps in PipelineLayer (#52848) · 0abdcff6
由 Haohongxiang 提交于 4月 17, 2023

0abdcff6

15 4月, 2023 1 次提交
- H
  
  [Opt CustomOP] Optimize the perf and impl of custom grad operator (#52915) · 0afef498
  由 HongyuJia 提交于 4月 15, 2023
  
  0afef498

PaddlePaddle / Paddle 大约 2 年 前同步成功

PaddlePaddle / Paddle
大约 2 年前同步成功