提交 · 08c90086c2691a7bfd787aa57ba2c7c6b37624d1 · PaddlePaddle / Paddle

16 6月, 2023 3 次提交

Z

lite xpu api & clone (#54670) · 08c90086
由 zhupengyang 提交于 6月 16, 2023

08c90086

[inference][trt]Layer norm rollback 2 plugin when trt<8.6 (#54679) · 1a941b71

由 bukejiyu 提交于 6月 16, 2023

* layer_norm op with dynamic shape support INormalizationLayer in TRT8.6

* Using trt layer to make layers_norm op in lower than trt8.6
layer_norm op with dynamic shape support INormalizationLayer in TRT8.6

* ROLLBACK to layer_norm plugin when trt<8.6
* Update layer_norm_op.cc delete log

* Update layer_norm_op.cc code style

1a941b71

[inference][trt] zero-dim support for cumsum and bitwise_not op (#54097) · 73fa98ed

由 bukejiyu 提交于 6月 16, 2023

* 0-dims support cumsum and bitwise_not
* Update cumsum_op.cc
* Update test_trt_convert_bitwise_not.py
---------
Co-authored-by: NZhang Jun <ewalker@live.cn>

73fa98ed

15 6月, 2023 17 次提交
- D
  
  add uint8 custom ccltype (#54671) · 6fc0378a
  由 duanyanhui 提交于 6月 15, 2023
  
  6fc0378a
- Y
  
  fix mac unittest bugs when use static phi (#54656) · b7a6e981
  由 YuanRisheng 提交于 6月 15, 2023
  
  b7a6e981
- H
  exp/expm1 support int32/int64/float16 forward (#54556) · 58ae8c7c
  由 Hui Zhang 提交于 6月 15, 2023
```
* fix for log xxx

* add int32/int64 for cpu/gpu; add float16/bfloat16 for cpu forward

* fix docstring

* fix bug

* fix bugs

* fix bugs

* fix bugs

* fix bugs

* fix bug

* using cast

* fix test

* fix api

* fix other bugs

* fix ci bug for not using dygraph guard

* add bfloat16 test

* fix ut

* bf16

* exp/expm1 support int32/int64

* fix ut

* fix ut

* fix ut
```
  58ae8c7c
- C
  support composite higher grad maker auto gen (#54666) · 9a36fd4b
  由 Charles-hit 提交于 6月 15, 2023
```
* support composite higher grad maker

* fix composite maker higher order gen
```
  9a36fd4b
- G
  [cmake] rm leveldb submodule (#54436) · 3a275162
  由 gouzil 提交于 6月 15, 2023
```
* [cmake] rm leveldb sub

* [cmake] add tag check

* [cmake] fix install dir

* [cmake] fix install error

* [cmake] fix update error

* fix
```
  3a275162
- R
  
  [CustomDevice] add MOE support, PART2 (#54573) · 8c214b6a
  由 ronnywang 提交于 6月 15, 2023
  
  8c214b6a
- Z
  
  static graph autogen code for prior_box (#54508) · 8771fff3
  由 Zhenghai Zhang 提交于 6月 15, 2023
  
  8771fff3
- H
  [IR] [Baby step] New interprector support new ir (#54570) · ce0c5c27
  由 hong 提交于 6月 15, 2023
```
* add kernel dialect

* change DenseTensorTypeStorage to DenseTensorType

* add test case`

* add first pd_op to kernel dialect

* lower pd op to kernel dialect

* update

* update

* remove useless code

* add attrite print test

* fix bug

* update

* update

* update

* update

* polish code

* fix bug

* polish  code  and add python test

* add test

* fix test error

* add env flag

* fix bug

* revert test env

* change cc_test_old to cc_test

* fix build_static bug

* fix type test error

* udpate cmake

* disable test in windows

* fix inference compile
```
  ce0c5c27
- Z
  
  refresh (#50791) · c7ba811d
  由 zqw_1997 提交于 6月 15, 2023
  
  c7ba811d
- L
  
  fix dygraph to dynamic (#54633) · 38ff4fee
  由 LiYuRio 提交于 6月 15, 2023
  
  38ff4fee
- B
  [inference][trt]modify test timeout and test_trt_convert_activation bug fix (#54491) · 1f3dd978
  由 bukejiyu 提交于 6月 15, 2023
```
* modify tensorrt ci timeout

* activation ci bug fix

* comment out  int8 mode test_trt_dynamic_shape_groupnorm
```
  1f3dd978
- H
  
  fix pp release_output (#54673) · fcec31ab
  由 Haohongxiang 提交于 6月 15, 2023
  
  fcec31ab
- C
  
  fix batch_norm optest code (#54661) · 3a8484c4
  由 cyber-pioneer 提交于 6月 15, 2023
  
  3a8484c4
- G
  Fix sync batch norm op under cuda 12 (#54640) · 7fef4ee9
  由 Ghost Screaming 提交于 6月 15, 2023
```
* Fix bug of reduce_sum op. When input.numel() > INT32_MAX, its result
is wrong.

* Remove climits.

* Fix problem of pickle and NCCL_P2P_DISABLE in distributed testcases in
cuda12.

* Fix problem of TimeOut of distributed testcases under cuda12.

* Fix bug of test_sync_batch_norm_op_static_build accuracy problem under
cuda12.

* Remove useless code modification.
```
  7fef4ee9
- L
  
  remove useless code (#54657) · 490e2f3d
  由 LiYuRio 提交于 6月 15, 2023
  
  490e2f3d
- R
  
  [CustomDevice] add MOE support, PART1 (#54572) · 20db8602
  由 ronnywang 提交于 6月 15, 2023
  
  20db8602
- L
  
  remove vlog and modified error (#54648) · 3261b106
  由 LiYuRio 提交于 6月 15, 2023
  
  3261b106
14 6月, 2023 20 次提交
- C
  
  fix mea get pad no default return bug (#54644) · c037453d
  由 Chitsing KUI 提交于 6月 14, 2023
  
  c037453d
- C
  [AMP Prim OP]support amp logic for some prim ops (#54608) · 182e0904
  由 Charles-hit 提交于 6月 14, 2023
```
* fix api rename

* support amp logic for some prim ops

---------
Co-authored-by: Nkangguangli <kangguangli@hotmail.com>
```
  182e0904
- C
  [prim] Add committer for reviewing prim backward file (#54632) · cc91fa66
  由 cyber-pioneer 提交于 6月 14, 2023
```
* add user to check composite_rule.py

* add committer for reviewing prim backward file
```
  cc91fa66
- C
  [AutoTuner] Add auto tuner to obtain optima configuration (#54460) · e12d2867
  由 caozhou 提交于 6月 14, 2023
```
* add auto tuner

* fix prune

* fix sharding prune and mbs candidates

* fix cfg

* fix launch

* fix launch

* add unittest

* fix code style
```
  e12d2867
- G
  Fix cuda12 timeout problems. (#54615) · a90d9088
  由 Ghost Screaming 提交于 6月 14, 2023
```
* Fix bug of reduce_sum op. When input.numel() > INT32_MAX, its result
is wrong.

* Remove climits.

* Fix problem of pickle and NCCL_P2P_DISABLE in distributed testcases in
cuda12.

* Fix problem of TimeOut of distributed testcases under cuda12.

* Remove useless modification.

* Remove useless modification.
```
  a90d9088
- C
  [prim] move batch_norm prim test to op_test (#54458) · 58b4c60f
  由 cyber-pioneer 提交于 6月 14, 2023
```
* move batch_norm prim test to op_test

* fix optest bug

* add test to cmake

* add cinn test case

* fix batch_norm prim grad bf16

* fix code

* add cuda check

* fix batch_norm bfloat16

* fix cpu bfloat16 bug

* skip non-bfloat16-supported platform

* fix code

* fix cinn rtol and atol in bfloat16

* fix name

* fix config
```
  58b4c60f
- C
  
  support group_norm and cumsum prim ops bf16 dtype (#54580) · f7eb03c6
  由 Charles-hit 提交于 6月 14, 2023
  
  f7eb03c6
- [Zero-Dim] paddle.nanmedian/nanquantile support 0D Tensor (#54500) · 3d4d995f
  由 zhouweiwei2014 提交于 6月 14, 2023
```
* [Zero-Dim] paddle.nanmedian support 0D Tensor

* fix CI
```
  3d4d995f
- [Zero-Dim] add 0D test case (#54581) · ca59c72b
  由 zhouweiwei2014 提交于 6月 14, 2023
  
  ca59c72b
- Y
  
  update flash attn select (#54630) · 49a45f71
  由 Yuang Liu 提交于 6月 14, 2023
  
  49a45f71
- C
  
  Fix lite-subgraph xpu inference error (#54607) · d3259900
  由 chalsliu 提交于 6月 14, 2023
  
  d3259900
- Z
  
  set xpu context at runtime (#54587) · d0d7d01f
  由 zhupengyang 提交于 6月 14, 2023
  
  d0d7d01f
- Z
  
  [AMP] fix bf16 amp training error (#54571) · 4ee3815e
  由 Zhang Ting 提交于 6月 14, 2023
  
  4ee3815e
- H
  Support code generation for op fill_any (#54378) · 4277f61f
  由 huangjiyi 提交于 6月 14, 2023
```
* update

* update
```
  4277f61f
- Y
  [BugFix]: Fix ci test bugs in test_fuse_gemm_epilogue_pass.py and... · ded7d190
  由 yuehuayingxueluo 提交于 6月 14, 2023
```
[BugFix]: Fix ci test bugs in test_fuse_gemm_epilogue_pass.py and test_fused_gemm_epilogue_op.py (#54519)

* fix ci bugs in fused_linear

* fix code style
```
  ded7d190
- S
  
  fix bug of release output in pp (#54624) · 40bfe0eb
  由 ShenLiang 提交于 6月 14, 2023
  
  40bfe0eb
- Z
  [IR] Support mutable attribute as input for paddle dialect OP build method (#54563) · d658940a
  由 zhangbo9674 提交于 6月 14, 2023
```
* support mutable attr is input for build

* add ut

* solve conflict
```
  d658940a
- S
  Fix A100 CUDA12 ut (#54487) · a96c6dc7
  由 sneaxiy 提交于 6月 14, 2023
```
* fix A100 CUDA12 ut

* fix ci uts

* fix test_sync_batch_norm_op

* fix sync bn op ut again by separating 2 files

* fix codestyle ci

* combine other PRs

* fix codestyle

* fix codestyle ci
```
  a96c6dc7
- A
  [IR]Polish ProgramTranslator private member code style (#54470) · 45ba9cf0
  由 Aurelius84 提交于 6月 14, 2023
```
* [IR]Polish ProgramTranslator private member code style

* update blog
```
  45ba9cf0
- Y
  [IR&PASS] part 3-2: add PatternApplicator and FrozenRewritePatternSet, refine... · 548fb821
  由 Yuanle Liu 提交于 6月 14, 2023
```
[IR&PASS] part 3-2: add PatternApplicator and FrozenRewritePatternSet, refine PatternMatch code, add some api for Builder (#54492)

* [IR&PASS] add PatternApplicator and FrozenRewritePatternSet, refine PatternMatch code, add some api for Builder and TypeId

* fix comment
```
  548fb821

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功