提交 · 1375b3f763cca2d50f4e85136b81c684a7dd31a6 · PaddlePaddle / Paddle

19 6月, 2023 3 次提交

T

fix python (#54724) · 1375b3f7
由 tianshuo78520a 提交于 6月 19, 2023

1375b3f7
W
static graph autogen code support for full_like op (#54698) · 8947488c
由 Wang Xin 提交于 6月 19, 2023
```
* static graph autogen code support for full_like op

* fix

* fix bug
```
8947488c

Support tensor attribute runtime (#54692) · 93f7a02a

由 hong 提交于 6月 19, 2023

* add kernel dialect

* change DenseTensorTypeStorage to DenseTensorType

* add test case`

* add first pd_op to kernel dialect

* lower pd op to kernel dialect

* update

* update

* remove useless code

* add attrite print test

* fix bug

* update

* update

* update

* update

* polish code

* fix bug

* polish  code  and add python test

* add test

* fix test error

* add env flag

* fix bug

* revert test env

* change cc_test_old to cc_test

* fix build_static bug

* fix type test error

* udpate cmake

* disable test in windows

* fix inference compile

* update

* support tensor attribute runtime

* add result check

* polish test code

* fix test error

* add scalar test & polish code

* re-open test case

93f7a02a

16 6月, 2023 15 次提交
- R
  Run plan in standalone executor (#54394) · 752670e2
  由 Ruibiao Chen 提交于 6月 16, 2023
```
* Run plan in standalone executor

* Update codes

* Update atol and rtol for py3-CI

* Add scope to cache key

* Fix CI errors

* Fix code style

* Update codes

* Remove fetch_name in standalone executor

* Fix UT

* Update codes

* Fix new IR bug
```
  752670e2
- C
  
  fix batch_norm grad kernel nhwc error (#54703) · 4c6f77d8
  由 cyber-pioneer 提交于 6月 16, 2023
  
  4c6f77d8
- H
  
  int32/int64 forward (#54687) · 1df2ee6c
  由 Hui Zhang 提交于 6月 16, 2023
  
  1df2ee6c
- Z
  fix lamb optimizer always_adapt (#54654) · 2a56f4b3
  由 zhiboniu 提交于 6月 16, 2023
```
* fix lamb always_adapt

* fix optest

* fix all optests
```
  2a56f4b3
- J
  [kunlun] support xpu runtime profiler (#54685) · 82eeda69
  由 jameszhang 提交于 6月 16, 2023
```
* [kunlun] support xpu runtime profiler

* fix cmake error

* add libxpti.so to paddle package

* fix for style check

* sync change in setup.py and python/setup.py.in

* remove libxpti.so from paddle output dir in this PR
```
  82eeda69
- C
  
  fix batch_norm cuda grad kernel test mode bug (#54681) · eb9d07e5
  由 cyber-pioneer 提交于 6月 16, 2023
  
  eb9d07e5
- E
  fix RecordStreamForGC nullptr exception (#54606) · 5f92cc54
  由 engineer1109 提交于 6月 16, 2023
```
changed
```
  5f92cc54
- R
  
  [CustomDevice] add MOE support, PART3 (#54676) · 584ae4d7
  由 ronnywang 提交于 6月 16, 2023
  
  584ae4d7
- L
  
  separate four directions p2p communication to a new file (#54664) · ff806111
  由 LiYuRio 提交于 6月 16, 2023
  
  ff806111
- Y
  
  Two kinds of profiler to pp/vp (#54586) · aac91e82
  由 Yuang Liu 提交于 6月 16, 2023
  
  aac91e82
- W
  
  [XPU] fc fusion supports sigmoid, swish and relu6 (#54486) · 9b2bcfd6
  由 wz1qqx 提交于 6月 16, 2023
  
  9b2bcfd6
- T
  Update CUDA12 Dockerfile (#54547) · 924ca81d
  由 tianshuo78520a 提交于 6月 16, 2023
```
* Ampere-ci-test

* Ampere-ci-test

* fix build error,Ampere-ci-test

* fix glide
```
  924ca81d
- Z
  
  lite xpu api & clone (#54670) · 08c90086
  由 zhupengyang 提交于 6月 16, 2023
  
  08c90086
- B
  [inference][trt]Layer norm rollback 2 plugin when trt<8.6 (#54679) · 1a941b71
  由 bukejiyu 提交于 6月 16, 2023
```
* layer_norm op with dynamic shape support INormalizationLayer in TRT8.6

* Using trt layer to make layers_norm op in lower than trt8.6
layer_norm op with dynamic shape support INormalizationLayer in TRT8.6

* ROLLBACK to layer_norm plugin when trt<8.6
* Update layer_norm_op.cc delete log

* Update layer_norm_op.cc code style
```
  1a941b71
- B
  [inference][trt] zero-dim support for cumsum and bitwise_not op (#54097) · 73fa98ed
  由 bukejiyu 提交于 6月 16, 2023
```
* 0-dims support cumsum and bitwise_not
* Update cumsum_op.cc
* Update test_trt_convert_bitwise_not.py
---------
Co-authored-by: NZhang Jun <ewalker@live.cn>
```
  73fa98ed
15 6月, 2023 17 次提交
- D
  
  add uint8 custom ccltype (#54671) · 6fc0378a
  由 duanyanhui 提交于 6月 15, 2023
  
  6fc0378a
- Y
  
  fix mac unittest bugs when use static phi (#54656) · b7a6e981
  由 YuanRisheng 提交于 6月 15, 2023
  
  b7a6e981
- H
  exp/expm1 support int32/int64/float16 forward (#54556) · 58ae8c7c
  由 Hui Zhang 提交于 6月 15, 2023
```
* fix for log xxx

* add int32/int64 for cpu/gpu; add float16/bfloat16 for cpu forward

* fix docstring

* fix bug

* fix bugs

* fix bugs

* fix bugs

* fix bugs

* fix bug

* using cast

* fix test

* fix api

* fix other bugs

* fix ci bug for not using dygraph guard

* add bfloat16 test

* fix ut

* bf16

* exp/expm1 support int32/int64

* fix ut

* fix ut

* fix ut
```
  58ae8c7c
- C
  support composite higher grad maker auto gen (#54666) · 9a36fd4b
  由 Charles-hit 提交于 6月 15, 2023
```
* support composite higher grad maker

* fix composite maker higher order gen
```
  9a36fd4b
- G
  [cmake] rm leveldb submodule (#54436) · 3a275162
  由 gouzil 提交于 6月 15, 2023
```
* [cmake] rm leveldb sub

* [cmake] add tag check

* [cmake] fix install dir

* [cmake] fix install error

* [cmake] fix update error

* fix
```
  3a275162
- R
  
  [CustomDevice] add MOE support, PART2 (#54573) · 8c214b6a
  由 ronnywang 提交于 6月 15, 2023
  
  8c214b6a
- Z
  
  static graph autogen code for prior_box (#54508) · 8771fff3
  由 Zhenghai Zhang 提交于 6月 15, 2023
  
  8771fff3
- H
  [IR] [Baby step] New interprector support new ir (#54570) · ce0c5c27
  由 hong 提交于 6月 15, 2023
```
* add kernel dialect

* change DenseTensorTypeStorage to DenseTensorType

* add test case`

* add first pd_op to kernel dialect

* lower pd op to kernel dialect

* update

* update

* remove useless code

* add attrite print test

* fix bug

* update

* update

* update

* update

* polish code

* fix bug

* polish  code  and add python test

* add test

* fix test error

* add env flag

* fix bug

* revert test env

* change cc_test_old to cc_test

* fix build_static bug

* fix type test error

* udpate cmake

* disable test in windows

* fix inference compile
```
  ce0c5c27
- Z
  
  refresh (#50791) · c7ba811d
  由 zqw_1997 提交于 6月 15, 2023
  
  c7ba811d
- L
  
  fix dygraph to dynamic (#54633) · 38ff4fee
  由 LiYuRio 提交于 6月 15, 2023
  
  38ff4fee
- B
  [inference][trt]modify test timeout and test_trt_convert_activation bug fix (#54491) · 1f3dd978
  由 bukejiyu 提交于 6月 15, 2023
```
* modify tensorrt ci timeout

* activation ci bug fix

* comment out  int8 mode test_trt_dynamic_shape_groupnorm
```
  1f3dd978
- H
  
  fix pp release_output (#54673) · fcec31ab
  由 Haohongxiang 提交于 6月 15, 2023
  
  fcec31ab
- C
  
  fix batch_norm optest code (#54661) · 3a8484c4
  由 cyber-pioneer 提交于 6月 15, 2023
  
  3a8484c4
- G
  Fix sync batch norm op under cuda 12 (#54640) · 7fef4ee9
  由 Ghost Screaming 提交于 6月 15, 2023
```
* Fix bug of reduce_sum op. When input.numel() > INT32_MAX, its result
is wrong.

* Remove climits.

* Fix problem of pickle and NCCL_P2P_DISABLE in distributed testcases in
cuda12.

* Fix problem of TimeOut of distributed testcases under cuda12.

* Fix bug of test_sync_batch_norm_op_static_build accuracy problem under
cuda12.

* Remove useless code modification.
```
  7fef4ee9
- L
  
  remove useless code (#54657) · 490e2f3d
  由 LiYuRio 提交于 6月 15, 2023
  
  490e2f3d
- R
  
  [CustomDevice] add MOE support, PART1 (#54572) · 20db8602
  由 ronnywang 提交于 6月 15, 2023
  
  20db8602
- L
  
  remove vlog and modified error (#54648) · 3261b106
  由 LiYuRio 提交于 6月 15, 2023
  
  3261b106
14 6月, 2023 5 次提交

C

fix mea get pad no default return bug (#54644) · c037453d
由 Chitsing KUI 提交于 6月 14, 2023

c037453d

[AMP Prim OP]support amp logic for some prim ops (#54608) · 182e0904

由 Charles-hit 提交于 6月 14, 2023

* fix api rename

* support amp logic for some prim ops

---------
Co-authored-by: Nkangguangli <kangguangli@hotmail.com>

182e0904

C
[prim] Add committer for reviewing prim backward file (#54632) · cc91fa66
由 cyber-pioneer 提交于 6月 14, 2023
```
* add user to check composite_rule.py

* add committer for reviewing prim backward file
```
cc91fa66

[AutoTuner] Add auto tuner to obtain optima configuration (#54460) · e12d2867

由 caozhou 提交于 6月 14, 2023

* add auto tuner

* fix prune

* fix sharding prune and mbs candidates

* fix cfg

* fix launch

* fix launch

* add unittest

* fix code style

e12d2867

Fix cuda12 timeout problems. (#54615) · a90d9088

由 Ghost Screaming 提交于 6月 14, 2023

* Fix bug of reduce_sum op. When input.numel() > INT32_MAX, its result
is wrong.

* Remove climits.

* Fix problem of pickle and NCCL_P2P_DISABLE in distributed testcases in
cuda12.

* Fix problem of TimeOut of distributed testcases under cuda12.

* Remove useless modification.

* Remove useless modification.

a90d9088

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功