提交 · 38ec37cd705d6121749d5bdd01537ec04b38a968 · PaddlePaddle / Paddle

19 4月, 2023 9 次提交
- K
  [Perf] fix static graph performance issue in amp mode with multicard (#52724) · 38ec37cd
  由 kangguangli 提交于 4月 19, 2023
```
* fix

* fix

* fix

* fix

* fix

* fix fuse group order
```
  38ec37cd
- L
  Support Linear operation in cuBlaslt and plug into attn_gemm and fusedLinear backward op (#52028) · f6f18835
  由 limingshu 提交于 4月 19, 2023
```
* first commit

* restruct c++ interface to divide linear from matmulwithcublaslt

* finish building in cublaslt impl

* fix code bugs

* fix host cost

* add some changes
```
  f6f18835
- H
  
  update (#53036) · 2944d3c0
  由 huangjiyi 提交于 4月 19, 2023
  
  2944d3c0
- W
  add autogen code support for mean_all op (#52855) · 93ff8e4c
  由 Wang Xin 提交于 4月 19, 2023
```
* add autogen code support for mean_all op

* bug fixed

* bug fixed

* bug fixed
```
  93ff8e4c
- Z
  fix graph_reindex (#52930) · e5506be6
  由 zhangyuqin1998 提交于 4月 19, 2023
```
* fix graph_reindex

* fix

* Update op_compat.yaml
```
  e5506be6
- H
  Register fluid kerenls to phi [part 13] (#53037) · d9edb233
  由 huangjiyi 提交于 4月 19, 2023
```
* update

* fix bug

* update

* fix bug
```
  d9edb233
- H
  
  update (#53033) · 7a323f78
  由 huangjiyi 提交于 4月 19, 2023
  
  7a323f78
- H
  Register fluid kerenls to phi [part 8] (#53032) · a176a07e
  由 huangjiyi 提交于 4月 19, 2023
```
* update

* fix bug
```
  a176a07e
- Y
  
  Remove a LOG(INFO). test=document_fix (#53056) · e0c14fdf
  由 Yiqun Liu 提交于 4月 19, 2023
  
  e0c14fdf
18 4月, 2023 28 次提交
- N
  
  Print the forward's stack when backward op has nan/inf and FLAGS_check_nan_inf_level = 0 (#52639) · 660f781b
  由 niuliling123 提交于 4月 18, 2023
  
  660f781b
- Z
  
  fix the bug of quanting matmul (#52833) · 7b5065ab
  由 zhouzj 提交于 4月 18, 2023
  
  7b5065ab
- C
  【Hackathon No.60】prelu, clip_by_norm, multi_dot 算子FP16/BF16单测完善 (#52666) · c3055d23
  由 chenxujun 提交于 4月 18, 2023
```
* Add prelu, clip_by_norm, multi_dot tests

* Fix code

* Fix code
```
  c3055d23
- Z
  
  support excluded_layers for amp.decorate (#52871) · 534efcb6
  由 Zhang Ting 提交于 4月 18, 2023
  
  534efcb6
- T
  
  fix build error (#53018) · 864aa75d
  由 tianshuo78520a 提交于 4月 18, 2023
  
  864aa75d
- Q
  
  Update default custom device dir for version check, test=develop (#53020) · fdd2d916
  由 Qi Li 提交于 4月 18, 2023
  
  fdd2d916
- Z
  [AMP OP&Test] Unique support float16&bfloat16 (#52995) · 1d37868f
  由 Zhang Zheng 提交于 4月 18, 2023
```
* [AMP OP&Test] Unique support float16&bfloat16

* add test
```
  1d37868f
- Z
  reorder MatrixRank (#52925) · 00efdf84
  由 zhangyuqin1998 提交于 4月 18, 2023
```
* reorder MatrixRank

* fix

* fix

* fix

* fix

* fix
```
  00efdf84
- C
  [Prim] Support prim vjp of operator group_norm (#52663) · 069bb2d9
  由 cyber-pioneer 提交于 4月 18, 2023
```
* add gn vjp

* fix 0

* fix args num

* fix type

* debug2

* remove unused expand

* support fp16

* fix typo

* fix reshape bug

* test3

* test4

* fix bug3

* add comment
```
  069bb2d9
- C
  
  Add logspace tests (#52956) · 417e5baf
  由 chenxujun 提交于 4月 18, 2023
  
  417e5baf
- H
  register fluid kerenls to phi [part 6.5] (#52882) · cb81befa
  由 huangjiyi 提交于 4月 18, 2023
```
* update

* fix bug

* update

* fix bug
```
  cb81befa
- C
  【Hackathon No.60】randperm, split, split_with_num 算子FP16/BF16单测完善 (#52683) · bc91012f
  由 chenxujun 提交于 4月 18, 2023
```
* Add split, split_with_num tests

* Add randperm tests

* Fix code
```
  bc91012f
- G
  
  test,test=develop (#52993) · 8b82f77e
  由 Galaxy1458 提交于 4月 18, 2023
  
  8b82f77e
- C
  
  Add index_add, index_sample, put_along_axis, take_along_axis tests (#52572) · 1eb30775
  由 chenxujun 提交于 4月 18, 2023
  
  1eb30775
- T
  
  fix xpu test;test=document_fix (#53016) · afc2c598
  由 tianshuo78520a 提交于 4月 18, 2023
  
  afc2c598
- H
  register fluid kerenls to phi [part 6.4] (#52881) · 37ca3b4c
  由 huangjiyi 提交于 4月 18, 2023
```
* update

* revert lookup_table_op
```
  37ca3b4c
- 张
  
  remove mlu(#53007) · 4d5a3ad6
  由张春乔提交于 4月 18, 2023
  
  4d5a3ad6
- M
  rename _varbase_creator as _create_tensor (#52938) · 240e13a2
  由 Meteor Liu 提交于 4月 18, 2023
```
* rename _varbase_creator as create_tensor

* rename _varbase_creator as create_tensor
```
  240e13a2
- G
  【0D output】add 0D output support for linalg.slogdet (#52891) · a7155c5c
  由 GGBond8488 提交于 4月 18, 2023
```
* add 0D output support for inalg.slogdet,test=allcase

* fix zerom dime test error test=allcase

* fix test error test=allcase

* add static backward test, test=allcase
```
  a7155c5c
- R
  
  Set random seed for test_tensordot (#53004) · f1b6a76b
  由 Ruibiao Chen 提交于 4月 18, 2023
  
  f1b6a76b
- T
  del read (#52943) · 188efd11
  由 tianshuo78520a 提交于 4月 18, 2023
```
* del read

* fix

* test log

* fix

* fix bug
```
  188efd11
- J
  fix the set_value error in cpu (#49804) · 239dbc4e
  由 JYChen 提交于 4月 18, 2023
```
* fix the set_value error in cpu

* add a unitest for set_value OP

* fix platform::is_gpu_place

* add todo note for set_value
```
  239dbc4e
- Z
  add autogen code support for rnn op (#52799) · aba6af4f
  由 Zhenghai Zhang 提交于 4月 18, 2023
```
* add autogen code support for rnn op

* fix bug

* fix bug
```
  aba6af4f
- L
  add autogen code support for lu (#52802) · f9fadfc4
  由 LoneRanger 提交于 4月 18, 2023
```
* add autogen code support for lu

* fix bug

* fix bug

* fix bug

* fix bug
```
  f9fadfc4
- R
  [CustomDevice] add c_identity op (#52982) · 77b4d0f1
  由 ronnywang 提交于 4月 18, 2023
```
* [CustomDevice] add c_identity op

* fix use calc stream
```
  77b4d0f1
- X
  
  [prim add instance_norm custom vjp] (#52935) · f7b80ada
  由 Xiaoxu Chen 提交于 4月 18, 2023
  
  f7b80ada
- Y
  [AMP] Support overload of paddle.static.amp.decorate function. (#52918) · 79a01d6c
  由 Yiqun Liu 提交于 4月 18, 2023
```
* Implement a common AmpTestBase.

* Support overload of decorate.

* Change the ignore list of flake and fix an error.
```
  79a01d6c
- Z
  reorder_prior_box (#52749) · a70d9db9
  由 zhangyuqin1998 提交于 4月 18, 2023
```
* reorder_prior_box

* fix
```
  a70d9db9
17 4月, 2023 3 次提交

Y

[Auto Parallel] Add the micro-bathsize config (#52912) · 94afa5ab
由 Yulong Ao 提交于 4月 17, 2023

94afa5ab

mv ps distributed dir (#52885) · 1765d5d1

由 tianshuo78520a 提交于 4月 17, 2023

* mv ps distributed dir

* fix

* add del auto_parallel

* add auto_parallel

* fix ps

* fix bug

* fix test bug

* fix test bug

* merge develop fix error

* merge develop fix error

* merge develop fix error

1765d5d1

[Paddle-Inference] Add cutlass conv2d_depthwise (#51792) · bd3b096a

由 zhoutianzi666 提交于 4月 17, 2023

* initial commit for cutlass_teller

* second commit for cutlass_teller

* add conv2d_depthwise python template

* add conv2d_depthwise cutlass template

* /zhoukangkang/paddle_cutlass/Paddle/paddle/fluid/framework/ir/cutlass_teller.h

* refine code in Conv2dFusionCanSupport

* add macro in cutlass_teller.h

* add 3x3 5x5 teller

* add groups not 1 or conv2d_depthwise teller

* 只生成ic是8的倍数的conv2d_depthwise 的kernel

* add EXPLICIT in cutlass_teller.h

* final commit

* add split_k_slices in conv2d_depthwise

* make stages == 2

* 重构部分代码

* add CutlassFusionType

* solve illegal memory

* make stride_h=stride_w && make dilation==1

* must check HasAttr(use_cutlass) before GetAttrIfExists

* add CONV2D_DEPTHWISE_BIAS_SILU to OpType2String

* modify decl.h and util.cu

bd3b096a

PaddlePaddle / Paddle 接近 2 年 前同步成功

PaddlePaddle / Paddle
接近 2 年前同步成功