提交 · dff4ff42af8a1ee544cef5dac837ca8adb8e0131 · PaddlePaddle / Paddle

03 3月, 2023 7 次提交
- Y
  [PHI Decoupling]Remove memory header (Part2) (#50870) · 558068cc
  由 YuanRisheng 提交于 3月 03, 2023
```
* decouple memory copy

* fix ci bugs

* fix ci compile bugs

* fix rocm compile

* fix ci bugs
```
  558068cc
- Z
  
  Fix batch_norm momentum (#51120) · d9fb639c
  由 zhangkaihuo 提交于 3月 03, 2023
  
  d9fb639c
- W
  add gather_nd_comp_grad composite rule (#50966) · 625e30b7
  由 wangxiaoning 提交于 3月 03, 2023
```
* comp gather_nd_grad

* fix

* test no cinn

* fix

* fix cinn
```
  625e30b7
- [Zero-Dim] fix create_scalar to create 0D (#51024) · 792531b6
  由 zhouweiwei2014 提交于 3月 03, 2023
  
  792531b6
- J
  
  add function to disable trt op by output name (#49497) · db47dec5
  由 JingZhuangzhuang 提交于 3月 03, 2023
  
  db47dec5
- N
  
  Add multi_precision for adagrad op (#50078) · 4779c2c1
  由 niuliling123 提交于 3月 03, 2023
  
  4779c2c1
- R
  [CustomDevice] fix process_group_custom api (#50718) · 12e9aaa5
  由 ronnywang 提交于 3月 03, 2023
```
* [CustomDevice] fix process_group_custom api

* update

* update

* update

* update
```
  12e9aaa5
02 3月, 2023 20 次提交
- R
  New executor static build for fluid kernel (#50670) · bf50784c
  由 Ruibiao Chen 提交于 3月 02, 2023
```
* Check structed kernel for new executor static build

* Update code

* Ready for resnet50

* Move transfer_dtype to phi

* Ready for transformer

* Fix CI errors

* Fix layer_norm InferMeta

* Remove layer_norm infermeta fix
```
  bf50784c
- L
  Cache for cublaslt descriptor (#50931) · 819f8939
  由 limingshu 提交于 3月 02, 2023
```
* first commit

* finish base work

* modification for good

* fix for cache setting and gather the algo and desc as one data for cache storage

* fix for cache setting and gather the algo and desc as one data for cache storage

* install pre-commit check
```
  819f8939
- C
  
  fix zero bug of case21: paddle.mode (#51091) · 25d3ed65
  由 chenxiao120660 提交于 3月 02, 2023
  
  25d3ed65
- A
  
  fix divide zero bug for paddle.all (#51088) · 2bcd3935
  由 ahahahahahaha 提交于 3月 02, 2023
  
  2bcd3935
- Z
  
  [inference][trt]reduce support int64 (#50897) · 77c9c90a
  由 Zhang Jun 提交于 3月 02, 2023
  
  77c9c90a
- X
  
  [dy2static] fix memory leakage problem in T5 model (#51100) · 8a6de610
  由 xiongkun 提交于 3月 02, 2023
  
  8a6de610
- X
  [Paddle Inference] Add trt tile converter for dynamic shape. (#50841) · 5fdf7130
  由 xiaoxiaohehe001 提交于 3月 02, 2023
```
* add_trt_tile

* tile_trt
```
  5fdf7130
- Z
  Fix performance problem in BF16 models (#50283) · e421c6a6
  由 zyfncg 提交于 3月 02, 2023
```
* fix performance drop in BF16 models

* fix test_cpu_quantize_squash_pass
```
  e421c6a6
- C
  Add prim test for elementwise ops (#50807) · b8713309
  由 Charles-hit 提交于 3月 02, 2023
```
* fix prim_op_test when python api outs is different with kernel sig

* add elementwise op prim test

* fix unit test

* add bfloat16 for full in static  prim api

* empty-commit

* close bf16 test

* polish elementwise tests
```
  b8713309
- J
  【Prim】Fix slice error and eager comp (#51086) · bbca66f2
  由 Jiabin Yang 提交于 3月 02, 2023
```
* fix attrs copy error

* fix bert by fix slice error

* fix op test
```
  bbca66f2
- F
  
  fix:set output name (#51004) · 3bba4af7
  由 feng_shuai 提交于 3月 02, 2023
  
  3bba4af7
- W
  
  [XPU] add smallest mode for top_k (#51053) · 0fd6e2a1
  由 wangshengxiang 提交于 3月 02, 2023
  
  0fd6e2a1
- H
  [GetCurrentCUDAStream] Add C++ API GetCurrentCUDAStream (#51027) · cce2b94d
  由 HongyuJia 提交于 3月 02, 2023
```
* polish codes according #50813

* [getCurrentCUDAStream] Add C++ API getCurrentCUDAStream

* change get->Get

* wrap with macro

* use Get instead of get
```
  cce2b94d
- L
  [AMP OP&Test] register fp16 and bf16 kernel for uniform_random (#50993) · 72f34450
  由 Leo Chen 提交于 3月 02, 2023
```
* register fp16 and bf16 kernel for uniform_random

* fix compile

* support selected_rows

* add ut

* revert cpu

* fp16 test skip cpu
```
  72f34450
- W
  Add concat grad cinn (#50972) · a4689c90
  由 wangzhen38 提交于 3月 02, 2023
```
* [cinn] concat_grad

* [cinn] concat_grad

* [cinn] concat_grad build success

* [Add PGLBOX] fix unnitest

* [Add PGLBOX] fix unnitest

* [Add PGLBOX] fix codestyle

* [cinn] update by comments

* [cinn] update by comment

* [cinn] add axis check
```
  a4689c90
- Z
  [IR] Type system stage3: add class Dialect (#50959) · b0a604cb
  由 zhangbo9674 提交于 3月 02, 2023
```
* add dialect

* add some interface for dialect

* add some dialect interfaces for class Type

* set WITH_NEWIR=OFF

* refine code by comment

* polish code

* refine include style

* refine log for debug
```
  b0a604cb
- G
  
  [Hackathon NO.74] 为 Paddle-TRT 添加 grid_sampler 算子 (#50934) · 8f156fd7
  由 gaoziyuan 提交于 3月 02, 2023
  
  8f156fd7
- H
  
  Change xpu_context.h to cut off unrelated dependency (#51079) · b535d6ce
  由 haosicheng 提交于 3月 02, 2023
  
  b535d6ce
- Y
  
  process multiple conv2d_fusion shares weight (#51068) · ae60105d
  由 Yuanle Liu 提交于 3月 02, 2023
  
  ae60105d
- Z
  Split generated_op.cc into 4 src files [generated_op(1-4).cc] (#50985) · 4652bee4
  由 zyfncg 提交于 3月 02, 2023
```
* split generated_op.cc into 4 src files

* fix bug

* fix compile on windows
```
  4652bee4
01 3月, 2023 13 次提交

Integration flash attention (#49869) · 61611786

由 Chitsing KUI 提交于 3月 01, 2023

* flash attn

* seed

* almost

* softmax

* fix workspace

* add unitest; linux only

* fix setup

* fix datatype include

* fix setup typo

* fix def scope

* new error api

* use paddle fork

* fix attr bug; complete ut

* update flash hash

* fix rng reset

* fix offset

* fix comments

61611786

J
【Prim】Fix sqrt grad (#51045) · 5751b7f4
由 Jiabin Yang 提交于 3月 01, 2023
```
* fix sqrt grad

* fix sqrt grad
```
5751b7f4

[Tensor Operants & Prim-Relevant] Tensor supports logical operants (#50983) · 1794927b

由 HongyuJia 提交于 3月 01, 2023

* Add comments for #50886

* [Tensor Operants & Prim-Relevant] Tensor supports logical operants

* add prim dynamic unit test

* add prim static unit test

1794927b

add topk prim backward (#50679) · 296b3ff0

由 zqw_1997 提交于 3月 01, 2023

* tmp gather vjp

* support gather

* remove useless code

* fix compiling error

* fix ut

* add eager test

* add eager test

* add seed

* small change

* fix cpu error

* fix transpose op compat

* remove tensor index case

* fix prim_cinn

* small commit

* add cumsum prim backward

* small commit

* skip aixs=None test case

* fix op generante eror

* fix static test error

* remove unused code

* fix static test error

* small commit

* skip cpu float16 test case

* skip eager cpu cumsum float16 test case

* add eager and static UT

* fix ut

* add composite backward rule

* fix error

* fix type error and format error

* add try cpu+float16 test

* fix test bugs

* remove test for cpu+float16 and make y[0] be the grad arg

* add cinn test

* fix UT

* fix the wrong dim of v in test cases

* change y[0] to y[1] for grad in UT

* reshape flatten out

* Disable cinn single test

* use scatter_nd_add

* modify the reshape part of topk_grad

* delete useless build file

* to make the syntax right

* modify bug

* try use of put_along_axis

* remove cinn test

* reformat todo

* add silu composite rule

* fix code style.

* add cinn test

* fix composite grad maker code gen

* add prim in cumsum op test

* remove old test

* fix typro

* pass the static test

* fix typro

* modify optest and delete old test files

* remove normal test_top_k_op test

* fix typro

* pass axis=None test case

* buffer comment

* for debug

* add silu fp16 unit test.

* add static guard

* remove forward prim test

* remove same name axis

* modify the test_top_v2_op.py to pass all local tests

* delete the useless testcase

* fix mistake

* add more testcases to test dtype16 and dtype32

---------
Co-authored-by: NJiabinYang <360788950@qq.com>
Co-authored-by: NGGBond8488 <857631483@qq.com>
Co-authored-by: Nzxcd <228587199@qq.com>
Co-authored-by: NCharles-hit <wanghao107@baidu.com>

296b3ff0

[Zero-Dim] Add Expand/Expand_as/Top_k for XPU to support Zero Dim Input. (#50947) · 226b4a95

由 yunyaoXYY 提交于 3月 01, 2023

* Add unitest from shilong

* Add kernel code from shilong

* fix codestyle

* add broadcast_shape test

* fix unitest

* fix unitests

* fix unitest

* add 0D grad support

* add 0D grad support

* add 0D grad support

* fix 0D tensor

* fix 0D

* fix xpu 0D

* fix expand kernel

* fix xpu expand

* Fix 0D kernel

* fix 0D

* fix 0D

* fix 0D

* fix 0D

* fix XPU top_k

* cancel the modify of xpu

* add XPU 0D tensor

* fix 0D

226b4a95

W

fix the backward bug of cumsum (#50997) · 934934d8
由 wawltor 提交于 3月 01, 2023

934934d8
M

[xpu] fix bugs of split/embedding_with_wltwise_add/beam_search_decode kernel (#51052) · 753fa844
由 mayang002 提交于 3月 01, 2023

753fa844
rename distributed_fused_lamb attr ring_id->ring_ids (#51000) · a348a423
由 TaoTao Li 提交于 3月 01, 2023

a348a423
C
fix zero bug of case18: paddle.logsumexp (#51034) · 2f900965
由 chenxiao120660 提交于 3月 01, 2023
```
* fix bug of logsumexp

* fix bug for logsumexp

* fix bug for logsumexp
```
2f900965
C

add op map (#51026) · 83f61bd5
由 cyber-pioneer 提交于 3月 01, 2023

83f61bd5
C

[XPU] Fix xpu_fuse_pass error caused by weight sharing by other operators. (#51039) · 1054b23e
由 csy0225 提交于 3月 01, 2023

1054b23e
G

fix cumsum prim op maker type error (#51014) · add510b9
由 GGBond8488 提交于 3月 01, 2023

add510b9
Z

[XPU] delete op device (#51029) · c9309942
由 zhupengyang 提交于 3月 01, 2023

c9309942

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功