提交 · e4e94a889a7e172ca92b9d0c4aca8c3c08a39fea · BaiXuePrincess / Paddle

01 2月, 2023 8 次提交

[Zero-Dim] Fix 0-dim tensor for arg_min_max op. (#49570) · e4e94a88

由 Zhong Hui 提交于 2月 01, 2023

* fix 0-d tensor for arg_min_max op.

* fix xpu.

* fix zero dims

* fix

* Update arg_min_max_kernel.cc

* Update arg_min_max_kernel.cc

* Update arg_min_max_kernel.cc

* Update test_zero_dim_tensor.py

* Update test_zero_dim_tensor_xpu.py

* Update test_zero_dim_tensor.py

* Update arg_min_max_kernel.cc

* Update arg_min_max_kernel.cc

* Update arg_min_max_kernel.cc

e4e94a88

Z

support grid_sampler_grad op for XPU (#49857) · 520f48d6
由 zhangyikun02 提交于 2月 01, 2023

520f48d6
G
[Divide by 0 Error] add lu check (#49974) · f71796b6
由 gouzil 提交于 2月 01, 2023
```
* [Divide by 0 Error] add lu check

* [Divide by 0 Error] lu check migrate to c++
```
f71796b6

[Divide by 0 Error] add eig check (#49971) · 226a6567

由 gouzil 提交于 2月 01, 2023

* [Divide by 0 Error] add eig check

* [Divide by 0 Error] eig check migrate to c++

* [Divide by 0 Error] Fix class name error

226a6567

[Divide by 0 Error] add norm check (#49966) · 5dfddaea

由 gouzil 提交于 2月 01, 2023

* [Divide by 0 Error] add norm check

* [Divide by 0 Error] fix x AttributeError

* [Divide by 0 Error] norm check migrate to c++

5dfddaea

Combination of multiple paddle::memory::allocate operation into one for ops (#49126) · bdae5481

由 limingshu 提交于 2月 01, 2023

* A leap of try for cudaLaunchCooperativeKernel

* fix bugs

* Totally replace the lar cuda kernel

* Fix bugs

* fix code according to comments

* fix codes according to  review comments

* adding some function overload

* relocate the power operation.

* add bf16 support for index select relevant ops

* revert bf16 type change.

* add changes for more op

* fix code writting bugs

bdae5481

Fix UFA非法地址访问(UFA illegal address access) of case4: paddle.unbind (#49995) · 9ce8cfcf

由 RedContritio 提交于 2月 01, 2023

* add axis check for unbind

* add axis range check for unbind

* update unittest and axis validation for unbind

* add unittest invalid axis for unbind

* restore axis extract for unbind

9ce8cfcf

H2D data transfer optimization for split kernel (#49086) · 057ba778

由 limingshu 提交于 2月 01, 2023

* profile reduce kernel for fp16 and reduceHigherdim

* use reinterpret_cast

* fix for CI on ROCm

* add Macro for ROCm

* ROCm CI config

* ROCm CI config

* unit test repair

* pull

* add common_funcs.h

* reduceType

* Update reduce_function.h

* not higher

* rename

* implement of matmul using cublasLt instead of cublas

* cublasLt bugfix

* Update matmul_kernel_impl.h

* Update matmul_kernel_impl_via_blasLt.h

* for-loop-algo

* PR comments changes

* add macro

* ci unused variable isCublasLt

* ci unused variable isCublasLt macro

* split matmul to autotune

* rewrite the split kernel with segmented_array

* rewrite the split kernel with segmented_array

* rewrite the split kernel with segmented_array

* add some method for cuda_graph

* fix bugs for rocm

* change for ci-error

* i dont know why ci-model-benchmark gives a shit error, so i recover codes with original one to see if original codes work.

* add some changes for passing mode_benchmark and coverage ci

* fix ci error

* fix ci-rocm error

* add some changes for header

---------
Co-authored-by: Nzhangbopd <1299246947@qq.com>
Co-authored-by: NBo Zhang <105368690+zhangbopd@users.noreply.github.com>

057ba778

31 1月, 2023 19 次提交
- R
  
  support empty input for unique_consecutive (#49978) · dc1b6511
  由 RedContritio 提交于 1月 31, 2023
  
  dc1b6511
- W
  
  bind pixel_shuffle & pixel_shuffle_grad op for xpu (#50090) · a5f2e1f7
  由 wangshengxiang 提交于 1月 31, 2023
  
  a5f2e1f7
- R
  Add unified device management api (#48651) · 7aaaa1c6
  由 ronnywang 提交于 1月 31, 2023
```
* [CustomDevice] add custom device api

* update

* update

* test=document_fix

* update

* update

* add  examples
```
  7aaaa1c6
- R
  Fix 空指针 (Null pointer) of case15: paddle.broadcast_tensors (#49980) · 78ec942b
  由 RedContritio 提交于 1月 31, 2023
```
* fix incorrect output shape of broadcast

* add unittest
```
  78ec942b
- Z
  
  optimize 2D sync_batch_norm (#49663) · 9a4acfee
  由 zhangkaihuo 提交于 1月 31, 2023
  
  9a4acfee
- Z
  
  not use shm cache default (#50089) · 118aee6f
  由 zhangbo9674 提交于 1月 31, 2023
  
  118aee6f
- Bump Cutlass version to 2.11.0 (#50073) · c64296bf
  由 MarDino 提交于 1月 31, 2023
  
  c64296bf
- 张
  fix div 0 error in floormod (#49997) · 26bdea0f
  由张春乔提交于 1月 31, 2023
```
* fix mod 0 error

* fix div 0 error in floormod
```
  26bdea0f
- 2
  
  support fp16 squaredl2norm (#48315) · ce4637c1
  由 201716010711 提交于 1月 30, 2023
  
  ce4637c1
- X
  support 0d tensor for interpolate (#49929) · 2e156ac8
  由 xiaoting 提交于 1月 31, 2023
```
* support 0d tensor for interpolate

* support 0d tensor for interpolate

* add xpu unittest for interp

* update unittest for interpolate

* fix coverage

* fix code style

* fix for coverage

* fix coverage
```
  2e156ac8
- H
  [Decouple phi] Decouple custom_op in fluid and phi (#49866) · 48b3e869
  由 HongyuJia 提交于 1月 31, 2023
```
* decouple phi custom_op

* decouple phi custom_op, remove codes

* delete custom symbol of inference
```
  48b3e869
- 张
  
  fix div 0 error in conv1_transpose (#50000) · 1755a154
  由张春乔提交于 1月 31, 2023
  
  1755a154
- R
  Fix 堆栈溢出 (stack overflow) of case10: paddle.unique (#49981) · dbfdefa7
  由 RedContritio 提交于 1月 31, 2023
```
* add axis check in UniqueRawInferMeta

* add unittest for negative axis

* simplify check for unique
```
  dbfdefa7
- R
  Fix 空指针 (Null pointer) of case 14 paddle.atan2 (#49973) · 82edc65b
  由 RedContritio 提交于 1月 31, 2023
```
* add elements count check in atan2

* add unittest and pre-check in inferMeta

* add dimension check
```
  82edc65b
- 张
  Fix the div 0 error of matrix_power (#49942) · fb74147c
  由张春乔提交于 1月 31, 2023
```
* add zero size check in matrix_power_kernel_impl.h

* add zero size check in matrix_power_kernel_impl.h

* add zero size check in unittest

* bug_fix

* bug_fix

* bug_fix

* bug_fix

* bug_fix

* bug fix

* bug_fix

* bug_fix

* add static check

* delete the dy codes
```
  fb74147c
- R
  Fix 堆栈溢出 (stack overflow) of case9: paddle.repeat_interleave (#49982) · 66682be0
  由 RedContritio 提交于 1月 31, 2023
```
* support negative index in repeat_interleave

* add unittest
```
  66682be0
- 张
  
  fix the div 0 error of pixel_shuffle (#49996) · baf96a12
  由张春乔提交于 1月 31, 2023
  
  baf96a12
- R
  
  add dims check for nms_kernel (#49993) · 4976153d
  由 RedContritio 提交于 1月 31, 2023
  
  4976153d
- Y
  Unify the gpu implementation of stack and unstack to reuse the optimization. (#49748) · 3586e856
  由 Yiqun Liu 提交于 1月 31, 2023
```
* Unify the gpu implementation of stack and unstack to reuse the optimization.

* Optimize the cuda implementation of unstack.

* Use GpuMemcpyAsync instead of memory::Copy.

* Fix error of calculating the index.

* Use FastDivMod to further imporve the performance of unstack.
```
  3586e856
30 1月, 2023 5 次提交

Fix 空指针 (Null pointer) of case 2 paddle.linalg.lu_unpack (#49976) · 6f8ec229

由 RedContritio 提交于 1月 30, 2023

* add pivots type check and fix batchsize error

* add unittest for batchsize = 0

* fix nullptr in lu_unpack

fix batchsize error in LU_Unpack
add nullptr check in OneFunctor

* remove exception in device code

6f8ec229

[Divide by 0 Error] add pinv check (#49951) · f6e874bc

由 Ryan 提交于 1月 30, 2023

* add pinv check

* add unitest

* update unitest

* roll back

* fix not call stupid bug

* use context

f6e874bc

E
add phi tensor vector array api from fluid (#49885) · 094e3b8c
由 engineer1109 提交于 1月 30, 2023
```
replace all TensorFromVector & TensorToVector

AssignKernel async copy
```
094e3b8c

Support stream priority for standalone executor (#49939) · 172d1de6

由 Ruibiao Chen 提交于 1月 30, 2023

* Support stream priority for standalone executor

* Fix compile error

* Fix compile error

* Fix compile error

* Fix compile error

* Fix compile error

172d1de6

S
make FLAGS_gemm_use_half_precision_compute_type=false by default (#50050) · 964cd660
由 sneaxiy 提交于 1月 30, 2023
```
* make FLAGS_gemm_use_half_precision_compute_type=false defaultly

* fix comments
```
964cd660

25 1月, 2023 1 次提交
- L
  remove useless kTranspose enum element (#38660) · f43cb3b7
  由 limingshu 提交于 1月 25, 2023
```
Co-authored-by: Nzhangbopd <1299246947@qq.com>
```
  f43cb3b7
20 1月, 2023 3 次提交
- J
  
  【Prim】Refactor prim flags system (#49930) · 23d20e30
  由 Jiabin Yang 提交于 1月 20, 2023
  
  23d20e30
- J
  Fix for bad_alloc in oneDNN matmul_grad kernel (#48593) · 44855da3
  由 jakpiase 提交于 1月 20, 2023
```
* fix for matmul_grad

* another fix for matmul_grad

* fix
```
  44855da3
- S
  
  add unique support zero dim (#49260) · ee4e5323
  由 sprouteer 提交于 1月 20, 2023
  
  ee4e5323
19 1月, 2023 2 次提交

Fix paddle.queeze_ bug (#49903) · 11e34ae0

由 heliqi 提交于 1月 19, 2023

* fix queeze_ bug

* fix slove use squeeze_kernel

* fix slove use squeeze_kernel

* fix slove use squeeze_kernel

* add test case

11e34ae0

[KUNLUN] add op: maxpool_with_index (#49505) · f71f77e9

由 jameszhang 提交于 1月 19, 2023

* [KUNLUN] add op: maxpool_with_index

* use DeviceContext::Alloc() instead of DenseTensor::mutable_data()

* fix file format

* solve clip unittest failure

* minor fix

* Revert "solve clip unittest failure" since the issue is fixed
in #49535

This reverts commit 1127adc66e79afe35ac3c00bb34e6aaa7cd7d78b.

* align with xdnn on the definition of mask in max_pool_with_index

* minor

f71f77e9

18 1月, 2023 2 次提交

Handle repetitive code in oneDNN activation fuse passes (#49824) · a1b2e1e2

由 Sławomir Siwek 提交于 1月 18, 2023

* extract fuse pass logic to header file

* adjust namespaces

* Update paddle/fluid/framework/ir/mkldnn/activation_onednn_fuse_pass.h

update date
Co-authored-by: NTomasz Socha <tomasz.socha@intel.com>

* add inline remove static
Co-authored-by: NTomasz Socha <tomasz.socha@intel.com>

a1b2e1e2

Add align check for Concat Kernel (#49761) · 24379442
由 MarDino 提交于 1月 18, 2023
```
* add align check

* refine
```
24379442

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致