提交 · 057ba778fa19c1b9670150d5ea5e83d6c8d64d04 · BaiXuePrincess / Paddle

01 2月, 2023 1 次提交

H2D data transfer optimization for split kernel (#49086) · 057ba778

由 limingshu 提交于 2月 01, 2023

* profile reduce kernel for fp16 and reduceHigherdim

* use reinterpret_cast

* fix for CI on ROCm

* add Macro for ROCm

* ROCm CI config

* ROCm CI config

* unit test repair

* pull

* add common_funcs.h

* reduceType

* Update reduce_function.h

* not higher

* rename

* implement of matmul using cublasLt instead of cublas

* cublasLt bugfix

* Update matmul_kernel_impl.h

* Update matmul_kernel_impl_via_blasLt.h

* for-loop-algo

* PR comments changes

* add macro

* ci unused variable isCublasLt

* ci unused variable isCublasLt macro

* split matmul to autotune

* rewrite the split kernel with segmented_array

* rewrite the split kernel with segmented_array

* rewrite the split kernel with segmented_array

* add some method for cuda_graph

* fix bugs for rocm

* change for ci-error

* i dont know why ci-model-benchmark gives a shit error, so i recover codes with original one to see if original codes work.

* add some changes for passing mode_benchmark and coverage ci

* fix ci error

* fix ci-rocm error

* add some changes for header

---------
Co-authored-by: Nzhangbopd <1299246947@qq.com>
Co-authored-by: NBo Zhang <105368690+zhangbopd@users.noreply.github.com>

057ba778

31 1月, 2023 11 次提交
- R
  
  support empty input for unique_consecutive (#49978) · dc1b6511
  由 RedContritio 提交于 1月 31, 2023
  
  dc1b6511
- W
  
  bind pixel_shuffle & pixel_shuffle_grad op for xpu (#50090) · a5f2e1f7
  由 wangshengxiang 提交于 1月 31, 2023
  
  a5f2e1f7
- Z
  
  optimize 2D sync_batch_norm (#49663) · 9a4acfee
  由 zhangkaihuo 提交于 1月 31, 2023
  
  9a4acfee
- Bump Cutlass version to 2.11.0 (#50073) · c64296bf
  由 MarDino 提交于 1月 31, 2023
  
  c64296bf
- 张
  fix div 0 error in floormod (#49997) · 26bdea0f
  由张春乔提交于 1月 31, 2023
```
* fix mod 0 error

* fix div 0 error in floormod
```
  26bdea0f
- 2
  
  support fp16 squaredl2norm (#48315) · ce4637c1
  由 201716010711 提交于 1月 30, 2023
  
  ce4637c1
- X
  support 0d tensor for interpolate (#49929) · 2e156ac8
  由 xiaoting 提交于 1月 31, 2023
```
* support 0d tensor for interpolate

* support 0d tensor for interpolate

* add xpu unittest for interp

* update unittest for interpolate

* fix coverage

* fix code style

* fix for coverage

* fix coverage
```
  2e156ac8
- 张
  
  fix div 0 error in conv1_transpose (#50000) · 1755a154
  由张春乔提交于 1月 31, 2023
  
  1755a154
- R
  Fix 空指针 (Null pointer) of case 14 paddle.atan2 (#49973) · 82edc65b
  由 RedContritio 提交于 1月 31, 2023
```
* add elements count check in atan2

* add unittest and pre-check in inferMeta

* add dimension check
```
  82edc65b
- R
  
  add dims check for nms_kernel (#49993) · 4976153d
  由 RedContritio 提交于 1月 31, 2023
  
  4976153d
- Y
  Unify the gpu implementation of stack and unstack to reuse the optimization. (#49748) · 3586e856
  由 Yiqun Liu 提交于 1月 31, 2023
```
* Unify the gpu implementation of stack and unstack to reuse the optimization.

* Optimize the cuda implementation of unstack.

* Use GpuMemcpyAsync instead of memory::Copy.

* Fix error of calculating the index.

* Use FastDivMod to further imporve the performance of unstack.
```
  3586e856
30 1月, 2023 3 次提交

Fix 空指针 (Null pointer) of case 2 paddle.linalg.lu_unpack (#49976) · 6f8ec229

由 RedContritio 提交于 1月 30, 2023

* add pivots type check and fix batchsize error

* add unittest for batchsize = 0

* fix nullptr in lu_unpack

fix batchsize error in LU_Unpack
add nullptr check in OneFunctor

* remove exception in device code

6f8ec229

[Divide by 0 Error] add pinv check (#49951) · f6e874bc

由 Ryan 提交于 1月 30, 2023

* add pinv check

* add unitest

* update unitest

* roll back

* fix not call stupid bug

* use context

f6e874bc

E
add phi tensor vector array api from fluid (#49885) · 094e3b8c
由 engineer1109 提交于 1月 30, 2023
```
replace all TensorFromVector & TensorToVector

AssignKernel async copy
```
094e3b8c

25 1月, 2023 1 次提交
- L
  remove useless kTranspose enum element (#38660) · f43cb3b7
  由 limingshu 提交于 1月 25, 2023
```
Co-authored-by: Nzhangbopd <1299246947@qq.com>
```
  f43cb3b7
20 1月, 2023 1 次提交
- J
  Fix for bad_alloc in oneDNN matmul_grad kernel (#48593) · 44855da3
  由 jakpiase 提交于 1月 20, 2023
```
* fix for matmul_grad

* another fix for matmul_grad

* fix
```
  44855da3
19 1月, 2023 2 次提交

Fix paddle.queeze_ bug (#49903) · 11e34ae0

由 heliqi 提交于 1月 19, 2023

* fix queeze_ bug

* fix slove use squeeze_kernel

* fix slove use squeeze_kernel

* fix slove use squeeze_kernel

* add test case

11e34ae0

[KUNLUN] add op: maxpool_with_index (#49505) · f71f77e9

由 jameszhang 提交于 1月 19, 2023

* [KUNLUN] add op: maxpool_with_index

* use DeviceContext::Alloc() instead of DenseTensor::mutable_data()

* fix file format

* solve clip unittest failure

* minor fix

* Revert "solve clip unittest failure" since the issue is fixed
in #49535

This reverts commit 1127adc66e79afe35ac3c00bb34e6aaa7cd7d78b.

* align with xdnn on the definition of mask in max_pool_with_index

* minor

f71f77e9

18 1月, 2023 6 次提交

Add align check for Concat Kernel (#49761) · 24379442
由 MarDino 提交于 1月 18, 2023
```
* add align check

* refine
```
24379442
[Zero-Dim] support input 0D for paddle.moveaxis / quantile (#49813) · 26140ec8
由 zhouweiwei2014 提交于 1月 18, 2023
```
* [Zero-Dim] support input 0D for paddle.moveaxis/quantile

* fix CI
```
26140ec8

[PHI] remove bitwise and, or, xor (#49916) · 9056cc8b

由 RuohengMa 提交于 1月 18, 2023

* add reduce_sum_int64 and reduce_sum_int8 xpu kernels

* [PHI] add clip grad kernel with support type float32 and int32

* [PHI unittest] add clip_grad unit test

* adapt code to clang-format

* update xpu api output with clip_grad api

* remove int8 support of reduce_sum xpu kernel since it can not pass unit tests

* adapt license date, add code for XPUDataType convertion

* add int8 support of reduce_sum

* add reduce_sum unit tests for dtype int64, int8, and add more test cases

* update license date

* remove buggy bitwise and, or and xor xpu kernels, refine bitwise not xpu kernel

* change license date

9056cc8b

H

[XPU] add logical_not op. (#49911) · 60d1199a
由 houj04 提交于 1月 18, 2023

60d1199a

[0 Tensor support] support the 0d tensor for the cumsum (#49518) · 5fca45ea

由 wawltor 提交于 1月 18, 2023

* Add the cumsum 0d tensor

* xpu and cpu judge the 0d  tensor

* change to 2022 to 2023 in new commit

* fix the reverse logic

5fca45ea

Z

[Zero-Dim] Fix bug in masked_select for XPU (#49904) · 1a8be158
由 Zhang Zheng 提交于 1月 18, 2023

1a8be158

16 1月, 2023 1 次提交

CUDA12.0 integration (#49539) · 1885d55a

由 zlsh80826 提交于 1月 16, 2023

* Update warpctc for cuda-12

* Deprecate cudaProfilerInitialize for CUDA > 11

* Deprecate CUSPARSE_MV_ALG_DEFAULT for CUDA_VERSION >= 11040

* Add the missing thrust header

1885d55a

13 1月, 2023 8 次提交
- L
  Move GeneralDivMod from segmented_array.h to fast_divmod.h (#48934) · ad4824e5
  由 limingshu 提交于 1月 13, 2023
```
* first commit

* add some changes in stack kernel.

* move the location of GeneralDivMod

* fix code format error according to ci
```
  ad4824e5
- R
  [Zero-Dim] add where, atan2, median 0-Dim ut (#49692) · 1508cae7
  由 ronnywang 提交于 1月 13, 2023
```
* add where, atan2, median 0d ut

* add where, atan2, median 0d ut

* update

* update

* update
```
  1508cae7
- Y
  
  add xpu adagrad and where_grad kernels (#49701) · a99c3cd4
  由 ykkk2333 提交于 1月 13, 2023
  
  a99c3cd4
- L
  
  Add unitest for set_value, set_value_grad. test=kunlun (#49773) · 5e722245
  由 Leo Guo 提交于 1月 13, 2023
  
  5e722245
- W
  
  add prelu & prelu_grad op for xpu (#49672) · 8d512b8f
  由 wangshengxiang 提交于 1月 13, 2023
  
  8d512b8f
- W
  [cpplint fix] under ps (#49759) · d5c5bbc3
  由 wangzhen38 提交于 1月 13, 2023
```
* [cpplint fix] under ps
```
  d5c5bbc3
- Z
  
  Update threshold of bn1d (#49734) · 0294ab41
  由 zhangkaihuo 提交于 1月 13, 2023
  
  0294ab41
- Y
  
  fix fc and fused_fc_elementwise_layernorm kernel diff (#49778) · 0b24d167
  由 Yuanle Liu 提交于 1月 13, 2023
  
  0b24d167
12 1月, 2023 4 次提交
- S
  lerp support 0 Tensor (#49667) · 8cd0d5b3
  由 sunli 提交于 1月 12, 2023
```
* lerp support 0 Tensor

* fix lerp grad

* fix lerp zero test

* fix 0D + ND/ND + 0D

* fix check

* update code

* fix lerp infer shape

* static backward test

* updata static graph test
```
  8cd0d5b3
- Y
  
  deal with conflict (#49766) · 27aec62b
  由 YuanRisheng 提交于 1月 12, 2023
  
  27aec62b
- L
  Fix the bugs of set_value and set_value_grad ops and add register in (#49750) · 438975fd
  由 Leo Guo 提交于 1月 12, 2023
```
xpu2_op_list.cc. test=kunlun
```
  438975fd
- Y
  [PHI]Rename some PHI Kernel (#49470) · 30f5e39b
  由 YuanRisheng 提交于 1月 12, 2023
```
* rename kernel

* delete sig

* modify code according comment

* fix ci bugs
```
  30f5e39b
11 1月, 2023 1 次提交

Implement a common segmented array. (#49450) · b1faa562

由 Yiqun Liu 提交于 1月 11, 2023

* Implement a common PointerArray.

* Polish codes.

* Add including of header file.

* Add the branch of kFix8.

* Fix compiling error.

* Add alignas hint to fix the performance drop.

* Optimize the H2D copy in stack_grad.

* Rename the macro.

* Fix align hint for different compilers.

* Polish the define of PADDLE_ALIGN.

* Fix compiling error.

* Remove the align hint on windows.

b1faa562

10 1月, 2023 1 次提交

Optimization for StackGradCUDAKernel for last dimension stack case. (#48992) · 0cae5c7f

由 limingshu 提交于 1月 10, 2023

* add stack grad kernel optimization

* add basic optimization kernel for stack_grad_kernel

* optimization of stack_grad_kernel for last dim stack and change code format with pre-commit

0cae5c7f

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致