提交 · 44855da3e30f1bad5c32d084fa7dd05c9cf76a7c · BaiXuePrincess / Paddle

20 1月, 2023 1 次提交
- J
  Fix for bad_alloc in oneDNN matmul_grad kernel (#48593) · 44855da3
  由 jakpiase 提交于 1月 20, 2023
```
* fix for matmul_grad

* another fix for matmul_grad

* fix
```
  44855da3
19 1月, 2023 2 次提交

Fix paddle.queeze_ bug (#49903) · 11e34ae0

由 heliqi 提交于 1月 19, 2023

* fix queeze_ bug

* fix slove use squeeze_kernel

* fix slove use squeeze_kernel

* fix slove use squeeze_kernel

* add test case

11e34ae0

[KUNLUN] add op: maxpool_with_index (#49505) · f71f77e9

由 jameszhang 提交于 1月 19, 2023

* [KUNLUN] add op: maxpool_with_index

* use DeviceContext::Alloc() instead of DenseTensor::mutable_data()

* fix file format

* solve clip unittest failure

* minor fix

* Revert "solve clip unittest failure" since the issue is fixed
in #49535

This reverts commit 1127adc66e79afe35ac3c00bb34e6aaa7cd7d78b.

* align with xdnn on the definition of mask in max_pool_with_index

* minor

f71f77e9

18 1月, 2023 6 次提交

Add align check for Concat Kernel (#49761) · 24379442
由 MarDino 提交于 1月 18, 2023
```
* add align check

* refine
```
24379442
[Zero-Dim] support input 0D for paddle.moveaxis / quantile (#49813) · 26140ec8
由 zhouweiwei2014 提交于 1月 18, 2023
```
* [Zero-Dim] support input 0D for paddle.moveaxis/quantile

* fix CI
```
26140ec8

[PHI] remove bitwise and, or, xor (#49916) · 9056cc8b

由 RuohengMa 提交于 1月 18, 2023

* add reduce_sum_int64 and reduce_sum_int8 xpu kernels

* [PHI] add clip grad kernel with support type float32 and int32

* [PHI unittest] add clip_grad unit test

* adapt code to clang-format

* update xpu api output with clip_grad api

* remove int8 support of reduce_sum xpu kernel since it can not pass unit tests

* adapt license date, add code for XPUDataType convertion

* add int8 support of reduce_sum

* add reduce_sum unit tests for dtype int64, int8, and add more test cases

* update license date

* remove buggy bitwise and, or and xor xpu kernels, refine bitwise not xpu kernel

* change license date

9056cc8b

H

[XPU] add logical_not op. (#49911) · 60d1199a
由 houj04 提交于 1月 18, 2023

60d1199a

[0 Tensor support] support the 0d tensor for the cumsum (#49518) · 5fca45ea

由 wawltor 提交于 1月 18, 2023

* Add the cumsum 0d tensor

* xpu and cpu judge the 0d  tensor

* change to 2022 to 2023 in new commit

* fix the reverse logic

5fca45ea

Z

[Zero-Dim] Fix bug in masked_select for XPU (#49904) · 1a8be158
由 Zhang Zheng 提交于 1月 18, 2023

1a8be158

16 1月, 2023 1 次提交

CUDA12.0 integration (#49539) · 1885d55a

由 zlsh80826 提交于 1月 16, 2023

* Update warpctc for cuda-12

* Deprecate cudaProfilerInitialize for CUDA > 11

* Deprecate CUSPARSE_MV_ALG_DEFAULT for CUDA_VERSION >= 11040

* Add the missing thrust header

1885d55a

13 1月, 2023 8 次提交
- L
  Move GeneralDivMod from segmented_array.h to fast_divmod.h (#48934) · ad4824e5
  由 limingshu 提交于 1月 13, 2023
```
* first commit

* add some changes in stack kernel.

* move the location of GeneralDivMod

* fix code format error according to ci
```
  ad4824e5
- R
  [Zero-Dim] add where, atan2, median 0-Dim ut (#49692) · 1508cae7
  由 ronnywang 提交于 1月 13, 2023
```
* add where, atan2, median 0d ut

* add where, atan2, median 0d ut

* update

* update

* update
```
  1508cae7
- Y
  
  add xpu adagrad and where_grad kernels (#49701) · a99c3cd4
  由 ykkk2333 提交于 1月 13, 2023
  
  a99c3cd4
- L
  
  Add unitest for set_value, set_value_grad. test=kunlun (#49773) · 5e722245
  由 Leo Guo 提交于 1月 13, 2023
  
  5e722245
- W
  
  add prelu & prelu_grad op for xpu (#49672) · 8d512b8f
  由 wangshengxiang 提交于 1月 13, 2023
  
  8d512b8f
- W
  [cpplint fix] under ps (#49759) · d5c5bbc3
  由 wangzhen38 提交于 1月 13, 2023
```
* [cpplint fix] under ps
```
  d5c5bbc3
- Z
  
  Update threshold of bn1d (#49734) · 0294ab41
  由 zhangkaihuo 提交于 1月 13, 2023
  
  0294ab41
- Y
  
  fix fc and fused_fc_elementwise_layernorm kernel diff (#49778) · 0b24d167
  由 Yuanle Liu 提交于 1月 13, 2023
  
  0b24d167
12 1月, 2023 4 次提交
- S
  lerp support 0 Tensor (#49667) · 8cd0d5b3
  由 sunli 提交于 1月 12, 2023
```
* lerp support 0 Tensor

* fix lerp grad

* fix lerp zero test

* fix 0D + ND/ND + 0D

* fix check

* update code

* fix lerp infer shape

* static backward test

* updata static graph test
```
  8cd0d5b3
- Y
  
  deal with conflict (#49766) · 27aec62b
  由 YuanRisheng 提交于 1月 12, 2023
  
  27aec62b
- L
  Fix the bugs of set_value and set_value_grad ops and add register in (#49750) · 438975fd
  由 Leo Guo 提交于 1月 12, 2023
```
xpu2_op_list.cc. test=kunlun
```
  438975fd
- Y
  [PHI]Rename some PHI Kernel (#49470) · 30f5e39b
  由 YuanRisheng 提交于 1月 12, 2023
```
* rename kernel

* delete sig

* modify code according comment

* fix ci bugs
```
  30f5e39b
11 1月, 2023 1 次提交

Implement a common segmented array. (#49450) · b1faa562

由 Yiqun Liu 提交于 1月 11, 2023

* Implement a common PointerArray.

* Polish codes.

* Add including of header file.

* Add the branch of kFix8.

* Fix compiling error.

* Add alignas hint to fix the performance drop.

* Optimize the H2D copy in stack_grad.

* Rename the macro.

* Fix align hint for different compilers.

* Polish the define of PADDLE_ALIGN.

* Fix compiling error.

* Remove the align hint on windows.

b1faa562

10 1月, 2023 3 次提交

Optimization for StackGradCUDAKernel for last dimension stack case. (#48992) · 0cae5c7f

由 limingshu 提交于 1月 10, 2023

* add stack grad kernel optimization

* add basic optimization kernel for stack_grad_kernel

* optimization of stack_grad_kernel for last dim stack and change code format with pre-commit

0cae5c7f

[PHI Decoupling] move sequence_scale from fluid to phi (#49668) · a36c5490

由 Ryan 提交于 1月 10, 2023

* try sequence_padding

* fix cant use mutable_data

* fix mistake fluid_sequence_scale.hh/CMakeLists.t include

* fix namespace bug

* fix framework::ToAbsOffset not found

* fix codestyle

a36c5490

Refine name style and MoeKernel (#49432) · 39210ed0
由 MarDino 提交于 1月 10, 2023

39210ed0

09 1月, 2023 4 次提交
- Add concat optimization (#49540) · 1a0b3661
  由 MarDino 提交于 1月 09, 2023
```
* add concat optimization

* refine

* remove annotation

* use alignas instead of aligned_storage
```
  1a0b3661
- Q
  
  add fill/fill_any for kunlun (#49645) · 31ea3231
  由 QingshuChen 提交于 1月 09, 2023
  
  31ea3231
- Y
  [XPU] add einsum fill diagonal and diagonal kernels (#49465) · a5bf156b
  由 ykkk2333 提交于 1月 09, 2023
```
* migrate shaple sgd, split,sign xpu kernels to phi, test=kunlun

* fix dlrm throughput problem, test=kunlun

* add xpu einsum, fill_diagonal, and diagonal kernels, test=kunlun
```
  a5bf156b
- W
  
  [0 Tensor support] cumprod (#49550) · 50a8b655
  由 wangzhen38 提交于 1月 09, 2023
  
  50a8b655
06 1月, 2023 3 次提交

Dev (#49591) · 07db4a9f

由 RuohengMa 提交于 1月 06, 2023

* add bitwise and, bitwise not, bitwise or and bitwise xor

* correct typo

07db4a9f

[zero-dim] Support 0-d for kthvalue and mode (#49340) · 292738f3

由 JYChen 提交于 1月 06, 2023

* add 0-d support for paddle.kthvalue

* add 0-d support for paddle.mode

* fix coverage test for device

* fix check-bug in windows

* change axis check from LT to LE

* add shape & value check for grad when input is 0d tensor

292738f3

T

fix bug (#49546) · e0ee7403
由 Thomas Young 提交于 1月 06, 2023

e0ee7403

05 1月, 2023 2 次提交
- S
  Support 0D for paddle.sort/argsort (#49501) · 032da731
  由 Siming Dai 提交于 1月 05, 2023
```
* support 0D for paddle.sort/argsort

* support 0D tensor for paddle.sort/argsort in xpu

* fix bug

* fix grad and add value assertion
```
  032da731
- Z
  
  support generate static graph code for imag and real op (#49523) · 192eb4d5
  由 zyfncg 提交于 1月 05, 2023
  
  192eb4d5
04 1月, 2023 3 次提交

W

[Inference] Add conv_fusion nhwc impl. (#49047) · 4a8708bb
由 Wilber 提交于 1月 04, 2023

4a8708bb
Y

[Paddle Inference] fix mixed precision diff (#49475) · ac75a9a6
由 Yuanle Liu 提交于 1月 04, 2023

ac75a9a6

[Unify KernelKey] change OpKernelType->KernelKey (#49138) · 4383494f

由 HongyuJia 提交于 1月 04, 2023

* execute use kernel_key first

* change OpKernelType->KernelKey

* fix py3 compile error, remove redundant header files

* fix build_strategy_test

* fix DataType::RAW

* fix custom_type test: operator_test.cc

* fix transform place

* fix backends_are_same_class

* try fix place TransDataDevice

* support all KernelKey

* fix TransformData

* fix place_are_same_class

* fix merge

* fix test_params_no_grad

* fix specific place of GetExpectedKernelType

* fix specific place of GetExpectedKernelType

* fix GetKernelTypeForVar

* fix dtype error

* fix fetch_v2

* change GetKernelTypeForVar

* fix interpreter

* fix typo error

* polish codes

* polish codes

* polish codes

* fix conflict

4383494f

03 1月, 2023 2 次提交
- L
  
  H2D data transfer optimization for concat kernel (#49040) · 0de94cd9
  由 limingshu 提交于 1月 03, 2023
  
  0de94cd9
- Z
  [Paddle Inference] Implement conv2d_fusion NHWC format using cutlass (#47989) · c123dd1e
  由 zhoutianzi666 提交于 1月 03, 2023
```
* Implement conv2d_fusion NHWC format using CUTLASS
* Add unit testing for CUTLASS Conv in inference
* Add experimental API for CUTLASS.
```
  c123dd1e

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致