提交 · 30f5e39b6c6031da2116488b03d7a5e23f04a4f7 · PaddlePaddle / Paddle

12 1月, 2023 1 次提交
- Y
  [PHI]Rename some PHI Kernel (#49470) · 30f5e39b
  由 YuanRisheng 提交于 1月 12, 2023
```
* rename kernel

* delete sig

* modify code according comment

* fix ci bugs
```
  30f5e39b
11 1月, 2023 1 次提交

Implement a common segmented array. (#49450) · b1faa562

由 Yiqun Liu 提交于 1月 11, 2023

* Implement a common PointerArray.

* Polish codes.

* Add including of header file.

* Add the branch of kFix8.

* Fix compiling error.

* Add alignas hint to fix the performance drop.

* Optimize the H2D copy in stack_grad.

* Rename the macro.

* Fix align hint for different compilers.

* Polish the define of PADDLE_ALIGN.

* Fix compiling error.

* Remove the align hint on windows.

b1faa562

10 1月, 2023 3 次提交

Optimization for StackGradCUDAKernel for last dimension stack case. (#48992) · 0cae5c7f

由 limingshu 提交于 1月 10, 2023

* add stack grad kernel optimization

* add basic optimization kernel for stack_grad_kernel

* optimization of stack_grad_kernel for last dim stack and change code format with pre-commit

0cae5c7f

[PHI Decoupling] move sequence_scale from fluid to phi (#49668) · a36c5490

由 Ryan 提交于 1月 10, 2023

* try sequence_padding

* fix cant use mutable_data

* fix mistake fluid_sequence_scale.hh/CMakeLists.t include

* fix namespace bug

* fix framework::ToAbsOffset not found

* fix codestyle

a36c5490

Refine name style and MoeKernel (#49432) · 39210ed0
由 MarDino 提交于 1月 10, 2023

39210ed0

09 1月, 2023 4 次提交
- Add concat optimization (#49540) · 1a0b3661
  由 MarDino 提交于 1月 09, 2023
```
* add concat optimization

* refine

* remove annotation

* use alignas instead of aligned_storage
```
  1a0b3661
- Q
  
  add fill/fill_any for kunlun (#49645) · 31ea3231
  由 QingshuChen 提交于 1月 09, 2023
  
  31ea3231
- Y
  [XPU] add einsum fill diagonal and diagonal kernels (#49465) · a5bf156b
  由 ykkk2333 提交于 1月 09, 2023
```
* migrate shaple sgd, split,sign xpu kernels to phi, test=kunlun

* fix dlrm throughput problem, test=kunlun

* add xpu einsum, fill_diagonal, and diagonal kernels, test=kunlun
```
  a5bf156b
- W
  
  [0 Tensor support] cumprod (#49550) · 50a8b655
  由 wangzhen38 提交于 1月 09, 2023
  
  50a8b655
06 1月, 2023 3 次提交

Dev (#49591) · 07db4a9f

由 RuohengMa 提交于 1月 06, 2023

* add bitwise and, bitwise not, bitwise or and bitwise xor

* correct typo

07db4a9f

[zero-dim] Support 0-d for kthvalue and mode (#49340) · 292738f3

由 JYChen 提交于 1月 06, 2023

* add 0-d support for paddle.kthvalue

* add 0-d support for paddle.mode

* fix coverage test for device

* fix check-bug in windows

* change axis check from LT to LE

* add shape & value check for grad when input is 0d tensor

292738f3

T

fix bug (#49546) · e0ee7403
由 Thomas Young 提交于 1月 06, 2023

e0ee7403

05 1月, 2023 2 次提交
- S
  Support 0D for paddle.sort/argsort (#49501) · 032da731
  由 Siming Dai 提交于 1月 05, 2023
```
* support 0D for paddle.sort/argsort

* support 0D tensor for paddle.sort/argsort in xpu

* fix bug

* fix grad and add value assertion
```
  032da731
- Z
  
  support generate static graph code for imag and real op (#49523) · 192eb4d5
  由 zyfncg 提交于 1月 05, 2023
  
  192eb4d5
04 1月, 2023 3 次提交

W

[Inference] Add conv_fusion nhwc impl. (#49047) · 4a8708bb
由 Wilber 提交于 1月 04, 2023

4a8708bb
Y

[Paddle Inference] fix mixed precision diff (#49475) · ac75a9a6
由 Yuanle Liu 提交于 1月 04, 2023

ac75a9a6

[Unify KernelKey] change OpKernelType->KernelKey (#49138) · 4383494f

由 HongyuJia 提交于 1月 04, 2023

* execute use kernel_key first

* change OpKernelType->KernelKey

* fix py3 compile error, remove redundant header files

* fix build_strategy_test

* fix DataType::RAW

* fix custom_type test: operator_test.cc

* fix transform place

* fix backends_are_same_class

* try fix place TransDataDevice

* support all KernelKey

* fix TransformData

* fix place_are_same_class

* fix merge

* fix test_params_no_grad

* fix specific place of GetExpectedKernelType

* fix specific place of GetExpectedKernelType

* fix GetKernelTypeForVar

* fix dtype error

* fix fetch_v2

* change GetKernelTypeForVar

* fix interpreter

* fix typo error

* polish codes

* polish codes

* polish codes

* fix conflict

4383494f

03 1月, 2023 3 次提交
- L
  
  H2D data transfer optimization for concat kernel (#49040) · 0de94cd9
  由 limingshu 提交于 1月 03, 2023
  
  0de94cd9
- Z
  [Paddle Inference] Implement conv2d_fusion NHWC format using cutlass (#47989) · c123dd1e
  由 zhoutianzi666 提交于 1月 03, 2023
```
* Implement conv2d_fusion NHWC format using CUTLASS
* Add unit testing for CUTLASS Conv in inference
* Add experimental API for CUTLASS.
```
  c123dd1e
- Y
  Use BroadcastKernel and ReduceKernel to optimize expand and expand_grad. (#49419) · c4604025
  由 Yiqun Liu 提交于 1月 03, 2023
```
* Use BroadcastKernel and ReduceKernel to optimize expand and expand_grad.

* Correct the axis when there is only 1 input in BroadcastKernel.

* Add the calculate of output's shape.
```
  c4604025
31 12月, 2022 1 次提交
- C
  
  support flip 0D (#49460) · cb22a5c7
  由 caozhou 提交于 12月 31, 2022
  
  cb22a5c7
30 12月, 2022 1 次提交

在文档中统一静态图模式与动态图模式的英文翻译 (#49170) · a186e60d

由 Sanbu 提交于 12月 30, 2022

* 1219

* temporarily change the num_diff_files limit, test=document_fix

* Revert "temporarily change the num_diff_files limit, test=document_fix"

This reverts commit 8e70f00ef468d2dad0e38b3da06295ed62990d20.

* for codestyle

* remove duplicate license

* `static mode` -> `static graph mode`

* Update hybrid_parallel_inference.py

* Update layer_function_generator.py

* Update manipulation.py

* reset
Co-authored-by: NLigoml <39876205+Ligoml@users.noreply.github.com>
Co-authored-by: NSigureMo <sigure.qaq@gmail.com>

a186e60d

29 12月, 2022 1 次提交
- Y
  
  xpu kernels support api int64 vector inputs, test=kunlun (#49336) · 3c2420a3
  由 ykkk2333 提交于 12月 29, 2022
  
  3c2420a3
28 12月, 2022 3 次提交
- S
  
  fix unique_kernel support axis=-1 (#49385) · ab786715
  由 sprouteer 提交于 12月 28, 2022
  
  ab786715
- X
  
  fix_moe (#49353) · 04511cf9
  由 xiaoxiaohehe001 提交于 12月 28, 2022
  
  04511cf9
- H
  
  fix bugs of paddle.multiplex API (#49368) · f6f0c562
  由 Haohongxiang 提交于 12月 28, 2022
  
  f6f0c562
27 12月, 2022 3 次提交
- Z
  
  add unbind op for xpu (#49356) · 16931039
  由 zhangyikun02 提交于 12月 27, 2022
  
  16931039
- X
  fix fold for large bs (#49337) · 9dde26f6
  由 xiaoting 提交于 12月 27, 2022
```
* fix fold for large bs

* fix fold for large bs
```
  9dde26f6
- X
  Revert "make bilinear interpolate stable. (#48644)" (#49307) · 17ec1620
  由 xiongkun 提交于 12月 27, 2022
```
This reverts commit e1e8bf72.
```
  17ec1620
26 12月, 2022 2 次提交

fix dlrm qpsproblem (#49171) · c8f76337

由 ykkk2333 提交于 12月 26, 2022

* migrate shaple sgd, split,sign xpu kernels to phi, test=kunlun

* fix dlrm throughput problem, test=kunlun

c8f76337

R
[0d Tensor] update scatter for zero-dimension tensor (#49279) · 73aa98cf
由 Roc 提交于 12月 26, 2022
```
* revert concat and change concat to stack

* let stack kernel support int8, uint8 and bool type
```
73aa98cf

23 12月, 2022 6 次提交
- Q
  
  suport recompute for kunlun (#49069) · 98c17a68
  由 QingshuChen 提交于 12月 23, 2022
  
  98c17a68
- Y
  
  Fix arange gpu kernel (#49273) · e073313d
  由 Yuanle Liu 提交于 12月 23, 2022
  
  e073313d
- C
  fix matmul double and triple grad (#48779) · 13c4fd59
  由 Charles-hit 提交于 12月 23, 2022
```
* fix matmul double and triple grad

* remove some comment

* add matmul_double_grad unit test

* fix matmul triple grad

* fix dot triple grad and add unit test

* modify codestyle

* fix dot_grad

* refactor dot triple grad

* disable some unit test

* fix unit test

* fix unit test in double grad
```
  13c4fd59
- H
  
  square_grad support fp16 *test=kunlun (#48847) · ae544586
  由 haosicheng 提交于 12月 23, 2022
  
  ae544586
- H
  add rnn-t loss and api (#49199) · c088f9ec
  由 Hui Zhang 提交于 12月 23, 2022
```
* add warp transducer code
```
  c088f9ec
- Register half datatype for Roll Kernel (#49192) · 3b90a7f3
  由 MarDino 提交于 12月 23, 2022
```
* register half datatype

* register roll grad fp16 kernel
```
  3b90a7f3
22 12月, 2022 3 次提交
- X
  
  [Paddle Inference] Add moe phi kernel (#48703) · def2a87f
  由 xiaoxiaohehe001 提交于 12月 22, 2022
  
  def2a87f
- Z
  Optimize performance of batch_norm_bwd with NHWC layout and infer mode (#49209) · a9fd0807
  由 Zhang Zheng 提交于 12月 22, 2022
```
* Optimize performance of batch_norm_bwd with NHWC layout and infer mode

* fix
```
  a9fd0807
- Z
  
  Optimize performance of avgpool2d with NHWC layout (#49231) · aa0098f6
  由 Zhang Zheng 提交于 12月 22, 2022
  
  aa0098f6

PaddlePaddle / Paddle 接近 2 年 前同步成功

PaddlePaddle / Paddle
接近 2 年前同步成功