提交 · b7b231a668ac51365cdce11dfafe6f7da04b2350 · PaddlePaddle / Paddle

30 9月, 2022 1 次提交

support pure bfloat16 for more ops (#46364) · b7b231a6

由 sneaxiy 提交于 9月 30, 2022

* support pure bfloat16

* support bf16 linear

* update PR to pass CI

* tiny fix where_grad_kernel.cu

* add bfloat16 to selu_grad to pass CI

* fix selu grad compilation error

b7b231a6

29 9月, 2022 7 次提交

C

Optimize softmax's performance when dim_size >= 100000. (#46535) · 9012787f
由 carryyu 提交于 9月 29, 2022

9012787f

Move valid check from python to kernel (#46412) · 37bc2d7b

由 Zhang Zheng 提交于 9月 29, 2022

* Move valid check from python to kernel

* fix error throw

* fix

* invalid label check

* fix

* Revert "fix"

This reverts commit 79fad6799cfa4b30423dbc84e67d7d843d22b84a.

* Revert "invalid label check"

This reverts commit 402a9707390ad5386b3222e85844b92d2e9b9fa4.

* Revert "fix"

This reverts commit 09ba3080ee0587447f875c19cdf060485f15ae3b.

* Revert "fix error throw"

This reverts commit a901bfcc2179d5c120ec29af766f392b122dab52.

* Revert "Move valid check from python to kernel"

This reverts commit baa03cc4ef82d8d45516c30dfb52bf5aead30748.

* final fix

* fix

* fix

37bc2d7b

Add index_select, index_select_grad, reduce_min kernel and their unittests for... · 9a1855ff

由 Leo Guo 提交于 9月 29, 2022

Add index_select, index_select_grad, reduce_min kernel and their unittests for kunlun. Add registers of index_select, index_select_grad, reduce_min, sqrt, sqrt_grad to xpu2_op_list.test=kunlun. (#46557)

9a1855ff

fix P40 topk: Make the optimized topk compatible with P40. (#46547) · 667082c0

由 carryyu 提交于 9月 29, 2022

* fix P40 topk: Make the optimized topk compatible with P40.

* fix P40 topk: Make the optimized topk compatible with P40.

* fix P40 topk: Make the optimized topk compatible with P40.

667082c0

M

add register for strided_slice_grad (#46549) · 40ab6faf
由 ming1753 提交于 9月 29, 2022

40ab6faf
傅

fix uniform_rand_kernel FP16 support in dygraph mode (#46212) · ccab0e2a
由傅剑寒提交于 9月 29, 2022

ccab0e2a

[XPU] update xpu cmake to 0928. (#46437) · 58a478f8

由 houj04 提交于 9月 29, 2022

* [XPU] update xpu cmake to 0923. test=kunlun

* [XPU] update xpu cmake to 0928. test=kunlun

58a478f8

28 9月, 2022 5 次提交

Remove the declaration of using Tensor in framework/tensor.h (#46432) · e12a905e

由 Chen Weihang 提交于 9月 28, 2022

* remove needless using tensor

* remove needless using tensor

* resolve conflict

* replace tensor using

* fix format error

* revert needless changing

* fix rocm and npu compile error

* fix cinn compile error

* fix format error

* fix mkldnn format error

* fix mkldnn format error

* fix cinn compile error

* fix cinn compile error

* fix cinn compile error

* resolve conflict

e12a905e

L

first commit (#46525) · 806b252c
由 limingshu 提交于 9月 28, 2022

806b252c

[PHI] relu6_grad kernel (#46501) · cee2b12d

由 Sławomir Siwek 提交于 9月 28, 2022

* Relu6

* remove fluid handler

* add individual kernel signature

* coding style

* replace bounded_relu with clip

* whitespace

* code style

cee2b12d

Y
[BugFix]Fix concat bugs when call onednn kernel (#46518) · 0ee6dfbe
由 YuanRisheng 提交于 9月 28, 2022
```
* fix concat bug

* fix ci bugs

* fix ci bugs
```
0ee6dfbe

[NPU] add gpu kernel for transfer layout (#46307) · 526d963e

由 kangguangli 提交于 9月 28, 2022

* add gpu kernel for transfer layout

* comment error throw

* fix: flag setting in testcase; add condition check for raising error

* fix typo

* fix: add error type for PADDLE_THROW

* remove kernel fallback in data_transfer.cc

* remove useless variable definition

526d963e

27 9月, 2022 2 次提交
- L
  
  Delete int kernel type in Scatter Kernel.test=kunlun (#46030) · 403cd2b5
  由 Leo Guo 提交于 9月 27, 2022
  
  403cd2b5
- Z
  
  [Sparse] Support static graph (#46245) · a02eb143
  由 zhangkaihuo 提交于 9月 27, 2022
  
  a02eb143
26 9月, 2022 2 次提交
- Z
  
  fix shard_index kernel (#46491) · 808bf2b4
  由 zhaoyingli 提交于 9月 26, 2022
  
  808bf2b4
- L
  
  [Fix] Remove std::trunc() in FloorDivideFunctor and InverseFloorDivideFunctor (#45051) · 091ae705
  由 Lin Manhui 提交于 9月 26, 2022
  
  091ae705
23 9月, 2022 4 次提交
- Z
  Optimize performance of depthwise_conv_fwd (#46287) · 330b1a0a
  由 Zhang Zheng 提交于 9月 23, 2022
```
* Optimize performance of depthwise_conv_fwd

* fix
```
  330b1a0a
- D
  add phi reduce_sum test=kunlun (#46241) · 22fe4f03
  由 dongfangshenzhu 提交于 9月 23, 2022
```
* add phi reduce_sum test=kunlun

* add fhi reduce_sum test=kunlun

* add fhi reduce_sum test=kunlun
```
  22fe4f03
- Y
  
  move selected_rows_functor (#46373) · b6c6f4f9
  由 YuanRisheng 提交于 9月 23, 2022
  
  b6c6f4f9
- L
  Addition of bf16 type support for Compare OP (#46413) · 1a7d907d
  由 limingshu 提交于 9月 23, 2022
```
* first commit

* clarify the quotes

* change code style format

* support bfloat16
```
  1a7d907d
22 9月, 2022 5 次提交

[PHI] Sum op migration (#46239) · 3448afc1

由 Paulina Gacek 提交于 9月 22, 2022

* Sum kernel migrated to phi

* Static cast added, file name changed

* OneDNNGetDataType to uppercase

* refactoring

* AddOneDNNHandler changed to SumOneDNNHandler

3448afc1

P
[PHI] Migrate sgd and stack oneDNN kernels (#46374) · 4ae37aee
由 Piotr Paturej 提交于 9月 22, 2022
```
* Convert slice+grad oneDNN fluid kernels to PHI

* Change mutable_data to Alloc

* Refactor licences
```
4ae37aee

[PHI] Migrate gelu kernels (#45596) · 567e2fc8

由 Sławomir Siwek 提交于 9月 22, 2022

* gaussian random

* mkldnn to onednn renaming

* fix merge conflicts

* remove fluid code

* onednn renaming

* gelu fwd

* sort activations

* gelu gradient

* remove unused macros

* merge conflicts

* fix merge conflicts

* remove extra contraint from gelu op

567e2fc8

Optimize topk's performance when k is small and input_width is large (#45312) · 2c687df0

由 carryyu 提交于 9月 22, 2022

* Optimize topk's performance when k is small and input_width is large

* 修改blockdim设置逻辑

* Update top_k_function_cuda.h

2c687df0

L
[Code Clean] Clarify once_flag setting for kernel autotune module (#44141) · 66a4b2e8
由 limingshu 提交于 9月 22, 2022
```
* first commit

* clarify the quotes

* change code style format

* rerun for ci
```
66a4b2e8

21 9月, 2022 9 次提交

add layer_norm trt fp16 support (#45043) · b7a1ae22

由 ccrrong 提交于 9月 21, 2022

* add fp16 support

* update

* update half

* code format

* fix unittest

* fix rocm compile error

* code format

* code format

* fix rocm compile error

* fix rocm compile error

b7a1ae22

P

Revert pool+grad oneDNN kernel conversion (#45989) · dc31d2aa
由 Piotr Paturej 提交于 9月 21, 2022

dc31d2aa
Z
Revert "SparseConv support duplicate coordinates (#44976)" (#45202) · 8fbe97e4
由 zhangkaihuo 提交于 9月 21, 2022
```
This reverts commit e8de9dfd.
```
8fbe97e4

Enable PaddleInference to use CINN. (#45009) · 3aa6bd57

由 Zhen Wang 提交于 9月 21, 2022

* use cinn in the paddle inference

* fix some cmake errors

* Avoid division by zero in the arange_kernel.

* Avoid dynamic ops.

* Remove some useless codes.

* Use OpTransInfo to encapsulate some codes used in the build_cinn_pass.

3aa6bd57

Z
[Sparse]Conv sort out (#46216) · 23e06680
由 zhangkaihuo 提交于 9月 21, 2022
```
* sort out index
```
23e06680
Z
[Sparse] add_coo_dense (#46322) · 55d31980
由 zhangkaihuo 提交于 9月 21, 2022
```
* for add_bias
```
55d31980

migrate add_n kernel to phi (#46318) · 0f9dde43

由 ykkk2333 提交于 9月 21, 2022

* migrate sigmoid with cross entropy, and tile xpu kernels to phi, test=kunlun

* migrate add_n kernep to phi, test=kunlun

0f9dde43

5

optimization of depthwise_conv2d grad (#46332) · 18650db3
由 5u13 提交于 9月 21, 2022

18650db3

[PHI] Migrate concat+grad, expand+grad, fill_constant, nearest_interp and... · 3d59fee5

由 Piotr Paturej 提交于 9月 21, 2022

[PHI] Migrate concat+grad, expand+grad, fill_constant, nearest_interp and bilinear_interp oneDNN kernels (#45863)

* Migrate concat+grad, expand+grad, fill_constant, nearest_interp_v2 and bilinear_interp_v2 oneDNN kernels to PHI

* Remove old namespace variable

* Fix invalid out dims error

* Add mutable_data method to concat output

* Add check for -1 dim before computing out_dims

* Capitalize oneDNNGetDataType function name

* Change fill_constant kernel to correct PHI kernel

* Attempt to fix dims error

* Fix fill_constant (full) kernel

3d59fee5

20 9月, 2022 5 次提交
- 5
  
  optimization of max_pool3d grad (#45934) · 0e563da6
  由 5u13 提交于 9月 20, 2022
  
  0e563da6
- O
  【PFCC算子性能优化】为Paddle优化adaptive_pooling_op性能 (#45959) · 6d067860
  由 Ouyang Chao 提交于 9月 20, 2022
```
* optimize adaptive_pooling_op (forward)

* fix bug of AdaptiveKernelMaxPool2dWithIdx

* fix bug of AdaptiveKernelPool2D
```
  6d067860
- S
  [PHI] migrate softmax_grad kernel (#46257) · 4dad95cc
  由 Sławomir Siwek 提交于 9月 20, 2022
```
* init

* remove softmaxop

* merge dev

* correct dir

* style
```
  4dad95cc
- P
  [PHI] Migrate slice, slice_grad, split, pad and pad3d oneDNN kernels (#46101) · b232b5e9
  由 Piotr Paturej 提交于 9月 20, 2022
```
* Convert split, pad and pad3d kernels

* Convert slice+grad oneDNN fluid kernels to PHI

* change out->mutable_data to dev_ctx.Alloc
```
  b232b5e9
- P
  [PHI] Shape op migration (#46051) · 27fe77bc
  由 Paulina Gacek 提交于 9月 20, 2022
```
* First approach

* Shape kernel corrected

* Compilation error fixed

* Resize corrected

* Registered types added

* Mistake corrected & types added

* sum kernel deleted
```
  27fe77bc

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功