提交 · 3da3462f3499cb4233cc779118a9ad9670c0ebf0 · Crayon鑫 / Paddle

11 10月, 2022 1 次提交
- N
  
  Update layout autotune for module with no modified (#46541) · 3da3462f
  由 niuliling123 提交于 10月 11, 2022
  
  3da3462f
10 10月, 2022 5 次提交

由 YuanRisheng 提交于 10月 10, 2022

* add yaml entry for rnn and rrnn_grad, move infershape function for rnn_grad to phi infer meta

* WIP: move rnn kernrl to phi

* Change the code generation to avoid converting from intializer list to tuple of heterogeneous types.
This is only triggered when an api has intermediate outputs, and the result of the outputs are of heterogeneous types.

* fix the bug that when none in a vector of tensors requires gradient, the conversion to InferShapeContext to InferMetaContext (a.k.a. BuildInferMetaContext) produces errorous results.

* fix ci bugs

* fix ci bugs

* fix ci bugs

* modify code according comment
Co-authored-by: Nchenfeiyu <chenfeiyu@baidu.com>

ab60fd8b

R

remove comment (#46827) · 8a5f17e8
由 Rayman 提交于 10月 10, 2022

8a5f17e8

[PHI] transpose2_grad op migration (#46139) · e3407a80

由 Paulina Gacek 提交于 10月 10, 2022

* op migrated, Copy(OneDNNContext, ...) added

* mutable_data & op registration in fluid removed

* refactoring

* OneDNNGetDataType to uppercase

* missing cpu check added, handler moved to .h file

* name changed to transpose_grad

* Copy changed back to TensorCopy

* Resizing corrected, Copy(OneDNNContext) removed

e3407a80

R

【Hackathon No.36】优化 lerp_grad op 在 GPU 上的计算性能 (#45946) · ef61df30
由 Rayman 提交于 10月 10, 2022

ef61df30
R
【Hackathon No.56&38】deformable_conv_v1 算子实现 float16 数据类型支持&前向运行加速 (#46111) · 5e0614a1
由 Rayman 提交于 10月 10, 2022
```
support fp16 for deformable conv
```
5e0614a1

09 10月, 2022 4 次提交
- Z
  
  add sync_batch_norm_kernel (#46430) · 5cd6a707
  由 zhangkaihuo 提交于 10月 09, 2022
  
  5cd6a707
- Z
  
  [Sparse] Add a batch_norm kernel (#46359) · 888223b7
  由 zhangkaihuo 提交于 10月 09, 2022
  
  888223b7
- S
  
  add seed check (#46747) · 97ec57fe
  由 Sławomir Siwek 提交于 10月 09, 2022
  
  97ec57fe
- S
  Enable hard_swish_grad unit test (#46621) · ff0171e4
  由 Sławomir Siwek 提交于 10月 09, 2022
```
* enable hard_swish_grad unit test

* remove unused argument
```
  ff0171e4
08 10月, 2022 1 次提交
- H
  
  fix typo (#46680) · 6e9bb9f9
  由 HongyuJia 提交于 10月 08, 2022
  
  6e9bb9f9
03 10月, 2022 1 次提交
- J
  Requantize to use Memory Desc in Tensors (#46608) · a579e523
  由 Jacek Czaja 提交于 10月 03, 2022
```
* - some more MD changes

* - lint

* - compilation fixes

* - compilation fixes

* - lint

* - fix
```
  a579e523
30 9月, 2022 10 次提交
- Fix undefined reference PD_IntArrayGetElementCount (#46662) · 2055a1d2
  由 engineer1109 提交于 9月 30, 2022
```
* Fix undefined reference PD_IntArrayGetElementCount

* Delete PD_IntArrayGetSize Unused
```
  2055a1d2
- Z
  Optimize performance of depthwise_conv_bwd of filter (#46490) · 04eb211a
  由 Zhang Zheng 提交于 9月 30, 2022
```
* Optimize performance of depthwise_conv_bwd of filter

* op-benchmark

* fix

* op benchmark

* merge bwd
```
  04eb211a
- Z
  Optimize performance of depthwise_conv_bwd (#46362) · f17a73e9
  由 Zhang Zheng 提交于 9月 30, 2022
```
* Optimize performance of depthwise_conv_bwd

* fix
```
  f17a73e9
- Y
  fix bugs of tipc, test=kunlun (#46540) · d16360c8
  由 ykkk2333 提交于 9月 30, 2022
```
* migrate sigmoid with cross entropy, and tile xpu kernels to phi, test=kunlun

* migrate add_n kernep to phi, test=kunlun

* fix bugs of tipc, test=kunlun
```
  d16360c8
- H
  
  change mkldnn kernel layout, ALL_LAYOUT->ONEDNN (#46626) · 22e81907
  由 HongyuJia 提交于 9月 30, 2022
  
  22e81907
- H
  
  change mkldnn kernel layout, ALL_LAYOUT->ONEDNN (#46628) · 4744cbc7
  由 HongyuJia 提交于 9月 30, 2022
  
  4744cbc7
- 六
  
  【Hackathon No.21】为 Paddle 新增 paddle.incubate.sparse.transpose 稀疏 API (#45849) · 2b879a69
  由六个骨头提交于 9月 30, 2022
  
  2b879a69
- H
  
  change mkldnn kernel layout, ALL_LAYOUT->ONEDNN (#46627) · 4b9dae01
  由 HongyuJia 提交于 9月 30, 2022
  
  4b9dae01
- H
  
  change mkldnn kernel layout, ALL_LAYOUT->ONEDNN (#46629) · abee2210
  由 HongyuJia 提交于 9月 30, 2022
  
  abee2210
- S
  support pure bfloat16 for more ops (#46364) · b7b231a6
  由 sneaxiy 提交于 9月 30, 2022
```
* support pure bfloat16

* support bf16 linear

* update PR to pass CI

* tiny fix where_grad_kernel.cu

* add bfloat16 to selu_grad to pass CI

* fix selu grad compilation error
```
  b7b231a6
29 9月, 2022 9 次提交

C

Optimize softmax's performance when dim_size >= 100000. (#46535) · 9012787f
由 carryyu 提交于 9月 29, 2022

9012787f

Move valid check from python to kernel (#46412) · 37bc2d7b

由 Zhang Zheng 提交于 9月 29, 2022

* Move valid check from python to kernel

* fix error throw

* fix

* invalid label check

* fix

* Revert "fix"

This reverts commit 79fad6799cfa4b30423dbc84e67d7d843d22b84a.

* Revert "invalid label check"

This reverts commit 402a9707390ad5386b3222e85844b92d2e9b9fa4.

* Revert "fix"

This reverts commit 09ba3080ee0587447f875c19cdf060485f15ae3b.

* Revert "fix error throw"

This reverts commit a901bfcc2179d5c120ec29af766f392b122dab52.

* Revert "Move valid check from python to kernel"

This reverts commit baa03cc4ef82d8d45516c30dfb52bf5aead30748.

* final fix

* fix

* fix

37bc2d7b

Add index_select, index_select_grad, reduce_min kernel and their unittests for... · 9a1855ff

由 Leo Guo 提交于 9月 29, 2022

Add index_select, index_select_grad, reduce_min kernel and their unittests for kunlun. Add registers of index_select, index_select_grad, reduce_min, sqrt, sqrt_grad to xpu2_op_list.test=kunlun. (#46557)

9a1855ff

fix P40 topk: Make the optimized topk compatible with P40. (#46547) · 667082c0

由 carryyu 提交于 9月 29, 2022

* fix P40 topk: Make the optimized topk compatible with P40.

* fix P40 topk: Make the optimized topk compatible with P40.

* fix P40 topk: Make the optimized topk compatible with P40.

667082c0

M

add register for strided_slice_grad (#46549) · 40ab6faf
由 ming1753 提交于 9月 29, 2022

40ab6faf
傅

fix uniform_rand_kernel FP16 support in dygraph mode (#46212) · ccab0e2a
由傅剑寒提交于 9月 29, 2022

ccab0e2a
H
[OptLayoutSelect] Select the highest priority layout (#46598) · 596d8209
由 HongyuJia 提交于 9月 29, 2022
```
* select highest priority layout

* opt performance, save virtual table find
```
596d8209

[Fix KernelKeyParser] Unify the logic of `operator()` in `KernelKeyParser` (#46560) · 4140d7ec

由 HongyuJia 提交于 9月 29, 2022

* add datatype check for ParseKernelKeyByInputArgs

* polish error message

* Actually, einsum has vector<Tensor> inpute with DataType::COMPLEX64, see test_einsum_v2.py

* headerfile remove enforce.h

4140d7ec

[XPU] update xpu cmake to 0928. (#46437) · 58a478f8

由 houj04 提交于 9月 29, 2022

* [XPU] update xpu cmake to 0923. test=kunlun

* [XPU] update xpu cmake to 0928. test=kunlun

58a478f8

28 9月, 2022 9 次提交

Remove the declaration of using Tensor in framework/tensor.h (#46432) · e12a905e

由 Chen Weihang 提交于 9月 28, 2022

* remove needless using tensor

* remove needless using tensor

* resolve conflict

* replace tensor using

* fix format error

* revert needless changing

* fix rocm and npu compile error

* fix cinn compile error

* fix format error

* fix mkldnn format error

* fix mkldnn format error

* fix cinn compile error

* fix cinn compile error

* fix cinn compile error

* resolve conflict

e12a905e

H

rename filenames from pten to phi (#46579) · 3f8585a9
由 HongyuJia 提交于 9月 28, 2022

3f8585a9

[phi Backend] Change BackendSet from uint64_t to uint32_t (#46532) · f6f8c935

由 HongyuJia 提交于 9月 28, 2022

* change BackendSet from 64bits to 32bits

* fix _MSC_VER error, __lzcnt32->__lzcnt

* fix __GNUC__ error, __builtin_clzl->__builtin_clz

f6f8c935

L

first commit (#46525) · 806b252c
由 limingshu 提交于 9月 28, 2022

806b252c

[PHI] relu6_grad kernel (#46501) · cee2b12d

由 Sławomir Siwek 提交于 9月 28, 2022

* Relu6

* remove fluid handler

* add individual kernel signature

* coding style

* replace bounded_relu with clip

* whitespace

* code style

cee2b12d

Y

add decode_jpeg yaml (#46562) · c7da8602
由 YuanRisheng 提交于 9月 28, 2022

c7da8602
Y
[BugFix]Fix concat bugs when call onednn kernel (#46518) · 0ee6dfbe
由 YuanRisheng 提交于 9月 28, 2022
```
* fix concat bug

* fix ci bugs

* fix ci bugs
```
0ee6dfbe

[NPU] add gpu kernel for transfer layout (#46307) · 526d963e

由 kangguangli 提交于 9月 28, 2022

* add gpu kernel for transfer layout

* comment error throw

* fix: flag setting in testcase; add condition check for raising error

* fix typo

* fix: add error type for PADDLE_THROW

* remove kernel fallback in data_transfer.cc

* remove useless variable definition

526d963e

W
[PHI] phi support xpu black list (#46527) · 84f7835d
由 wanghuancoder 提交于 9月 28, 2022
```
* phi support xpu black list
```
84f7835d

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致