提交 · 43d6bdca6851c258ce1e44e44da1aca455a39580 · PaddlePaddle / Paddle

25 5月, 2023 1 次提交
- R
  
  Fix the custom pass with empty type (#54065) · 43d6bdca
  由 ronnywang 提交于 5月 25, 2023
  
  43d6bdca
24 5月, 2023 1 次提交

Try to increase the repeat of autotune and fix the setting of allow_tf32_cublas. (#53622) · f4abe34b

由 Yiqun Liu 提交于 5月 24, 2023

* Try to increase the repeat of autotune and fix the setting of allow_tf32_cublas.

* Change the repeat of cublaslt to 10.

* Use FLAGS_cublaslt_exhaustive_search_times as repeats.

* Fix compiling error on CI.

* Polish the key and simplify codes.

f4abe34b

23 5月, 2023 11 次提交

[CINN] Enable check_cinn on some tests (#53710) · 97fe79a9

由 Fisher 提交于 5月 23, 2023

* Enable check_cinn on some tests

Tests: bitwise, compare, shape, assign_value, sum, expand_v2,
lookup_table, lookup_table_v2

* Enable more CINN tests

Tests with CINN: expand_v2, matmul, matmul_v2, mul, norm, one_hot_v2
Add target select in cinn_launch_op

* Revert test_mul_op

* Improve op unit tests

97fe79a9

L

fix nccl version (#53942) · 89da2f19
由 LiYuRio 提交于 5月 23, 2023

89da2f19

[static op generation] tril_triu (#54033) · 4af0f140

由 gouzil 提交于 5月 23, 2023

* [phi] autogen code tril_triu

* [phi][api]fix tril_triu_grad args

* [fluid] clean cmake; [phi] fix infer_meta

4af0f140

C

Fix typos (#53960) · d89e0367
由 co63oc 提交于 5月 23, 2023

d89e0367
C

fix typos(#53967) · c36a000d
由 cyberslack_lee 提交于 5月 23, 2023

c36a000d

Functionalize distributed_fused_lamb kernel (#53896) · 5f8e7d8f

由 huangjiyi 提交于 5月 23, 2023

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update HostAlloc

* update param name

* update cpu kernel

* remove kernel header

* update

* update

5f8e7d8f

H
move fusion_group infershape to phi (#53934) · 3dc99088
由 huangjiyi 提交于 5月 23, 2023
```
* update

* update

* update

* set out dtype
```
3dc99088

static graph autogen code support for pad3d op (#53733) · bcf67536

由 Wang Xin 提交于 5月 23, 2023

* static graph autogen code support for pad3d op

* bug fixed

* add ut for pad3d mkldnn op

* fix coverage

* fix bug

* fix bug

* Delete test_pad3d_mkldnn_op.py

bcf67536

R
[CustomDevice] fix auto_paralell (#53842) · 3aa5d64e
由 ronnywang 提交于 5月 23, 2023
```
* [CustomDevice] fix auto_paralell

* update

* update

* update
```
3aa5d64e

[static op generation] group_norm (#53489) · d0514a93

由 LoneRanger 提交于 5月 23, 2023

* fix the static op generation for group_norm

* fix bug of mismatch

* fix bug of AssertionError

* fix setting of composite

d0514a93

H
[0D-Tensor] Support elementwise_add (#53955) · 26c824db
由 HongyuJia 提交于 5月 23, 2023
```
* [0D-Tensor] Support elementwise_add

* support elementwise_add ZeroDim2&3
```
26c824db

22 5月, 2023 3 次提交

update_c++14_to_c++17_on_windows (#53958) · 6e043202

由 risemeup1 提交于 5月 22, 2023

* update_c++14_to_c++17_on_windows

* disable test_audio_logmel_feature and test_audio_mel_feature

6e043202

Y
[Inference] add config.enable_low_precision_io api and remove rely on... · d1bbd900
由 Yuanle Liu 提交于 5月 22, 2023
```
[Inference] add config.enable_low_precision_io api and remove rely on AnalysisConfig::Precison in trt (#52485)
```
d1bbd900

Add multiclass_nms3 GPU kernel (#52401) · f71c805e

由 Tian Zheng 提交于 5月 22, 2023

* Add GPU kernel for multiclass_nms3 op

* Make multiclass_nms3 gpu kernel output consistent with cpu kernel

* Fix API incompatibility

* Fix unittests on builds without CUDA

* Fix ROCM build

* Remove fluid headers; Use default atol for unittest

* Change function and variable naming

* Add comments; Reduce redundant code

* Use paddle test framework

f71c805e

19 5月, 2023 5 次提交

add minimum grad composite rules (#52561) · 97690816

由 warrentdrew 提交于 5月 19, 2023

* add minimum grad composite rules

* add public python api

* fix format

* fix format

* update testcase

* fix testcase

* fix format

* fix cmakelist.txt

* fix format

* fix param problem

* fix op and composite rule

* fix bf16 cpu support problem

* fix bf16 cpu issue

* fix axis error log

* add axis for maximum

* revert commit

* remove .orig

* fix generic problem

* revert max op

* fix axis error

* fix maximum axis

* fix test_check_output

* fix cinn

* fix minimum maximum axis check

97690816

Add flash attention to speedup fused_gate_attention. (#52731) · d29c1f8e

由 limingshu 提交于 5月 19, 2023

* Reorganize the forward codes of flash-attention.

* Fix forward.

* Remove some noused codes.

* Simplify codes and fix backward.

* Change all LOG(INFO) to VLOG and fix the backward.

* add scale for AF2 flash_attn, much thanks to xreki and shaojie for debug these codes

* decrease the effect of debug print on performance

* Unify the initialize of flashattn arguments.

* Rewirte the reshape of temp_mask and temp_bias.

* API support use_flash_attn.

* Fix compiling error on CI.

* Try to crop the flash-attention lib.

* Correct the condition of whether can use flash-attn.

* Remove the softmax_out argument.

* Remove is_causal.

* Polish codes.

* Fix qkv_transpose_out's shape and scaling of Q * K.

* Update commit of flash-attention.

---------
Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>

d29c1f8e

G

test,test=develop (#53811) · 10758725
由 Galaxy1458 提交于 5月 19, 2023

10758725
X
【prim】merge branch for GradOpMaker codeGen to clear code (#53874) · 6cb53e91
由 xiaoguoguo626807 提交于 5月 19, 2023
```
* review

* modify opcompat bug

* modify pybind
```
6cb53e91
R

[CustomDevice] fix buffered reader exception (#53925) · b922e711
由 ronnywang 提交于 5月 19, 2023

b922e711

18 5月, 2023 6 次提交

Fused elementwises kernels and ops (#51427) · fb4a6ecf

由 Hulek 提交于 5月 18, 2023

* Fused elementwises kernels and ops

* change fuse pass name

* adjust .pbtxt files

* adjust quantization attributes

* add missing arguments and fix others, review fixed

* simplify fused kernel registration

* fix elementwise unit tests

* reuse one fused elementwise op

* adjust proto

* Add supported datatypes

* Change 'Scale' to 'scale' in tests, change some tests to onednn

* Revert breaking changes

* Fix unit tests

* Delete obsolete test cases

* Delete commented out code

* Fix codestyle

* delete temporary condition

* fix conflicts and delete duplicate fusing

* Fix code after merge

* Move tests to new directory

* fix tests volatility

* Rename test_elementwise_add_onednn_op.py to test_elementwise_add_mkldnn_op.py

* Update CMakeLists.txt add mkldnn op test

---------
Co-authored-by: NSilv3S <slawomir.siwek@intel.com>

fb4a6ecf

H

move fusion_group kernel to phi (#53781) · 26da689d
由 huangjiyi 提交于 5月 18, 2023

26da689d
W
move sequence_mask op InferShape func (#53782) · a862debf
由 Wang Xin 提交于 5月 18, 2023
```
* move sequence_mask op InferShape func

* add dtype infer
```
a862debf
C

Fix typos in elementwise dir (#53907) · 2782b291
由 co63oc 提交于 5月 18, 2023

2782b291

support auto generate for op layer_norm (#53178) · 4f07b653

由 RedContritio 提交于 5月 18, 2023

* simplify layer_norm_op.cc

* support auto generate for op layer_norm

* update unittest for composite_layer_norm

* remove layer_norm_op.cc from scripts

* replace layer_norm_op with generated_op

* add get_expected_kernel for layer_norm

* update cmake kernel register function for layer_norm_mkldnn_op

4f07b653

C

Fix typos in send_v2_op.cu.cc (#53904) · 65ce6886
由 co63oc 提交于 5月 18, 2023

65ce6886

17 5月, 2023 1 次提交
- G
  
  [fluid] decoupling abn op (#53826) · 38e5cd00
  由 gouzil 提交于 5月 17, 2023
  
  38e5cd00
16 5月, 2023 9 次提交

G
remove some [-Wunused-parameter] warning and fix a file to pass cpplint (#53814) · 10a38b4e
由 Galaxy1458 提交于 5月 16, 2023
```
* test,test=develop

* test,test=develop

* test,test=develop

* test,test=develop

* test,test=develop
```
10a38b4e

【static】modify backward prune logic for EmptygradOpMaker (#53746) · 69161a96

由 xiaoguoguo626807 提交于 5月 16, 2023

* add rules

* modify no kernel yaml parse

* success op generate

* success test_silu_double

* modify bug

* modify static error

* modify silu_grad input

* modify kernel signature

* modify kernel signature

* code style

* code style

* review

* delete opinfo modify

* modify gradOpMaker

* modify gradOpMaker

* modify genarated-j2

* add approve rules

* modify aytograd_functional_static_test

69161a96

move cudnn_lstm kernel to phi (#53730) · 52889e38

由 huangjiyi 提交于 5月 16, 2023

* update

* fix bug

* test

* test

* update

* update mutable_data

* fix bug

* update

* fix bug

* update output type reg

* update

* update

52889e38

张

由张春乔提交于 5月 16, 2023

* rm npu

* rm use_npu

* rm npuid

* rm use_npu

* rm npuid

* delete npupinned

* roll back sth.

* roll back sth.

* delete npupinned

* roll back sth.

* roll back sth.

* rm npu

* rollback something

* rollback npu identity

* rollback npu identity

5b054d2f

Move fused batchnorm to Phi (#53476) · 5e5481d8

由 Sonder 提交于 5月 16, 2023

* trans fused batch norm Compute function

* trans batch norm register info to phi

* trans fused batch norm grad Compute

* trans batch norm grad register info

* add sig file

* update sig file

* Update fused_bn_activation_kernel.cu

* Update fused_bn_activation_grad_kernel.cu

* fix

* Rename fused_bn_activation_kernel_grad.cu to fused_bn_activation_kernel.cu

* fix

* fix

* fix CudnnDataType error

* fix

* fix include

* update

* add #if

* add fused bn act to cmakelist.txt

* update  cmakelist

* fix #ifdef error

* add timeout set

* add env set

* fix

* fix

* Update fused_bn_activation_sig.cc

5e5481d8

static graph autogen code support for softmax op (#53581) · 312f0187

由 Wang Xin 提交于 5月 16, 2023

* static graph autogen code support for softmax op

* bug fixed

* fix PR-CI-Windows error

* fix CI error

* bug fixed

* fix conflicts

312f0187

C

support auto generation V2 abs (#53341) · b86bbe85
由 cyberslack_lee 提交于 5月 16, 2023

b86bbe85

张

[static op generation] InstanceNorm (#53340) · 7b81092b

由张春乔提交于 5月 16, 2023

* mv InstanceNorm

* modify op_version.yaml

* modify add Operator:: in get_expected_kernel_func.cc

* rm gradexpectedkernel

* add extra

* add float epsilon=1e-5

7b81092b

[phi] move stft to phi - Step 1 (#53517) · 00c21abc

由 gouzil 提交于 5月 16, 2023

* [phi]mv StftKernel to phi

* [phi] fix KernelSignature

* [phi]fix arr error

* [phi] Disable check_dygraph

* [phi]fix include

* [phi] rewrite mutable_data, add output register

* [phi] fix  Alloc

* [phi] fix Alloc again

* [phi] fix mutable_data

* [phi] fix onesided_out Resize

00c21abc

15 5月, 2023 3 次提交

H
move dequantize kernel to phi (#53739) · efd410c8
由 huangjiyi 提交于 5月 15, 2023
```
* update

* fix bug

* fix output type def
```
efd410c8
R

[CustomDevice] add inference MP support, PART2 (#53701) · e04f8d4a
由 ronnywang 提交于 5月 15, 2023

e04f8d4a

Silu double grad (#53605) · 94c38803

由 xiaoguoguo626807 提交于 5月 15, 2023

* add rules

* modify no kernel yaml parse

* success op generate

* success test_silu_double

* modify bug

* modify static error

* modify silu_grad input

* modify kernel signature

* modify kernel signature

* code style

* code style

* review

* delete opinfo modify

94c38803

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功