提交 · 669353c1e2743a3a01876278fbc030199ffbd0dc · 机器未来 / Paddle

02 8月, 2022 13 次提交

由 seemingwang 提交于 8月 02, 2022

* move renorm op

* change python api

* change op class func

* alloc data

* remove comments

* fix grad arguments

* fix python argument

* fix python argument

* change unit-test

* remove shape func registration

* recover extra-arguments

* recover shape functor

669353c1

C
【PFCC算子性能优化】 SeluKernel Optimization (#44490) · 859c4077
由 carryyu 提交于 8月 02, 2022
```
* [PFCC] SeluKernel Optimization

* selu kernel optimization

* add private

Co-authored-by: carryyu <>
```
859c4077

Multihead matmul fp16 (#44792) · 0fd8ee63

由 Wilber 提交于 8月 02, 2022

* multihead matmul add fp16

* fix windows error

* fix rocm error

* fix rocm error

0fd8ee63

D

fix gpups CUDADeviceContext to phi-GPUContext;test=develop (#44804) · 3491d183
由 danleifeng 提交于 8月 02, 2022

3491d183

[Eager] use eager final state instead intermediate state (#44722) · f1873b90

由 Weilong Wu 提交于 8月 02, 2022

* [Eager] call final_state_slice under eager mode

* rm useless comments

* use eager final state instead intermidiate state

* update fill_constant yaml

* update fill_constant yaml

* modify wrapped_infermeta_gen logic to fix special case

* fix slice in manipulation

* use fill_constant_

* modify slice infermeta

* rm final_state_conv2d

* use final_state_slice

* use final_state_slice only

* polish slice, use final state

* add paddle_throw for SplitInferMeta

* rm fill_constant_ temply

* recover array_equal, not allclose

* recover original code

f1873b90

[Phi] Move QR to Phi (#44742) · 2cf2e786

由 Yulong Ao 提交于 8月 02, 2022

* [Phi] Move Qr to the Phi

* [Phi] Regiter the cpu grad kernel for qr

* [Phi] Share the cuda kernels to lstsq

* [Phi] Remove some improper inlcude files

* [Phi] Modify codes based on the reviews

* [Phi] Remove unecessary files and add the cuda_only comment

* [Phi] Remove the unecessary include file

* [Phi] Remove qr_op.cu and lstsq_op.cu

2cf2e786

X
[Eager]Menual fused_gemm_epilogue (#44748) · a2980169
由 xiaoguoguo626807 提交于 8月 02, 2022
```
* manuel_fused_gemm_epilogue
```
a2980169
W
[Phi] polish and rename, pt* -> phi* (#44697) · 942ff89f
由 Weilong Wu 提交于 8月 02, 2022
```
* polish and rename, pt* -> phi*

* fix code format
```
942ff89f
H
[XPU] fp16 for layer_norm op (#44778) · 4c3e13de
由 houj04 提交于 8月 02, 2022
```
* [XPU] fp16 for layer_norm op. test=kunlun
```
4c3e13de
R
Skip inplace for coalesce_tensor_op outputs (#44795) · bb22e59c
由 Ruibiao Chen 提交于 8月 02, 2022
```
* Skip inplace for coalesce_tensor_op outputs

* Fix typos

* Add UTs

* Fix typos
```
bb22e59c

[phi] add yolov3_loss yaml and unittest (#44476) · c7cf12fc

由 ccrrong 提交于 8月 02, 2022

* add yaml and unittest

* update yaml

* update backward yaml and unittest

* update yaml

* add Yolov3LossGradInferMeta

* update yolov3_loss_op.cc

* fix bug

* code format

c7cf12fc

support beam_search operator on xpu. test=kunlun (#44720) · 9bf80772

由 mengqingchun02 提交于 8月 02, 2022

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

9bf80772

R
Refactor build_op_downstream_map for standalone executor (#44729) · 9b97ac70
由 Ruibiao Chen 提交于 8月 02, 2022
```
* Refactor build_op_downstream_map for standalone executor

* Add some comments
```
9b97ac70

01 8月, 2022 14 次提交

unify gpu context (#44740) · 86763023

由 Leo Chen 提交于 8月 01, 2022

* remove cudaDeviceContext

* remove more template

* fix rocm compile

* remove alias name CUDADeviceContext

* fix compile

* fix tests

* revert changes

86763023

Z

Revert for cmake static library errors on XPU KP #44762 · f15d930a
由 zhiboniu 提交于 8月 01, 2022

f15d930a

GPUGraph merge to develop (#44594) · 798670bb

由 danleifeng 提交于 8月 01, 2022

798670bb

[Sparse] optimize sparse attention (#44743) · 1149a378
由 zhouweiwei2014 提交于 8月 01, 2022

1149a378
W
[JitLayer]Polish PEFuntion to speed up JitLayer and fix memory leak (#44738) · 75122319
由 WangZhen 提交于 8月 01, 2022
```
* Polish PEFuntion to speed up JitLayer

* Polish PEFunction code

* Fix comments
```
75122319
A

generate_unify_header supports excludes (#44761) · 212f015f
由 Aganlengzi 提交于 8月 01, 2022

212f015f

[operator migration] Migrate unstack_op and nms_op (#44424) · 9d2e0ecb

由 Thomas Young 提交于 8月 01, 2022

* update unstack_op

* update unstack_op

* update unstack_op

* fix unstack test

* update unstack

* update with remote

* fix unstack_test.py

* temp_save_change_nms_op

* add nms test

* update nms fix

* update unstack_op

* temp save change

* finish fix nms_op

* pass nms test

* fix CI

* fix ops test

* save change

* fix code style

* fix code style

* fix ci and codestyle

* fix ci
Co-authored-by: NShiningZhang <zhang_liang1991@126.com>

9d2e0ecb

W
infer context fix place error. (#44726) · 74e46a93
由 Wilber 提交于 8月 01, 2022
```
* infer context fix place error.

* update

* update
```
74e46a93
J
Fix to CI (#44744) · 71f74f5c
由 Jacek Czaja 提交于 8月 01, 2022
```
* - fix

* - another fix

* lint
```
71f74f5c
L
migrate overlap_add and overlap_add_grad op (#44739) · 2a8219c1
由 levi131 提交于 8月 01, 2022
```
* update code format

* add ymal and test

* update for comments
```
2a8219c1
W
[Paddle Inference] add varlen_token_prune plugin, pass, convert (#44733) · 24187fcb
由 Wangzheee 提交于 8月 01, 2022
```
* add varlen_token_prune plugin, pass, convert
```
24187fcb
X

migrate reduce_amin,reduce_amax kernel to phi (#44698) · 8482f1ae
由 Xiaoxu Chen 提交于 8月 01, 2022

8482f1ae

[PHI] Move lu_unpack to phi (#44674) · c905a9e9

由 Lin Manhui 提交于 8月 01, 2022

* Add kernel declarations

* Copy kernel implementation code

* Transfer implementation code

* Register new kernels

* Remove old kernels

* Fix code style

* Fix bugs

* mutable_data->HostAlloc

* Transfer infermeta

* Add yaml and update python api

* Add PADDLE_WITH_HIP check

* Update unittests

* Add kernel declarations

* Copy kernel implementation code

* Transfer kernel implementation code

* Register new kernels

* Remove old kernels

* Add lu_unpack_sig

* Fix bugs

* Fix bugs

* Fix bugs

* Optimize directory structure

* Add output checks

* Update include files

* lu_impl.h->lu_kernel_impl.h

* Transfer infermeta

* Add yaml and update python api

* Add check_eager
Co-authored-by: NBobholamovic <linmanhui@baidu.com>

c905a9e9

H

ort backend support output mutable data (#44724) · 3948c243
由 heliqi 提交于 7月 31, 2022

3948c243

30 7月, 2022 1 次提交
- Z
  Phi prior box (#44431) · d92b2f2d
  由 zhiboniu 提交于 7月 30, 2022
```
* phi_prior_box

* add float[] support

* phi_prior_box_optest

* update
```
  d92b2f2d
29 7月, 2022 12 次提交

L
unify fluid::CUDADeviceContext and phi::GpuContext (#44723) · 88490567
由 Leo Chen 提交于 7月 29, 2022
```
* remove cudaDeviceContext

* remove more template

* fix rocm compile
```
88490567

[API/OP] Migrate Lstsq op into phi (#44318) · ab2aaf8b

由 Haohongxiang 提交于 7月 29, 2022

* migrate lstsq op

* update

* fix bugs for CIs

* update

* fix bugs

* add uts

* update

* update

* update

* fix bugs of jip

* fix bugs of hip

* update

* update according to review

* update

* update

* update

* update

ab2aaf8b

Q
add some fp16 op for kunlun resnet50 model (#44672) · fecbc958
由 QingshuChen 提交于 7月 29, 2022
```
* add some fp16 op for kunlun resnet50 model
*test=kunlun

* tmp
*test=kunlun
```
fecbc958
Z

phi_multiclass_nms3 (#44613) · a9919903
由 zhiboniu 提交于 7月 29, 2022

a9919903
A
add FLAGS_enable_api_kernel_fallback (#44706) · e439d735
由 Aganlengzi 提交于 7月 29, 2022
```
* add FLAGS_enable_api_kernel_fallback

* deal with more cases

* add ut for coverage
```
e439d735

[WIP] Matmul v1 & v2 unification -- part 1 (#44640) · 653885a5

由 Jacek Czaja 提交于 7月 29, 2022

* - Unit tests to be debugged

- fix

- refactor

- diagnostic

- more diagnostic

- fix

- Fix number two

- fix

- fix

- fix

- alpha added

- more fixes

- compilation fix

- removed diagnostic code

- cosmetic fixes

* lint

653885a5

Phi softplus migration (#44542) · 05515662

由 Wang Bojun 提交于 7月 29, 2022

* add yaml and utests of phi softplus

add yaml of softplus

fix softplus bug in phi

* update utests

* bug fix

* bug fix for test_layers

* layer api match

* match def and doc in ops.py

* doc polish

* fix unwanted modified of thresholded_relu

* style imporve

05515662

C
skip cast trt convert when input dtype is bool (#44716) · 5d94618d
由 ccrrong 提交于 7月 29, 2022
```
* skip cast trt convert when input dtype is bool
```
5d94618d

[Auto parallel] Optimization Tuning (#43782) · 72f2ed43

由 JZ-LIANG 提交于 7月 29, 2022

* fixed bug for pass & engine

* fixed bug for benchmark GPT-3

* add tuner & profiler

* add algorithms & config

72f2ed43

move CUDAStream to phi (#44529) · da3743fd

由 Leo Chen 提交于 7月 29, 2022

* init

* move CUDAStream to phi

* fix compilation

* merge develop

* add stream_owned_ member

* split cuda_stream.h

* fix cpu compile

* fix constructor

* fix bug

* fix windows compile

* fix inference test_levit

* fix windows tests

da3743fd

A

update to sdk2.6.0 (#44673) · 23ad0cc4
由 Allen Guo 提交于 7月 29, 2022

23ad0cc4
J

Support backward final hook (#44686) · 8c43c0fe
由 Jiabin Yang 提交于 7月 29, 2022

8c43c0fe

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致