提交 · 291d894184d3bc714a1caec47ea1ddbba2f90640 · BaiXuePrincess / Paddle

22 3月, 2022 1 次提交
- Z
  
  Adjusted CUDA arches for NEW_RELEASE_ALL (#40660) · 71b813f0
  由 Zhanlue Yang 提交于 3月 22, 2022
  
  71b813f0
17 3月, 2022 1 次提交

CopyFromCpu and CopyToCpu of Onnxruntime back-end optimize (#40561) · fcbb7440

由 heliqi 提交于 3月 17, 2022

* add onnxruntime predictor

* Add code comments

* support link paddle2onnx onnxruntime

* support onnxruntime with python

* support onnxruntime with python

* support onnxruntime with windows

* paddle2onnx compile with windows

* supoort windows compile

* supoort windows compile with onnxruntime

* supoort windows compile with paddle2onnx

* supoort mac compile

* compile with mac

* compile with mac

* add code comments

* fix remind word

* code optimization

* add test case

* add test case

* add inference demo_ci test case

* fix compile paddle2onnx with no python

* add inference demo_ci test case

* add inference demo_ci test case

* add inference infer_ut test case

* support c go api and test cases

* add converage test case

* add converage test case

* add capi test case

* add capi test case

* fix onnxruntime copyfromcpu and copytocpu

* fix goapi

* modify code

fcbb7440

14 3月, 2022 1 次提交
- 王
  
  [infrt] unify the infrt dialect. test=develop (#40451) · 481db5e9
  由王明冬提交于 3月 14, 2022
  
  481db5e9
12 3月, 2022 1 次提交
- J
  fix NetBuilder API Name bug in cinn_lib_test (#40392) · 69a01c47
  由 jiangcheng 提交于 3月 12, 2022
```
* fix NetBuilder API Name bug in cinn_lib_test

* update cinn version to newest
```
  69a01c47
10 3月, 2022 2 次提交

Inference add ONNXRuntime back-end (#39988) · 431afc39

由 heliqi 提交于 3月 10, 2022

* add onnxruntime predictor

* Add code comments

* support link paddle2onnx onnxruntime

* support onnxruntime with python

* support onnxruntime with python

* support onnxruntime with windows

* paddle2onnx compile with windows

* supoort windows compile

* supoort windows compile with onnxruntime

* supoort windows compile with paddle2onnx

* supoort mac compile

* compile with mac

* compile with mac

* add code comments

* fix remind word

* code optimization

* add test case

* add test case

* add inference demo_ci test case

* fix compile paddle2onnx with no python

* add inference demo_ci test case

* add inference demo_ci test case

* add inference infer_ut test case

* support c go api and test cases

* add converage test case

* add converage test case

* add capi test case

* add capi test case

431afc39

add tril_triu for xpu, *test=kunlun (#40246) · 1128db30

由 z8hanghuan 提交于 3月 10, 2022

* add tril_triu for xpu, *test=kunlun

* add tril_triu for xpu, *test=kunlun

* add tril_triu for xpu, *test=kunlun

* add tril_triu for xpu, *test=kunlun

* add tril_triu for xpu, *test=kunlun

1128db30

08 3月, 2022 2 次提交

C
[Phi] Remove gpudnn suffix & polish cmake (#40239) · 3a77d027
由 Chen Weihang 提交于 3月 08, 2022
```
* remove gpudnn suffix & polish cmake

* fix typo
```
3a77d027

[Phi]Move Relu/Cos/Sin/Tan/Acos/Asin/Atan/Sinh/Cosh/Asinh/Acosh/Atanh kernels... · 975f99ab

由 YuanRisheng 提交于 3月 08, 2022

[Phi]Move Relu/Cos/Sin/Tan/Acos/Asin/Atan/Sinh/Cosh/Asinh/Acosh/Atanh kernels in Activation to Phi (#40175)

* move activation op

* adjust code format

* fix compile bugs

* fix ci bugs

* code format adjust

* code format adjust2

* activate ci status

* modify according to comment

975f99ab

07 3月, 2022 2 次提交

王

[infrt] fold the infrt.cvtTensorOp. test=develop (#40214) · b798fb07
由王明冬提交于 3月 07, 2022

b798fb07

cuBlasLt Epilogue To Fuse Linear + ReLU|GeLU (#39437) · 2a3d9eca

由 Ming-Xu Huang 提交于 3月 07, 2022

* Added cuBlasLtHandle_t to device context.

* Added fused_gemm_epilogue op.

1. Added fused_gemm_epilogue op to leverage cuBlastLt Epilogue.
2. Support fusion Act(X*Y + bias), X'dims >=2 and Y'dims shoule be 2.
2. Act currently only be supported ReLU. (Will add GeLU in the future).

* Added UT to fused_gemm_epilogue op.

* Added LinearAct Pattern

1. Added LinearAct into graph_pattern_detector.* to define (2.)'s
pattern.
2. LinearAct is used to detect act(element_add(matmul_v2(x, w), bias)).
3. act currently only support ReLU (Will support GeLU in the future).

* Added FuseGemmEpiloguePass

1, Added FuseGemmEpiloguePass to handle nn.Linear + Act{ReLU}
fusion (GeLU will be supported in the future).
2. Only support matmul_v2 from nn.Linear.

* Added pybind to BuildStrageter.fuse_gemm_epilogue_.

* Added UT for fuse_gemm_epilogue_pass.

* GeLU support and EpilogueSingleton

1. Added GeLU support to fused_gemm_epilogue op.
2. Added EpilogueSingleton to cache auxiliary pointer.
3. Added related UTs.

* Rename cublaslt_epilogue_opto gemm_epilogue_op.*.

* Added both train and infer pattern to LinearAct.

1. Added support of fwd graph with grap_ops linking to LinearAct.
2. Added related changes to fuse_gemm_epilogue_pass for above
modification.

* Changed CUDA requirement from 11.4 to 11.6 for fuse_gemm_epilogue_pass.

* Added identity activation support to gemm_epilogue_op.

* Added Linear Fusion (matmul_v2 + ele_add)

1. Added matmul_v2 + ele_add pattern to LinearActPattern.
2. Added matmul_v2 + ele_add support to fuse_gemm_epilogue_pass.

* Rename gemm_epilogue_op.* to fused_gemm_epilogue_op.*

* Add fused_gemm_epilogue_grad op.

1. Added fused_gemm_epilogue_grad to support backward epilogue fusion.

* Add UTs to fused_gemm_epilogue_grad_op.

* Change attribute name in fused_gemm_epilogue_grad_op for clearing.

* Allow DX and DBias be dispensable to fused_gemm_epilogue_grad op.

* Added ElementwiseAdd+Matmul+Act graph pattern detection.

* Fuse backward of Linear( Act(x))

1. Added backward fusion pass to Linear( Act(x)).
2. Added backward fusion pass to Linear(x).

* Added UTs to backward fusion of Linear(Act(x)).

* Complete document of arguments to fused_gemm_epilogue_op.

* Made arguments of some functions pass by reference.

* Modify code with review comments.

1. Made arguments of some function pass by reference.
2. Removed redundant code.
3. Followed Google code style to change code.

* Made 'const' code style be consistent

* Fixed random seed of python UTs.

* Set Compiling constrains to cuBlasLt

1. Require CUDA 11.6+
2. Remove fuse_gemm_epilogue related tests when CUDA < 11.6.

* Code Reivew from Paddle

1. Changed arguments name is_first_gemm to without_x_gradient for
clearing.
2. Applied PADDLE_THROW in fused_gemm_epilogue_op.

* Remove EpilogueSingleton

1. Applied ReserveSpace to replace Epilogue for passing auxiliary
pointers between FWD and BWD.

* Fix a logical error and enhance UTs.

1. Added act op count checking in UTs.
2. Fix issue to fuse backward or ReLU(Linear(X)).
3. TODO: solve GELU fusion issues.

* Fix Linear and GeLU fusion issues.

1. Modified graph_detech_pattern to fit with both linear wiht gelu or
relu.
2. Modified data range in Uts to allow negative values.

* Removed fused_gemm_epilogue_op.h.

* Rename namespace pten to phi.

* Rename name of arguments in fused_gemm_epilogue_op

1. bias -> Bias.
2. out -> Out.
3. reserve_space -> ReserveSpace.

* Change EpiloguePassActivationCache as local variable.

1. Removed singleton in EpiloguePassActivationCache.
2. Made EpiloguePassActivationCache as an argument to each pass
functions.

2a3d9eca

04 3月, 2022 1 次提交
- Q
  
  [ROCm] fix hip test to update LD_LIBRARY_PATH, test=develop (#40153) · a7e4cdaf
  由 Qi Li 提交于 3月 04, 2022
  
  a7e4cdaf
02 3月, 2022 2 次提交

Z
Adjust GPU Arches for next level Whl release strategy (#39910) · 3fc698fb
由 Zhanlue Yang 提交于 3月 02, 2022
```
* Adjust GPU Arches for Whl releases

* Adjusted CUDA arches

* fixed minor issue

* adjusted gpu arches
```
3fc698fb

[Pten] Gru lstm migration (#39729) · e4dba69a

由 Feiyu Chan 提交于 3月 02, 2022

* move sequence2batch

* move lstm and gru

* Add phi/kernels directory into exclusion to stop using hipcc to compile non .cu files in it.

e4dba69a

01 3月, 2022 2 次提交

[Phi] Support kps backend and kernel registry (#39941) · 08b43cce

由 Chen Weihang 提交于 3月 01, 2022

* support kps backend and compile

* resolve conflict

* fix kps backend trans

* test in xpu2 device

* remove dummy kernel

08b43cce

optimize mergeadd for sparse_adam,*test=kunlun (#39966) · d4911594

由 z8hanghuan 提交于 3月 01, 2022

* optimize mergeadd for sparse_adam,*test=kunlun

* optimize mergeadd for sparse_adam,*test=kunlun

* optimize mergeadd for sparse_adam, *test=kunlun

d4911594

28 2月, 2022 2 次提交

【infrt】add TrtOpConverterPass (#39902) · 35471b1f

由 Shang Zhizhou 提交于 2月 28, 2022

* add some trt layers

* trtOpConverter pass ok

* add comments

* add constraints to some attrs in the pd_lower_to_trt patterns

* update constraint

* fix code style

* update pass name

* update code style

* change .hpp.inc to .cc.inc in mlir_add_rewriter

35471b1f

[KP] Unify .cu and .xpu files with .kps files (#39917) · 0ff72e5d

由 Liu-xiandong 提交于 2月 28, 2022

* [KP] Unify .cu and .xpu files with .kps files

* fix CI bug in GPU and modify the list

* fix conflict

* modify the date

0ff72e5d

25 2月, 2022 1 次提交

[Phi] Support cudnn kernel moving & move softmax kernels (#39547) · 8895379a

由 Chen Weihang 提交于 2月 25, 2022

* support cudnn kernel moving

* polish cmake rules

* add unittest for coverage

* remove orig kernel

* remove softmax cudnn kernel

* fix softmax test failed

* fix npu func error

* resolve conflict

* rename gpu dnn kernels

* fix name rule error

* fix compile error

* update fp16 namespace

8895379a

24 2月, 2022 2 次提交
- C
  [PTen->Phi PR3] Rename pten make target to phi (#39832) · f77019a0
  由 Chen Weihang 提交于 2月 24, 2022
```
* rename pten to phi

* fix infrt compile failed

* resolve conflict
```
  f77019a0
- C
  [PHi] Skip kernel declare for cuda only kernel on rocm (#39869) · 76a6b88d
  由 Chen Weihang 提交于 2月 24, 2022
```
* skip kernel declare for cuda only kernel on rocm

* fix error
```
  76a6b88d
23 2月, 2022 1 次提交

[KP] Add elementwise add xpu after phi, test=develop (#39787) · 1a1a2ce8

由 Liu-xiandong 提交于 2月 23, 2022

* [KP] Add elementwise add xpu, test=develop

* modify the File Permissions

* modify the copyright time

* modify code style

* modify code style

1a1a2ce8

22 2月, 2022 2 次提交
- Z
  
  add hard_swish in xpu2_op_list.h and update xpu.cmake,test=kunlun (#39586) · 8d1d0bdf
  由 zhangyikun02 提交于 2月 22, 2022
  
  8d1d0bdf
- C
  [PTen->Phi PR2] Rename PT_REGISTER macro to PD_REGISTER (#39790) · 4a338796
  由 Chen Weihang 提交于 2月 22, 2022
```
* unify register macro

* rename declare macro

* fix infrt error
```
  4a338796
20 2月, 2022 1 次提交

[PTen->Phi PR1] Change pten dirname and namespace to phi (#39748) · dcfe1986

由 Chen Weihang 提交于 2月 20, 2022

* rename pten dir to phi

* rename namespace to phi

* rename infrt pten dir to phi

* resolve conflict

* rename pten to phi in cmake

* revert all infrt change

* change needed files

* fix infrt failed

* fix inference failed

dcfe1986

19 2月, 2022 2 次提交

[Pten] Add selected_rows kernel for Full (#39465) · 79f8eeca

由 zyfncg 提交于 2月 19, 2022

* Add selected_rows kernel for full

* remove fill_constant register in fluid

* fix bug without GPU

* add jit_kernel_helper dependency for fc

* do some refactor

* add unittest for ops signatures

* add coverage unittest

* fix merge conflict

* fix full selectew_rows bug

79f8eeca

C
[PTen] Support parse cc file in gpu (#39691) · b29c05c7
由 Chen Weihang 提交于 2月 19, 2022
```
* support parse cc in gpu

* change file name
```
b29c05c7

18 2月, 2022 2 次提交
- F
  [Pten] blas and lapck migration (#39587) · 8c7ee8c2
  由 Feiyu Chan 提交于 2月 18, 2022
```
* move blas related files
* move lapack related files
```
  8c7ee8c2
- A
  [IPU] Update IpuStrategy (#39644) · 46161679
  由 Allen Guo 提交于 2月 18, 2022
```
* Update IpuStrategy

* fix ci

* rerun ci
```
  46161679
17 2月, 2022 1 次提交

add softplus op for kunlun2. test=kunlun (#39555) · 9f99b591

由 houj04 提交于 2月 17, 2022

* add softplus op for kunlun2. test=kunlun

* add softplus op for kunlun2. test=kunlun

* fix code style. test=kunlun

* fix code style. test=kunlun

* add more test cases. test=kunlun

9f99b591

15 2月, 2022 3 次提交

[PluggableDevice] Add custom runtime support (#38740) · 3e7825f3

由 ronnywang 提交于 2月 15, 2022

* [CustomRuntime] Add DeviceManager

* [CustomRuntime] Add DeviceInterface

* [CustomRuntime] Add Stream, Event, DeviceGuard, CallbackManager

* [CustomRuntime] Add plug-in device

* [CustomRuntime] Memory module support PluggableDevice

* [CustomRuntime] Add WITH_PLUGGABLE_DEVICE cmake option

* update

* [API] update API doc based on comments, test=develop
Co-authored-by: Nqili93 <qili93@qq.com>

3e7825f3

S
fix bug when use extern_openblas and generator is ninja (#39428) · f73f5b06
由 Sing_chan 提交于 2月 15, 2022
```
* fix bug when use extern_openblas and generator is ninja

* modify according to zhouwei's comment
```
f73f5b06

new way of test case, 2nd, *test=kunlun (#39478) · 4745234f

由 z8hanghuan 提交于 2月 15, 2022

* new way of test case, 2nd, *test=kunlun

* new way of test case, 2nd, *test=kunlun

* new way of test case, 2nd, *test=kunlun

4745234f

14 2月, 2022 1 次提交
- Q
  
  [ROCm] fix missing dcu kernel in operator.cmake, test=develop (#39480) · 55da9344
  由 Qi Li 提交于 2月 14, 2022
  
  55da9344
11 2月, 2022 1 次提交
- Z
  
  get build time (#39368) · 72ad280b
  由 zhangchunle 提交于 2月 11, 2022
  
  72ad280b
02 2月, 2022 1 次提交

[PTen] Remove kernel alias name (#39321) · 5dc20c27

由 Chen Weihang 提交于 2月 02, 2022

* remove kernel alias name

* fix depreacted error

* fix deprecated failed

* fix mean error

* resolve conflict

* fix windows failed

5dc20c27

30 1月, 2022 1 次提交
- feat(cncl_mlu): add cncl dev for mlu distributed backend (#39294) · d28f6f7b
  由 mhhhh1 提交于 1月 30, 2022
  
  d28f6f7b
29 1月, 2022 2 次提交

Add xpu2 compiler (#37254) · 92da5055

由 Liu-xiandong 提交于 1月 29, 2022

* Add XPU compiler for paddle, test=develop

* clean code

* clean useless code

* clean useless code

* clean useless code

* test

* add include path

* use clang compiler

* xpu2.cmake

* XPU2 compiler passed

* update

* update after pten

* combination the WITH_XPU and WITH_XPU2

* update the fuse operation in WITH_XPU and WITH_XPU2

* update

* update

* update

* fix the merge error

* update

* update the code

* update the code

* add run_kp_kernel flag

* update

* update

* fix prepared type_ bug

* clean and update the code

* reset the kernel_primitives

* update

* clean the code

* delete useless comment

* fix the bug in WITH_XPU

* update

* update

* modify the abi

* delete some useless code

* Parameter automation in xpu compilation

* Parameter automation in xpu compilation

* delete kps in cmake

* delete useless comment

* clean the code

* clean the code

92da5055

J

Update register_kernels and kernel_library function in pten.cmake (#39259) · 6b3a6a9f
由 Jack Zhou 提交于 1月 29, 2022

6b3a6a9f

28 1月, 2022 1 次提交
- Y
  [PTen]Refactor scale kernel that has selected_rows input (#39278) · abfc2fe9
  由 YuanRisheng 提交于 1月 28, 2022
```
* refactor scale kernel that its input is selected_rows

* complement upload file
```
  abfc2fe9
27 1月, 2022 1 次提交
- T
  compile for afs api (#39113) · 4748486e
  由 Thunderbrook 提交于 1月 27, 2022
```
* compile for afs api

* with pslib
```
  4748486e

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致