提交 · 5087fe20c6984261308d5895522c524cefbe64f8 · Crayon鑫 / Paddle

14 4月, 2022 2 次提交
- H
  
  fix xpu cmake lib name. test=kunlun (#41786) · 5087fe20
  由 houj04 提交于 4月 14, 2022
  
  5087fe20
- Z
  [XPUPS]modify xpu_kp.cmake with HETERPS&PSLIB (#41760) · 7e7d2300
  由 zmxdream 提交于 4月 14, 2022
```
* modify xpu_kp.cmake with HETERPS&PSLIB

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop
```
  7e7d2300
13 4月, 2022 1 次提交
- H
  Update sign op xpu (#41685) · a4d4c116
  由 houj04 提交于 4月 13, 2022
```
* update sign op on xpu. test=kunlun

* fix typo. test=kunlun
```
  a4d4c116
12 4月, 2022 3 次提交
- N
  
  Replaced cp with copy in xpu_cmake (#41542) · 18f569c3
  由 niuliling123 提交于 4月 12, 2022
  
  18f569c3
- Q
  
  update kunlun xdnn (#41657) · a688ae2e
  由 QingshuChen 提交于 4月 12, 2022
  
  a688ae2e
- Z
  
  Adjusted CUDA Arches (#41628) · cade0018
  由 Zhanlue Yang 提交于 4月 12, 2022
  
  cade0018
11 4月, 2022 2 次提交
- A
  
  support more ops (#41421) · fc621dfe
  由 Allen Guo 提交于 4月 11, 2022
  
  fc621dfe
- S
  
  update lite compile cmake (#41512) · 535810ba
  由 shentanyue 提交于 4月 11, 2022
  
  535810ba
08 4月, 2022 2 次提交
- modify unittest of lstm forward, *test=kunlun (#41534) · d4710dfe
  由 z8hanghuan 提交于 4月 08, 2022
```
* modify unittest of lstm forward, *test=kunlun

* modify unittest of lstm forward, *test=kunlun
```
  d4710dfe
- Z
  
  Fix libmct.cmake tar ownership change (#41516) · 70036d5d
  由 Zhong Hui 提交于 4月 08, 2022
  
  70036d5d
07 4月, 2022 2 次提交
- L
  
  add send/recv to/from switch module for PrcoessGroupHeter (#41285) · 633ac4e6
  由 lilong12 提交于 4月 07, 2022
  
  633ac4e6
- H
  momentum support l2decay for xpu. test=kunlun (#41325) · 533c649f
  由 houj04 提交于 4月 07, 2022
```
* momentum support l2decay for xpu. test=kunlun

* fix include file. test=kunlun

* fix cmake for device_worker. test=kunlun
```
  533c649f
06 4月, 2022 1 次提交
- A
  [IPU] remove paddle_ipu shared library (#41307) · 229e91bf
  由 Allen Guo 提交于 4月 06, 2022
```
* remove paddle_ipu shared library

* fix unique_name
```
  229e91bf
01 4月, 2022 1 次提交

support multi_layer of bilstm,*test=kunlun (#41151) · 00d23897

由 z8hanghuan 提交于 4月 01, 2022

* support multi_layer of bilstm,*test=kunlun

* support multi_layer of bilstm, *test=kunlun

* support multi_layer of bilstm, *test=kunlun

* support multi_layer of bilstm, *test=kunlun

00d23897

31 3月, 2022 2 次提交
- C
  [Phi] Fix kps compile failed (#41129) · 1faefc93
  由 Chen Weihang 提交于 3月 31, 2022
```
* fix kps compile failed

* remove useless cond

* add xpu for xpu_kp
```
  1faefc93
- Z
  
  Opt the compilation of sparse kernel (#41086) · b9da48da
  由 zhangkaihuo 提交于 3月 31, 2022
  
  b9da48da
30 3月, 2022 3 次提交
- N
  
  Add -rf in xpu_kp.cmake when cp .kps to .xpu (#41059) · 5c1631f2
  由 niuliling123 提交于 3月 30, 2022
  
  5c1631f2
- Z
  
  Apply TransposeFolding & GemmRewriter passes. (#41084) · c761b48b
  由 Zhen Wang 提交于 3月 29, 2022
  
  c761b48b
- H
  swish and pow op for xpu test=kunlun (#40654) · d951f3af
  由 houj04 提交于 3月 30, 2022
```
* swish and pow op for xpu. test=kunlun

* fix code style. test=kunlun.

* use pow_grad xdnn api. test=kunlun.
```
  d951f3af
29 3月, 2022 2 次提交

[Phi] Unify kernel build targets (#41091) · 23c3d967

由 Chen Weihang 提交于 3月 29, 2022

* unify_kernel_build_target

* fix dnn kernel failed

* fix dnn kernel loss target

* fix xpu compile failed

23c3d967

Update of oneDNN to 2.5 (#39426) · 35b96d48

由 Jacek Czaja 提交于 3月 29, 2022

* - update of oneDNN to 2.5

* - changes to UT testing onednn verbose

* - Update of oneDNN to 2.5.3

* - update onednn to 2.5.4

35b96d48

28 3月, 2022 1 次提交

[Phi] Fix assign kernel bug (#40927) · 822a2d1f

由 Chen Weihang 提交于 3月 28, 2022

* fix assign kernel bug

* fix xpu kernel select error

* add cudn pinned place

* fix copy error

* fix infrt error

822a2d1f

27 3月, 2022 1 次提交

add check of data type and support mutable_data with compiled infos (#40920) · 6a94adbe

由 TeFeng Chen 提交于 3月 27, 2022

* support check data type and mutable_data with compiled infos in paddle with cinn

* update cinn_instruction_run_op_test with multi data type

6a94adbe

25 3月, 2022 2 次提交

[Phi] Migrate Adam and AdamW into Phi (#40351) · 56cd3407

由 Aurelius84 提交于 3月 25, 2022

* [Phi] Migrate Adam and Adamw into Phi

* fix compile error and unittest ok

* fix compile error and unittest ok

* fix undefined reference to fLI::FLAGS

* test depend on operator

* fix cmake

* fix xpu compile

* fix infrt

* fix amp_type_traits

* fix amp_type_traits

* modify according reviewer

* modify according reviewer

* fix dtype float16

* fix typo

* fix Cmake

* fix code style

56cd3407

support multi_dims for tril_triu, *test=kunlun (#40712) · 9ffedcfd

由 z8hanghuan 提交于 3月 25, 2022

* support multi_dims for tril_triu, *test=kunlun

* support multi_dims for tril_triu, *test=kunlun

* support multi_dims for tril_triu, *test=kunlun

* update xpu.cmake date, support multi_dims for tril_triu, *test=kunlun

9ffedcfd

24 3月, 2022 2 次提交
- A
  
  [phi] Remove usless cmake message (#40884) · 38d1fe34
  由 Aurelius84 提交于 3月 24, 2022
  
  38d1fe34
- A
  [phi] Split selected_rows CMake compilation (#40864) · e6cbd72d
  由 Aurelius84 提交于 3月 24, 2022
```
* [phi] Split selected_rows CMake compilation

* move file back

* move file back
```
  e6cbd72d
23 3月, 2022 3 次提交
- F
  
  [NPU] fix cmake for 5.1.RC1.xxx version (#40704) · 292011eb
  由 furnace 提交于 3月 23, 2022
  
  292011eb
- J
  
  fix inference_lib.cmake (#40765) · 3d0be938
  由 JingZhuangzhuang 提交于 3月 23, 2022
  
  3d0be938
- L
  [KP] fix compilation bug in phi (#40805) · 7a78aec7
  由 Liu-xiandong 提交于 3月 23, 2022
```
* [KP] fix compilation bug in phi

* delete the comment

* delete useless comment
```
  7a78aec7
22 3月, 2022 1 次提交
- Z
  
  Adjusted CUDA arches for NEW_RELEASE_ALL (#40660) · 71b813f0
  由 Zhanlue Yang 提交于 3月 22, 2022
  
  71b813f0
17 3月, 2022 1 次提交

CopyFromCpu and CopyToCpu of Onnxruntime back-end optimize (#40561) · fcbb7440

由 heliqi 提交于 3月 17, 2022

* add onnxruntime predictor

* Add code comments

* support link paddle2onnx onnxruntime

* support onnxruntime with python

* support onnxruntime with python

* support onnxruntime with windows

* paddle2onnx compile with windows

* supoort windows compile

* supoort windows compile with onnxruntime

* supoort windows compile with paddle2onnx

* supoort mac compile

* compile with mac

* compile with mac

* add code comments

* fix remind word

* code optimization

* add test case

* add test case

* add inference demo_ci test case

* fix compile paddle2onnx with no python

* add inference demo_ci test case

* add inference demo_ci test case

* add inference infer_ut test case

* support c go api and test cases

* add converage test case

* add converage test case

* add capi test case

* add capi test case

* fix onnxruntime copyfromcpu and copytocpu

* fix goapi

* modify code

fcbb7440

14 3月, 2022 1 次提交
- 王
  
  [infrt] unify the infrt dialect. test=develop (#40451) · 481db5e9
  由王明冬提交于 3月 14, 2022
  
  481db5e9
12 3月, 2022 1 次提交
- J
  fix NetBuilder API Name bug in cinn_lib_test (#40392) · 69a01c47
  由 jiangcheng 提交于 3月 12, 2022
```
* fix NetBuilder API Name bug in cinn_lib_test

* update cinn version to newest
```
  69a01c47
10 3月, 2022 2 次提交

Inference add ONNXRuntime back-end (#39988) · 431afc39

由 heliqi 提交于 3月 10, 2022

* add onnxruntime predictor

* Add code comments

* support link paddle2onnx onnxruntime

* support onnxruntime with python

* support onnxruntime with python

* support onnxruntime with windows

* paddle2onnx compile with windows

* supoort windows compile

* supoort windows compile with onnxruntime

* supoort windows compile with paddle2onnx

* supoort mac compile

* compile with mac

* compile with mac

* add code comments

* fix remind word

* code optimization

* add test case

* add test case

* add inference demo_ci test case

* fix compile paddle2onnx with no python

* add inference demo_ci test case

* add inference demo_ci test case

* add inference infer_ut test case

* support c go api and test cases

* add converage test case

* add converage test case

* add capi test case

* add capi test case

431afc39

add tril_triu for xpu, *test=kunlun (#40246) · 1128db30

由 z8hanghuan 提交于 3月 10, 2022

* add tril_triu for xpu, *test=kunlun

* add tril_triu for xpu, *test=kunlun

* add tril_triu for xpu, *test=kunlun

* add tril_triu for xpu, *test=kunlun

* add tril_triu for xpu, *test=kunlun

1128db30

08 3月, 2022 2 次提交

C
[Phi] Remove gpudnn suffix & polish cmake (#40239) · 3a77d027
由 Chen Weihang 提交于 3月 08, 2022
```
* remove gpudnn suffix & polish cmake

* fix typo
```
3a77d027

[Phi]Move Relu/Cos/Sin/Tan/Acos/Asin/Atan/Sinh/Cosh/Asinh/Acosh/Atanh kernels... · 975f99ab

由 YuanRisheng 提交于 3月 08, 2022

[Phi]Move Relu/Cos/Sin/Tan/Acos/Asin/Atan/Sinh/Cosh/Asinh/Acosh/Atanh kernels in Activation to Phi (#40175)

* move activation op

* adjust code format

* fix compile bugs

* fix ci bugs

* code format adjust

* code format adjust2

* activate ci status

* modify according to comment

975f99ab

07 3月, 2022 2 次提交

王

[infrt] fold the infrt.cvtTensorOp. test=develop (#40214) · b798fb07
由王明冬提交于 3月 07, 2022

b798fb07

cuBlasLt Epilogue To Fuse Linear + ReLU|GeLU (#39437) · 2a3d9eca

由 Ming-Xu Huang 提交于 3月 07, 2022

* Added cuBlasLtHandle_t to device context.

* Added fused_gemm_epilogue op.

1. Added fused_gemm_epilogue op to leverage cuBlastLt Epilogue.
2. Support fusion Act(X*Y + bias), X'dims >=2 and Y'dims shoule be 2.
2. Act currently only be supported ReLU. (Will add GeLU in the future).

* Added UT to fused_gemm_epilogue op.

* Added LinearAct Pattern

1. Added LinearAct into graph_pattern_detector.* to define (2.)'s
pattern.
2. LinearAct is used to detect act(element_add(matmul_v2(x, w), bias)).
3. act currently only support ReLU (Will support GeLU in the future).

* Added FuseGemmEpiloguePass

1, Added FuseGemmEpiloguePass to handle nn.Linear + Act{ReLU}
fusion (GeLU will be supported in the future).
2. Only support matmul_v2 from nn.Linear.

* Added pybind to BuildStrageter.fuse_gemm_epilogue_.

* Added UT for fuse_gemm_epilogue_pass.

* GeLU support and EpilogueSingleton

1. Added GeLU support to fused_gemm_epilogue op.
2. Added EpilogueSingleton to cache auxiliary pointer.
3. Added related UTs.

* Rename cublaslt_epilogue_opto gemm_epilogue_op.*.

* Added both train and infer pattern to LinearAct.

1. Added support of fwd graph with grap_ops linking to LinearAct.
2. Added related changes to fuse_gemm_epilogue_pass for above
modification.

* Changed CUDA requirement from 11.4 to 11.6 for fuse_gemm_epilogue_pass.

* Added identity activation support to gemm_epilogue_op.

* Added Linear Fusion (matmul_v2 + ele_add)

1. Added matmul_v2 + ele_add pattern to LinearActPattern.
2. Added matmul_v2 + ele_add support to fuse_gemm_epilogue_pass.

* Rename gemm_epilogue_op.* to fused_gemm_epilogue_op.*

* Add fused_gemm_epilogue_grad op.

1. Added fused_gemm_epilogue_grad to support backward epilogue fusion.

* Add UTs to fused_gemm_epilogue_grad_op.

* Change attribute name in fused_gemm_epilogue_grad_op for clearing.

* Allow DX and DBias be dispensable to fused_gemm_epilogue_grad op.

* Added ElementwiseAdd+Matmul+Act graph pattern detection.

* Fuse backward of Linear( Act(x))

1. Added backward fusion pass to Linear( Act(x)).
2. Added backward fusion pass to Linear(x).

* Added UTs to backward fusion of Linear(Act(x)).

* Complete document of arguments to fused_gemm_epilogue_op.

* Made arguments of some functions pass by reference.

* Modify code with review comments.

1. Made arguments of some function pass by reference.
2. Removed redundant code.
3. Followed Google code style to change code.

* Made 'const' code style be consistent

* Fixed random seed of python UTs.

* Set Compiling constrains to cuBlasLt

1. Require CUDA 11.6+
2. Remove fuse_gemm_epilogue related tests when CUDA < 11.6.

* Code Reivew from Paddle

1. Changed arguments name is_first_gemm to without_x_gradient for
clearing.
2. Applied PADDLE_THROW in fused_gemm_epilogue_op.

* Remove EpilogueSingleton

1. Applied ReserveSpace to replace Epilogue for passing auxiliary
pointers between FWD and BWD.

* Fix a logical error and enhance UTs.

1. Added act op count checking in UTs.
2. Fix issue to fuse backward or ReLU(Linear(X)).
3. TODO: solve GELU fusion issues.

* Fix Linear and GeLU fusion issues.

1. Modified graph_detech_pattern to fit with both linear wiht gelu or
relu.
2. Modified data range in Uts to allow negative values.

* Removed fused_gemm_epilogue_op.h.

* Rename namespace pten to phi.

* Rename name of arguments in fused_gemm_epilogue_op

1. bias -> Bias.
2. out -> Out.
3. reserve_space -> ReserveSpace.

* Change EpiloguePassActivationCache as local variable.

1. Removed singleton in EpiloguePassActivationCache.
2. Made EpiloguePassActivationCache as an argument to each pass
functions.

2a3d9eca

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致