提交 · af6ef8881438201fbf135500cc7d652a32ae583b · PaddlePaddle / Paddle

14 3月, 2022 8 次提交

[Phi]Add diag_v2 grad kernel (#40447) · e157f2af

由 Siming Dai 提交于 3月 14, 2022

* Add diag grad kernel

* fix unittest case

* add float16, remove const &

* delete diag_grad in op_utils.h

e157f2af

Add an elementwise + activation fusion pass. (#36541) · 3f219160

由 Tomasz Socha 提交于 3月 14, 2022

* Add elementwise add and activation fuse pass

* Fix copy ellision

* More flexible pattern detector

* More flexible fusion pass

* Update lists for pass

* Add support for Pow operator

* Add support for more activation types

* Style

* Rename fusion pass

* First version of tests

* Dirty version of pass

* Polished version

* Update pbtxt

* Style

* Update names

* Style

* Use PADDLE_ENFORCE_EQ

* Save error message to variable

* WO for error checks

* CR

* Static style check

* Add missing 'activation_scale' attribute

* Add relu6 and sigmoid activations

* Style

* Fix fuse list formating

* Sync filenames for fuse pass files

* Fix cmake after move

* Fix registration

* Fix pass name in tests

* Add missing activations to checker

* WIPS

* Working mul op

* Working sub

* Working Add

* Remove pten includes

* Remove some forward declarations

* Remove Includes

* Fixes

* Remove default kernels

* Add check if post_ops attributes are avaliable

* Style

* Code adjustment

* Register default kernels

* We have year 2022 not 2021...
Co-authored-by: Njakpiase <jakpia21@gmail.com>
Co-authored-by: NSylwester Fraczek <sylwester.fraczek@intel.com>

* Fast review fixes
Co-authored-by: Njakpiase <jakpia21@gmail.com>
Co-authored-by: NSylwester Fraczek <sylwester.fraczek@intel.com>

* Review Fix

* Rename one_dnn -> onednn

* Style after review

* Fast and dirty fix for quantization

* Update tests

* Style

* Fix mkldnn_quantizer config

* Add Joanna's suggestion.

* Check if operator is explicitly disables on OneDNN

* Try to use unregistered attributes

* Style

* Test new framework

* FXI

* FXII

* Update test

* Style
Co-authored-by: Njakpiase <jakpia21@gmail.com>
Co-authored-by: NSylwester Fraczek <sylwester.fraczek@intel.com>

3f219160

F

[MLU] add merged_momentum mlu kernel (#40406) · 1f7b2516
由 fwenguang 提交于 3月 14, 2022

1f7b2516

Support custom op and paddle.autograd.bacward in eager (#40423) · 227fa408

由 Jiabin Yang 提交于 3月 14, 2022

* eager, test=develop

* fix bug, test=develop

* eager, test=develop

* merge legacy to fluid

* eager, test=develop

* eager, test=develop

* Refactor TensorAdd func by template and remove gradient_accumulation in eager

* Remove needless target name

* eager, test=develop

* eager, test=develop

* Use overload instead of template

* Remove legacy code

* Remove legacy code

* selectedrows, test=develop

* Remove DataType test

* eager, test=develop

* eager, test=develop

* support gan, test=develop

* Using Tensor directly instead of using EagerTensor

* support gradient_accumulation

* make test_imperative_lod_tensor_to_selected_rows longer

* make test_imperative_lod_tensor_to_selected_rows longer

* refine code

* ptb, test=develop

* Rename all EagerTensor to Tensor

* Rename some EagerTensor to Tensor

* rename EagerTensor to EagerVariable

* eager, test=develop

* eager, test=develop

* eager, test=develop

* eager, test=develop

* add more test

* eager, test=develop

* Support copiable selected rows and merge develop

* save load, eager, test=develop

* save load, eager, test=develop

* refine, test=develop

* remove useless _set_value method

* refine, test=develop

* refine, test=develop

* revert static_runner, test=develop

* EagerTensor to Tensor, test=develop

* refine, test=develop

* refine, test=develop

* clear grad, test=develop

* merge, develop

* merge, develop

* merge, test=develop

* merge, test=develop

* Support quant and part of slice

* support legacy static save

* extend slim tests time

* remove imperative on inference

* remove imperative on inference

* merge develop

* fix typo

* fix typo

* split slice related code into 2 part for imperative and eager

* split slice from inference

* split slice from inference

* fix test_tensor_register_hook

* support custom op in eager mode

* fix inference deps error

* split eager utils from custom operator

* fix type match

* fix typo
Co-authored-by: NWang Huan <wanghuan29@baidu.com>
Co-authored-by: NWeilong Wu <veyron_wu@163.com>
Co-authored-by: Nwanghuancoder <wanghuancoder@163.com>

227fa408

0

adjust params order for eager.Tensor._copy_to (#40449) · c6ec8b9f
由 0x45f 提交于 3月 14, 2022

c6ec8b9f

[KP] Add unittests for... · f269ca3f

由 Lijunhui 提交于 3月 14, 2022

[KP] Add unittests for brelu,ceil,celu,elu,floor,hard_shrink,hard_sigmoid,log1p,logsigmoid,relu6,silu,soft_relu,softsign,swish (#40448)

* solve unexecuted UT

* add 24 activation op UT

* append swish&thresholded_relu to kpfirst_list

* rm thresholded_relu

f269ca3f

Z
[AutoParallel] Converter (#40434) · 3881b6cb
由 zhaoyingli 提交于 3月 14, 2022
```
* [AutoParallel] Converter
Converter API
```
3881b6cb

[multiprocessing] Add paddle.incubate.multiprocessing for sharing tensors ... · e553f758

由 Zhong Hui 提交于 3月 14, 2022

[multiprocessing] Add paddle.incubate.multiprocessing for sharing tensors  between python processes. (#37302)

* Add support for paddle.multiprocessing
* move multiprocessing to incubate.

e553f758

11 3月, 2022 5 次提交
- Y
  
  [hybrid] Support tensor parallel and cache structure for fused attention op. (#40101) · 1882c496
  由 Yuang Liu 提交于 3月 11, 2022
  
  1882c496
- Z
  
  [MLU]add allgather_op mlu kernel (#40356) · dc773828
  由 zn 提交于 3月 11, 2022
  
  dc773828
- update square & sigmoid unittest (#40404) · 807bff4a
  由 z8hanghuan 提交于 3月 11, 2022
  
  807bff4a
- H
  
  minor fix matmul and onehot xpu. test=kunlun (#40419) · 594e412d
  由 houj04 提交于 3月 11, 2022
  
  594e412d
- B
  
  fix_import_distribute_bugs (#40396) · bd2d4fd0
  由 Baibaifan 提交于 3月 11, 2022
  
  bd2d4fd0
10 3月, 2022 3 次提交

C
[Auto Parallel]Update reshard for while sub block (#40366) · 2747de2b
由 caozhou 提交于 3月 10, 2022
```
* update reshard for while sub block

* fix code format error
```
2747de2b

add tril_triu for xpu, *test=kunlun (#40246) · 1128db30

由 z8hanghuan 提交于 3月 10, 2022

* add tril_triu for xpu, *test=kunlun

* add tril_triu for xpu, *test=kunlun

* add tril_triu for xpu, *test=kunlun

* add tril_triu for xpu, *test=kunlun

* add tril_triu for xpu, *test=kunlun

1128db30

Move dropout to phi (#40148) · 99fc1b08

由 hong 提交于 3月 10, 2022

* move dropout to phi; test=develop

* fix xpu, npu compile error; test=develop

99fc1b08

09 3月, 2022 11 次提交
- B
  
  add_sharding_api (#40129) · f40ed5f4
  由 Baibaifan 提交于 3月 09, 2022
  
  f40ed5f4
- F
  
  change timeout for pool (#40341) · 1defc8f3
  由 feng_shuai 提交于 3月 09, 2022
  
  1defc8f3
- W
  fix the full_like with fill the value of inf (#40232) · ec582895
  由 wawltor 提交于 3月 09, 2022
```
* fix the full_like with fill the value of inf

* update the test case for the fill_any_like

* updae the comments for the full_like
```
  ec582895
- 0
  adapt run_program OP for eager (#40198) · 3e9601ba
  由 0x45f 提交于 3月 09, 2022
```
* adapt run_program OP for eager

* fix program_id

* refine code

* fix test
```
  3e9601ba
- S
  Fix time of utest in distributed (#40163) · 7ea9235c
  由 ShenLiang 提交于 3月 09, 2022
```
* fix time of utest
```
  7ea9235c
- W
  
  [hybrid] fused_feedforward op support tensor model parallel (#40160) · e0866dc6
  由 WangXi 提交于 3月 09, 2022
  
  e0866dc6
- W
  
  bypass eager mode (#40245) · 05ff6cc5
  由 Weilong Wu 提交于 3月 09, 2022
  
  05ff6cc5
- X
  [optest]: fix transpose, support different parameter name between python_api... · 2037fa68
  由 xiongkun 提交于 3月 09, 2022
```
[optest]: fix transpose, support  different parameter name between python_api and KernelSignature. (#40258)

* optest: fix transpose

* fix
```
  2037fa68
- A
  
  add ipu uts (#40205) · 0b597754
  由 Allen Guo 提交于 3月 09, 2022
  
  0b597754
- A
  [IPU] update ipu unittests p1 (#39923) · fe765cb3
  由 Allen Guo 提交于 3月 09, 2022
```
* update ipu UTs part1

* rename ut

* sync api changes

* update uts for new api

* update use_ipumodel()

* update use_ipumodel()

* split pr
```
  fe765cb3
- A
  [IPU] update ipu unittests p3 (#40072) · 86effa0c
  由 Allen Guo 提交于 3月 09, 2022
```
* update ipu UTs part3

* rename uts

* sync api changes

* update uts for new api

* update use_ipumodel()

* split pr
```
  86effa0c
08 3月, 2022 10 次提交

Add profiler statistic (#40249) · c1d81ec1

由 chenjian 提交于 3月 08, 2022

* add python profiler package

* update according to review

* fix bug

* fix bug

* fix bug

* add unit test

* Revert "add unit test"

This reverts commit 4e69ff71b0645e069afe5dd8fea0d07717852c48.

* reduce for pr

* add unit test

* modify for pr

* fix unittest

* update for ci coverage

* modify according to review

* fix bug

* improve coverage

* add profiler code

* add statistic code

* reduce content for pr

c1d81ec1

K

fix yolov3 return value in dygraph mode. test=develop (#40185) · 9aa6bfc7
由 Kaipeng Deng 提交于 3月 08, 2022

9aa6bfc7

Fix fold python examples (#38636) · d4a4eb9d

由 xiaoting 提交于 3月 08, 2022

* fix fold python examples, test=develop

* fix size type, test=develop

* fix python example, test=develop

* fix fold shape check

* fix fold dygraph mode, test=develop

d4a4eb9d

L
add the implementation of process group for hccl (#40228) · 73583f86
由 lilong12 提交于 3月 08, 2022
```
* add pg_hccl
```
73583f86
X

add support for concat and variadic tensor list (#40229) · f1fe2ad4
由 xiongkun 提交于 3月 08, 2022

f1fe2ad4
A
[IPU] update ipu unittests p4 (#40073) · 061044a0
由 Allen Guo 提交于 3月 08, 2022
```
* update ipu UTs part4

* rename uts

* sync api changes

* update uts for new api
```
061044a0

[IPU] update ipu unittests p2 (#40069) · a279a4f8

由 Allen Guo 提交于 3月 08, 2022

* update ipu UTs part2

* clean git

* rename ut

* rename ut 1

* sync api changes

* update uts for new api

* update uts for new api

* fix re-define

a279a4f8

[MLU] add fleet init api and collective api pytest for mlu (#40010) · c722ee69
由 mhhhh1 提交于 3月 08, 2022
```
* [MLU] add fleet init api and collective api pytest for mlu

* fix no value for argument 'data_type' in method call
```
c722ee69
C
add profiler statistic helper (#40111) · 1f857cb9
由 chenjian 提交于 3月 08, 2022
```
* add profiler helper

* fix unittest

* improve test coverage rate
```
1f857cb9

add python profiler package (#40065) · 10325a82

由 chenjian 提交于 3月 08, 2022

* add python profiler package

* update according to review

* fix bug

* fix bug

* fix bug

* add unit test

* Revert "add unit test"

This reverts commit 4e69ff71b0645e069afe5dd8fea0d07717852c48.

* reduce for pr

* add unit test

* modify for pr

* fix unittest

* update for ci coverage

* modify according to review

* fix bug

* improve coverage

10325a82

07 3月, 2022 3 次提交

X
[OpTest] Support to test paddle API end-to-end for check_eager (#40169) · 79a32715
由 xiongkun 提交于 3月 07, 2022
```
* add python api test in TestOp

* test_python_api if self.python_api is set

* fix code by CR
```
79a32715

refactor unittest for nearest_interp_v2_op_xpu. test=kunlun (#39804) · c09adab8

由 houj04 提交于 3月 07, 2022

* refactor unittest for nearest_interp_v2_op_xpu. test=kunlun

* fix code style. test=kunlun

* fix code style. test=kunlun

c09adab8

cuBlasLt Epilogue To Fuse Linear + ReLU|GeLU (#39437) · 2a3d9eca

由 Ming-Xu Huang 提交于 3月 07, 2022

* Added cuBlasLtHandle_t to device context.

* Added fused_gemm_epilogue op.

1. Added fused_gemm_epilogue op to leverage cuBlastLt Epilogue.
2. Support fusion Act(X*Y + bias), X'dims >=2 and Y'dims shoule be 2.
2. Act currently only be supported ReLU. (Will add GeLU in the future).

* Added UT to fused_gemm_epilogue op.

* Added LinearAct Pattern

1. Added LinearAct into graph_pattern_detector.* to define (2.)'s
pattern.
2. LinearAct is used to detect act(element_add(matmul_v2(x, w), bias)).
3. act currently only support ReLU (Will support GeLU in the future).

* Added FuseGemmEpiloguePass

1, Added FuseGemmEpiloguePass to handle nn.Linear + Act{ReLU}
fusion (GeLU will be supported in the future).
2. Only support matmul_v2 from nn.Linear.

* Added pybind to BuildStrageter.fuse_gemm_epilogue_.

* Added UT for fuse_gemm_epilogue_pass.

* GeLU support and EpilogueSingleton

1. Added GeLU support to fused_gemm_epilogue op.
2. Added EpilogueSingleton to cache auxiliary pointer.
3. Added related UTs.

* Rename cublaslt_epilogue_opto gemm_epilogue_op.*.

* Added both train and infer pattern to LinearAct.

1. Added support of fwd graph with grap_ops linking to LinearAct.
2. Added related changes to fuse_gemm_epilogue_pass for above
modification.

* Changed CUDA requirement from 11.4 to 11.6 for fuse_gemm_epilogue_pass.

* Added identity activation support to gemm_epilogue_op.

* Added Linear Fusion (matmul_v2 + ele_add)

1. Added matmul_v2 + ele_add pattern to LinearActPattern.
2. Added matmul_v2 + ele_add support to fuse_gemm_epilogue_pass.

* Rename gemm_epilogue_op.* to fused_gemm_epilogue_op.*

* Add fused_gemm_epilogue_grad op.

1. Added fused_gemm_epilogue_grad to support backward epilogue fusion.

* Add UTs to fused_gemm_epilogue_grad_op.

* Change attribute name in fused_gemm_epilogue_grad_op for clearing.

* Allow DX and DBias be dispensable to fused_gemm_epilogue_grad op.

* Added ElementwiseAdd+Matmul+Act graph pattern detection.

* Fuse backward of Linear( Act(x))

1. Added backward fusion pass to Linear( Act(x)).
2. Added backward fusion pass to Linear(x).

* Added UTs to backward fusion of Linear(Act(x)).

* Complete document of arguments to fused_gemm_epilogue_op.

* Made arguments of some functions pass by reference.

* Modify code with review comments.

1. Made arguments of some function pass by reference.
2. Removed redundant code.
3. Followed Google code style to change code.

* Made 'const' code style be consistent

* Fixed random seed of python UTs.

* Set Compiling constrains to cuBlasLt

1. Require CUDA 11.6+
2. Remove fuse_gemm_epilogue related tests when CUDA < 11.6.

* Code Reivew from Paddle

1. Changed arguments name is_first_gemm to without_x_gradient for
clearing.
2. Applied PADDLE_THROW in fused_gemm_epilogue_op.

* Remove EpilogueSingleton

1. Applied ReserveSpace to replace Epilogue for passing auxiliary
pointers between FWD and BWD.

* Fix a logical error and enhance UTs.

1. Added act op count checking in UTs.
2. Fix issue to fuse backward or ReLU(Linear(X)).
3. TODO: solve GELU fusion issues.

* Fix Linear and GeLU fusion issues.

1. Modified graph_detech_pattern to fit with both linear wiht gelu or
relu.
2. Modified data range in Uts to allow negative values.

* Removed fused_gemm_epilogue_op.h.

* Rename namespace pten to phi.

* Rename name of arguments in fused_gemm_epilogue_op

1. bias -> Bias.
2. out -> Out.
3. reserve_space -> ReserveSpace.

* Change EpiloguePassActivationCache as local variable.

1. Removed singleton in EpiloguePassActivationCache.
2. Made EpiloguePassActivationCache as an argument to each pass
functions.

2a3d9eca

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功