提交 · a7e0cdea61c5ec4576c609262588f19ed2430061 · 机器未来 / Paddle

23 6月, 2022 1 次提交

[cherry-pick] release/2.3 elementwise_mul and matmul mkldnn fix (#43725) · a7e0cdea

由 lidanqing 提交于 6月 23, 2022

* Correct elementwise quantization (#43693)

* [Bug fix] Do not quantize weights Y when matmul X and Y both other ops outputs (#43297)

* fix some matmul that X and Y both other ops outputs, do not dequantize the Y.

* fix CI format

* fix according to review
Co-authored-by: Njoanna.wozna.intel <joanna.wozna@intel.com>

a7e0cdea

30 5月, 2022 1 次提交
- W
  [Paddle-Inference] fix_multiheadpass_int8 (#43020) · 72880279
  由 Wangzheee 提交于 5月 30, 2022
```
* fix_multi_int8 (#42977)

* cherry-pick fix_multihead_int8
```
  72880279
10 5月, 2022 1 次提交
- J
  pdnode_compare (#42597) (#42633) · 403b503f
  由 JingZhuangzhuang 提交于 5月 10, 2022
```
* pdnode_compare

* panode compare

* pdnode_compare
```
  403b503f
22 4月, 2022 2 次提交
- B
  
  add mkldnn compute_propagate_scales int8 pass (#41592) (#42080) · 41003161
  由 baoachun 提交于 4月 22, 2022
  
  41003161
- A
  [IPU] add mixed-precission support for ipu (#41733) (#41906) · c09b1d68
  由 Allen Guo 提交于 4月 22, 2022
```
add mixed-precission support for ipu

cherry-pick from #41733
```
  c09b1d68
21 4月, 2022 2 次提交
- B
  
  add mkldnn int8 pass [step1] (#41579) (#42045) · 04f20b83
  由 baoachun 提交于 4月 21, 2022
  
  04f20b83
- J
  
  fix adaptive pool pass bug (#42022) · 5b9cdd9b
  由 JingZhuangzhuang 提交于 4月 21, 2022
  
  5b9cdd9b
06 4月, 2022 1 次提交
- A
  [IPU] remove paddle_ipu shared library (#41307) · 229e91bf
  由 Allen Guo 提交于 4月 06, 2022
```
* remove paddle_ipu shared library

* fix unique_name
```
  229e91bf
04 4月, 2022 1 次提交
- S
  conv + elementwise_add refactor (#41286) · e5e0b726
  由 Sławomir Siwek 提交于 4月 04, 2022
```
* DRY

* change nodes names

* add const prefix

* change asX to as_x in all files
```
  e5e0b726
02 4月, 2022 1 次提交
- W
  [Paddle inference] support new quant_model (#41049) · 1b58ce14
  由 Wangzheee 提交于 4月 02, 2022
```
* paddle inference support new quant_model
```
  1b58ce14
31 3月, 2022 2 次提交
- H
  add flatten2,reshape2,squueze2_trt_fuse_pass test cast (#41031) · 7ef69202
  由 heliqi 提交于 3月 31, 2022
```
* add flatten2,reshape2,squueze2_trt_fuse_pass  test cast

* add flatten2,reshape2,squueze2_trt_fuse_pass  test cast

* add flatten2,reshape2,squueze2_trt_fuse_pass  test cast
```
  7ef69202
- L
  add depend when doing fuse_all_optimizer on program (#41178) · 3b00dc92
  由 Leo Chen 提交于 3月 31, 2022
```
* fix dependency of fused optimizer

* add ut
```
  3b00dc92
30 3月, 2022 1 次提交
- Y
  
  move elementwise_mul selected rows input (#41042) · 13f1641d
  由 YuanRisheng 提交于 3月 30, 2022
  
  13f1641d
24 3月, 2022 2 次提交
- J
  Correct MultipleQuantizeSquash (#40717) · 753964a2
  由 joanna.wozna.intel 提交于 3月 24, 2022
```
* Correct MultipleQuantizeSquash

* Correct logging
```
  753964a2
- C
  [Phi] Move mul op kernel into phi (#40833) · 1b491818
  由 Chen Weihang 提交于 3月 24, 2022
```
* add mul phi kernel

* remove mul op kernel

* remove original mul grad op

* fix cinn test

* fix dygraph test failed
```
  1b491818
23 3月, 2022 1 次提交
- Z
  Removed redundant use of declarations.h (#40703) · 2a1b4c07
  由 Zhanlue Yang 提交于 3月 23, 2022
```
* Removed redundant use of declarations.h

* Fixed minor bug
```
  2a1b4c07
21 3月, 2022 2 次提交

F
Move conv-transpose OPs to phi (#40675) · 1eb96eec
由 From00 提交于 3月 21, 2022
```
* Move conv-transpose OPs to phi

* Fix CI errors

* Fix CI errors
```
1eb96eec

[IPU] update ipu_backend (#40685) · d67fe921

由 Allen Guo 提交于 3月 21, 2022

* sync changes

* copy sOpNamescope

* fix UTs

* add authors
Co-authored-by: NXiaobing Wang <xiaobingw@graphcore.ai>
Co-authored-by: NAllen Guo <alleng@graphcore.ai>
Co-authored-by: NZhixin Yao <zhixiny@graphcore.ai>
Co-authored-by: NZhaorui Chen <zhaoruic@graphcore.ai>
Co-authored-by: NHan Zhao <hanzhao@graphcore.ai>

* fix code-format

* fix compile error

* add comments for feed_op
Co-authored-by: NXiaobing Wang <xiaobingw@graphcore.ai>
Co-authored-by: NZhixin Yao <zhixiny@graphcore.ai>
Co-authored-by: NZhaorui Chen <zhaoruic@graphcore.ai>
Co-authored-by: NHan Zhao <hanzhao@graphcore.ai>

d67fe921

18 3月, 2022 1 次提交

[Phi] Migrate gelu/log_softmax/prelu op kernel and infershape (#40393) · aed6faf2

由 shentanyue 提交于 3月 18, 2022

* add gelu

* fix gelu

* add log_softmax

* add prelu kernel and prelu/gelu/logsoftmax infershape

* fix

* fix

* fix

* fix

* fix ci

* log_softmax rewrite

* fix

* fix

* fix conflict

* fix compile error

* fix comment

* fix

* ci_fix
Co-authored-by: NYan Li <liyan665@gmail.com>

aed6faf2

17 3月, 2022 2 次提交
- T
  
  fix double-free bug in variables of cinn subgraph (#40609) · 7dad9f70
  由 TeFeng Chen 提交于 3月 17, 2022
  
  7dad9f70
- B
  
  support gpu mixed precision inference (#40531) · 06fee998
  由 baoachun 提交于 3月 17, 2022
  
  06fee998
16 3月, 2022 2 次提交

Quantize elementwise mul (#40546) · 2def79bc

由 Zuza 提交于 3月 16, 2022

* Quantize elementwise mul op

* Parametrize elementwise functions

* Fix code formatting

2def79bc

[Auto Parallel] Add the support for the auto completion of while_op (#39939) · ec6b8fbd

由 Yulong Ao 提交于 3月 16, 2022

* [Auto Parallel] Support the auto completion of while_op

* [Auto Parallel] Improve the completion algorithms

* [Auto Parallel] Fix bugs for ernie inference

* [Auto Parallel] Remove attrs which cannot be pickled

* [Auto Parallel] make the dims_mappings of LodTensorArray vars empty

* [Auto Parallel] Fix bugs for the ernie inference in the pipeline parallel

* [Auto Parallel] Remove unncessary comments

* [Auto Parallel] Fix a bug of the CMakeLists

* [Auto Parallel] Use the newest APIs to write the unit test

* [Auto Parallel] Remove unnecessary statements

ec6b8fbd

15 3月, 2022 2 次提交

oneDNN NHWC fixes (#40049) · dde9cec0

由 Jacek Czaja 提交于 3月 15, 2022

* - Prototype of third solution

- fix

- compilation fixes

- fix

- fixe

- fix

- fix

- compilation fix

- comment fix

- lint

update mkldnn conv_elementwise_add_fuse_pass ut

- NHWC changes to prelu

- alhpa dims

- UT fix

- fix to UT

- lint

- Some fixes

- added to BWD of prelu NHWC support

- reverted removal of resetting cu_layout in clearing of caching

* - Small changes

* - compilation fix

* - fix

* - fix

* lint

* - fixes after internal review

* - compilation fix

* - lint

dde9cec0

[Phi]Move Tanh/BRelu/LeakyRelu/ThresholdedRelu Kernels to Phi (#40385) · d7112180

由 YuanRisheng 提交于 3月 15, 2022

* move activation op

* adjust code format

* fix compile bugs

* fix ci bugs

* code format adjust

* code format adjust2

* activate ci status

* modify according to comment

* move activation kernel

* revert relu6

* reduce add code

* perfect use_phi_functor

* completing func name

* fix bugs when run ci

* fix bugs when run infr

* modifpy infrt get kernel signature

d7112180

14 3月, 2022 1 次提交

Add an elementwise + activation fusion pass. (#36541) · 3f219160

由 Tomasz Socha 提交于 3月 14, 2022

* Add elementwise add and activation fuse pass

* Fix copy ellision

* More flexible pattern detector

* More flexible fusion pass

* Update lists for pass

* Add support for Pow operator

* Add support for more activation types

* Style

* Rename fusion pass

* First version of tests

* Dirty version of pass

* Polished version

* Update pbtxt

* Style

* Update names

* Style

* Use PADDLE_ENFORCE_EQ

* Save error message to variable

* WO for error checks

* CR

* Static style check

* Add missing 'activation_scale' attribute

* Add relu6 and sigmoid activations

* Style

* Fix fuse list formating

* Sync filenames for fuse pass files

* Fix cmake after move

* Fix registration

* Fix pass name in tests

* Add missing activations to checker

* WIPS

* Working mul op

* Working sub

* Working Add

* Remove pten includes

* Remove some forward declarations

* Remove Includes

* Fixes

* Remove default kernels

* Add check if post_ops attributes are avaliable

* Style

* Code adjustment

* Register default kernels

* We have year 2022 not 2021...
Co-authored-by: Njakpiase <jakpia21@gmail.com>
Co-authored-by: NSylwester Fraczek <sylwester.fraczek@intel.com>

* Fast review fixes
Co-authored-by: Njakpiase <jakpia21@gmail.com>
Co-authored-by: NSylwester Fraczek <sylwester.fraczek@intel.com>

* Review Fix

* Rename one_dnn -> onednn

* Style after review

* Fast and dirty fix for quantization

* Update tests

* Style

* Fix mkldnn_quantizer config

* Add Joanna's suggestion.

* Check if operator is explicitly disables on OneDNN

* Try to use unregistered attributes

* Style

* Test new framework

* FXI

* FXII

* Update test

* Style
Co-authored-by: Njakpiase <jakpia21@gmail.com>
Co-authored-by: NSylwester Fraczek <sylwester.fraczek@intel.com>

3f219160

11 3月, 2022 2 次提交

S

refactor conv+relementwise_add (residual) (#40005) · 47459e98
由 Sylwester Fraczek 提交于 3月 11, 2022

47459e98

[Phi] Remove needless deps in unittests (#40256) · 89ed57e2

由 Chen Weihang 提交于 3月 11, 2022

* remove needless deps in unittests

* add gpu marco

* fix other unittests

* fix kernel name error

* fix test_prepare_op

* fix failed dygraph unittests

* fix gpu failed tests

* fix cinn test failed

* fix cinn test failed

* fix dropout tests

89ed57e2

10 3月, 2022 1 次提交
- W
  [Phi] add the infer shape meta for the graph_send_recv (#40320) · 5ae85131
  由 wawltor 提交于 3月 10, 2022
```
* add the infer shape meta for the graph_send_recv

* move the infershape code to another file
```
  5ae85131
08 3月, 2022 1 次提交

[Phi]Move Relu/Cos/Sin/Tan/Acos/Asin/Atan/Sinh/Cosh/Asinh/Acosh/Atanh kernels... · 975f99ab

由 YuanRisheng 提交于 3月 08, 2022

[Phi]Move Relu/Cos/Sin/Tan/Acos/Asin/Atan/Sinh/Cosh/Asinh/Acosh/Atanh kernels in Activation to Phi (#40175)

* move activation op

* adjust code format

* fix compile bugs

* fix ci bugs

* code format adjust

* code format adjust2

* activate ci status

* modify according to comment

975f99ab

07 3月, 2022 1 次提交

cuBlasLt Epilogue To Fuse Linear + ReLU|GeLU (#39437) · 2a3d9eca

由 Ming-Xu Huang 提交于 3月 07, 2022

* Added cuBlasLtHandle_t to device context.

* Added fused_gemm_epilogue op.

1. Added fused_gemm_epilogue op to leverage cuBlastLt Epilogue.
2. Support fusion Act(X*Y + bias), X'dims >=2 and Y'dims shoule be 2.
2. Act currently only be supported ReLU. (Will add GeLU in the future).

* Added UT to fused_gemm_epilogue op.

* Added LinearAct Pattern

1. Added LinearAct into graph_pattern_detector.* to define (2.)'s
pattern.
2. LinearAct is used to detect act(element_add(matmul_v2(x, w), bias)).
3. act currently only support ReLU (Will support GeLU in the future).

* Added FuseGemmEpiloguePass

1, Added FuseGemmEpiloguePass to handle nn.Linear + Act{ReLU}
fusion (GeLU will be supported in the future).
2. Only support matmul_v2 from nn.Linear.

* Added pybind to BuildStrageter.fuse_gemm_epilogue_.

* Added UT for fuse_gemm_epilogue_pass.

* GeLU support and EpilogueSingleton

1. Added GeLU support to fused_gemm_epilogue op.
2. Added EpilogueSingleton to cache auxiliary pointer.
3. Added related UTs.

* Rename cublaslt_epilogue_opto gemm_epilogue_op.*.

* Added both train and infer pattern to LinearAct.

1. Added support of fwd graph with grap_ops linking to LinearAct.
2. Added related changes to fuse_gemm_epilogue_pass for above
modification.

* Changed CUDA requirement from 11.4 to 11.6 for fuse_gemm_epilogue_pass.

* Added identity activation support to gemm_epilogue_op.

* Added Linear Fusion (matmul_v2 + ele_add)

1. Added matmul_v2 + ele_add pattern to LinearActPattern.
2. Added matmul_v2 + ele_add support to fuse_gemm_epilogue_pass.

* Rename gemm_epilogue_op.* to fused_gemm_epilogue_op.*

* Add fused_gemm_epilogue_grad op.

1. Added fused_gemm_epilogue_grad to support backward epilogue fusion.

* Add UTs to fused_gemm_epilogue_grad_op.

* Change attribute name in fused_gemm_epilogue_grad_op for clearing.

* Allow DX and DBias be dispensable to fused_gemm_epilogue_grad op.

* Added ElementwiseAdd+Matmul+Act graph pattern detection.

* Fuse backward of Linear( Act(x))

1. Added backward fusion pass to Linear( Act(x)).
2. Added backward fusion pass to Linear(x).

* Added UTs to backward fusion of Linear(Act(x)).

* Complete document of arguments to fused_gemm_epilogue_op.

* Made arguments of some functions pass by reference.

* Modify code with review comments.

1. Made arguments of some function pass by reference.
2. Removed redundant code.
3. Followed Google code style to change code.

* Made 'const' code style be consistent

* Fixed random seed of python UTs.

* Set Compiling constrains to cuBlasLt

1. Require CUDA 11.6+
2. Remove fuse_gemm_epilogue related tests when CUDA < 11.6.

* Code Reivew from Paddle

1. Changed arguments name is_first_gemm to without_x_gradient for
clearing.
2. Applied PADDLE_THROW in fused_gemm_epilogue_op.

* Remove EpilogueSingleton

1. Applied ReserveSpace to replace Epilogue for passing auxiliary
pointers between FWD and BWD.

* Fix a logical error and enhance UTs.

1. Added act op count checking in UTs.
2. Fix issue to fuse backward or ReLU(Linear(X)).
3. TODO: solve GELU fusion issues.

* Fix Linear and GeLU fusion issues.

1. Modified graph_detech_pattern to fit with both linear wiht gelu or
relu.
2. Modified data range in Uts to allow negative values.

* Removed fused_gemm_epilogue_op.h.

* Rename namespace pten to phi.

* Rename name of arguments in fused_gemm_epilogue_op

1. bias -> Bias.
2. out -> Out.
3. reserve_space -> ReserveSpace.

* Change EpiloguePassActivationCache as local variable.

1. Removed singleton in EpiloguePassActivationCache.
2. Made EpiloguePassActivationCache as an argument to each pass
functions.

2a3d9eca

03 3月, 2022 1 次提交

Move bn to pten (#39347) · ebd0f512

由 hong 提交于 3月 03, 2022

* add bn cpu version; test=develop

* move batch norm to pten

* move batch norm to pten; test=develop

* fix bug; test=develop

* fix func::tranpose depend bug; test=develop

* fix compile bugs; test=develop

* fix use_op batch_norm bug; test=develop

* fix cudnn bn add relu test; test=develop

* fix pten context build and double grad bug; test= develop

* remve useless code; test=develop

* add batch norm gpu fp16 support; test=develop

* fix test bn op bug; test=develop

* remove output dtype set; test=develop

* fix bug; test=develop

* fix bug; test=develop

* fix applay pass to program bug; test=develop

* revert to develop; test=develop

* fix rocm bug; test=develop

* revert operator to develop; test=develop

* fix pre_commit; test=develop

* fix statci check error; test=develop

* resolve conflict; test=develop

* ana batch norm bug;

* revert batch norm op

* resolve conlict

* fix nan inf and speed bug; test=develop

* fix bug; test=develop

* fix error; test=develop

* test expand op; test=develop

* fix bug; test=develop

* resolve confilct

* resolve confilct; test=develop

* polish code; test=develop

* polish code; test=develop

* change mutable data to ctx alloc; test=develop

* make format same with ci; test=develop

* fix format error with ci; test=develop

ebd0f512

01 3月, 2022 1 次提交
- W
  remove conv_affine_channel_fuse_pass (#39817) · fc06be9d
  由 wenbin 提交于 3月 01, 2022
```
* remove

* pass

* more pass
```
  fc06be9d
28 2月, 2022 1 次提交

[Pten->Phi PR4] Rename pten in funcs to phi (#39961) · eb42dd52

由 Chen Weihang 提交于 2月 28, 2022

* rename pten_utils to phi_utils

* rename pten_utils target

* rename Pten to Phi

* replace pten with phi

* resolve conflict

eb42dd52

25 2月, 2022 1 次提交

[Phi] Support cudnn kernel moving & move softmax kernels (#39547) · 8895379a

由 Chen Weihang 提交于 2月 25, 2022

* support cudnn kernel moving

* polish cmake rules

* add unittest for coverage

* remove orig kernel

* remove softmax cudnn kernel

* fix softmax test failed

* fix npu func error

* resolve conflict

* rename gpu dnn kernels

* fix name rule error

* fix compile error

* update fp16 namespace

8895379a

24 2月, 2022 1 次提交
- J
  Fix for split op in BF16 inference (#39548) · 75f91ce4
  由 jakpiase 提交于 2月 24, 2022
```
* Fix for split bf16 inference

* added test for pass

* changes after review
```
  75f91ce4
22 2月, 2022 2 次提交
- W
  [Paddle-Inference] fix pass and convert_op for preln_ernie (#39733) · 574f3402
  由 Wangzheee 提交于 2月 22, 2022
```
* fix pass and convert_op for preln_ernie and add preln_ernie'flag in pass
```
  574f3402
- A
  
  sync recent changes (#39763) · d945e24c
  由 Allen Guo 提交于 2月 22, 2022
  
  d945e24c
21 2月, 2022 1 次提交

Update record interface using part2 (#39694) · c984cd85

由 chenjian 提交于 2月 21, 2022

* fix RecordEvent interface

* modify default level to 4

* update interface use

* add const default trace level

* update record event interface using

* update record event interface using

* update operator.cc

* update part2

* update part1

* fix include profiler.h header in ps server

* fix include profiler.h header in ps server

* fix profiler.h header

c984cd85

20 2月, 2022 1 次提交

[PTen->Phi PR1] Change pten dirname and namespace to phi (#39748) · dcfe1986

由 Chen Weihang 提交于 2月 20, 2022

* rename pten dir to phi

* rename namespace to phi

* rename infrt pten dir to phi

* resolve conflict

* rename pten to phi in cmake

* revert all infrt change

* change needed files

* fix infrt failed

* fix inference failed

dcfe1986

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致