提交 · 8015fbd60f4e96ffc4ad2bba46e16b9a2a4a5941 · BaiXuePrincess / Paddle

29 12月, 2022 1 次提交

[Cherry-pick]Move sum op to PHI && Fix MetaTensor's bug when run infermeta (#49342) · 8015fbd6

由 YuanRisheng 提交于 12月 29, 2022

* cherry-pick 45860

* [BUG FIX]Fix MetaTensor's bug when run infermeta (#46265)

* fix sum bug

* fix ci bugs

* fix ci bugs

* update code according comment

8015fbd6

22 12月, 2022 1 次提交

Fix mixed precision bug (#49239) · 11c7f570

由 Yuanle Liu 提交于 12月 22, 2022

* [Release2.4] Revert python link prs (#48573)

* Revert "Fix mac link python (#48017)"

This reverts commit 3fa7a736.

* Revert "[Cherry-pick] Fix python link error (#47811)"

This reverts commit ff642c68.

* Update config.go

* fix mixed precision inference
Co-authored-by: NChen Weihang <chenweihang@baidu.com>

11c7f570

19 12月, 2022 1 次提交

[cherry-pick][Inference] support mixed precision inference (#49077) · ddcd1b61

由 Yuanle Liu 提交于 12月 19, 2022

* [Release2.4] Revert python link prs (#48573)

* Revert "Fix mac link python (#48017)"

This reverts commit 3fa7a736.

* Revert "[Cherry-pick] Fix python link error (#47811)"

This reverts commit ff642c68.

* Update config.go

* [Paddle Inference] Add float_to_half_pass to support  inference with mixed precision (#47993)

* [Inference] optimize some code and fix some bug (#48780)

* clean ir_pass_manager and fix map_depthwise_conv_to_conv_pass

* fix unitest timeout

* [Paddle Inference] clean unused code  (#48392)

* fix

* update

* update
Co-authored-by: NChen Weihang <chenweihang@baidu.com>

ddcd1b61

29 11月, 2022 1 次提交

[cherry-pick] updating mul and matmul with set_mem_desc and fix... · 9e2ba9b9

由 yeliang2258 提交于 11月 29, 2022

[cherry-pick] updating mul and matmul with set_mem_desc and fix squeeze_transpose for MKLDNN (#47951)

* Fix slice bugs in MKLDNN when input dims are zeros (#46671)

* fix slice bugs

* fix

* update code

* fix

* update code

* updating mul and matmul with set_mem_desc (#45624)

* - mul & matmul changes

- fix

- bs16 correction of strides

* - cosmetic fixes

* - lint

* - fix

* - fix

* - format -> mem_desc

* - fix

* - fix

* - fix

* - fix

* - fix

* fix squueze_transpose (#47911)
Co-authored-by: NJacek Czaja <jacek.czaja@intel.com>

9e2ba9b9

28 11月, 2022 1 次提交

Cherrypick NV fixes to release/2.4 (#48263) · 7a0b8625

由 zlsh80826 提交于 11月 28, 2022

* Reduce squeeze2_matmul_fuse_pass, flattent tests time (#47098)

* Add missing fp32 config and reduce the testing combination

* Reduce trt matmul pass test max examples

* Loose TRT fp16 tests tolerance (#47100)

* Loose TRT half test tolerance to 1e-3 (#47101)

* Loose TRT half test tolerance to 1e-3 (#47106)

* Update distributed_strategy.proto (#46531)

* Close popen pipe after used (#47053)

* Add launch_bounds (#47285)

* Fix TRT UT failures (#47488)

* Format cherry-picked commits

* CudnnNormConvolution is no longer supported on NVIDIA Hopper GPUs (#48203)

* Skip tests that use fused_ops on H100

* Add error message to FusedOps on H100
Co-authored-by: NShijie <505749828@qq.com>
Co-authored-by: NLeo Chen <39020268+leo0519@users.noreply.github.com>
Co-authored-by: NTian Zheng <tizheng@nvidia.com>

7a0b8625

10 11月, 2022 1 次提交
- R
  Fuse multi transformer layer pass (#47541) (#47830) · 3a6cc57c
  由 RichardWooSJTU 提交于 11月 10, 2022
```
* add fuse_multi_transformer_layer_pass
```
  3a6cc57c
09 11月, 2022 1 次提交
- H
  [cherry-pick] Squeeze2 and transpose2 fuse using oneDNN(#47712) · ea5f44b8
  由 Hui Zhang 提交于 11月 09, 2022
```
* suqeeze2 + transpose2 fuse onednn cherrypick 2.4

* format

* fix merge
```
  ea5f44b8
08 11月, 2022 1 次提交

[CHERRY-PICK] Added caching to oneDNN FC and op+unsqueeze2 and op+reshape2 fuse passes (#47690) · d0e19af3

由 jakpiase 提交于 11月 08, 2022

* fc cherrypick

* another files added

* added transpose cherrypick

* reverter somebodys fc changes

* minor fix

* minor fix

* cherry-pick of fc+act changes

* minor fix

* fix

d0e19af3

03 11月, 2022 3 次提交
- S
  
  FC/matmul(v2) + scale fuse pass (#47420) · 99c872fa
  由 Sławomir Siwek 提交于 11月 03, 2022
  
  99c872fa
- Y
  Fix ComputePropagateScalesMkldnnPass of MKLDNN (#47574) (#47639) · 559b9754
  由 yeliang2258 提交于 11月 03, 2022
```
* add constant_folding_pass pass for mkldnn int8

* update UpdateScaleOpInOutScales
```
  559b9754
- K
  [cherry pick] fix memory copy in prepare_data of FusedMultiTransformer pass (#47308) · ba4fbe71
  由 Kaipeng Deng 提交于 11月 03, 2022
```
* fix memory copy in prepare_data. test=develop

* add cache_kv fp16 support. test=develop

* fit for simplify_with_basic_ops_pass. test=develop
```
  ba4fbe71
01 11月, 2022 1 次提交

[cherry-pick][code-gen] Support code-gen for opmaker of sparse op (#46993) (#47417) · 601626ac

由 zyfncg 提交于 11月 01, 2022

* support generating code of opmaker for backward op invoke forward op (#46912)

* [code-gen] Support code-gen for opmaker of sparse op (#46993)

* support generating code of opmaker for backward op invoke forward op

* gsupport code-gen of opmaker for sparse op

* refind logic of choose phi kernrel

* fix complie budg

* fix code_gen bug

* fix bug

* fix kernel signature code-gen

* fix complie bug of VarType

* fix complie bug of VarType

* fix test_sparse_conv_op

* fix test_sparse_norm_op

* [Phi] Refactor logic of judging whether having a phi kernrel (#46920)

* refind logic of choose phi kernrel

* fix complie budg

* update cmake

601626ac

20 10月, 2022 3 次提交

K
[cherry pick] Add FusedMultiTransformer fuse pass for GPT3 (#47150) · 396427a7
由 Kaipeng Deng 提交于 10月 20, 2022
```
* add fused_attention_pass. test=develop

* support fp16. test=develop

* fix format. test=develop
```
396427a7

[cherry-pick] Fix quantize model deploy bug in MKLDNN (#47119) · c2d344dd

由 yeliang2258 提交于 10月 20, 2022

* Fix quantize model deploy bugs when using MKLDNN (#45920)

* fix immutable op quantize bugs

* fix

* fix build bug

* fix test

* notest,test=inference

* fix ppyoloe acc drop bugs

* fix test

* fix test

* add test

* fix

* fix

* fix test

* fix refined name bug

* fix test

* bias fix

* fix matmul weight dequant bug

* re-ci

* fix tester

* fix test

* fix tester

* update weight dequantize func

* update code

* update test for converage

* update test

* update cmake

* update cmakelist

* update code

* rerun ci

* remove useless code

* re-ci

* update code

* update code

* fix header

* update code for log

c2d344dd

W
[Cherry-pick] layernorm shift partation enhance (#47086) · 9ed1454a
由 Wang Bojun 提交于 10月 20, 2022
```
* Enhance the layernorm shift partation fuse op when shift size > 0 (roll shifting)
* fix cherry-pick test
```
9ed1454a

19 10月, 2022 3 次提交

Add unsigned int8 scale propagation (#46378) (#47156) · 66dccd7d

由 yeliang2258 提交于 10月 19, 2022

* Add unsigned int8 propagation

* Add or modify unit tests

* Correct concat scale checking

* Apply review suggestions

* Corrections
Co-authored-by: Njoanna.wozna.intel <joanna.wozna@intel.com>

66dccd7d

Add enable_partial_send_recv switch in pipeline_configs (#46992) (#47083) · 1d015f12

由 Ghost Screaming 提交于 10月 19, 2022

* Fix bug of reduce_sum op. When input.numel() > INT32_MAX, its result
is wrong.

* Support allow_partial switch, which can be configure in
pipeline_configs. If sent tensor are not the same from
different hosts, they shouldn't been sent partially and
then concated as a whole tensor.

* Change name allow_partial to enable_partial_send_recv.

* Add global variable _enable_partial_send_recv

1d015f12

W
[Dy2St]Fix recurrent op eager deletion pass error in dy2st (#47105) (#47134) · 69515e90
由 WangZhen 提交于 10月 19, 2022
```
[CherryPick][Dy2St]Fix recurrent op eager deletion pass error in dy2st
```
69515e90

17 10月, 2022 2 次提交

Z
[cherry-pick]Sparse static graph (#46838) · 10225d22
由 zhangkaihuo 提交于 10月 17, 2022
```
cherry-pick : #46322, #46245
Sparse API 支持静态图
```
10225d22

[IPU] paddle-inference support custom-ops (#45235) (#46868) · bd89be12

由 Allen Guo 提交于 10月 17, 2022

* paddle-inference support custom-ops
Co-authored-by: NZhixin Yao <zhixiny@graphcore.ai>

* fix tolower
Co-authored-by: NZhixin Yao <zhixiny@graphcore.ai>
Co-authored-by: NZhixin Yao <zhixiny@graphcore.ai>

bd89be12

14 10月, 2022 1 次提交
- Z
  
  [Paddle-TRT] support new quant format from slim (#46022) (#46979) · b8677c0d
  由 zhoutianzi666 提交于 10月 14, 2022
  
  b8677c0d
13 10月, 2022 1 次提交
- Z
  
  interpretercore thread not always spin (#46687) (#46952) · d90aaa6e
  由 zhangbo9674 提交于 10月 13, 2022
  
  d90aaa6e
11 10月, 2022 1 次提交

[cherry-pick] [PHI] relu6_grad kernel (#46501) (#46862) · 2bcbf8b0

由 Sławomir Siwek 提交于 10月 11, 2022

* [PHI] Migrate gelu kernels (#45596)

* gaussian random

* mkldnn to onednn renaming

* fix merge conflicts

* remove fluid code

* onednn renaming

* gelu fwd

* sort activations

* gelu gradient

* remove unused macros

* merge conflicts

* fix merge conflicts

* remove extra contraint from gelu op

* [PHI] relu6_grad kernel (#46501)

* Relu6

* remove fluid handler

* add individual kernel signature

* coding style

* replace bounded_relu with clip

* whitespace

* code style

2bcbf8b0

27 9月, 2022 1 次提交

[cherry-pick] clear extra attrs of some ops in OpMaker (#45845, #45984, 46060) (#46218) · 0cc2251f

由 zyfncg 提交于 9月 27, 2022

* Clear extra attrs of elementwise op in OpMaker (#45845)

* clear extra attrs of elementwise op in opmaker

* fix op_debug_string_test

* fix bug of grad_add

* fix sort of runtime attrs

* Clear extra attrs of scale in OpMaker (#45984)

* clear extra attr of scale in opmaker

* fix sum bug

* fix merge conflict

* fix minus

* Clear extra attributes of some Op in OpMaker (Part4) (#46060)

* clear extra attr of some ops in opmaker

* revert clear use_cudnn for pool

* fix test_operator_desc

* fix Attr interface of OperatorBase

* fix code stype

0cc2251f

23 9月, 2022 1 次提交
- Z
  
  fix compile problem (#46354), test=kunlun (#46383) · 6a508334
  由 zyfncg 提交于 9月 23, 2022
  
  6a508334
20 9月, 2022 5 次提交
- H
  [PolishComments] Polish some code comments (#46032) (#46261) · 42e56f65
  由 HongyuJia 提交于 9月 20, 2022
```
* polish code comments

* polish data_device_transform.cc
```
  42e56f65
- L
  [cherry-pick] Refine thread pool config of interpretercore (#46219) · 1418a719
  由 Leo Chen 提交于 9月 20, 2022
```
* add config

* add config

* follow comments

* fix serial run
```
  1418a719
- Z
  [Inference] fix preln_residual_bias_fuse_pass bug in TNT_small model (#46178) (#46260) · c384b00d
  由 zhoutianzi666 提交于 9月 20, 2022
```
* fix preln_residual_bias_fuse_pass bug in TNT_small model
```
  c384b00d
- Z
  Run_program_op add scope cache & reuse (#45813) (#46223) · 4f28a4c2
  由 zhangbo9674 提交于 9月 20, 2022
```
* add scope cache & reuse

* add gc scope for end of each train step

* del scope reuse for jit

* refine code

* test
```
  4f28a4c2
- Z
  Fix wrong eigen header include (#46082) (#46202) · ac8cce20
  由 zyfncg 提交于 9月 20, 2022
```
* fix wrong eigen header include

* fix complie bug

* fix nan_inf_utils_detail

* fix resource_manager

* fix conv_miopen_helper
```
  ac8cce20
19 9月, 2022 1 次提交
- X
  
  convfusion_cache (#46054) · f4ec1563
  由 xiaoxiaohehe001 提交于 9月 19, 2022
  
  f4ec1563
16 9月, 2022 1 次提交

[Cherry-pick] Normalize yaml name and label (#46052) · 8caaf85a

由 Chen Weihang 提交于 9月 16, 2022

* normalize yaml file name (#45894)

* Clear extra attributes of activation op in OpMaker (#45772)

* clear extra attr of activation op in opmaker

* fix syntax bug

* fix mkldnn kernel

* fix merge conflict

* fix bug

* [PHI] Normalize yaml op label (#45976)

* normalize yaml op label

* revert op_compat yaml change

* fix prelu and rnn compat problem

* replace api by op

* support assign op backward refuse forward (#45879)

* normize yaml backward op label (#46028)
Co-authored-by: Nzyfncg <zhangyunfei07@baidu.com>
Co-authored-by: NCharles-hit <56987902+Charles-hit@users.noreply.github.com>

8caaf85a

15 9月, 2022 2 次提交
- W
  Support 0 shapes input Tensor for MKL slice (#45930) (#46072) · 903c87bd
  由 WangZhen 提交于 9月 15, 2022
```
Support 0 shapes input Tensor for MKL slice kernel
```
  903c87bd
- Z
  Delete eigen header in data_type.h (#46036) (#46066) · 2680a71e
  由 zyfncg 提交于 9月 15, 2022
```
* delete eigen header in data_type.h

* fix complie bug

* refactor
```
  2680a71e
14 9月, 2022 3 次提交
- J
  
  merge python lib (#46013) · 5130b0a1
  由 JingZhuangzhuang 提交于 9月 14, 2022
  
  5130b0a1
- L
  
  set device id before op run (#45994) · 2fac8abb
  由 Leo Chen 提交于 9月 14, 2022
  
  2fac8abb
- P
  
  delete new executor log (#45917) · e223cf7b
  由 pangyoki 提交于 9月 14, 2022
  
  e223cf7b
13 9月, 2022 2 次提交
- J
  
  cherry pick softmax infer kernel (#45957) · 0903020d
  由 JingZhuangzhuang 提交于 9月 13, 2022
  
  0903020d
- R
  [cherry-pick] Allow manaully set py_reader name in standalone executor (#45898) (#45931) · 29c44eb2
  由 Ruibiao Chen 提交于 9月 13, 2022
```
* Allow manaully set py_reader name in standalone executor

* Fix CI errors
```
  29c44eb2
09 9月, 2022 1 次提交
- R
  [CustomDevice] add dy2static support (#45878) · abc85c50
  由 ronnywang 提交于 9月 09, 2022
```
* [CustomDevice] add dy2static support

* update
```
  abc85c50

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致