提交 · 34f67a88a625ea3d5aeea470a75d287e2e54d4cc · PaddlePaddle / Paddle

08 11月, 2022 2 次提交
- K
  
  add fuse_multi_transformer passes to fp16. test=develop (#47733) · 34f67a88
  由 Kaipeng Deng 提交于 11月 08, 2022
  
  34f67a88
- J
  [CHERRY-PICK] Added caching to oneDNN FC and op+unsqueeze2 and op+reshape2 fuse passes (#47690) · d0e19af3
  由 jakpiase 提交于 11月 08, 2022
```
* fc cherrypick

* another files added

* added transpose cherrypick

* reverter somebodys fc changes

* minor fix

* minor fix

* cherry-pick of fc+act changes

* minor fix

* fix
```
  d0e19af3
07 11月, 2022 1 次提交

[cherry-pick2.4]docs fix (#47669) · cf668ab3

由 Ligoml 提交于 11月 07, 2022

* #46165

* #45752

* fix some doc bug test=document_fix (#45488)

* fix some doc bug test=document_fix

* fix some docs issues, test=document_fix

* beta -> \beta in softplus

* threshold -> \varepsilon in softplus

* parameter name

* delta -> \delta in smooth_l1_loss

* fix some docs test=document_fix

* fix docs test=document_fix

* fix docs && 增加空行 test=document_fix

* Update python/paddle/nn/functional/activation.py, test=document_fix

* Update python/paddle/nn/layer/activation.py, test=document_fix
Co-authored-by: NSigureMo <sigure.qaq@gmail.com>

* [docs] add ipustrategy Hyperlink (#46422)

* [docs] add ipustrategy Hyperlink

* fix ipu_shard_guard docs; test=document_fix

* [docs] add set_ipu_shard note

* [docs] fix hyperlink

* update framework.py

* fix mlu_places docs; test=document_fix

* fix put_along_axis docs; test=document_fix

* fix flake8 W293 error, test=document_fix

* fix typo in typing, test=document_fix
Co-authored-by: NLigoml <39876205+Ligoml@users.noreply.github.com>
Co-authored-by: NNyakku Shigure <sigure.qaq@gmail.com>

* #46659

* Update README_cn.md (#46927)

修复了错别字

* #46738

* fix paddle.get_default_dtype (#47040)

Chinese and English return values are inconsistent

* fix bug
Co-authored-by: N张春乔 <83450930+Liyulingyue@users.noreply.github.com>
Co-authored-by: NInfinity_lee <luhputu0815@gmail.com>
Co-authored-by: Nmrcangye <chenloong@88.com>
Co-authored-by: NSigureMo <sigure.qaq@gmail.com>
Co-authored-by: Ngouzil <66515297+gouzil@users.noreply.github.com>
Co-authored-by: NHamid Zare <12127420+hamidzr@users.noreply.github.com>
Co-authored-by: NSqhttwl <61459740+Sqhttwl@users.noreply.github.com>
Co-authored-by: NOccupyMars2025 <31559413+OccupyMars2025@users.noreply.github.com>
Co-authored-by: N超级码牛 <54444805+SuperCodebull@users.noreply.github.com>
Co-authored-by: Njzhang533 <jzhang533@gmail.com>

cf668ab3

03 11月, 2022 3 次提交
- S
  
  FC/matmul(v2) + scale fuse pass (#47420) · 99c872fa
  由 Sławomir Siwek 提交于 11月 03, 2022
  
  99c872fa
- Y
  Fix ComputePropagateScalesMkldnnPass of MKLDNN (#47574) (#47639) · 559b9754
  由 yeliang2258 提交于 11月 03, 2022
```
* add constant_folding_pass pass for mkldnn int8

* update UpdateScaleOpInOutScales
```
  559b9754
- K
  [cherry pick] fix memory copy in prepare_data of FusedMultiTransformer pass (#47308) · ba4fbe71
  由 Kaipeng Deng 提交于 11月 03, 2022
```
* fix memory copy in prepare_data. test=develop

* add cache_kv fp16 support. test=develop

* fit for simplify_with_basic_ops_pass. test=develop
```
  ba4fbe71
01 11月, 2022 2 次提交

[cherry-pick][code-gen] Support code-gen for opmaker of sparse op (#46993) (#47417) · 601626ac

由 zyfncg 提交于 11月 01, 2022

* support generating code of opmaker for backward op invoke forward op (#46912)

* [code-gen] Support code-gen for opmaker of sparse op (#46993)

* support generating code of opmaker for backward op invoke forward op

* gsupport code-gen of opmaker for sparse op

* refind logic of choose phi kernrel

* fix complie budg

* fix code_gen bug

* fix bug

* fix kernel signature code-gen

* fix complie bug of VarType

* fix complie bug of VarType

* fix test_sparse_conv_op

* fix test_sparse_norm_op

* [Phi] Refactor logic of judging whether having a phi kernrel (#46920)

* refind logic of choose phi kernrel

* fix complie budg

* update cmake

601626ac

Y

fix p2p comm memory release logic (#47497) (#47517) · 0201ccc4
由 Yuang Liu 提交于 11月 01, 2022

0201ccc4

29 10月, 2022 1 次提交
- A
  [JITLayer]Enable OneDNN on CPU and Fix zero shape (#47428) (#47436) · f4788442
  由 Aurelius84 提交于 10月 29, 2022
```
* [JITLayer]Enable OneDNN on CPU and Fix zero shape
```
  f4788442
28 10月, 2022 4 次提交
- W
  [Dy2St]Fix abnormal growth of memory in train mode and no_grad for Dy2St (#47398) (#47414) · 7618cbdc
  由 WangZhen 提交于 10月 28, 2022
```
* [Dy2St]Fix abnormal growth of memory in train mode and no_grad for Dy2St 
```
  7618cbdc
- A
  [Cherry-pick][JIT] Add Predictor for JITLayer (#47379) (#47419) · c42929c5
  由 Aurelius84 提交于 10月 28, 2022
```
* [JIT] Add Predictor for JITLayer (#47379)

* add predictor_engine

* add predictor_engine

* fix zero shape

* fix lodTensor

* fix unittest

* fix code style

* update CmakeList

* fix new executor
```
  c42929c5
- Z
  [cherry-pick]add sync_batch_norm_bn and deliver indices_dict (#47407) · 0fa8309a
  由 zhangkaihuo 提交于 10月 28, 2022
```
add sync_batch_norm_bn and deliver indices_dict 
```
  0fa8309a
- Z
  
  support multiclass_nms in int8 (#47337) · eec93bda
  由 zhoutianzi666 提交于 10月 28, 2022
  
  eec93bda
27 10月, 2022 2 次提交
- Z
  [cherry-pick] add batch_norm_kernel (#47394) · b143e008
  由 zhangkaihuo 提交于 10月 27, 2022
```
* cherry-pick #46359 and resolve conflict
```
  b143e008
- W
  fix slice bug (#47349) (#47376) · 99cec1a6
  由 wanghuancoder 提交于 10月 27, 2022
```
修改一处Slice的bug
```
  99cec1a6
26 10月, 2022 3 次提交

Z
Fix inference performance problem caused by selecting cudnn kernel of softmax (#47338) (#47367) · 0369cd0f
由 zyfncg 提交于 10月 26, 2022
```
* fix inference perfermence problem caused by selecting cudnn kernel for softmax

* recover use_cudnn in opmaker of softmax
```
0369cd0f
Y
Added workaround for elementwise oneDNN kernel (#47080) (#47342) · 7c6550a6
由 yeliang2258 提交于 10月 26, 2022
```
* return proper state

* fix for dims

* fix
Co-authored-by: Njakpiase <jakpia21@gmail.com>
```
7c6550a6

[Cherry-pick][Release/2.4]Refine the memory usage of fused_attention and... · 9a6dd8f8

由 sneaxiy 提交于 10月 26, 2022

[Cherry-pick][Release/2.4]Refine the memory usage of fused_attention and fused_feedforward ops (#47235)

* fix fused_attention fused_feedforward

* fix ci

* fix ci

* fix ci PADDLE_GET_CONST

* fix ci ut

9a6dd8f8

24 10月, 2022 1 次提交

Support BF16 training for sharding (#46846) (#47246) · 5c85f1a7

由 Ghost Screaming 提交于 10月 24, 2022

* Fix bug of reduce_sum op. When input.numel() > INT32_MAX, its result
is wrong.

* support pure bfloat16

* support bf16 linear

* update PR to pass CI

* tiny fix where_grad_kernel.cu

* Support bfloat16 type for reducer and sharding.

* Fix some bug.

* Polish code.

* Polise code.

* Add bfloat16 datatype in fill_grad kernels.
Co-authored-by: Nsneaxiy <sneaxiy@126.com>
Co-authored-by: Nsneaxiy <sneaxiy@126.com>

5c85f1a7

21 10月, 2022 3 次提交

Z
[Paddle-TRT] Fix conv2d (#47034) · d42a1dc3
由 zhoutianzi666 提交于 10月 21, 2022
```
* forbid Conv2D into Paddle-TensoRT when weight is not persistable.
```
d42a1dc3
J
Add infer prune function (#47047) · 8739497c
由 JingZhuangzhuang 提交于 10月 21, 2022
```
* Add infer prune function

* add fusion op
```
8739497c

Add paddle audio dataset & backend (#45939) (#47230) · 29c9f027

由 YangZhou 提交于 10月 21, 2022

* add audio feature dataset

* fix coding style

* fix coding style2

* rm librosa

* rm voxceleb

* rm librosa in test

* add scipy fftpack

* add functional

* fix setup

* fix setup2

* rm colorlog

* refactor dataset __init__.py

* fix converage

* fix librosa import error

* fix windows test

* fix windows ci

* rm datasets

* fix setup

* remove testdata

* add librosa in requirement

* add librosa in requirement2

* change librosa to 0.8.1

* update ci docker

* fix ci error

* fix ci error2

* fix ci coverage

* fix converage

* fix coverage

* rm audio_base in test, notest,test=coverage

* fix copyright

* rm backend

* add datast in __init__

* rm compliance&&add function test

* fix setup

* fix windows

* fix windows2

* fix test timeout

* add backend & datasets

* fix bugs

* fix ci time issue

* add dataset test

* rm test_audio_feature

* avoid windows isssue, tmp

* note windows isssue

* skip windows issue

* refactor dataset test

* add dataset.py

* fix dtype in layers.mfcc

* fix ci-static-check

* fix dtype in layers.mfcc && fix ci-static-check

* add relative accuracy

* modity API.spec

* skip cuda11.2 test

* skip cuda11.2 test2

* skip cuda11.2

* change dataset name

* fix format

* update api.spec

* update api.spec2

* fix coverage

* add dataset test

* rm download load dict

* rm download load dict in init

* update api.spec3

* fix dataset coverage

* fix coverage

* fix coverage2

* restore api.spec

* restore api.spec2

* fix api-spec 3

* fix api-spec 4

* fix api.spec

* fix api.spec6

* refactor init_backend

* fix typo

* change paddleaudio backend set

* fix get_current_audio_backend()

* fix format

* fix format2

* remove format in parameters

* fix format2

* add warning massage in wave_backend && remove redundant audio util

* rm audio util in print_signatures

* fix format3

* add tess dataset license

* format warning

* add more info in warning msg

* add paddleaudio version check

* replace dataset esc50 with tess

* add tess dataset && rm numpy transform in dataset.py

* fix set audio backend bug

* fix equal error

* fix format && coverage error

* add api example

* fix format

* fix error

* fix typo

* add noqa in __init__

* fix backend doc example error

* rm seed in dataset

* update bakcend example

* fix typo

* fix typo

* fix example err

* fix typo

* fix ci dataset test

* fix example fil

* try to fix ci

* clean dataset doc

* change get_current_audio_backend to get_current_backend

* creplace paddle.audio.backends.info with paddle.audio.info, same with load, save

* fix ci error

* repalce api in test_audio_backend

* fix save&&set_backend exmaple

29c9f027

20 10月, 2022 8 次提交

[Cherry-pick] Simplify conv codes and fix cache and autotune bugs. (#47197) · c0ed8729

由 Yiqun Liu 提交于 10月 20, 2022

* Simplify the codes of conv. (#45966)

* Enable to record whether the conv algo is got by exhaustive search to fix autotune cache bug. (#47065)

c0ed8729

K
[cherry pick] Add FusedMultiTransformer fuse pass for GPT3 (#47150) · 396427a7
由 Kaipeng Deng 提交于 10月 20, 2022
```
* add fused_attention_pass. test=develop

* support fp16. test=develop

* fix format. test=develop
```
396427a7

[cherry-pick] Fix quantize model deploy bug in MKLDNN (#47119) · c2d344dd

由 yeliang2258 提交于 10月 20, 2022

* Fix quantize model deploy bugs when using MKLDNN (#45920)

* fix immutable op quantize bugs

* fix

* fix build bug

* fix test

* notest,test=inference

* fix ppyoloe acc drop bugs

* fix test

* fix test

* add test

* fix

* fix

* fix test

* fix refined name bug

* fix test

* bias fix

* fix matmul weight dequant bug

* re-ci

* fix tester

* fix test

* fix tester

* update weight dequantize func

* update code

* update test for converage

* update test

* update cmake

* update cmakelist

* update code

* rerun ci

* remove useless code

* re-ci

* update code

* update code

* fix header

* update code for log

c2d344dd

Z
[Paddle-TRT][Cherry-Pick]Rewrite strided_slice converter using shape tensor (#47153) · 68c4ac31
由 zhoutianzi666 提交于 10月 20, 2022
```
* stride_to_24

* fix CI failing
```
68c4ac31
W
[Cherry-pick] layernorm shift partation enhance (#47086) · 9ed1454a
由 Wang Bojun 提交于 10月 20, 2022
```
* Enhance the layernorm shift partation fuse op when shift size > 0 (roll shifting)
* fix cherry-pick test
```
9ed1454a
J

add _get_phi_kernel_name interface (#47033) · 4c925242
由 JingZhuangzhuang 提交于 10月 20, 2022

4c925242
S
[Cherry-pick][Release/2.4] Fix some operators when the tensor.numel() > INT32_MAX (#47191) · c74bf018
由 sneaxiy 提交于 10月 20, 2022
```
Fix some operators when the tensor.numel() > INT32_MAX
```
c74bf018
S
[Cherry-pick][Release/2.4] support pure bfloat16 for more ops · da7d2f29
由 sneaxiy 提交于 10月 20, 2022
```
support pure bfloat16 for more ops
```
da7d2f29

19 10月, 2022 4 次提交

Add unsigned int8 scale propagation (#46378) (#47156) · 66dccd7d

由 yeliang2258 提交于 10月 19, 2022

* Add unsigned int8 propagation

* Add or modify unit tests

* Correct concat scale checking

* Apply review suggestions

* Corrections
Co-authored-by: Njoanna.wozna.intel <joanna.wozna@intel.com>

66dccd7d

Add enable_partial_send_recv switch in pipeline_configs (#46992) (#47083) · 1d015f12

由 Ghost Screaming 提交于 10月 19, 2022

* Fix bug of reduce_sum op. When input.numel() > INT32_MAX, its result
is wrong.

* Support allow_partial switch, which can be configure in
pipeline_configs. If sent tensor are not the same from
different hosts, they shouldn't been sent partially and
then concated as a whole tensor.

* Change name allow_partial to enable_partial_send_recv.

* Add global variable _enable_partial_send_recv

1d015f12

W
[Dy2St]Fix recurrent op eager deletion pass error in dy2st (#47105) (#47134) · 69515e90
由 WangZhen 提交于 10月 19, 2022
```
[CherryPick][Dy2St]Fix recurrent op eager deletion pass error in dy2st
```
69515e90
H
[ cherrypick] Construct exec and ctx only once in cond op to speed up (#47012) · fcb9c0b5
由 Hui Zhang 提交于 10月 19, 2022
```
Construct exec and ctx only once in cond op to speed up
```
fcb9c0b5

18 10月, 2022 6 次提交
- W
  
  reconstruct code for convert_fp16 (#46428) (#47087) · de6f15b6
  由 Wilber 提交于 10月 18, 2022
  
  de6f15b6
- W
  Add symbolic shape deduction function for unfold, scatter_nd_add, p_norm,... · 2cc8797e
  由 weishengying 提交于 10月 18, 2022
```
Add symbolic shape deduction function for unfold, scatter_nd_add, p_norm, grid_sampler, pad3d, etc (#46291) (#47003)
```
  2cc8797e
- [cherry-pick 2.4] add sparse api transpose/reshape/is_same_shape (#47076) · 5fef043d
  由 zhouweiwei2014 提交于 10月 18, 2022
```
新增sparse.is_same_shape、sparse.reshape、sparse.transpose 三个API
```
  5fef043d
- Z
  
  support shape tensor is the input of trt-subgraph (#47066) · 5a44c124
  由 zhoutianzi666 提交于 10月 18, 2022
  
  5a44c124
- H
  [cherry-pick] Fix perf issues of mp/pp/fuse in eager mode (#47071) · b84edd90
  由 Haohongxiang 提交于 10月 18, 2022
```
* [Dygraph] Fix performance of pp+mp by using send/recv_calc_stream instead of send/recv (#46116)

* [Dygraph] Fix Perf of FusedFeedForward and FusedAttention with AllReduce (#46780)

* update
```
  b84edd90
- W
  [Cherry pick] trt pool2d adaptive ifx (#47069) · 5f6b9f1b
  由 Wang Bojun 提交于 10月 18, 2022
```
* draft with debug print
* remove debug print
* bug fix for ci
```
  5f6b9f1b

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功