提交 · 0fa8309aed4f47d52f39ab808bf51242a2c7e016 · PaddlePaddle / Paddle

28 10月, 2022 2 次提交
- Z
  [cherry-pick]add sync_batch_norm_bn and deliver indices_dict (#47407) · 0fa8309a
  由 zhangkaihuo 提交于 10月 28, 2022
```
add sync_batch_norm_bn and deliver indices_dict 
```
  0fa8309a
- Z
  
  support multiclass_nms in int8 (#47337) · eec93bda
  由 zhoutianzi666 提交于 10月 28, 2022
  
  eec93bda
27 10月, 2022 2 次提交
- Z
  [cherry-pick] add batch_norm_kernel (#47394) · b143e008
  由 zhangkaihuo 提交于 10月 27, 2022
```
* cherry-pick #46359 and resolve conflict
```
  b143e008
- W
  fix slice bug (#47349) (#47376) · 99cec1a6
  由 wanghuancoder 提交于 10月 27, 2022
```
修改一处Slice的bug
```
  99cec1a6
26 10月, 2022 3 次提交

Z
Fix inference performance problem caused by selecting cudnn kernel of softmax (#47338) (#47367) · 0369cd0f
由 zyfncg 提交于 10月 26, 2022
```
* fix inference perfermence problem caused by selecting cudnn kernel for softmax

* recover use_cudnn in opmaker of softmax
```
0369cd0f
Y
Added workaround for elementwise oneDNN kernel (#47080) (#47342) · 7c6550a6
由 yeliang2258 提交于 10月 26, 2022
```
* return proper state

* fix for dims

* fix
Co-authored-by: Njakpiase <jakpia21@gmail.com>
```
7c6550a6

[Cherry-pick][Release/2.4]Refine the memory usage of fused_attention and... · 9a6dd8f8

由 sneaxiy 提交于 10月 26, 2022

[Cherry-pick][Release/2.4]Refine the memory usage of fused_attention and fused_feedforward ops (#47235)

* fix fused_attention fused_feedforward

* fix ci

* fix ci

* fix ci PADDLE_GET_CONST

* fix ci ut

9a6dd8f8

24 10月, 2022 1 次提交

Support BF16 training for sharding (#46846) (#47246) · 5c85f1a7

由 Ghost Screaming 提交于 10月 24, 2022

* Fix bug of reduce_sum op. When input.numel() > INT32_MAX, its result
is wrong.

* support pure bfloat16

* support bf16 linear

* update PR to pass CI

* tiny fix where_grad_kernel.cu

* Support bfloat16 type for reducer and sharding.

* Fix some bug.

* Polish code.

* Polise code.

* Add bfloat16 datatype in fill_grad kernels.
Co-authored-by: Nsneaxiy <sneaxiy@126.com>
Co-authored-by: Nsneaxiy <sneaxiy@126.com>

5c85f1a7

21 10月, 2022 3 次提交

Z
[Paddle-TRT] Fix conv2d (#47034) · d42a1dc3
由 zhoutianzi666 提交于 10月 21, 2022
```
* forbid Conv2D into Paddle-TensoRT when weight is not persistable.
```
d42a1dc3
J
Add infer prune function (#47047) · 8739497c
由 JingZhuangzhuang 提交于 10月 21, 2022
```
* Add infer prune function

* add fusion op
```
8739497c

Add paddle audio dataset & backend (#45939) (#47230) · 29c9f027

由 YangZhou 提交于 10月 21, 2022

* add audio feature dataset

* fix coding style

* fix coding style2

* rm librosa

* rm voxceleb

* rm librosa in test

* add scipy fftpack

* add functional

* fix setup

* fix setup2

* rm colorlog

* refactor dataset __init__.py

* fix converage

* fix librosa import error

* fix windows test

* fix windows ci

* rm datasets

* fix setup

* remove testdata

* add librosa in requirement

* add librosa in requirement2

* change librosa to 0.8.1

* update ci docker

* fix ci error

* fix ci error2

* fix ci coverage

* fix converage

* fix coverage

* rm audio_base in test, notest,test=coverage

* fix copyright

* rm backend

* add datast in __init__

* rm compliance&&add function test

* fix setup

* fix windows

* fix windows2

* fix test timeout

* add backend & datasets

* fix bugs

* fix ci time issue

* add dataset test

* rm test_audio_feature

* avoid windows isssue, tmp

* note windows isssue

* skip windows issue

* refactor dataset test

* add dataset.py

* fix dtype in layers.mfcc

* fix ci-static-check

* fix dtype in layers.mfcc && fix ci-static-check

* add relative accuracy

* modity API.spec

* skip cuda11.2 test

* skip cuda11.2 test2

* skip cuda11.2

* change dataset name

* fix format

* update api.spec

* update api.spec2

* fix coverage

* add dataset test

* rm download load dict

* rm download load dict in init

* update api.spec3

* fix dataset coverage

* fix coverage

* fix coverage2

* restore api.spec

* restore api.spec2

* fix api-spec 3

* fix api-spec 4

* fix api.spec

* fix api.spec6

* refactor init_backend

* fix typo

* change paddleaudio backend set

* fix get_current_audio_backend()

* fix format

* fix format2

* remove format in parameters

* fix format2

* add warning massage in wave_backend && remove redundant audio util

* rm audio util in print_signatures

* fix format3

* add tess dataset license

* format warning

* add more info in warning msg

* add paddleaudio version check

* replace dataset esc50 with tess

* add tess dataset && rm numpy transform in dataset.py

* fix set audio backend bug

* fix equal error

* fix format && coverage error

* add api example

* fix format

* fix error

* fix typo

* add noqa in __init__

* fix backend doc example error

* rm seed in dataset

* update bakcend example

* fix typo

* fix typo

* fix example err

* fix typo

* fix ci dataset test

* fix example fil

* try to fix ci

* clean dataset doc

* change get_current_audio_backend to get_current_backend

* creplace paddle.audio.backends.info with paddle.audio.info, same with load, save

* fix ci error

* repalce api in test_audio_backend

* fix save&&set_backend exmaple

29c9f027

20 10月, 2022 8 次提交

[Cherry-pick] Simplify conv codes and fix cache and autotune bugs. (#47197) · c0ed8729

由 Yiqun Liu 提交于 10月 20, 2022

* Simplify the codes of conv. (#45966)

* Enable to record whether the conv algo is got by exhaustive search to fix autotune cache bug. (#47065)

c0ed8729

K
[cherry pick] Add FusedMultiTransformer fuse pass for GPT3 (#47150) · 396427a7
由 Kaipeng Deng 提交于 10月 20, 2022
```
* add fused_attention_pass. test=develop

* support fp16. test=develop

* fix format. test=develop
```
396427a7

[cherry-pick] Fix quantize model deploy bug in MKLDNN (#47119) · c2d344dd

由 yeliang2258 提交于 10月 20, 2022

* Fix quantize model deploy bugs when using MKLDNN (#45920)

* fix immutable op quantize bugs

* fix

* fix build bug

* fix test

* notest,test=inference

* fix ppyoloe acc drop bugs

* fix test

* fix test

* add test

* fix

* fix

* fix test

* fix refined name bug

* fix test

* bias fix

* fix matmul weight dequant bug

* re-ci

* fix tester

* fix test

* fix tester

* update weight dequantize func

* update code

* update test for converage

* update test

* update cmake

* update cmakelist

* update code

* rerun ci

* remove useless code

* re-ci

* update code

* update code

* fix header

* update code for log

c2d344dd

Z
[Paddle-TRT][Cherry-Pick]Rewrite strided_slice converter using shape tensor (#47153) · 68c4ac31
由 zhoutianzi666 提交于 10月 20, 2022
```
* stride_to_24

* fix CI failing
```
68c4ac31
W
[Cherry-pick] layernorm shift partation enhance (#47086) · 9ed1454a
由 Wang Bojun 提交于 10月 20, 2022
```
* Enhance the layernorm shift partation fuse op when shift size > 0 (roll shifting)
* fix cherry-pick test
```
9ed1454a
J

add _get_phi_kernel_name interface (#47033) · 4c925242
由 JingZhuangzhuang 提交于 10月 20, 2022

4c925242
S
[Cherry-pick][Release/2.4] Fix some operators when the tensor.numel() > INT32_MAX (#47191) · c74bf018
由 sneaxiy 提交于 10月 20, 2022
```
Fix some operators when the tensor.numel() > INT32_MAX
```
c74bf018
S
[Cherry-pick][Release/2.4] support pure bfloat16 for more ops · da7d2f29
由 sneaxiy 提交于 10月 20, 2022
```
support pure bfloat16 for more ops
```
da7d2f29

19 10月, 2022 4 次提交

Add unsigned int8 scale propagation (#46378) (#47156) · 66dccd7d

由 yeliang2258 提交于 10月 19, 2022

* Add unsigned int8 propagation

* Add or modify unit tests

* Correct concat scale checking

* Apply review suggestions

* Corrections
Co-authored-by: Njoanna.wozna.intel <joanna.wozna@intel.com>

66dccd7d

Add enable_partial_send_recv switch in pipeline_configs (#46992) (#47083) · 1d015f12

由 Ghost Screaming 提交于 10月 19, 2022

* Fix bug of reduce_sum op. When input.numel() > INT32_MAX, its result
is wrong.

* Support allow_partial switch, which can be configure in
pipeline_configs. If sent tensor are not the same from
different hosts, they shouldn't been sent partially and
then concated as a whole tensor.

* Change name allow_partial to enable_partial_send_recv.

* Add global variable _enable_partial_send_recv

1d015f12

W
[Dy2St]Fix recurrent op eager deletion pass error in dy2st (#47105) (#47134) · 69515e90
由 WangZhen 提交于 10月 19, 2022
```
[CherryPick][Dy2St]Fix recurrent op eager deletion pass error in dy2st
```
69515e90
H
[ cherrypick] Construct exec and ctx only once in cond op to speed up (#47012) · fcb9c0b5
由 Hui Zhang 提交于 10月 19, 2022
```
Construct exec and ctx only once in cond op to speed up
```
fcb9c0b5

18 10月, 2022 6 次提交
- W
  
  reconstruct code for convert_fp16 (#46428) (#47087) · de6f15b6
  由 Wilber 提交于 10月 18, 2022
  
  de6f15b6
- W
  Add symbolic shape deduction function for unfold, scatter_nd_add, p_norm,... · 2cc8797e
  由 weishengying 提交于 10月 18, 2022
```
Add symbolic shape deduction function for unfold, scatter_nd_add, p_norm, grid_sampler, pad3d, etc (#46291) (#47003)
```
  2cc8797e
- [cherry-pick 2.4] add sparse api transpose/reshape/is_same_shape (#47076) · 5fef043d
  由 zhouweiwei2014 提交于 10月 18, 2022
```
新增sparse.is_same_shape、sparse.reshape、sparse.transpose 三个API
```
  5fef043d
- Z
  
  support shape tensor is the input of trt-subgraph (#47066) · 5a44c124
  由 zhoutianzi666 提交于 10月 18, 2022
  
  5a44c124
- H
  [cherry-pick] Fix perf issues of mp/pp/fuse in eager mode (#47071) · b84edd90
  由 Haohongxiang 提交于 10月 18, 2022
```
* [Dygraph] Fix performance of pp+mp by using send/recv_calc_stream instead of send/recv (#46116)

* [Dygraph] Fix Perf of FusedFeedForward and FusedAttention with AllReduce (#46780)

* update
```
  b84edd90
- W
  [Cherry pick] trt pool2d adaptive ifx (#47069) · 5f6b9f1b
  由 Wang Bojun 提交于 10月 18, 2022
```
* draft with debug print
* remove debug print
* bug fix for ci
```
  5f6b9f1b
17 10月, 2022 3 次提交

[Cherry-pick] Collective communication APIs (#46922) · 5fba2a98

由 Wen Sun 提交于 10月 17, 2022

* Support both use_calc_stream and sync_op in send recv APIs (#46023)

* Support both use_calc_stream and sync_op in allgather API (#46295)

* Support both use_calc_stream and sync_op in collective communication API (#46761)

* Move group and all reduce from collective to communication (#45848)

* Completes bfloat16 dtype for collective api in eager mode (#45844)

* Fix collective APIs cannot be recognized when building docs (#46962)
Co-authored-by: NLiYuRio <63526175+LiYuRio@users.noreply.github.com>

5fba2a98

Z
[cherry-pick]Sparse static graph (#46838) · 10225d22
由 zhangkaihuo 提交于 10月 17, 2022
```
cherry-pick : #46322, #46245
Sparse API 支持静态图
```
10225d22

[IPU] paddle-inference support custom-ops (#45235) (#46868) · bd89be12

由 Allen Guo 提交于 10月 17, 2022

* paddle-inference support custom-ops
Co-authored-by: NZhixin Yao <zhixiny@graphcore.ai>

* fix tolower
Co-authored-by: NZhixin Yao <zhixiny@graphcore.ai>
Co-authored-by: NZhixin Yao <zhixiny@graphcore.ai>

bd89be12

14 10月, 2022 5 次提交
- W
  
  cherry-pick 46942 (#47015) · 82db4993
  由 Wilber 提交于 10月 14, 2022
  
  82db4993
- X
  
  Add bmm convert (#47011) · 8f1ac7cf
  由 xiaoxiaohehe001 提交于 10月 14, 2022
  
  8f1ac7cf
- A
  [BUG]Fix expand_as_v2 bug while X and Y with different dtype (#46950) (#46999) · 4b472656
  由 Aurelius84 提交于 10月 14, 2022
```
* [BUG]Fix expand_as_v2 bug while X and Y with different dtype

* fix commit
```
  4b472656
- Z
  [cherry-pick 2.4][inference] fix reshape2 opteller (#46871) · 535d7574
  由 Zhang Jun 提交于 10月 14, 2022
```
* fix reshape2 opteller;
add elementwise min/max register for tensorrt
```
  535d7574
- Z
  
  [Paddle-TRT] support new quant format from slim (#46022) (#46979) · b8677c0d
  由 zhoutianzi666 提交于 10月 14, 2022
  
  b8677c0d
13 10月, 2022 3 次提交

Z

interpretercore thread not always spin (#46687) (#46952) · d90aaa6e
由 zhangbo9674 提交于 10月 13, 2022

d90aaa6e
傅
[Cherry-pick] Add fp16 dtype support for set_value op (#46906) · 100a0750
由傅剑寒提交于 10月 13, 2022
```
Fix set_value failure when source tensor is fp16 Dtype and destiny value is a number
(dev PR link:#46801)
```
100a0750

[cherry-pick] [PHI] transpose2_grad op migration (#46139) (#46873) · 0280c0b9

由 Sławomir Siwek 提交于 10月 13, 2022

* Revert pool+grad oneDNN kernel conversion (#45989)

* [PHI] transpose2_grad op migration (#46139)

* op migrated, Copy(OneDNNContext, ...) added

* mutable_data & op registration in fluid removed

* refactoring

* OneDNNGetDataType to uppercase

* missing cpu check added, handler moved to .h file

* name changed to transpose_grad

* Copy changed back to TensorCopy

* Resizing corrected, Copy(OneDNNContext) removed
Co-authored-by: NPiotr Paturej <48731682+piotrekobi@users.noreply.github.com>
Co-authored-by: NPaulina Gacek <paulina.gacek@intel.com>

0280c0b9

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功