提交 · 559b9754b7e401dbf679715b4352b0baaef7d14f · PaddlePaddle / Paddle

03 11月, 2022 3 次提交
- Y
  Fix ComputePropagateScalesMkldnnPass of MKLDNN (#47574) (#47639) · 559b9754
  由 yeliang2258 提交于 11月 03, 2022
```
* add constant_folding_pass pass for mkldnn int8

* update UpdateScaleOpInOutScales
```
  559b9754
- Z
  [Sparse] Unified api args name (#47529) (#47627) · 75088bbf
  由 zhangkaihuo 提交于 11月 03, 2022
```
Unified api args name
```
  75088bbf
- K
  [cherry pick] fix memory copy in prepare_data of FusedMultiTransformer pass (#47308) · ba4fbe71
  由 Kaipeng Deng 提交于 11月 03, 2022
```
* fix memory copy in prepare_data. test=develop

* add cache_kv fp16 support. test=develop

* fit for simplify_with_basic_ops_pass. test=develop
```
  ba4fbe71
02 11月, 2022 1 次提交
- S
  
  [geometric] Optimize graph sample speed (#47531) (#47548) · 7a1cf277
  由 Siming Dai 提交于 11月 02, 2022
  
  7a1cf277
01 11月, 2022 2 次提交

[cherry-pick][code-gen] Support code-gen for opmaker of sparse op (#46993) (#47417) · 601626ac

由 zyfncg 提交于 11月 01, 2022

* support generating code of opmaker for backward op invoke forward op (#46912)

* [code-gen] Support code-gen for opmaker of sparse op (#46993)

* support generating code of opmaker for backward op invoke forward op

* gsupport code-gen of opmaker for sparse op

* refind logic of choose phi kernrel

* fix complie budg

* fix code_gen bug

* fix bug

* fix kernel signature code-gen

* fix complie bug of VarType

* fix complie bug of VarType

* fix test_sparse_conv_op

* fix test_sparse_norm_op

* [Phi] Refactor logic of judging whether having a phi kernrel (#46920)

* refind logic of choose phi kernrel

* fix complie budg

* update cmake

601626ac

Y

fix p2p comm memory release logic (#47497) (#47517) · 0201ccc4
由 Yuang Liu 提交于 11月 01, 2022

0201ccc4

29 10月, 2022 1 次提交
- A
  [JITLayer]Enable OneDNN on CPU and Fix zero shape (#47428) (#47436) · f4788442
  由 Aurelius84 提交于 10月 29, 2022
```
* [JITLayer]Enable OneDNN on CPU and Fix zero shape
```
  f4788442
28 10月, 2022 4 次提交
- W
  [Dy2St]Fix abnormal growth of memory in train mode and no_grad for Dy2St (#47398) (#47414) · 7618cbdc
  由 WangZhen 提交于 10月 28, 2022
```
* [Dy2St]Fix abnormal growth of memory in train mode and no_grad for Dy2St 
```
  7618cbdc
- A
  [Cherry-pick][JIT] Add Predictor for JITLayer (#47379) (#47419) · c42929c5
  由 Aurelius84 提交于 10月 28, 2022
```
* [JIT] Add Predictor for JITLayer (#47379)

* add predictor_engine

* add predictor_engine

* fix zero shape

* fix lodTensor

* fix unittest

* fix code style

* update CmakeList

* fix new executor
```
  c42929c5
- Z
  [cherry-pick]add sync_batch_norm_bn and deliver indices_dict (#47407) · 0fa8309a
  由 zhangkaihuo 提交于 10月 28, 2022
```
add sync_batch_norm_bn and deliver indices_dict 
```
  0fa8309a
- Z
  
  support multiclass_nms in int8 (#47337) · eec93bda
  由 zhoutianzi666 提交于 10月 28, 2022
  
  eec93bda
27 10月, 2022 2 次提交
- Z
  [cherry-pick] add batch_norm_kernel (#47394) · b143e008
  由 zhangkaihuo 提交于 10月 27, 2022
```
* cherry-pick #46359 and resolve conflict
```
  b143e008
- W
  fix slice bug (#47349) (#47376) · 99cec1a6
  由 wanghuancoder 提交于 10月 27, 2022
```
修改一处Slice的bug
```
  99cec1a6
26 10月, 2022 3 次提交

Z
Fix inference performance problem caused by selecting cudnn kernel of softmax (#47338) (#47367) · 0369cd0f
由 zyfncg 提交于 10月 26, 2022
```
* fix inference perfermence problem caused by selecting cudnn kernel for softmax

* recover use_cudnn in opmaker of softmax
```
0369cd0f
Y
Added workaround for elementwise oneDNN kernel (#47080) (#47342) · 7c6550a6
由 yeliang2258 提交于 10月 26, 2022
```
* return proper state

* fix for dims

* fix
Co-authored-by: Njakpiase <jakpia21@gmail.com>
```
7c6550a6

[Cherry-pick][Release/2.4]Refine the memory usage of fused_attention and... · 9a6dd8f8

由 sneaxiy 提交于 10月 26, 2022

[Cherry-pick][Release/2.4]Refine the memory usage of fused_attention and fused_feedforward ops (#47235)

* fix fused_attention fused_feedforward

* fix ci

* fix ci

* fix ci PADDLE_GET_CONST

* fix ci ut

9a6dd8f8

25 10月, 2022 1 次提交

[Sparse] Fix indices (#47190) (#47226) · 942ab42f

由 zhangkaihuo 提交于 10月 25, 2022

当前无法从Tensor中获取到SparseTensor的sparse_dim，无法准确推断出indices的shape，所以目前先以3D点云模型为主，输入的SparseTensor的维度是5D的，其中非零元素是一维向量，所以indices是[4, -1]。

942ab42f

24 10月, 2022 1 次提交

Support BF16 training for sharding (#46846) (#47246) · 5c85f1a7

由 Ghost Screaming 提交于 10月 24, 2022

* Fix bug of reduce_sum op. When input.numel() > INT32_MAX, its result
is wrong.

* support pure bfloat16

* support bf16 linear

* update PR to pass CI

* tiny fix where_grad_kernel.cu

* Support bfloat16 type for reducer and sharding.

* Fix some bug.

* Polish code.

* Polise code.

* Add bfloat16 datatype in fill_grad kernels.
Co-authored-by: Nsneaxiy <sneaxiy@126.com>
Co-authored-by: Nsneaxiy <sneaxiy@126.com>

5c85f1a7

21 10月, 2022 3 次提交

Z
[Paddle-TRT] Fix conv2d (#47034) · d42a1dc3
由 zhoutianzi666 提交于 10月 21, 2022
```
* forbid Conv2D into Paddle-TensoRT when weight is not persistable.
```
d42a1dc3
J
Add infer prune function (#47047) · 8739497c
由 JingZhuangzhuang 提交于 10月 21, 2022
```
* Add infer prune function

* add fusion op
```
8739497c

Add paddle audio dataset & backend (#45939) (#47230) · 29c9f027

由 YangZhou 提交于 10月 21, 2022

* add audio feature dataset

* fix coding style

* fix coding style2

* rm librosa

* rm voxceleb

* rm librosa in test

* add scipy fftpack

* add functional

* fix setup

* fix setup2

* rm colorlog

* refactor dataset __init__.py

* fix converage

* fix librosa import error

* fix windows test

* fix windows ci

* rm datasets

* fix setup

* remove testdata

* add librosa in requirement

* add librosa in requirement2

* change librosa to 0.8.1

* update ci docker

* fix ci error

* fix ci error2

* fix ci coverage

* fix converage

* fix coverage

* rm audio_base in test, notest,test=coverage

* fix copyright

* rm backend

* add datast in __init__

* rm compliance&&add function test

* fix setup

* fix windows

* fix windows2

* fix test timeout

* add backend & datasets

* fix bugs

* fix ci time issue

* add dataset test

* rm test_audio_feature

* avoid windows isssue, tmp

* note windows isssue

* skip windows issue

* refactor dataset test

* add dataset.py

* fix dtype in layers.mfcc

* fix ci-static-check

* fix dtype in layers.mfcc && fix ci-static-check

* add relative accuracy

* modity API.spec

* skip cuda11.2 test

* skip cuda11.2 test2

* skip cuda11.2

* change dataset name

* fix format

* update api.spec

* update api.spec2

* fix coverage

* add dataset test

* rm download load dict

* rm download load dict in init

* update api.spec3

* fix dataset coverage

* fix coverage

* fix coverage2

* restore api.spec

* restore api.spec2

* fix api-spec 3

* fix api-spec 4

* fix api.spec

* fix api.spec6

* refactor init_backend

* fix typo

* change paddleaudio backend set

* fix get_current_audio_backend()

* fix format

* fix format2

* remove format in parameters

* fix format2

* add warning massage in wave_backend && remove redundant audio util

* rm audio util in print_signatures

* fix format3

* add tess dataset license

* format warning

* add more info in warning msg

* add paddleaudio version check

* replace dataset esc50 with tess

* add tess dataset && rm numpy transform in dataset.py

* fix set audio backend bug

* fix equal error

* fix format && coverage error

* add api example

* fix format

* fix error

* fix typo

* add noqa in __init__

* fix backend doc example error

* rm seed in dataset

* update bakcend example

* fix typo

* fix typo

* fix example err

* fix typo

* fix ci dataset test

* fix example fil

* try to fix ci

* clean dataset doc

* change get_current_audio_backend to get_current_backend

* creplace paddle.audio.backends.info with paddle.audio.info, same with load, save

* fix ci error

* repalce api in test_audio_backend

* fix save&&set_backend exmaple

29c9f027

20 10月, 2022 9 次提交

[Cherry-pick] Simplify conv codes and fix cache and autotune bugs. (#47197) · c0ed8729

由 Yiqun Liu 提交于 10月 20, 2022

* Simplify the codes of conv. (#45966)

* Enable to record whether the conv algo is got by exhaustive search to fix autotune cache bug. (#47065)

c0ed8729

K
[cherry pick] Add FusedMultiTransformer fuse pass for GPT3 (#47150) · 396427a7
由 Kaipeng Deng 提交于 10月 20, 2022
```
* add fused_attention_pass. test=develop

* support fp16. test=develop

* fix format. test=develop
```
396427a7
L
Add value check & error message for gather_tree (#47051) (#47221) · 6712e262
由 liu zhengxi 提交于 10月 20, 2022
```
Add value check & error message for gather_tree
cherry-pick #47051
```
6712e262

[cherry-pick] Fix quantize model deploy bug in MKLDNN (#47119) · c2d344dd

由 yeliang2258 提交于 10月 20, 2022

* Fix quantize model deploy bugs when using MKLDNN (#45920)

* fix immutable op quantize bugs

* fix

* fix build bug

* fix test

* notest,test=inference

* fix ppyoloe acc drop bugs

* fix test

* fix test

* add test

* fix

* fix

* fix test

* fix refined name bug

* fix test

* bias fix

* fix matmul weight dequant bug

* re-ci

* fix tester

* fix test

* fix tester

* update weight dequantize func

* update code

* update test for converage

* update test

* update cmake

* update cmakelist

* update code

* rerun ci

* remove useless code

* re-ci

* update code

* update code

* fix header

* update code for log

c2d344dd

Z
[Paddle-TRT][Cherry-Pick]Rewrite strided_slice converter using shape tensor (#47153) · 68c4ac31
由 zhoutianzi666 提交于 10月 20, 2022
```
* stride_to_24

* fix CI failing
```
68c4ac31
W
[Cherry-pick] layernorm shift partation enhance (#47086) · 9ed1454a
由 Wang Bojun 提交于 10月 20, 2022
```
* Enhance the layernorm shift partation fuse op when shift size > 0 (roll shifting)
* fix cherry-pick test
```
9ed1454a
J

add _get_phi_kernel_name interface (#47033) · 4c925242
由 JingZhuangzhuang 提交于 10月 20, 2022

4c925242
S
[Cherry-pick][Release/2.4] Fix some operators when the tensor.numel() > INT32_MAX (#47191) · c74bf018
由 sneaxiy 提交于 10月 20, 2022
```
Fix some operators when the tensor.numel() > INT32_MAX
```
c74bf018
S
[Cherry-pick][Release/2.4] support pure bfloat16 for more ops · da7d2f29
由 sneaxiy 提交于 10月 20, 2022
```
support pure bfloat16 for more ops
```
da7d2f29

19 10月, 2022 6 次提交

Z
[cherry-pick] strided_slice grad add fp16 support (#47159) · 23f2a4ea
由 Zhang Ting 提交于 10月 19, 2022
```
* strided_slice grad add fp16 support
```
23f2a4ea

Add unsigned int8 scale propagation (#46378) (#47156) · 66dccd7d

由 yeliang2258 提交于 10月 19, 2022

* Add unsigned int8 propagation

* Add or modify unit tests

* Correct concat scale checking

* Apply review suggestions

* Corrections
Co-authored-by: Njoanna.wozna.intel <joanna.wozna@intel.com>

66dccd7d

[CherryPick] Support TypeHint for function decorated by @to_static (#47147) · 247ef477

由 xiongkun 提交于 10月 19, 2022

* [Dy2Static] Support TypeHint for function decorated by @to_static (#47121)

* Add TypeHint Transformer

* add unittest for typehint transformer

* [Dy2Static] Remove GradTransformer (#47063)

* [Dy2Static] Remove GradTransformer
1. fix einsum infershape bugs.
2. remove grad_transformer and unify paddle.grad and paddle.static.gradient.
3. add dygraph_and_dy2static_only decorator for dy2static.

* fix bugs

* rename

247ef477

Add enable_partial_send_recv switch in pipeline_configs (#46992) (#47083) · 1d015f12

由 Ghost Screaming 提交于 10月 19, 2022

* Fix bug of reduce_sum op. When input.numel() > INT32_MAX, its result
is wrong.

* Support allow_partial switch, which can be configure in
pipeline_configs. If sent tensor are not the same from
different hosts, they shouldn't been sent partially and
then concated as a whole tensor.

* Change name allow_partial to enable_partial_send_recv.

* Add global variable _enable_partial_send_recv

1d015f12

W
[Dy2St]Fix recurrent op eager deletion pass error in dy2st (#47105) (#47134) · 69515e90
由 WangZhen 提交于 10月 19, 2022
```
[CherryPick][Dy2St]Fix recurrent op eager deletion pass error in dy2st
```
69515e90
H
[ cherrypick] Construct exec and ctx only once in cond op to speed up (#47012) · fcb9c0b5
由 Hui Zhang 提交于 10月 19, 2022
```
Construct exec and ctx only once in cond op to speed up
```
fcb9c0b5

18 10月, 2022 4 次提交
- W
  
  reconstruct code for convert_fp16 (#46428) (#47087) · de6f15b6
  由 Wilber 提交于 10月 18, 2022
  
  de6f15b6
- W
  Add symbolic shape deduction function for unfold, scatter_nd_add, p_norm,... · 2cc8797e
  由 weishengying 提交于 10月 18, 2022
```
Add symbolic shape deduction function for unfold, scatter_nd_add, p_norm, grid_sampler, pad3d, etc (#46291) (#47003)
```
  2cc8797e
- [cherry-pick 2.4] add sparse api transpose/reshape/is_same_shape (#47076) · 5fef043d
  由 zhouweiwei2014 提交于 10月 18, 2022
```
新增sparse.is_same_shape、sparse.reshape、sparse.transpose 三个API
```
  5fef043d
- Z
  
  support shape tensor is the input of trt-subgraph (#47066) · 5a44c124
  由 zhoutianzi666 提交于 10月 18, 2022
  
  5a44c124

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功