提交 · 0369cd0f0a655a98a7cd6e18a062e8c8fb6fc965 · BaiXuePrincess / Paddle

26 10月, 2022 6 次提交
- Z
  Fix inference performance problem caused by selecting cudnn kernel of softmax (#47338) (#47367) · 0369cd0f
  由 zyfncg 提交于 2年前
```
* fix inference perfermence problem caused by selecting cudnn kernel for softmax

* recover use_cudnn in opmaker of softmax
```
  0369cd0f
- R
  
  fix a bug that print log twice (#47336) (#47343) · a16ef9f1
  由 Roc 提交于 2年前
  
  a16ef9f1
- Y
  [Cherry-pick][audio] fix tess split fold (#47350) · 85094bce
  由 YangZhou 提交于 2年前
```
* fix tess split fold

* format
```
  85094bce
- A
  [Cherry-Pick][Dy2Stat]Fix module loading OSError in multiprocess (#47302) · 12e6dfcf
  由 Aurelius84 提交于 2年前
```
[Dy2Stat]Fix module loading OSError in multiprocess
```
  12e6dfcf
- Y
  Added workaround for elementwise oneDNN kernel (#47080) (#47342) · 7c6550a6
  由 yeliang2258 提交于 2年前
```
* return proper state

* fix for dims

* fix
Co-authored-by: Njakpiase <jakpia21@gmail.com>
```
  7c6550a6
- S
  [Cherry-pick][Release/2.4]Refine the memory usage of fused_attention and... · 9a6dd8f8
  由 sneaxiy 提交于 2年前
```
[Cherry-pick][Release/2.4]Refine the memory usage of fused_attention and fused_feedforward ops (#47235)

* fix fused_attention fused_feedforward

* fix ci

* fix ci

* fix ci PADDLE_GET_CONST

* fix ci ut
```
  9a6dd8f8
25 10月, 2022 3 次提交

[Sparse] Fix indices (#47190) (#47226) · 942ab42f

由 zhangkaihuo 提交于 2年前

当前无法从Tensor中获取到SparseTensor的sparse_dim，无法准确推断出indices的shape，所以目前先以3D点云模型为主，输入的SparseTensor的维度是5D的，其中非零元素是一维向量，所以indices是[4, -1]。

942ab42f

S
[geometric] fix english doc (#46485) (#47317) · 99d8ba47
由 Siming Dai 提交于 2年前
```
* fix geometric doc
```
99d8ba47

[cherry-pick] add prior_box and box_coder for paddle.vision.ops (#46786) · d5c6386c

由 Feng Ni 提交于 2年前

* add prior_box and box_coder for paddle.vision.ops

* fix UT change assertTrue to assert_allclose

* fix formula format

d5c6386c

24 10月, 2022 6 次提交
- N
  [CodeStyle] add black config to release2.4 (#47146) · 6454133f
  由 Nyakku Shigure 提交于 2年前
```
* [CodeStyle] add black config to release2.4

* empty commit, test=document_fix
```
  6454133f
- P
  Fix hAPI bug of not compatible with LayerHook (#47001) (#47283) · e8d63399
  由 parap1uie-s 提交于 2年前
```
* Fix hAPI bug of not compatible with LayerHook
```
  e8d63399
- Z
  
  fix import in python3.6 (#47275) · caf27519
  由 zhaoyingli 提交于 2年前
  
  caf27519
- Y
  
  Fix virtualpp with mp/recompute bugs (#47242) (#47249) · 9780eb72
  由 Yuang Liu 提交于 2年前
  
  9780eb72
- G
  Support BF16 training for sharding (#46846) (#47246) · 5c85f1a7
  由 Ghost Screaming 提交于 2年前
```
* Fix bug of reduce_sum op. When input.numel() > INT32_MAX, its result
is wrong.

* support pure bfloat16

* support bf16 linear

* update PR to pass CI

* tiny fix where_grad_kernel.cu

* Support bfloat16 type for reducer and sharding.

* Fix some bug.

* Polish code.

* Polise code.

* Add bfloat16 datatype in fill_grad kernels.
Co-authored-by: Nsneaxiy <sneaxiy@126.com>
Co-authored-by: Nsneaxiy <sneaxiy@126.com>
```
  5c85f1a7
- R
  
  fix send for old dygraph mode by passing use_calc_stream to the send op (#47110) (#47201) · 82f1e1b7
  由 Roc 提交于 2年前
  
  82f1e1b7
21 10月, 2022 6 次提交

Z
[Paddle-TRT] Fix conv2d (#47034) · d42a1dc3
由 zhoutianzi666 提交于 2年前
```
* forbid Conv2D into Paddle-TensoRT when weight is not persistable.
```
d42a1dc3
H

support qat in sharding stage2 (#47169) (#47240) · 281891c5
由 Haohongxiang 提交于 2年前

281891c5

[CustomDevice] turn on WITH_CUSTOM_DEVICE when WITH_PYTHON=ON (#47165) · d1fedc54

由 ronnywang 提交于 2年前

cherry pick #47108

原 WITH_CUSTOM_DEVICE 默认打开/关闭的策略随 ON_INFER开关，由于训练和预测共同发包，现在训练包编译时会打开 ON_INFER，导致 WITH_CUSTOM_DEVICE 默认关闭，custom device 功能不可用

WITH_CUSTOM_DEVICE 默认打开/关闭的策略更改为随 WITH_PYTHON 开关

d1fedc54

V
add pdsa-2022-001, test=document_fix (#47228) · 001c8a6a
由 Vigi Zhang 提交于 2年前
```
Add PDSA-2022-001 security advisory
```
001c8a6a
J
Add infer prune function (#47047) · 8739497c
由 JingZhuangzhuang 提交于 2年前
```
* Add infer prune function

* add fusion op
```
8739497c

Add paddle audio dataset & backend (#45939) (#47230) · 29c9f027

由 YangZhou 提交于 2年前

* add audio feature dataset

* fix coding style

* fix coding style2

* rm librosa

* rm voxceleb

* rm librosa in test

* add scipy fftpack

* add functional

* fix setup

* fix setup2

* rm colorlog

* refactor dataset __init__.py

* fix converage

* fix librosa import error

* fix windows test

* fix windows ci

* rm datasets

* fix setup

* remove testdata

* add librosa in requirement

* add librosa in requirement2

* change librosa to 0.8.1

* update ci docker

* fix ci error

* fix ci error2

* fix ci coverage

* fix converage

* fix coverage

* rm audio_base in test, notest,test=coverage

* fix copyright

* rm backend

* add datast in __init__

* rm compliance&&add function test

* fix setup

* fix windows

* fix windows2

* fix test timeout

* add backend & datasets

* fix bugs

* fix ci time issue

* add dataset test

* rm test_audio_feature

* avoid windows isssue, tmp

* note windows isssue

* skip windows issue

* refactor dataset test

* add dataset.py

* fix dtype in layers.mfcc

* fix ci-static-check

* fix dtype in layers.mfcc && fix ci-static-check

* add relative accuracy

* modity API.spec

* skip cuda11.2 test

* skip cuda11.2 test2

* skip cuda11.2

* change dataset name

* fix format

* update api.spec

* update api.spec2

* fix coverage

* add dataset test

* rm download load dict

* rm download load dict in init

* update api.spec3

* fix dataset coverage

* fix coverage

* fix coverage2

* restore api.spec

* restore api.spec2

* fix api-spec 3

* fix api-spec 4

* fix api.spec

* fix api.spec6

* refactor init_backend

* fix typo

* change paddleaudio backend set

* fix get_current_audio_backend()

* fix format

* fix format2

* remove format in parameters

* fix format2

* add warning massage in wave_backend && remove redundant audio util

* rm audio util in print_signatures

* fix format3

* add tess dataset license

* format warning

* add more info in warning msg

* add paddleaudio version check

* replace dataset esc50 with tess

* add tess dataset && rm numpy transform in dataset.py

* fix set audio backend bug

* fix equal error

* fix format && coverage error

* add api example

* fix format

* fix error

* fix typo

* add noqa in __init__

* fix backend doc example error

* rm seed in dataset

* update bakcend example

* fix typo

* fix typo

* fix example err

* fix typo

* fix ci dataset test

* fix example fil

* try to fix ci

* clean dataset doc

* change get_current_audio_backend to get_current_backend

* creplace paddle.audio.backends.info with paddle.audio.info, same with load, save

* fix ci error

* repalce api in test_audio_backend

* fix save&&set_backend exmaple

29c9f027

20 10月, 2022 13 次提交
- Y
  [Cherry-pick] Simplify conv codes and fix cache and autotune bugs. (#47197) · c0ed8729
  由 Yiqun Liu 提交于 2年前
```
* Simplify the codes of conv. (#45966)

* Enable to record whether the conv algo is got by exhaustive search to fix autotune cache bug. (#47065)
```
  c0ed8729
- [cherry-pick 2.4] remove incubate of all paddle sparse api (#47183) · 50d4fa54
  由 zhouweiwei2014 提交于 2年前
  
  50d4fa54
- K
  [cherry pick] Add FusedMultiTransformer fuse pass for GPT3 (#47150) · 396427a7
  由 Kaipeng Deng 提交于 2年前
```
* add fused_attention_pass. test=develop

* support fp16. test=develop

* fix format. test=develop
```
  396427a7
- L
  Add value check & error message for gather_tree (#47051) (#47221) · 6712e262
  由 liu zhengxi 提交于 2年前
```
Add value check & error message for gather_tree
cherry-pick #47051
```
  6712e262
- G
  
  fix problem of persistable var saving in QAT (#47203) · 3d647b1c
  由 Guanghua Yu 提交于 2年前
  
  3d647b1c
- Y
  [cherry-pick] Fix quantize model deploy bug in MKLDNN (#47119) · c2d344dd
  由 yeliang2258 提交于 2年前
```
* Fix quantize model deploy bugs when using MKLDNN (#45920)

* fix immutable op quantize bugs

* fix

* fix build bug

* fix test

* notest,test=inference

* fix ppyoloe acc drop bugs

* fix test

* fix test

* add test

* fix

* fix

* fix test

* fix refined name bug

* fix test

* bias fix

* fix matmul weight dequant bug

* re-ci

* fix tester

* fix test

* fix tester

* update weight dequantize func

* update code

* update test for converage

* update test

* update cmake

* update cmakelist

* update code

* rerun ci

* remove useless code

* re-ci

* update code

* update code

* fix header

* update code for log
```
  c2d344dd
- Z
  [Paddle-TRT][Cherry-Pick]Rewrite strided_slice converter using shape tensor (#47153) · 68c4ac31
  由 zhoutianzi666 提交于 2年前
```
* stride_to_24

* fix CI failing
```
  68c4ac31
- J
  
  add get ops scripts (#47049) · 09b19233
  由 JingZhuangzhuang 提交于 2年前
  
  09b19233
- W
  [Cherry-pick] layernorm shift partation enhance (#47086) · 9ed1454a
  由 Wang Bojun 提交于 2年前
```
* Enhance the layernorm shift partation fuse op when shift size > 0 (roll shifting)
* fix cherry-pick test
```
  9ed1454a
- J
  
  add _get_phi_kernel_name interface (#47033) · 4c925242
  由 JingZhuangzhuang 提交于 2年前
  
  4c925242
- S
  [Cherry-pick][Release/2.4] Fix some operators when the tensor.numel() > INT32_MAX (#47191) · c74bf018
  由 sneaxiy 提交于 2年前
```
Fix some operators when the tensor.numel() > INT32_MAX
```
  c74bf018
- S
  [Cherry-pick][Release/2.4] support pure bfloat16 for more ops · da7d2f29
  由 sneaxiy 提交于 2年前
```
support pure bfloat16 for more ops
```
  da7d2f29
- W
  Fix cannot import `paddle.distributed` in python 3.6 on release/2.4 (#47141) · c894d91d
  由 Wen Sun 提交于 2年前
```
* fix: fix incorrect import

* fix: fix incorrect usage
```
  c894d91d
19 10月, 2022 6 次提交

[Cherry-Pick][AutoParallel] auto_parallel cherry-pick to release2.4 (#47145) · 90b31790

由 zhaoyingli 提交于 2年前

* [Auto Parallel] Make Engine class callable (#46416)

* [Auto Parallel] Imporve the user-defined fetches and logging

* [Auto Parallel] Make Engine class callable

* [Auto Parallel] Update the data loading of tuner

* Print IPS in auto parallel Engine (#46554)

* [AutoParallel] fix dist_split (#46505)

* [AutoParallel] fix dist_split

* add unittest

* update cmakelist

* [AutoParallel] fix sharding (#46572)

* [AutoParallel] fix process_mesh (#46583)

* [AutoParallel] fix reshard when train with eval (#46605)

* [AutoParallel] fix reshard when train with eval

* fix mppp

* [AutoParallel] fix amp when predict (#46637)

* [Auto Parallel]Update comp cost and completion for gpt auto search (#46387)

* update comp cost and completion for gpt auto search

* add unittest

* [Auto Parallel] Fix bugs caused by the inconsistent outputs of Engine API (#46633)

* [Auto Parallel] Unify the logger and outputs of Engine API

* [Auto Parallel] Fix the bugs of to_static

* [Auto Parallel] Adjust the test_to_static.py

* [Auto Parallel] Improve the fine-grained APIs (#46552)

* [Auto Parallel] Suppport different dataloaders

* [Auto Parallel] Add num_shards config for dataset

* [Auto Parallel] Unify the logger and outputs of Engine API

* [Auto Parallel] Fix the bugs of to_static

* [Auto Parallel] Adjust the test_to_static.py

* [Auto Parallel] Add the prepare API and replace __call__ with run

* [Auto Parallel] Improve the private implementations of Engine

* [Auto Parallel] Set capacity of dataloader for opt tuning

* [Auto Parallel] [WIP] Change the fine-grained API

* [Auto Parallel] Improve APIs to support different user cases

* [Auto Parallel] Add removed config

* [Auto Parallel] Add imports

* [Auto Parallel] Fix bugs for to_static

* [Auto Parallel] Remove unnecessary imports

* bugfix (#46921)

* [Auto Parallel] Fix the bug for None labels (#46987)

* [AutoParallel] adapt for gpt-gen (#46771)

* for gpt-gen

* fix reshard

* adapt assign and shape op

* add dist_assign & unittest

* add conditional block unittest

* rename unittest

* [Auto Parallel] Fix the bug of completion (#47056)

* [Auto Parallel] Fix the bug for None labels

* [Auto Parallel] Fix the completion bug

* [AutoParallel] add callbacks (#47014)

* [AutoParallel] add callbacks

* fix unittest

* fix dist_context

* fix engine

* fix cmakelist

* fix unittest's returns

* fix cmakelist

* [Auto Parallel] Add cost interface (#47043)

* add cost interface

* update inferface and add unittest

* update unittest

* update inferface

* [Auto Parallel]Add parallel tuner (#46189)

* add parallel tuner

* add unittest

* fix unittest

* set timeout of unittest

* set unittest timeout

* fix auto_mode setting

* update unittest

* sync from develop and update unittest

* remove unused import

* update unittest

* update cmakelist

* add unittests
Co-authored-by: NYulong Ao <aoyulong@baidu.com>
Co-authored-by: NRuibiao Chen <chenruibiao@baidu.com>
Co-authored-by: Ncaozhou <48191911+Caozhou1995@users.noreply.github.com>
Co-authored-by: NJZ-LIANG <jianzhongliang10@gmail.com>

90b31790

Z
[cherry-pick] strided_slice grad add fp16 support (#47159) · 23f2a4ea
由 Zhang Ting 提交于 2年前
```
* strided_slice grad add fp16 support
```
23f2a4ea

Add unsigned int8 scale propagation (#46378) (#47156) · 66dccd7d

由 yeliang2258 提交于 2年前

* Add unsigned int8 propagation

* Add or modify unit tests

* Correct concat scale checking

* Apply review suggestions

* Corrections
Co-authored-by: Njoanna.wozna.intel <joanna.wozna@intel.com>

66dccd7d

A
[Dy2Stat]Polish @to_static temporary file directory to speed up transformation (#47102) (#47144) · 5a9befea
由 Aurelius84 提交于 2年前
```
Polish @to_static temporary file directory to speed up transformation
```
5a9befea

[CherryPick] Support TypeHint for function decorated by @to_static (#47147) · 247ef477

由 xiongkun 提交于 2年前

* [Dy2Static] Support TypeHint for function decorated by @to_static (#47121)

* Add TypeHint Transformer

* add unittest for typehint transformer

* [Dy2Static] Remove GradTransformer (#47063)

* [Dy2Static] Remove GradTransformer
1. fix einsum infershape bugs.
2. remove grad_transformer and unify paddle.grad and paddle.static.gradient.
3. add dygraph_and_dy2static_only decorator for dy2static.

* fix bugs

* rename

247ef477

Add enable_partial_send_recv switch in pipeline_configs (#46992) (#47083) · 1d015f12

由 Ghost Screaming 提交于 2年前

* Fix bug of reduce_sum op. When input.numel() > INT32_MAX, its result
is wrong.

* Support allow_partial switch, which can be configure in
pipeline_configs. If sent tensor are not the same from
different hosts, they shouldn't been sent partially and
then concated as a whole tensor.

* Change name allow_partial to enable_partial_send_recv.

* Add global variable _enable_partial_send_recv

1d015f12

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致