提交 · 96e974a0342f0bac11332c7acc35ef956e0ccc3c · BaiXuePrincess / Paddle

29 12月, 2022 1 次提交
- [cherry-pick]fix bug of UT test_version, test=document_fix (#49401) · 96e974a0
  由 zhouweiwei2014 提交于 12月 29, 2022
  
  96e974a0
28 12月, 2022 1 次提交
- H
  [Cherry-pick] Fix CUDA11.8 Unittest Accuracy (#49374) · 8aa5be90
  由 Huihuang Zheng 提交于 12月 28, 2022
```
Fix CUDA11.8 Unittest Accuracy
```
  8aa5be90
21 12月, 2022 1 次提交
- Z
  
  cherry-pick #75b734 (#49201) · fb19648a
  由 zhangkaihuo 提交于 12月 21, 2022
  
  fb19648a
28 11月, 2022 1 次提交

Cherrypick NV fixes to release/2.4 (#48263) · 7a0b8625

由 zlsh80826 提交于 11月 28, 2022

* Reduce squeeze2_matmul_fuse_pass, flattent tests time (#47098)

* Add missing fp32 config and reduce the testing combination

* Reduce trt matmul pass test max examples

* Loose TRT fp16 tests tolerance (#47100)

* Loose TRT half test tolerance to 1e-3 (#47101)

* Loose TRT half test tolerance to 1e-3 (#47106)

* Update distributed_strategy.proto (#46531)

* Close popen pipe after used (#47053)

* Add launch_bounds (#47285)

* Fix TRT UT failures (#47488)

* Format cherry-picked commits

* CudnnNormConvolution is no longer supported on NVIDIA Hopper GPUs (#48203)

* Skip tests that use fused_ops on H100

* Add error message to FusedOps on H100
Co-authored-by: NShijie <505749828@qq.com>
Co-authored-by: NLeo Chen <39020268+leo0519@users.noreply.github.com>
Co-authored-by: NTian Zheng <tizheng@nvidia.com>

7a0b8625

04 11月, 2022 1 次提交

[CherryPick] Cherry pick #45916 #46031 #47299 (#47610) · 72e1eb6b

由 xiongkun 提交于 11月 04, 2022

* [ Dy2Static ] Fix bugs when select inputs meeting different shape or undefined-var (#45916)

* fix select_input with different shape errors:
1. select_input_with_buildin_type directly return non-undefinedvar branch when meeting undefined var
2. the output shape of select_input is inferred from inputs.

* reverse the logic in select_input

* [warning] added warning message in cond block when one branch returns variable and another returns None (#46031)

* [cherry-pick] Allow manaully set py_reader name in standalone executor (#45898) (#45931)

* Allow manaully set py_reader name in standalone executor

* [BugFix] while cond receives dict as input (#47299)

* fix bugs while cond receives dict as input

* add unittest

* change flatten -> _is_sequence_except_dict

* code format
Co-authored-by: Nfeifei-111 <wuzhanfei@baidu.com>

72e1eb6b

03 11月, 2022 2 次提交
- S
  
  FC/matmul(v2) + scale fuse pass (#47420) · 99c872fa
  由 Sławomir Siwek 提交于 11月 03, 2022
  
  99c872fa
- S
  support unbalanced data for pipeline (#47199) (#47569) · d4bf8b1a
  由 ShenLiang 提交于 11月 03, 2022
```
* add unbalanced data

* fix utest
```
  d4bf8b1a
31 10月, 2022 1 次提交

2.4/fix engine build (#47462) · 4b3589fb

由 zhaoyingli 提交于 10月 31, 2022

* update codestyle

* [AutoParallel] fix fp16 for subblock (#47189)

* [AutoParallel] fix fp16 for subblock

* fix engine

* fix comment

* [AutoParallel] fix engine _build and cost method (#47263)

* fix engine build method

* fix import

* update engine cost

* update raise error

* update cmakelist

* revert optimizer

* revert optimizer

* fix unittest

* fix unittest
Co-authored-by: Ncaozhou <caozhou@radi.ac.cn>
Co-authored-by: Ncaozhou <caozhou@radi.ac.cn>

4b3589fb

27 10月, 2022 2 次提交
- S
  [Cherry-pick Release/2.4] Fix multi_tensor adam and momentum bug when the... · 94240e2e
  由 sneaxiy 提交于 10月 27, 2022
```
[Cherry-pick Release/2.4] Fix multi_tensor adam and momentum bug when the parameter is list of dict (#47372)

* reformat file by black

* fix multi_tensor adam/momentum bug
```
  94240e2e
- Z
  [cherry-pick] add batch_norm_kernel (#47394) · b143e008
  由 zhangkaihuo 提交于 10月 27, 2022
```
* cherry-pick #46359 and resolve conflict
```
  b143e008
25 10月, 2022 1 次提交
- F
  [cherry-pick] add prior_box and box_coder for paddle.vision.ops (#46786) · d5c6386c
  由 Feng Ni 提交于 10月 25, 2022
```
* add prior_box and box_coder for paddle.vision.ops

* fix UT change assertTrue to assert_allclose

* fix formula format
```
  d5c6386c
20 10月, 2022 5 次提交

[cherry-pick 2.4] remove incubate of all paddle sparse api (#47183) · 50d4fa54
由 zhouweiwei2014 提交于 10月 20, 2022

50d4fa54
L
Add value check & error message for gather_tree (#47051) (#47221) · 6712e262
由 liu zhengxi 提交于 10月 20, 2022
```
Add value check & error message for gather_tree
cherry-pick #47051
```
6712e262

[cherry-pick] Fix quantize model deploy bug in MKLDNN (#47119) · c2d344dd

由 yeliang2258 提交于 10月 20, 2022

* Fix quantize model deploy bugs when using MKLDNN (#45920)

* fix immutable op quantize bugs

* fix

* fix build bug

* fix test

* notest,test=inference

* fix ppyoloe acc drop bugs

* fix test

* fix test

* add test

* fix

* fix

* fix test

* fix refined name bug

* fix test

* bias fix

* fix matmul weight dequant bug

* re-ci

* fix tester

* fix test

* fix tester

* update weight dequantize func

* update code

* update test for converage

* update test

* update cmake

* update cmakelist

* update code

* rerun ci

* remove useless code

* re-ci

* update code

* update code

* fix header

* update code for log

c2d344dd

Z
[Paddle-TRT][Cherry-Pick]Rewrite strided_slice converter using shape tensor (#47153) · 68c4ac31
由 zhoutianzi666 提交于 10月 20, 2022
```
* stride_to_24

* fix CI failing
```
68c4ac31
W
[Cherry-pick] layernorm shift partation enhance (#47086) · 9ed1454a
由 Wang Bojun 提交于 10月 20, 2022
```
* Enhance the layernorm shift partation fuse op when shift size > 0 (roll shifting)
* fix cherry-pick test
```
9ed1454a

19 10月, 2022 3 次提交

[Cherry-Pick][AutoParallel] auto_parallel cherry-pick to release2.4 (#47145) · 90b31790

由 zhaoyingli 提交于 10月 19, 2022

* [Auto Parallel] Make Engine class callable (#46416)

* [Auto Parallel] Imporve the user-defined fetches and logging

* [Auto Parallel] Make Engine class callable

* [Auto Parallel] Update the data loading of tuner

* Print IPS in auto parallel Engine (#46554)

* [AutoParallel] fix dist_split (#46505)

* [AutoParallel] fix dist_split

* add unittest

* update cmakelist

* [AutoParallel] fix sharding (#46572)

* [AutoParallel] fix process_mesh (#46583)

* [AutoParallel] fix reshard when train with eval (#46605)

* [AutoParallel] fix reshard when train with eval

* fix mppp

* [AutoParallel] fix amp when predict (#46637)

* [Auto Parallel]Update comp cost and completion for gpt auto search (#46387)

* update comp cost and completion for gpt auto search

* add unittest

* [Auto Parallel] Fix bugs caused by the inconsistent outputs of Engine API (#46633)

* [Auto Parallel] Unify the logger and outputs of Engine API

* [Auto Parallel] Fix the bugs of to_static

* [Auto Parallel] Adjust the test_to_static.py

* [Auto Parallel] Improve the fine-grained APIs (#46552)

* [Auto Parallel] Suppport different dataloaders

* [Auto Parallel] Add num_shards config for dataset

* [Auto Parallel] Unify the logger and outputs of Engine API

* [Auto Parallel] Fix the bugs of to_static

* [Auto Parallel] Adjust the test_to_static.py

* [Auto Parallel] Add the prepare API and replace __call__ with run

* [Auto Parallel] Improve the private implementations of Engine

* [Auto Parallel] Set capacity of dataloader for opt tuning

* [Auto Parallel] [WIP] Change the fine-grained API

* [Auto Parallel] Improve APIs to support different user cases

* [Auto Parallel] Add removed config

* [Auto Parallel] Add imports

* [Auto Parallel] Fix bugs for to_static

* [Auto Parallel] Remove unnecessary imports

* bugfix (#46921)

* [Auto Parallel] Fix the bug for None labels (#46987)

* [AutoParallel] adapt for gpt-gen (#46771)

* for gpt-gen

* fix reshard

* adapt assign and shape op

* add dist_assign & unittest

* add conditional block unittest

* rename unittest

* [Auto Parallel] Fix the bug of completion (#47056)

* [Auto Parallel] Fix the bug for None labels

* [Auto Parallel] Fix the completion bug

* [AutoParallel] add callbacks (#47014)

* [AutoParallel] add callbacks

* fix unittest

* fix dist_context

* fix engine

* fix cmakelist

* fix unittest's returns

* fix cmakelist

* [Auto Parallel] Add cost interface (#47043)

* add cost interface

* update inferface and add unittest

* update unittest

* update inferface

* [Auto Parallel]Add parallel tuner (#46189)

* add parallel tuner

* add unittest

* fix unittest

* set timeout of unittest

* set unittest timeout

* fix auto_mode setting

* update unittest

* sync from develop and update unittest

* remove unused import

* update unittest

* update cmakelist

* add unittests
Co-authored-by: NYulong Ao <aoyulong@baidu.com>
Co-authored-by: NRuibiao Chen <chenruibiao@baidu.com>
Co-authored-by: Ncaozhou <48191911+Caozhou1995@users.noreply.github.com>
Co-authored-by: NJZ-LIANG <jianzhongliang10@gmail.com>

90b31790

[CherryPick] Support TypeHint for function decorated by @to_static (#47147) · 247ef477

由 xiongkun 提交于 10月 19, 2022

* [Dy2Static] Support TypeHint for function decorated by @to_static (#47121)

* Add TypeHint Transformer

* add unittest for typehint transformer

* [Dy2Static] Remove GradTransformer (#47063)

* [Dy2Static] Remove GradTransformer
1. fix einsum infershape bugs.
2. remove grad_transformer and unify paddle.grad and paddle.static.gradient.
3. add dygraph_and_dy2static_only decorator for dy2static.

* fix bugs

* rename

247ef477

W
[Dy2St]Fix recurrent op eager deletion pass error in dy2st (#47105) (#47134) · 69515e90
由 WangZhen 提交于 10月 19, 2022
```
[CherryPick][Dy2St]Fix recurrent op eager deletion pass error in dy2st
```
69515e90

18 10月, 2022 3 次提交

W
Add symbolic shape deduction function for unfold, scatter_nd_add, p_norm,... · 2cc8797e
由 weishengying 提交于 10月 18, 2022
```
Add symbolic shape deduction function for unfold, scatter_nd_add, p_norm, grid_sampler, pad3d, etc (#46291) (#47003)
```
2cc8797e
[cherry-pick 2.4] add sparse api transpose/reshape/is_same_shape (#47076) · 5fef043d
由 zhouweiwei2014 提交于 10月 18, 2022
```
新增sparse.is_same_shape、sparse.reshape、sparse.transpose 三个API
```
5fef043d

Cherry pick for sharding (#47061) · 5b642140

由 Yuang Liu 提交于 10月 18, 2022

* [dygraph sharding] Overlap the reduce and the caculation for sharding stage 2. (#46495)

* [dygraph sharding stage 2] sharding broadcast overlap (#46656)

* Multi groups for broadcast of sharding stage 2 (#46894)

5b642140

17 10月, 2022 4 次提交

[Cherry-pick] Collective communication APIs (#46922) · 5fba2a98

由 Wen Sun 提交于 10月 17, 2022

* Support both use_calc_stream and sync_op in send recv APIs (#46023)

* Support both use_calc_stream and sync_op in allgather API (#46295)

* Support both use_calc_stream and sync_op in collective communication API (#46761)

* Move group and all reduce from collective to communication (#45848)

* Completes bfloat16 dtype for collective api in eager mode (#45844)

* Fix collective APIs cannot be recognized when building docs (#46962)
Co-authored-by: NLiYuRio <63526175+LiYuRio@users.noreply.github.com>

5fba2a98

Z
[cherry-pick]Sparse static graph (#46838) · 10225d22
由 zhangkaihuo 提交于 10月 17, 2022
```
cherry-pick : #46322, #46245
Sparse API 支持静态图
```
10225d22
A

fix ut timeout 2 (#45233) (#46867) · d913bc98
由 Allen Guo 提交于 10月 17, 2022

d913bc98
A

rm fp16 dtype_check (#46739) (#46866) · a1cdbad1
由 Allen Guo 提交于 10月 17, 2022

a1cdbad1

14 10月, 2022 6 次提交
- W
  
  cherry-pick 46942 (#47015) · 82db4993
  由 Wilber 提交于 10月 14, 2022
  
  82db4993
- G
  
  update quantization new format (#46529) · 84333cf5
  由 Guanghua Yu 提交于 10月 14, 2022
  
  84333cf5
- X
  
  Add bmm convert (#47011) · 8f1ac7cf
  由 xiaoxiaohehe001 提交于 10月 14, 2022
  
  8f1ac7cf
- A
  [BUG]Fix expand_as_v2 bug while X and Y with different dtype (#46950) (#46999) · 4b472656
  由 Aurelius84 提交于 10月 14, 2022
```
* [BUG]Fix expand_as_v2 bug while X and Y with different dtype

* fix commit
```
  4b472656
- Z
  [cherry-pick 2.4][inference] fix reshape2 opteller (#46871) · 535d7574
  由 Zhang Jun 提交于 10月 14, 2022
```
* fix reshape2 opteller;
add elementwise min/max register for tensorrt
```
  535d7574
- Z
  
  [Paddle-TRT] support new quant format from slim (#46022) (#46979) · b8677c0d
  由 zhoutianzi666 提交于 10月 14, 2022
  
  b8677c0d
13 10月, 2022 2 次提交

傅
[Cherry-pick] Add fp16 dtype support for set_value op (#46906) · 100a0750
由傅剑寒提交于 10月 13, 2022
```
Fix set_value failure when source tensor is fp16 Dtype and destiny value is a number
(dev PR link:#46801)
```
100a0750

[cherry-pick] [PHI] transpose2_grad op migration (#46139) (#46873) · 0280c0b9

由 Sławomir Siwek 提交于 10月 13, 2022

* Revert pool+grad oneDNN kernel conversion (#45989)

* [PHI] transpose2_grad op migration (#46139)

* op migrated, Copy(OneDNNContext, ...) added

* mutable_data & op registration in fluid removed

* refactoring

* OneDNNGetDataType to uppercase

* missing cpu check added, handler moved to .h file

* name changed to transpose_grad

* Copy changed back to TensorCopy

* Resizing corrected, Copy(OneDNNContext) removed
Co-authored-by: NPiotr Paturej <48731682+piotrekobi@users.noreply.github.com>
Co-authored-by: NPaulina Gacek <paulina.gacek@intel.com>

0280c0b9

12 10月, 2022 1 次提交
- N
  [Cherry-pick]Update layout autotune for module with no modified (#46541) (#46515) (#46880) · 61273c0e
  由 niuliling123 提交于 10月 12, 2022
```
Cherry-pick 46541
保证Reset50 TSM deeplabv3模型零修改下实现Layout自动调优
```
  61273c0e
11 10月, 2022 1 次提交
- S
  
  hard_swish grad (#46857) · 2c6bd4ad
  由 Sławomir Siwek 提交于 10月 11, 2022
  
  2c6bd4ad
10 10月, 2022 1 次提交
- F
  Fix gather op convert for Paddle-TensorRT (#46779) (#46825) · a0e03418
  由 feng_shuai 提交于 10月 10, 2022
```
* fix gather op convert to only support int32 index as input.
* add ut
```
  a0e03418
09 10月, 2022 1 次提交

[Dy2Static] refactor the return transformer (#45900) (#46205) · 4282af69

由 xiongkun 提交于 10月 09, 2022

* 1. refactor the return transformer.
2. fix some bugs in return transformer.

* support raise error while return stmt's father is For or while

* fix ci error.

* fix ci error and add some unittest

* code format

* fix ci error

4282af69

29 9月, 2022 2 次提交
- 傅
  [cherry-pick] Add FP16 support for uniform in dygraph mode on Nvidia GPU (#46641) · a58663f3
  由傅剑寒提交于 9月 29, 2022
```
Add FP16 support for uniform in dygraph mode on Nvidia GPU
Dev PR link PR46212
```
  a58663f3
- W
  
  Fix the half precision problem of general plugin (#46580) · d90db9bd
  由 weishengying 提交于 9月 29, 2022
  
  d90db9bd

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致