提交 · d1e8b1e2ca84e4e458e47da9e5a95a96bcf5f330 · PaddlePaddle / Paddle

11 4月, 2023 1 次提交

Cherry pick for fix of operator precision. (#52705) · d1e8b1e2

由 Yiqun Liu 提交于 4月 11, 2023

* Fix scale kernel for low precision, cherry pick #50998.

* Fix the FP16 precision problem of add_n. (#50129)

* Change squared_l2_norm to reuse ReduceKernel, and register fp16 and bf16 kernel, which is cherry pick #48315.

* Cherry-pick the fix of MPTypeTrait in KP, which is implemented in #50993.

* Cherry-pick the multi-precision support of AdamW for bf16, #48041.

* Fix compiling error.

* Cherry-pick the fix of CubTensorReduceImpl for bfloat16 in #50993.

* Fix unittest.

---------
Co-authored-by: Nliuruyan <44316842+liuruyan@users.noreply.github.com>

d1e8b1e2

09 4月, 2023 1 次提交

Add bfloat16 support for several operators and apis. (#52696) · ba9a22db

由 Yiqun Liu 提交于 4月 09, 2023

* Cherry-pick the register of bfloat16 for amp_kernel, pull request #45541.

* Cherry-pick the master_grad support of adamw, pull request #51141.

* add bf16 for some ops in static mode (#51582)

* Add bfloat16 support for some api in static mode.

* Fix codestyle.

* Revert the change of layer_function_generator.py.

---------
Co-authored-by: Shaojie WANG <wsjmessi@163.com>

ba9a22db

20 3月, 2023 1 次提交
- L
  
  Cherry-pick fleet executor and auto parallel (#50071) · 92c2dcbd
  由 LiYuRio 提交于 3月 20, 2023
  
  92c2dcbd
09 3月, 2023 1 次提交
- J
  
  Extra Sync for Tensor Parallel (#50637) · 4bacf2ab
  由 JZ-LIANG 提交于 3月 09, 2023
  
  4bacf2ab
17 2月, 2023 1 次提交
- W
  
  Add rpc ops to fetch data from remote service (#50220) · 9025fddd
  由 Wen Sun 提交于 2月 17, 2023
  
  9025fddd
04 1月, 2023 1 次提交
- Y
  [Cherry-pick] add condition of skipif (#49407) · 7696ae02
  由 YUNSHEN XIE 提交于 1月 04, 2023
```
* resolve conflict

* fix format error
```
  7696ae02
03 1月, 2023 2 次提交
- X
  [Cherry pick] fix fold for big bs (#49491) · 2a438b0a
  由 xiaoting 提交于 1月 03, 2023
```
* fix fold for large bs

* fix fold for large bs

* fix pre-commit
```
  2a438b0a
- F
  cherry-pick:Some version of TensorRT don't support qkv_plugin (#49425) · d7855fe8
  由 feng_shuai 提交于 1月 03, 2023
```
* cherry-pick:Some version of TensorRT don't support qkv_plugin

* cherry-pick:support coverage CI
```
  d7855fe8
30 12月, 2022 1 次提交

[MLU] cherry-pick from develop to release/2.4 (#48313) · 6e154fc6

由 Chenxiao Niu 提交于 12月 30, 2022

* [MLU] fix compute error of dropout op (#45923)

* [MLU] add mergedAdam kernel. (#45965)

* [MLU] add int64 support for mlu one_hot_v2 (#46313)

* [MLU] fix profiler compile failure (#46208)

* [MLU] add barrier_op kernel. (#46417)

* [MLU] fluid: add mluop (#46429)

* [MLU] add huber_loss kernel. (#46455)

* [MLU] add mlu kernel for add_reduce_max_grad (#45651)
Co-authored-by: Nliupeiyu <liupeiyu@cambricon.com>

* [MLU] add_fluid_mluop_yolo_box (#46573)

* [MLU] fix phi::Tensor compile error of mlu. (#46649)

* [MLU] add fluid MLUOps prior_box (#46585)

* [MLU] fix cmake error (#46772)

* [MLU]fix unittest of sync_bn (#46797)

* [MLU] add masterparam support for mlu adamw. (#46804)

* [MLU] add int64 support for allgather. (#46830)

* [MLU] fix compile error & add mlu blacklist function. (#47439)

* [MLU] fix softmax_with_cross_entropy failed in 370-X8.

* [MLU] fix cncl stuck caused by multiple initializations.

* [MLU] fix code style check.
Co-authored-by: Nqipengh <huangqipeng@cambricon.com>
Co-authored-by: Ncifar10 <41565156+cifar10@users.noreply.github.com>
Co-authored-by: Lux et Veritas <1004239791@qq.com>
Co-authored-by: Nliupeiyu <liupeiyu@cambricon.com>
Co-authored-by: Nronnywang <ronny1996@163.com>

6e154fc6

29 12月, 2022 1 次提交
- [cherry-pick]fix bug of UT test_version, test=document_fix (#49401) · 96e974a0
  由 zhouweiwei2014 提交于 12月 29, 2022
  
  96e974a0
28 12月, 2022 1 次提交
- H
  [Cherry-pick] Fix CUDA11.8 Unittest Accuracy (#49374) · 8aa5be90
  由 Huihuang Zheng 提交于 12月 28, 2022
```
Fix CUDA11.8 Unittest Accuracy
```
  8aa5be90
27 12月, 2022 1 次提交

[Cherry-pick] Fix custom operator backward=None (#48656) (#48715) · 39eb77a6

由 HongyuJia 提交于 12月 27, 2022

* [Release2.4] Revert python link prs (#48573)

* Revert "Fix mac link python (#48017)"

This reverts commit 3fa7a736.

* Revert "[Cherry-pick] Fix python link error (#47811)"

This reverts commit ff642c68.

* Update config.go

* fix custom operator backward=None (#48656)

* [Custom Extension] Fix custom double_grad backward=None (#49224)

* fix custom double_grad backward=None

* fix custom_relu.cu bug && polish testcase of double_grad

* remove old dynamic graph test

* add import fluid

* add import fluid
Co-authored-by: NChen Weihang <chenweihang@baidu.com>

39eb77a6

22 12月, 2022 1 次提交
- G
  
  fix unittest in post training quantization (#49257) · 5d29a5bf
  由 Guanghua Yu 提交于 12月 22, 2022
  
  5d29a5bf
21 12月, 2022 2 次提交
- A
  
  fix unittests (#49203) (#49210) · 7c36b887
  由 Aganlengzi 提交于 12月 21, 2022
  
  7c36b887
- Z
  
  cherry-pick #75b734 (#49201) · fb19648a
  由 zhangkaihuo 提交于 12月 21, 2022
  
  fb19648a
28 11月, 2022 1 次提交

Cherrypick NV fixes to release/2.4 (#48263) · 7a0b8625

由 zlsh80826 提交于 11月 28, 2022

* Reduce squeeze2_matmul_fuse_pass, flattent tests time (#47098)

* Add missing fp32 config and reduce the testing combination

* Reduce trt matmul pass test max examples

* Loose TRT fp16 tests tolerance (#47100)

* Loose TRT half test tolerance to 1e-3 (#47101)

* Loose TRT half test tolerance to 1e-3 (#47106)

* Update distributed_strategy.proto (#46531)

* Close popen pipe after used (#47053)

* Add launch_bounds (#47285)

* Fix TRT UT failures (#47488)

* Format cherry-picked commits

* CudnnNormConvolution is no longer supported on NVIDIA Hopper GPUs (#48203)

* Skip tests that use fused_ops on H100

* Add error message to FusedOps on H100
Co-authored-by: NShijie <505749828@qq.com>
Co-authored-by: NLeo Chen <39020268+leo0519@users.noreply.github.com>
Co-authored-by: NTian Zheng <tizheng@nvidia.com>

7a0b8625

24 11月, 2022 1 次提交
- U
  [cherry-pick2.4]en-docs warning&error fix (#48332) · 1490aaa9
  由 ustiniankw 提交于 11月 24, 2022
```
* fixdocs, test=document_fix

* fixdocs, test=document_fix
```
  1490aaa9
07 11月, 2022 2 次提交

[cherry-pick2.4]docs fix (#47669) · cf668ab3

由 Ligoml 提交于 11月 07, 2022

* #46165

* #45752

* fix some doc bug test=document_fix (#45488)

* fix some doc bug test=document_fix

* fix some docs issues, test=document_fix

* beta -> \beta in softplus

* threshold -> \varepsilon in softplus

* parameter name

* delta -> \delta in smooth_l1_loss

* fix some docs test=document_fix

* fix docs test=document_fix

* fix docs && 增加空行 test=document_fix

* Update python/paddle/nn/functional/activation.py, test=document_fix

* Update python/paddle/nn/layer/activation.py, test=document_fix
Co-authored-by: NSigureMo <sigure.qaq@gmail.com>

* [docs] add ipustrategy Hyperlink (#46422)

* [docs] add ipustrategy Hyperlink

* fix ipu_shard_guard docs; test=document_fix

* [docs] add set_ipu_shard note

* [docs] fix hyperlink

* update framework.py

* fix mlu_places docs; test=document_fix

* fix put_along_axis docs; test=document_fix

* fix flake8 W293 error, test=document_fix

* fix typo in typing, test=document_fix
Co-authored-by: NLigoml <39876205+Ligoml@users.noreply.github.com>
Co-authored-by: NNyakku Shigure <sigure.qaq@gmail.com>

* #46659

* Update README_cn.md (#46927)

修复了错别字

* #46738

* fix paddle.get_default_dtype (#47040)

Chinese and English return values are inconsistent

* fix bug
Co-authored-by: N张春乔 <83450930+Liyulingyue@users.noreply.github.com>
Co-authored-by: NInfinity_lee <luhputu0815@gmail.com>
Co-authored-by: Nmrcangye <chenloong@88.com>
Co-authored-by: NSigureMo <sigure.qaq@gmail.com>
Co-authored-by: Ngouzil <66515297+gouzil@users.noreply.github.com>
Co-authored-by: NHamid Zare <12127420+hamidzr@users.noreply.github.com>
Co-authored-by: NSqhttwl <61459740+Sqhttwl@users.noreply.github.com>
Co-authored-by: NOccupyMars2025 <31559413+OccupyMars2025@users.noreply.github.com>
Co-authored-by: N超级码牛 <54444805+SuperCodebull@users.noreply.github.com>
Co-authored-by: Njzhang533 <jzhang533@gmail.com>

cf668ab3

[Cherry-pick][BugFix]Fix set_attr modify underly type (#47500) (#47566) · 58c47e8d

由 Aurelius84 提交于 11月 07, 2022

* Fix set_attr modify underly type (#47500)

* reformat code

* Revert "reformat code"

This reverts commit f11a5d7658633e53c279f11612254937e2d87feb.

58c47e8d

04 11月, 2022 2 次提交

[CherryPick] Cherry pick #45916 #46031 #47299 (#47610) · 72e1eb6b

由 xiongkun 提交于 11月 04, 2022

* [ Dy2Static ] Fix bugs when select inputs meeting different shape or undefined-var (#45916)

* fix select_input with different shape errors:
1. select_input_with_buildin_type directly return non-undefinedvar branch when meeting undefined var
2. the output shape of select_input is inferred from inputs.

* reverse the logic in select_input

* [warning] added warning message in cond block when one branch returns variable and another returns None (#46031)

* [cherry-pick] Allow manaully set py_reader name in standalone executor (#45898) (#45931)

* Allow manaully set py_reader name in standalone executor

* [BugFix] while cond receives dict as input (#47299)

* fix bugs while cond receives dict as input

* add unittest

* change flatten -> _is_sequence_except_dict

* code format
Co-authored-by: Nfeifei-111 <wuzhanfei@baidu.com>

72e1eb6b

L
[cherry-pick2.4]for CodeStyle (#47608) · cfee9c13
由 Ligoml 提交于 11月 04, 2022
```
* only run pre-commit

* only run pre-commit
```
cfee9c13

03 11月, 2022 2 次提交
- S
  
  FC/matmul(v2) + scale fuse pass (#47420) · 99c872fa
  由 Sławomir Siwek 提交于 11月 03, 2022
  
  99c872fa
- S
  support unbalanced data for pipeline (#47199) (#47569) · d4bf8b1a
  由 ShenLiang 提交于 11月 03, 2022
```
* add unbalanced data

* fix utest
```
  d4bf8b1a
31 10月, 2022 2 次提交

2.4/fix engine build (#47462) · 4b3589fb

由 zhaoyingli 提交于 10月 31, 2022

* update codestyle

* [AutoParallel] fix fp16 for subblock (#47189)

* [AutoParallel] fix fp16 for subblock

* fix engine

* fix comment

* [AutoParallel] fix engine _build and cost method (#47263)

* fix engine build method

* fix import

* update engine cost

* update raise error

* update cmakelist

* revert optimizer

* revert optimizer

* fix unittest

* fix unittest
Co-authored-by: Ncaozhou <caozhou@radi.ac.cn>
Co-authored-by: Ncaozhou <caozhou@radi.ac.cn>

4b3589fb

G
[cherry-pick] update dygraph PTQ export_model api (#47415) · 12b9b03e
由 Guanghua Yu 提交于 10月 31, 2022
```
* update dygraph PTQ export_model api

* remove postprocess
```
12b9b03e

28 10月, 2022 1 次提交
- W
  [Dy2St]Fix abnormal growth of memory in train mode and no_grad for Dy2St (#47398) (#47414) · 7618cbdc
  由 WangZhen 提交于 10月 28, 2022
```
* [Dy2St]Fix abnormal growth of memory in train mode and no_grad for Dy2St 
```
  7618cbdc
27 10月, 2022 2 次提交
- S
  [Cherry-pick Release/2.4] Fix multi_tensor adam and momentum bug when the... · 94240e2e
  由 sneaxiy 提交于 10月 27, 2022
```
[Cherry-pick Release/2.4] Fix multi_tensor adam and momentum bug when the parameter is list of dict (#47372)

* reformat file by black

* fix multi_tensor adam/momentum bug
```
  94240e2e
- Z
  [cherry-pick] add batch_norm_kernel (#47394) · b143e008
  由 zhangkaihuo 提交于 10月 27, 2022
```
* cherry-pick #46359 and resolve conflict
```
  b143e008
26 10月, 2022 1 次提交
- A
  [Cherry-Pick][Dy2Stat]Fix module loading OSError in multiprocess (#47302) · 12e6dfcf
  由 Aurelius84 提交于 10月 26, 2022
```
[Dy2Stat]Fix module loading OSError in multiprocess
```
  12e6dfcf
25 10月, 2022 1 次提交
- F
  [cherry-pick] add prior_box and box_coder for paddle.vision.ops (#46786) · d5c6386c
  由 Feng Ni 提交于 10月 25, 2022
```
* add prior_box and box_coder for paddle.vision.ops

* fix UT change assertTrue to assert_allclose

* fix formula format
```
  d5c6386c
20 10月, 2022 8 次提交
- [cherry-pick 2.4] remove incubate of all paddle sparse api (#47183) · 50d4fa54
  由 zhouweiwei2014 提交于 10月 20, 2022
  
  50d4fa54
- L
  Add value check & error message for gather_tree (#47051) (#47221) · 6712e262
  由 liu zhengxi 提交于 10月 20, 2022
```
Add value check & error message for gather_tree
cherry-pick #47051
```
  6712e262
- G
  
  fix problem of persistable var saving in QAT (#47203) · 3d647b1c
  由 Guanghua Yu 提交于 10月 20, 2022
  
  3d647b1c
- Y
  [cherry-pick] Fix quantize model deploy bug in MKLDNN (#47119) · c2d344dd
  由 yeliang2258 提交于 10月 20, 2022
```
* Fix quantize model deploy bugs when using MKLDNN (#45920)

* fix immutable op quantize bugs

* fix

* fix build bug

* fix test

* notest,test=inference

* fix ppyoloe acc drop bugs

* fix test

* fix test

* add test

* fix

* fix

* fix test

* fix refined name bug

* fix test

* bias fix

* fix matmul weight dequant bug

* re-ci

* fix tester

* fix test

* fix tester

* update weight dequantize func

* update code

* update test for converage

* update test

* update cmake

* update cmakelist

* update code

* rerun ci

* remove useless code

* re-ci

* update code

* update code

* fix header

* update code for log
```
  c2d344dd
- Z
  [Paddle-TRT][Cherry-Pick]Rewrite strided_slice converter using shape tensor (#47153) · 68c4ac31
  由 zhoutianzi666 提交于 10月 20, 2022
```
* stride_to_24

* fix CI failing
```
  68c4ac31
- W
  [Cherry-pick] layernorm shift partation enhance (#47086) · 9ed1454a
  由 Wang Bojun 提交于 10月 20, 2022
```
* Enhance the layernorm shift partation fuse op when shift size > 0 (roll shifting)
* fix cherry-pick test
```
  9ed1454a
- J
  
  add _get_phi_kernel_name interface (#47033) · 4c925242
  由 JingZhuangzhuang 提交于 10月 20, 2022
  
  4c925242
- S
  [Cherry-pick][Release/2.4] support pure bfloat16 for more ops · da7d2f29
  由 sneaxiy 提交于 10月 20, 2022
```
support pure bfloat16 for more ops
```
  da7d2f29
19 10月, 2022 2 次提交

[Cherry-Pick][AutoParallel] auto_parallel cherry-pick to release2.4 (#47145) · 90b31790

由 zhaoyingli 提交于 10月 19, 2022

* [Auto Parallel] Make Engine class callable (#46416)

* [Auto Parallel] Imporve the user-defined fetches and logging

* [Auto Parallel] Make Engine class callable

* [Auto Parallel] Update the data loading of tuner

* Print IPS in auto parallel Engine (#46554)

* [AutoParallel] fix dist_split (#46505)

* [AutoParallel] fix dist_split

* add unittest

* update cmakelist

* [AutoParallel] fix sharding (#46572)

* [AutoParallel] fix process_mesh (#46583)

* [AutoParallel] fix reshard when train with eval (#46605)

* [AutoParallel] fix reshard when train with eval

* fix mppp

* [AutoParallel] fix amp when predict (#46637)

* [Auto Parallel]Update comp cost and completion for gpt auto search (#46387)

* update comp cost and completion for gpt auto search

* add unittest

* [Auto Parallel] Fix bugs caused by the inconsistent outputs of Engine API (#46633)

* [Auto Parallel] Unify the logger and outputs of Engine API

* [Auto Parallel] Fix the bugs of to_static

* [Auto Parallel] Adjust the test_to_static.py

* [Auto Parallel] Improve the fine-grained APIs (#46552)

* [Auto Parallel] Suppport different dataloaders

* [Auto Parallel] Add num_shards config for dataset

* [Auto Parallel] Unify the logger and outputs of Engine API

* [Auto Parallel] Fix the bugs of to_static

* [Auto Parallel] Adjust the test_to_static.py

* [Auto Parallel] Add the prepare API and replace __call__ with run

* [Auto Parallel] Improve the private implementations of Engine

* [Auto Parallel] Set capacity of dataloader for opt tuning

* [Auto Parallel] [WIP] Change the fine-grained API

* [Auto Parallel] Improve APIs to support different user cases

* [Auto Parallel] Add removed config

* [Auto Parallel] Add imports

* [Auto Parallel] Fix bugs for to_static

* [Auto Parallel] Remove unnecessary imports

* bugfix (#46921)

* [Auto Parallel] Fix the bug for None labels (#46987)

* [AutoParallel] adapt for gpt-gen (#46771)

* for gpt-gen

* fix reshard

* adapt assign and shape op

* add dist_assign & unittest

* add conditional block unittest

* rename unittest

* [Auto Parallel] Fix the bug of completion (#47056)

* [Auto Parallel] Fix the bug for None labels

* [Auto Parallel] Fix the completion bug

* [AutoParallel] add callbacks (#47014)

* [AutoParallel] add callbacks

* fix unittest

* fix dist_context

* fix engine

* fix cmakelist

* fix unittest's returns

* fix cmakelist

* [Auto Parallel] Add cost interface (#47043)

* add cost interface

* update inferface and add unittest

* update unittest

* update inferface

* [Auto Parallel]Add parallel tuner (#46189)

* add parallel tuner

* add unittest

* fix unittest

* set timeout of unittest

* set unittest timeout

* fix auto_mode setting

* update unittest

* sync from develop and update unittest

* remove unused import

* update unittest

* update cmakelist

* add unittests
Co-authored-by: NYulong Ao <aoyulong@baidu.com>
Co-authored-by: NRuibiao Chen <chenruibiao@baidu.com>
Co-authored-by: Ncaozhou <48191911+Caozhou1995@users.noreply.github.com>
Co-authored-by: NJZ-LIANG <jianzhongliang10@gmail.com>

90b31790

A
[Dy2Stat]Polish @to_static temporary file directory to speed up transformation (#47102) (#47144) · 5a9befea
由 Aurelius84 提交于 10月 19, 2022
```
Polish @to_static temporary file directory to speed up transformation
```
5a9befea

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功