提交 · 6959eae53a58b29ffca4efc848c55238734456e2 · PaddlePaddle / Paddle

12 4月, 2023 1 次提交
- Y
  
  Unify the static amp codes of fp16 and bf16. Reimplement #52694 in release/2.4. (#52697) · 6959eae5
  由 Yiqun Liu 提交于 4月 12, 2023
  
  6959eae5
29 8月, 2022 1 次提交
- Z
  
  add interpolate op to default black lists (#45393) · 9a560f7c
  由 Zhang Ting 提交于 8月 29, 2022
  
  9a560f7c
23 8月, 2022 2 次提交
- J
  
  bugfix (#45332) · 257438f3
  由 JZ-LIANG 提交于 8月 23, 2022
  
  257438f3
- J
  [Auto Parallel] Data Parallel Comm & Calc Overlap Optimization (#45173) · 229befc8
  由 JZ-LIANG 提交于 8月 23, 2022
```
* bugfix

* remove scaling

* support rescale_grad opt

* add unitest
```
  229befc8
05 6月, 2022 1 次提交

【code format check upgrade】 step2：yapf (#42944) · a072fca8

由 Sing_chan 提交于 6月 05, 2022

* use yapf to format all python file

* yapf exclude two unittests file for they rely on writing and reading file, and format will break them

* disable diff_py_file because too many diff files cause command following failed

a072fca8

28 4月, 2022 1 次提交

Add gradient merge for DistributedFusedLamb optimizer (#40177) · 108aeb28

由 sneaxiy 提交于 4月 28, 2022

* add gradient merge for DistributedFusedLamb

* use master acc gradient

* fix CI ut

* polish

* remove math_function_impl.h change

* fix test_update_loss_scaling_op.py

* try to fix XPU/NPU CI

* add gm ut

108aeb28

26 4月, 2022 1 次提交
- W
  
  Add fused_multi_transformer op to optimize transformer generation performance (#41814) · 9dadf7df
  由 WangXi 提交于 4月 26, 2022
  
  9dadf7df
15 4月, 2022 1 次提交
- A
  [IPU] add mixed-precission support for ipu (#41733) · d7224482
  由 Allen Guo 提交于 4月 15, 2022
```
* add mixed-precission support for ipu

* restore cast_model_to_fp16 api

* update UTs
```
  d7224482
16 3月, 2022 1 次提交
- Q
  
  [MLU] support amp O1 of mlu (#40461) · ad81f22c
  由 qipengh 提交于 3月 16, 2022
  
  ad81f22c
19 2月, 2022 1 次提交

Add the DistributedFusedLamb optimizer (#39148) · 5df3cd61

由 sneaxiy 提交于 2月 19, 2022

* add DistributedFusedLamb op

* polish code

* fix compile error

* compatible with pten changement

* fix rocm compile error

* improve converage

* update upstream/develop

* fix cast_with_ptr.h

* add FLAGS_distributed_lamb_divide_nranks_when_allreduce=1

* fix clip before allreduce

* add use_master_param_norm

* code polish

* fix bug

* fix ROCM ci

5df3cd61

07 2月, 2022 1 次提交

Update BF16 amp list (#39304) · 0c43ce22

由 arlesniak 提交于 2月 07, 2022

* amp list updated

* tests updated

* gray list updated

* amp list updated

* test updated

0c43ce22

13 1月, 2022 1 次提交

Added mul BF16/FP32 FWD/BWD oneDNN kernel (#38552) · fc6eed5b

由 jakpiase 提交于 1月 13, 2022

* base changes for mul reimplementation

* empty commit

* tmp save

* full implementation of mul bf16/fp32 fwd bwd

* CI fix

* CI rerun

* changed unity build cmake to avoid gpu issues

* removed mul mkldnn from unity build

* added skipping tests if not cpu_bf16

* CI fix

* CI fix

* CI fix

fc6eed5b

28 12月, 2021 1 次提交

Fix scatter_op fp16 perf problem. (#38499) · 33ce249f

由 Li Min 提交于 12月 28, 2021

* Fix scatter_op fp16 perf problem.

* Add scatter into black list.

* Add scatter into black list for dygraph.

33ce249f

20 12月, 2021 1 次提交

Support FP16 for more ops (#38123) · 1f445bf3

由 sneaxiy 提交于 12月 20, 2021

* support FP16 for more ops

* add amp list tests

* refine reduce_mean_grad

* fix OP benchmark ci

* fix fp16 reduce_mean

* updat ut, but still have some problems

* remove mean/reduce_mean fp16 kernel

1f445bf3

17 12月, 2021 1 次提交

Refine some AMP operators for BERT (#37923) · d80fe268

由 sneaxiy 提交于 12月 17, 2021

* support multi precision update for LAMB

* hide some api

* fix ci uts

* fix lamb output of dygraph

* remove some changes to some PR

* try to fix Py3 CI compile error

* fix test_imperative_optimizer, add lars ut, add layer_norm ut

* fix ut, fix format

* fix ut

* fix windows ci

d80fe268

27 10月, 2021 1 次提交

Fused transformer encoder layer and fused feedforward layer (#36604) · 9f3613f3

由 zhangkaihuo 提交于 10月 27, 2021

本PR是fused_transformer的layer层代码，包含FusedFeedForward的layer层代码和FusedTransformerEncoderLayer的代码。

9f3613f3

14 10月, 2021 1 次提交
- Z
  
  Add the complete code and related files of resnet_unit_op (#36366) · 12e6dbbc
  由 Zhang Zheng 提交于 10月 14, 2021
  
  12e6dbbc
21 9月, 2021 1 次提交

Reuse OneDNN handler for SGD and SUM for SelectedRows input tensors. (#35510) · 799f3861

由 Adam Osewski 提交于 9月 20, 2021

* Create stateful OneDNNAXPYHandler object.

This makes it possible to call it multiple times without recreating the
oneDNN primitives every time.

* Prepare SGDOpKernel to reuse its implementation from OneDNN kernel.

* OneDNN SGD kernel.

* Update call to use new OneDNNAXPYHandler object api.

* Setup seed in proper place.

* Enable OneDNN kernel only for single case.

* For dense param and sparse grad.

* Small refactor.

* Enable oneDNN by op attr or by cmd line flag.

* Use int64_t type for number of elements.

* Support dense param and grad from OneDNN kernel.

* Enable SGD OneDNN kernel when use MP BF16 optimizer.

* Force non-copyable/movable OneDNNAXPYHandler.

* Reuse OneDNNAXPYHandler for spare tensors in SUM op.

* Fix SFINAE rules.

* Remove recording event inside AXPY.

* Get rid of internal primitive caching.

* Stop use PP cache mechanims to store mem and primitive obj.
* Handler obj store and reuse needed desc & prim

* Do not derive from MKLDNNHandlerT

799f3861

10 9月, 2021 1 次提交
- S
  
  fix bug of recompute in hybridparallel (#35588) · d53e567a
  由 ShenLiang 提交于 9月 10, 2021
  
  d53e567a
24 8月, 2021 1 次提交
- A
  Update LearningRate for test fit a line BF16 (#34653) · 36f7e751
  由 Adam Osewski 提交于 8月 24, 2021
```
* Small corrections.

* Fix lr for bf16.

* Revert some changes.
```
  36f7e751
17 8月, 2021 1 次提交
- R
  
  [NPU]Adamw skip update for npu (#34897) · b4474fb4
  由 Roc 提交于 8月 17, 2021
  
  b4474fb4
05 8月, 2021 1 次提交
- W
  
  optimize pipeline performance with recompute and amp, test=allcase (#34519) · 911c8593
  由 WangXi 提交于 8月 05, 2021
  
  911c8593
22 7月, 2021 2 次提交
- L
  copy found_inf to cpu in advance to improve performance (#34274) · 781f4028
  由 Leo Chen 提交于 7月 22, 2021
```
* copy found_inf to cpu in advance to improve performance

* add npu test

* add npu test

* refine code

* refine memcpy op

* fix adam
```
  781f4028
- L
  
  enable amp unsupported_fp16_list for npu (#34314) · b0a2f005
  由 Leo Chen 提交于 7月 22, 2021
  
  b0a2f005
19 7月, 2021 1 次提交

[amp] pass found_inf to adam to suppport skip_update (#34176) · 9bc59673

由 Leo Chen 提交于 7月 19, 2021

* pass found_inf to adam

* add unittest

* fix bug

* refine unittest

* change unit test's directory

* disable unittest on cpu

9bc59673

16 7月, 2021 1 次提交

[NPU] add clear_float_status op (#34190) · 0e4bcede

由 Leo Chen 提交于 7月 16, 2021

* add clear_float_status op

* refine infershape

* fix typo

* refine check_finite_and_scale

* refine code

0e4bcede

05 7月, 2021 1 次提交

add `reduce_sum` op into amp black list (#33960) · aa9fdd0d

由 jiangcheng 提交于 7月 05, 2021

* reduce sum op default fp32, add into amp black list

* reduce_sum default fp32 can avoid return inf when the sum value large than 65504

aa9fdd0d

01 7月, 2021 1 次提交
- T
  
  fix bug DLTP-31078 (#33877) · 3e82a794
  由 taixiurong 提交于 7月 01, 2021
  
  3e82a794
29 6月, 2021 1 次提交
- T
  
  xpu support amp (#33809) · 4d4fb660
  由 taixiurong 提交于 6月 29, 2021
  
  4d4fb660
21 6月, 2021 1 次提交
- W
  
  update fp16 gray_list for tensor parallel (#33660) · 1681a2dd
  由 WangXi 提交于 6月 21, 2021
  
  1681a2dd
16 6月, 2021 1 次提交
- Z
  
  fix new ci check errors (#33561) · 16099abf
  由 zhiboniu 提交于 6月 16, 2021
  
  16099abf
10 6月, 2021 1 次提交
- B
  
  dp c_allreduce_sum_fusion op (#33169) · 003b4616
  由 Baibaifan 提交于 6月 10, 2021
  
  003b4616
26 5月, 2021 1 次提交
- J
  
  [Tensor Parallelism] split fix bug (#33015) · 20b9be65
  由 JZ-LIANG 提交于 5月 26, 2021
  
  20b9be65
07 5月, 2021 1 次提交
- J
  Mechanism that converts startup_program initializers to BF16 (#32720) · ce2bdb0a
  由 joanna.wozna.intel 提交于 5月 07, 2021
```
* Add casting initializers for bf16 training

* Changes after review

* Correct test and add comment
```
  ce2bdb0a
28 4月, 2021 1 次提交
- A
  
  Added pure_bf16 mode (#32281) · bc379ca3
  由 arlesniak 提交于 4月 28, 2021
  
  bc379ca3
23 4月, 2021 1 次提交

[NPU] refactor check_finite_and_scale npu kernel (#32407) · 39a59dcf

由 Leo Chen 提交于 4月 23, 2021

* refactor_check_finite_and_scale_npu_kernel

* fix compile

* add alloc_float_status op

* add alloc_float_status op

* add FloatStatus for check_finite_and_unscale

* refine code

* remove unneccessary logic

* refine for fleet

39a59dcf

22 4月, 2021 1 次提交
- Y
  
  Add fleet get_loss_scaling doc and update alert message (#32419) · d03b0b16
  由 Yuang Liu 提交于 4月 22, 2021
  
  d03b0b16
21 4月, 2021 2 次提交
- H
  
  fix bug in amp O2 (#32343) · 4be3b057
  由 huangxu96 提交于 4月 21, 2021
  
  4be3b057
- Y
  
  add get_loss_scaling to fleet (#32401) · 37bb3342
  由 Yuang Liu 提交于 4月 21, 2021
  
  37bb3342
15 4月, 2021 1 次提交
- F
  fix test sync_with_cpp (#32212) · 0c037d2d
  由 fangshuixun007 提交于 4月 15, 2021
```
fix test sync_with_cpp (#32212)
```
  0c037d2d

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功