提交 · a4d07bb9522c084e04e8764de03ae09f019bf5cf · OPTHREE / Paddle

24 12月, 2021 1 次提交
- Z
  
  [AMP] Add multi_precision for sgd (#38231) · a4d07bb9
  由 zhangbo9674 提交于 12月 24, 2021
  
  a4d07bb9
17 12月, 2021 1 次提交

Refine some AMP operators for BERT (#37923) · d80fe268

由 sneaxiy 提交于 12月 17, 2021

* support multi precision update for LAMB

* hide some api

* fix ci uts

* fix lamb output of dygraph

* remove some changes to some PR

* try to fix Py3 CI compile error

* fix test_imperative_optimizer, add lars ut, add layer_norm ut

* fix ut, fix format

* fix ut

* fix windows ci

d80fe268

08 12月, 2021 1 次提交
- Y
  
  bug fix for adamw (#37905) · 9a2d327c
  由 Yuang Liu 提交于 12月 08, 2021
  
  9a2d327c
28 10月, 2021 1 次提交
- L
  
  first commit (#36778) · 6edbdbfa
  由 limingshu 提交于 10月 28, 2021
  
  6edbdbfa
14 10月, 2021 2 次提交
- Z
  
  refine lars (#36409) · eb722e34
  由 Zeng Jinle 提交于 10月 14, 2021
  
  eb722e34
- Y
  
  [hybrid enhance] add flag to control the avg position for grad merge under pipeline mode (#36384) · 03d8304f
  由 Yuang Liu 提交于 10月 14, 2021
  
  03d8304f
13 10月, 2021 1 次提交

Merge lars op (#35476) · 0c31579c

由 limingshu 提交于 10月 13, 2021

* A leap of try for cudaLaunchCooperativeKernel

* fix bugs

* Totally replace the lar cuda kernel

* Fix bugs

* a test for lars merge

* Adding las_op_momentum infer_shape

* Fix codes

* use avg_numel instead of max_numel to acquire grid num

* modify unittest files about lars op

* Finally converge when merged-lars works

* fix ctest files

* add merged_operation kernel when cuda version is older than 11

* Fix code style

* fix ctest failure

* fix error

* fix all ctest error and change lars compute code of cpu

* fix bugs on v100.

* revert python modififation about lars

* revert python modification codes

0c31579c

12 10月, 2021 2 次提交
- Z
  Revert "refine LarsOptimizer (#36351)" (#36369) · 033a73c3
  由 Zeng Jinle 提交于 10月 12, 2021
```
This reverts commit b3f6eedb.
```
  033a73c3
- Z
  
  refine LarsOptimizer (#36351) · b3f6eedb
  由 Zeng Jinle 提交于 10月 12, 2021
  
  b3f6eedb
21 9月, 2021 1 次提交

Reuse OneDNN handler for SGD and SUM for SelectedRows input tensors. (#35510) · 799f3861

由 Adam Osewski 提交于 9月 20, 2021

* Create stateful OneDNNAXPYHandler object.

This makes it possible to call it multiple times without recreating the
oneDNN primitives every time.

* Prepare SGDOpKernel to reuse its implementation from OneDNN kernel.

* OneDNN SGD kernel.

* Update call to use new OneDNNAXPYHandler object api.

* Setup seed in proper place.

* Enable OneDNN kernel only for single case.

* For dense param and sparse grad.

* Small refactor.

* Enable oneDNN by op attr or by cmd line flag.

* Use int64_t type for number of elements.

* Support dense param and grad from OneDNN kernel.

* Enable SGD OneDNN kernel when use MP BF16 optimizer.

* Force non-copyable/movable OneDNNAXPYHandler.

* Reuse OneDNNAXPYHandler for spare tensors in SUM op.

* Fix SFINAE rules.

* Remove recording event inside AXPY.

* Get rid of internal primitive caching.

* Stop use PP cache mechanims to store mem and primitive obj.
* Handler obj store and reuse needed desc & prim

* Do not derive from MKLDNNHandlerT

799f3861

18 9月, 2021 1 次提交
- W
  
  [hybird] fix pipeline section program Parameter (#35847) · 67c63639
  由 WangXi 提交于 9月 18, 2021
  
  67c63639
17 9月, 2021 2 次提交

[AMP] Support pure fp16 training mode for dygraph (#35521) · adaeee4d

由 zhangbo9674 提交于 9月 17, 2021

* add pure fp16 major function in auto_cast & tracer

* support master weight in dygraph for pure fp16

* check mix dtype of fp16&fp32 for check_finite_and_unscale op

* change pure fp16 funtion name

* refine some bug in auto_cast

* refine auto_cast interface logic

* add param _casted_by_pure_fp16 for class Layer

* support state_dict hook for save model by user appointed dtype in pure_fp16_decorator

* refine pure_fp16_decorator as decorator

* add unittest

* add comment

* add comment

* support recompute

* add comment for auto_cast and decorator

* support to_static_state_dict for paddle.jit.save

* unlimite models num and optimizers num

* add lookup_table in black_list

* fix momentum and layer state_dict

* fix bug in layer state_dict

* fix bug in layer state_dict_helper

* refine unittest

* refine test_momentun_op

* refine interface and some code

* refine amp_decorator interface

* refine pure fp16 interface

* refine master weight interface

adaeee4d

Support EMA in Paddle2.x and Fleet (#35673) · fb4d5689

由 Haohongxiang 提交于 9月 17, 2021

* Support EMA in Paddle2.x and Fleet

* update

* update

* update

* modify ut of ema

* modify docs

* modify bugs

* update

* update

* update

* modify ut

fb4d5689

16 9月, 2021 2 次提交
- Y
  
  [hybrid] Fix mp multi gradient clip prob (#35713) · a4eadd15
  由 Yuang Liu 提交于 9月 16, 2021
  
  a4eadd15
- W
  
  [hybrid] remove scale op in insert_scale_loss_grad_ops (#35775) · 02b0be08
  由 WangXi 提交于 9月 16, 2021
  
  02b0be08
15 9月, 2021 1 次提交
- W
  
  [hybrid] out data parallel as optimizer sharding parallel (#35593) · 78465703
  由 WangXi 提交于 9月 15, 2021
  
  78465703
10 9月, 2021 1 次提交
- Z
  
  set gradient_merge_cond persistable to false (#35578) · 47d15a30
  由 Zhong Hui 提交于 9月 10, 2021
  
  47d15a30
08 9月, 2021 3 次提交

W

[hybrid] check pipeline persist var which changed in forward and used in backward (#35453) · a2dbb0c2
由 WangXi 提交于 9月 08, 2021

a2dbb0c2

Enable program passes on Fleet APIs (#34955) · 5f369881

由 Zeng Jinle 提交于 9月 08, 2021

* add fleet api for program pass

* turn on apply pass for CI test

* fix disable fuse_all_optimizer bug

* try to test ci

* fix CI

* fill unspecified op role

* fix fuse_allreduce

* add ut to improve coverage

* remove useless change

* improve c++ coverage

* follow some comments

* test ir pass pipeline

* update doc

* reduce ut time again

5f369881

L
support weight sharing for pipeline (#35351) · 5199c744
由 lilong12 提交于 9月 08, 2021
```
* support weight sharing
```
5199c744

02 9月, 2021 1 次提交
- Y
  
  [hybrid] [npu] fit npu nan/inf check (#35171) · 67ed7e12
  由 Yuang Liu 提交于 9月 02, 2021
  
  67ed7e12
27 8月, 2021 1 次提交
- W
  
  [hybrid][npu] fix npu clear float status in pipeline (#35165) · 73321264
  由 WangXi 提交于 8月 27, 2021
  
  73321264
23 8月, 2021 1 次提交
- Y
  
  [hybrid performance] optim the grad fuse for pipeline mode by sorting the grad by dtype (#35070) · fad4b3b4
  由 Yuang Liu 提交于 8月 23, 2021
  
  fad4b3b4
20 8月, 2021 1 次提交
- Y
  
  [hybrid performance] Grad fuse for gradient merge under pipeline mode (#35004) · 4d9b2d6d
  由 Yuang Liu 提交于 8月 20, 2021
  
  4d9b2d6d
17 8月, 2021 1 次提交
- R
  
  [NPU]Adamw skip update for npu (#34897) · b4474fb4
  由 Roc 提交于 8月 17, 2021
  
  b4474fb4
14 8月, 2021 1 次提交
- W
  
  [hybrid] refine pipeline stage and mp send/recv check (#34870) · 2cd05d5d
  由 WangXi 提交于 8月 14, 2021
  
  2cd05d5d
11 8月, 2021 1 次提交
- W
  
  [hybrid] pp+dp support fp16 allreduce (#34762) · 4d7af372
  由 WangXi 提交于 8月 11, 2021
  
  4d7af372
09 8月, 2021 1 次提交
- J
  
  bugfix remove fluid (#34680) · a3cc2d0b
  由 JZ-LIANG 提交于 8月 09, 2021
  
  a3cc2d0b
28 7月, 2021 2 次提交
- L
  [NPU] Support ScaleTensor for scale npu kernel (#34418) · f17ba93b
  由 Leo Chen 提交于 7月 28, 2021
```
* support ScaleTensor for scale npu kernel

* add more tests for adam npu

* fix compile

* fix unittest

* refine adam optimizer
```
  f17ba93b
- R
  
  fix optimizer.py (#34431) · 0fb15d9f
  由 Roc 提交于 7月 28, 2021
  
  0fb15d9f
27 7月, 2021 1 次提交
- W
  
  [hybrid parallel] pipeline support adamw and LRScheduler (#34402) · 6ab0a6a8
  由 WangXi 提交于 7月 27, 2021
  
  6ab0a6a8
20 7月, 2021 2 次提交
- Y
  
  [hybird optim] reduce rend/recv times for recompute, test=develop (#34248) · 3a5f1f22
  由 Yuang Liu 提交于 7月 20, 2021
  
  3a5f1f22
- W
  
  [hybrid parallel] Optimize pipeline memory (#34230) · a74208c1
  由 WangXi 提交于 7月 20, 2021
  
  a74208c1
19 7月, 2021 3 次提交
- L
  [amp] pass found_inf to adam to suppport skip_update (#34176) · 9bc59673
  由 Leo Chen 提交于 7月 19, 2021
```
* pass found_inf to adam

* add unittest

* fix bug

* refine unittest

* change unit test's directory

* disable unittest on cpu
```
  9bc59673
- L
  move the recv op the beginning of the forward/backward phase for pipeline (#34197) · cc007dce
  由 lilong12 提交于 7月 19, 2021
```
* mv recv to head, test=develop
```
  cc007dce
- R
  
  [NPU hybrid] Partial send /recv/ allgather for npu (#34189) · 0cd21fac
  由 Roc 提交于 7月 19, 2021
  
  0cd21fac
16 7月, 2021 1 次提交
- W
  
  [hybrid check] improve pipeline stage check (#34193) · 5ce58d57
  由 WangXi 提交于 7月 16, 2021
  
  5ce58d57
15 7月, 2021 1 次提交
- W
  cache core.ops (#34058) · f05098b5
  由 wanghuancoder 提交于 7月 15, 2021
```
* cache core.ops, test=develop

* refine, test=develop
```
  f05098b5
13 7月, 2021 1 次提交
- W
  
  [hybrid performance] Optimize tensor parallel plus pipeline parallel send recv size (#34110) · 348d043e
  由 WangXi 提交于 7月 13, 2021
  
  348d043e
12 7月, 2021 1 次提交
- W
  
  [hybrid performance] Optimize pipeline send wait (#34086) · 5f65ff91
  由 WangXi 提交于 7月 12, 2021
  
  5f65ff91

OPTHREE / Paddle 与 Fork 源项目一致

OPTHREE / Paddle
与 Fork 源项目一致