提交 · e62531520bbd63ef1caba6ef19c124a69497aefa · Crayon鑫 / Paddle

14 10月, 2021 1 次提交
- Z
  
  Add the complete code and related files of resnet_unit_op (#36366) · 12e6dbbc
  由 Zhang Zheng 提交于 10月 14, 2021
  
  12e6dbbc
21 9月, 2021 1 次提交

Reuse OneDNN handler for SGD and SUM for SelectedRows input tensors. (#35510) · 799f3861

由 Adam Osewski 提交于 9月 20, 2021

* Create stateful OneDNNAXPYHandler object.

This makes it possible to call it multiple times without recreating the
oneDNN primitives every time.

* Prepare SGDOpKernel to reuse its implementation from OneDNN kernel.

* OneDNN SGD kernel.

* Update call to use new OneDNNAXPYHandler object api.

* Setup seed in proper place.

* Enable OneDNN kernel only for single case.

* For dense param and sparse grad.

* Small refactor.

* Enable oneDNN by op attr or by cmd line flag.

* Use int64_t type for number of elements.

* Support dense param and grad from OneDNN kernel.

* Enable SGD OneDNN kernel when use MP BF16 optimizer.

* Force non-copyable/movable OneDNNAXPYHandler.

* Reuse OneDNNAXPYHandler for spare tensors in SUM op.

* Fix SFINAE rules.

* Remove recording event inside AXPY.

* Get rid of internal primitive caching.

* Stop use PP cache mechanims to store mem and primitive obj.
* Handler obj store and reuse needed desc & prim

* Do not derive from MKLDNNHandlerT

799f3861

10 9月, 2021 1 次提交
- S
  
  fix bug of recompute in hybridparallel (#35588) · d53e567a
  由 ShenLiang 提交于 9月 10, 2021
  
  d53e567a
24 8月, 2021 1 次提交
- A
  Update LearningRate for test fit a line BF16 (#34653) · 36f7e751
  由 Adam Osewski 提交于 8月 24, 2021
```
* Small corrections.

* Fix lr for bf16.

* Revert some changes.
```
  36f7e751
17 8月, 2021 1 次提交
- R
  
  [NPU]Adamw skip update for npu (#34897) · b4474fb4
  由 Roc 提交于 8月 17, 2021
  
  b4474fb4
05 8月, 2021 1 次提交
- W
  
  optimize pipeline performance with recompute and amp, test=allcase (#34519) · 911c8593
  由 WangXi 提交于 8月 05, 2021
  
  911c8593
22 7月, 2021 2 次提交
- L
  copy found_inf to cpu in advance to improve performance (#34274) · 781f4028
  由 Leo Chen 提交于 7月 22, 2021
```
* copy found_inf to cpu in advance to improve performance

* add npu test

* add npu test

* refine code

* refine memcpy op

* fix adam
```
  781f4028
- L
  
  enable amp unsupported_fp16_list for npu (#34314) · b0a2f005
  由 Leo Chen 提交于 7月 22, 2021
  
  b0a2f005
19 7月, 2021 1 次提交

[amp] pass found_inf to adam to suppport skip_update (#34176) · 9bc59673

由 Leo Chen 提交于 7月 19, 2021

* pass found_inf to adam

* add unittest

* fix bug

* refine unittest

* change unit test's directory

* disable unittest on cpu

9bc59673

16 7月, 2021 1 次提交

[NPU] add clear_float_status op (#34190) · 0e4bcede

由 Leo Chen 提交于 7月 16, 2021

* add clear_float_status op

* refine infershape

* fix typo

* refine check_finite_and_scale

* refine code

0e4bcede

05 7月, 2021 1 次提交

add `reduce_sum` op into amp black list (#33960) · aa9fdd0d

由 jiangcheng 提交于 7月 05, 2021

* reduce sum op default fp32, add into amp black list

* reduce_sum default fp32 can avoid return inf when the sum value large than 65504

aa9fdd0d

01 7月, 2021 1 次提交
- T
  
  fix bug DLTP-31078 (#33877) · 3e82a794
  由 taixiurong 提交于 7月 01, 2021
  
  3e82a794
29 6月, 2021 1 次提交
- T
  
  xpu support amp (#33809) · 4d4fb660
  由 taixiurong 提交于 6月 29, 2021
  
  4d4fb660
21 6月, 2021 1 次提交
- W
  
  update fp16 gray_list for tensor parallel (#33660) · 1681a2dd
  由 WangXi 提交于 6月 21, 2021
  
  1681a2dd
16 6月, 2021 1 次提交
- Z
  
  fix new ci check errors (#33561) · 16099abf
  由 zhiboniu 提交于 6月 16, 2021
  
  16099abf
10 6月, 2021 1 次提交
- B
  
  dp c_allreduce_sum_fusion op (#33169) · 003b4616
  由 Baibaifan 提交于 6月 10, 2021
  
  003b4616
26 5月, 2021 1 次提交
- J
  
  [Tensor Parallelism] split fix bug (#33015) · 20b9be65
  由 JZ-LIANG 提交于 5月 26, 2021
  
  20b9be65
07 5月, 2021 1 次提交
- J
  Mechanism that converts startup_program initializers to BF16 (#32720) · ce2bdb0a
  由 joanna.wozna.intel 提交于 5月 07, 2021
```
* Add casting initializers for bf16 training

* Changes after review

* Correct test and add comment
```
  ce2bdb0a
28 4月, 2021 1 次提交
- A
  
  Added pure_bf16 mode (#32281) · bc379ca3
  由 arlesniak 提交于 4月 28, 2021
  
  bc379ca3
23 4月, 2021 1 次提交

[NPU] refactor check_finite_and_scale npu kernel (#32407) · 39a59dcf

由 Leo Chen 提交于 4月 23, 2021

* refactor_check_finite_and_scale_npu_kernel

* fix compile

* add alloc_float_status op

* add alloc_float_status op

* add FloatStatus for check_finite_and_unscale

* refine code

* remove unneccessary logic

* refine for fleet

39a59dcf

22 4月, 2021 1 次提交
- Y
  
  Add fleet get_loss_scaling doc and update alert message (#32419) · d03b0b16
  由 Yuang Liu 提交于 4月 22, 2021
  
  d03b0b16
21 4月, 2021 2 次提交
- H
  
  fix bug in amp O2 (#32343) · 4be3b057
  由 huangxu96 提交于 4月 21, 2021
  
  4be3b057
- Y
  
  add get_loss_scaling to fleet (#32401) · 37bb3342
  由 Yuang Liu 提交于 4月 21, 2021
  
  37bb3342
15 4月, 2021 1 次提交
- F
  fix test sync_with_cpp (#32212) · 0c037d2d
  由 fangshuixun007 提交于 4月 15, 2021
```
fix test sync_with_cpp (#32212)
```
  0c037d2d
08 4月, 2021 1 次提交

The unsupported_fp16_list using in AMP will be created automatically during the runtime. (#32102) · 6e65fe02

由 Zhen Wang 提交于 4月 08, 2021

* Use the runtime to create the unsupported_fp16_list using in AMP.

* Add more infos about supported ops.

* Add some comments for the function of OpSupportedInfos.

* Fix the unit test of test_multi_precision_fp16_train.

6e65fe02

26 3月, 2021 1 次提交
- L
  [3D-parallel] Reformat pipeline parallel (#31786) · c3974d0e
  由 lilong12 提交于 3月 26, 2021
```
* update, test=develop
```
  c3974d0e
22 3月, 2021 1 次提交
- A
  
  [oneDNN] Initial bf16 amp integration (#31093) · 7ccf6b60
  由 arlesniak 提交于 3月 22, 2021
  
  7ccf6b60
20 1月, 2021 1 次提交
- H
  Add fleet amp_init() (#30572) · 13862008
  由 huangxu96 提交于 1月 20, 2021
```
* add fleet amp.init()

* add unittest for fleet_amp_init
```
  13862008
13 1月, 2021 1 次提交
- H
  
  add amp example document (#30314) · 342d62de
  由 huangxu96 提交于 1月 13, 2021
  
  342d62de
08 1月, 2021 1 次提交

Support pure fp16 training for AMP API. (#29544) · 7f7dfccf

由 Zhen Wang 提交于 1月 08, 2021

* add cast ops before and after unsupported fp16 ops.

* Keep partial net in FP32 pattern.

* Support check_finite_and_unscale and update_loss_scaling for FP16 calculation mode.

* Add fp16 support for adam op.

* add multi precision attr for adam.

* Fix the bug of test_multi_precision_fp16_train UT.

* Code format for CI.

* Fix the redefine error about MPTypeTrait on windows.

* fix bugs of the _create_accumulators func in Momentum.

* fix bug when inserting post cast op.

* Add the update_loss_scaling op in allow_set of UnusedVarCheck.

* Update for ci coverage.

* Add some doc for OptimizerWithMixedPrecision.

* Fix the code style.

* Imporve the doc of `amp_init`.

* Change for fp16 testing if users have the infer program defined in separate way.

7f7dfccf

05 1月, 2021 1 次提交
- W
  
  [fleet] combine amp and gradient merge, test=develop (#30086) · ab049978
  由 WangXi 提交于 1月 05, 2021
  
  ab049978
15 12月, 2020 1 次提交
- H
  add alias for fluid.contrib.mixed_precision (#29562) · c05170d3
  由 huangxu96 提交于 12月 15, 2020
```
* add alias for fluid.contrib.mixed_precision
```
  c05170d3
09 12月, 2020 1 次提交
- A
  
  fix amp support fleet (#29491) · 5d530c93
  由 Aurelius84 提交于 12月 09, 2020
  
  5d530c93
02 12月, 2020 2 次提交

Add pure fp16 training with master weights. (#27712) · be3777a5

由 Zhen Wang 提交于 12月 02, 2020

* add the weight decay func for the momentum op

* Add the multi_precision function in Momentum Optimizer.

* Make sure that the initial value of master weights are same with the fp16 weights.

* add static loss scaling.

* add the rescale_grad function in the pure fp16 training.

* use the original momentum updating method.

* Polish some codes, such as variable names.

* add docstring for apis.

* update the var creation details of _create_master_weight.

* not modify codes about imperative momentum updating.

* Fix the error of test_dist_sparse_tensor_load_momentum UT.

* add unit test for multi precision fp16 training.

* add more unit tests for CI.

* Use lower threshold values for allclose comparing in test_multi_precision_fp16_train UT.

* For CI Coverage Checking.

be3777a5

Layer norm fp16 (#29169) · 7584bb50

由 furnace 提交于 12月 02, 2020

* add fp16 for layer_norm op

* revert layernorm api

* fix forward

* fix forward

* fix backward for layernorm with fp16

* fix unit test for layernorm with fp16

* fix with_mkldnn compile error for layernorm with fp16

* 1. revert to PADDLE_ENFORCE_NOT_NULL, 2. change static_cast<float> to static_cast<U>

* fix with_mkldnn compile error for layernorm with fp16

* fix with_mkldnn compile error for layernorm with fp16
Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>

7584bb50

30 11月, 2020 1 次提交
- W
  
  optimizer amp, all use fp16 communication, overlap last comm and compute (#28957) · 0c2a51d2
  由 WangXi 提交于 11月 30, 2020
  
  0c2a51d2
18 11月, 2020 1 次提交
- L
  Add matmtl_v2 to amp list (#28693) · 11e32baf
  由 Leo Chen 提交于 11月 18, 2020
```
* add matmtl_v2 to amp list

* support dygraph
```
  11e32baf
04 11月, 2020 1 次提交
- L
  Skip reader op in mixed_precision decorator (#28353) · 71d62207
  由 Leo Chen 提交于 11月 04, 2020
```
* skip reader op in mixed_precision decorator

* add ut
```
  71d62207
12 10月, 2020 1 次提交
- W
  
  fleet combine amp dgc recompute meta optimizer (#27643) · 0a1862d1
  由 WangXi 提交于 10月 12, 2020
  
  0a1862d1
23 9月, 2020 1 次提交
- Z
  add fuse_bn_act op (#27230) · 906e7f92
  由 Zhang Ting 提交于 9月 23, 2020
```
* add fused_bn_add_relu op
```
  906e7f92

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致