提交 · d8dfef54a5caba7bbe1fd383707ee69dac58a959 · PaddlePaddle / Paddle

11 1月, 2021 2 次提交

[Cherry-Pick] Support pure fp16 training for AMP API. (#29544) (#30241) · d8dfef54

由 Zhen Wang 提交于 1月 11, 2021

* Support pure fp16 training for AMP API. (#29544)

* add cast ops before and after unsupported fp16 ops.

* Keep partial net in FP32 pattern.

* Support check_finite_and_unscale and update_loss_scaling for FP16 calculation mode.

* Add fp16 support for adam op.

* add multi precision attr for adam.

* Fix the bug of test_multi_precision_fp16_train UT.

* Code format for CI.

* Fix the redefine error about MPTypeTrait on windows.

* fix bugs of the _create_accumulators func in Momentum.

* fix bug when inserting post cast op.

* Add the update_loss_scaling op in allow_set of UnusedVarCheck.

* Update for ci coverage.

* Add some doc for OptimizerWithMixedPrecision.

* Fix the code style.

* Imporve the doc of `amp_init`.

* Change for fp16 testing if users have the infer program defined in separate way.

* Remove tensor copy in the update_loss_scaling op. (#29426)

* remove tensor copy in the update_loss_scaling op

* not use thrust.

* fix some cuda memory access error.

d8dfef54

[cherry-pick 2.0] optimize gradient merge (#30185) · e283dc6f

由 WangXi 提交于 1月 11, 2021

* Optimization grad merge performance (#29784)

* [fleet] combine amp and gradient merge, test=develop (#30086)

* fix assign_op_xpu concat_op_xpu warining (#30120)
Co-authored-by: Nliuyuhui <liuyuhui@baidu.com>

e283dc6f

08 1月, 2021 1 次提交

[Cherry-pick] amp related PR cherry pick into Release/2.0 (#30212) · 9f7c66b4

由 huangxu96 提交于 1月 08, 2021

* Optimizer trans momentum (#29597)

* merge amp related function in Momentum from paddle.fluid.contrib.optimizer into paddle.optimizer.

* Add unittest for 2.0  Momentum API.

* fix some bugs in weight_decay.

* add alias for fluid.contrib.mixed_precision (#29562)

* add alias for fluid.contrib.mixed_precision

* add static.amp into setup.pu.in (#29621)

* add static.amp into setup.pu.in

* add unittest for api

* fix a bug in multi_precision_fp16 unittest. (#29756)

9f7c66b4

07 1月, 2021 1 次提交

[Cherry-pick] Layer norm fp16 and Nvidia optimize (#29169 #29434 #29522 #29576) (#30110) · 44b81e63

由 furnace 提交于 1月 07, 2021

* Layer norm fp16 (#29169)

* add fp16 for layer_norm op

* revert layernorm api

* fix forward

* fix forward

* fix backward for layernorm with fp16

* fix unit test for layernorm with fp16

* fix with_mkldnn compile error for layernorm with fp16

* 1. revert to PADDLE_ENFORCE_NOT_NULL, 2. change static_cast<float> to static_cast<U>

* fix with_mkldnn compile error for layernorm with fp16

* fix with_mkldnn compile error for layernorm with fp16
Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>

* fix layer_norm accuracy (#29434)

* Layernorm opt (#29522)

* layernorm fw opt

* layernorm bw opt

* fix typo, test=develop

* remove const dim3 for windows CI compatibility

* merge develop
Co-authored-by: Nzlsh80826 <zlsh80826@gmail.com>

* Fix compile problem when cuda_arch < 6000 (#29576)

* fix compile problem when cuda_arch < 6000

* refine code

* refine code
Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>
Co-authored-by: Nzlsh80826 <zlsh80826@gmail.com>

44b81e63

09 12月, 2020 1 次提交
- A
  
  [cherry-pick] Fix amp support fleet(#29505) · d82d59e6
  由 Aurelius84 提交于 12月 09, 2020
  
  d82d59e6
03 12月, 2020 1 次提交

[Cherry-pick] Add pure fp16 training with master weights. (#29301) · d8ea8a06

由 Zhen Wang 提交于 12月 03, 2020

* Add pure fp16 training with master weights. (#27712)

* add the weight decay func for the momentum op

* Add the multi_precision function in Momentum Optimizer.

* Make sure that the initial value of master weights are same with the fp16 weights.

* add static loss scaling.

* add the rescale_grad function in the pure fp16 training.

* use the original momentum updating method.

* Polish some codes, such as variable names.

* add docstring for apis.

* update the var creation details of _create_master_weight.

* not modify codes about imperative momentum updating.

* Fix the error of test_dist_sparse_tensor_load_momentum UT.

* add unit test for multi precision fp16 training.

* add more unit tests for CI.

* Use lower threshold values for allclose comparing in test_multi_precision_fp16_train UT.

d8ea8a06

30 11月, 2020 1 次提交
- W
  
  optimizer amp, all use fp16 communication, overlap last comm and compute (#28957) · 0c2a51d2
  由 WangXi 提交于 11月 30, 2020
  
  0c2a51d2
18 11月, 2020 1 次提交
- L
  Add matmtl_v2 to amp list (#28693) · 11e32baf
  由 Leo Chen 提交于 11月 18, 2020
```
* add matmtl_v2 to amp list

* support dygraph
```
  11e32baf
04 11月, 2020 1 次提交
- L
  Skip reader op in mixed_precision decorator (#28353) · 71d62207
  由 Leo Chen 提交于 11月 04, 2020
```
* skip reader op in mixed_precision decorator

* add ut
```
  71d62207
12 10月, 2020 1 次提交
- W
  
  fleet combine amp dgc recompute meta optimizer (#27643) · 0a1862d1
  由 WangXi 提交于 10月 12, 2020
  
  0a1862d1
23 9月, 2020 1 次提交
- Z
  add fuse_bn_act op (#27230) · 906e7f92
  由 Zhang Ting 提交于 9月 23, 2020
```
* add fused_bn_add_relu op
```
  906e7f92
14 9月, 2020 1 次提交

Update amp_check_finite_and_scale_op and add an updating_loss_scaling op for... · d708b210

由 Zhen Wang 提交于 9月 14, 2020

Update amp_check_finite_and_scale_op and add an updating_loss_scaling op for static graph amp training. (#26240)

* update amp_check_finite_and_scale_op for static_amp.

* use amp_check_finite_and_scale in static graph amp.

* update grads to zero when grads own infinite values(as for amp_checkout_finite_and_scale op).

* add update_loss_scaling op in cpp.

* add update_loss_scaling_op unit test.

* update the doc of the check_finite_and_unscale op

* Update the process of gradients updating skipping if the gradients have infinite values.

* update the way to zero grads.

* update test_update_loss_scaling_op.py

* add log info when find infinite grads.

* add the unit test for UpdateLossScaling Layer.

d708b210

03 9月, 2020 1 次提交
- Z
  
  fix some cast error. (#26884) · bcdbac17
  由 Zhen Wang 提交于 9月 03, 2020
  
  bcdbac17
15 4月, 2020 1 次提交
- M
  fix AMP and recompute (#23551) · f0e743f1
  由 mapingshuo 提交于 4月 15, 2020
```
* allow amp and recompute working together
```
  f0e743f1
08 1月, 2020 1 次提交
- G
  
  fix init scaling value test=develop (#22145) · 5e07db15
  由 gongweibao 提交于 1月 08, 2020
  
  5e07db15
26 11月, 2019 1 次提交
- Z
  Fix some typos in AMP. (#21354) · be2e3e67
  由 Zhen Wang 提交于 11月 26, 2019
```
* fix some typos in AMP. test=develop

* delete useless codes. test=develop
```
  be2e3e67
30 10月, 2019 1 次提交

Add custom black variable name set in amp interface. (#20875) · 3255fe69

由 gongweibao 提交于 10月 30, 2019

* add custom black varname test=develop

* fix dtype test=develop

* fix num test=develop

* fix ut test=develop

* fix coverage test=develop

* fix blackvar names test=develop

3255fe69

15 10月, 2019 1 次提交
- G
  
  Add interface so user can get scaled loss when they use customized loss. (#20571) · 1d82025e
  由 gongweibao 提交于 10月 15, 2019
  
  1d82025e
10 10月, 2019 1 次提交
- G
  
  delete backward return list test=develop (#20294) · 7b9e3397
  由 gongweibao 提交于 10月 10, 2019
  
  7b9e3397
19 9月, 2019 1 次提交
- J
  Optimize amp for multi-gpu to enable FP16 gradients transfer across gpus. (#19714) · d9db94d7
  由 Jie Fang 提交于 9月 19, 2019
```
Optimize amp for multi-gpu to enable FP16 gradients transfer across gpus
```
  d9db94d7
10 9月, 2019 1 次提交
- G
  Fix float16 optimizer. (#19682) · 6c2bc29c
  由 gongweibao 提交于 9月 10, 2019
```
Fix float16 optimizer
```
  6c2bc29c
06 9月, 2019 1 次提交
- J
  init new amp, optimize inserting cast op for batchnorm (#18596) · c6a598a2
  由 Jie Fang 提交于 9月 06, 2019
```
init new amp, optimize inserting cast op for batchnorm
```
  c6a598a2
03 9月, 2019 1 次提交
- G
  Change backward_guard to optimize_guard to maximize the allreduce overlap. (#19506) · abaf87be
  由 gongweibao 提交于 9月 03, 2019
```
Change backward_guard to optimize_guard to maximize the allreduce overlap
```
  abaf87be
31 8月, 2019 1 次提交
- Z
  
  remove reset recordio usage (#19519) · 5dce1da6
  由 Zeng Jinle 提交于 8月 31, 2019
  
  5dce1da6
28 6月, 2019 1 次提交
- J
  init custom black white list (#18377) · 2b4ef509
  由 Jie Fang 提交于 6月 28, 2019
```
test=develop
```
  2b4ef509
25 6月, 2019 1 次提交
- J
  init black/white lists (#17847) · 172c2fac
  由 Jie Fang 提交于 6月 25, 2019
```
test=develop
```
  172c2fac
16 5月, 2019 1 次提交

init auto loss scaling (#17194) · 30e178fa

由 Jie Fang 提交于 5月 16, 2019

* init auto loss scaling

test=develop

* change API.spec

* change ifelse to switch and use reduce_sum to optimize checking isfinite

test=develop

* Remove redundant code

test=develop

30e178fa

25 4月, 2019 1 次提交

Init mixed precision training interface (#16856) · beda7825

由 Yibing Liu 提交于 4月 25, 2019

* Init mixed precision training interface

* Add fp16 test script

test=develop

* All initializers support float16

test=develop

* Code cleanup & add more code annotations

test=develop

* Update API spec

test=develop

* Add usage example in doc

test=develop

beda7825

PaddlePaddle / Paddle 接近 1 年 前同步成功

PaddlePaddle / Paddle
接近 1 年前同步成功