提交 · d8dfef54a5caba7bbe1fd383707ee69dac58a959 · PaddlePaddle / Paddle

11 1月, 2021 1 次提交

[Cherry-Pick] Support pure fp16 training for AMP API. (#29544) (#30241) · d8dfef54

由 Zhen Wang 提交于 1月 11, 2021

* Support pure fp16 training for AMP API. (#29544)

* add cast ops before and after unsupported fp16 ops.

* Keep partial net in FP32 pattern.

* Support check_finite_and_unscale and update_loss_scaling for FP16 calculation mode.

* Add fp16 support for adam op.

* add multi precision attr for adam.

* Fix the bug of test_multi_precision_fp16_train UT.

* Code format for CI.

* Fix the redefine error about MPTypeTrait on windows.

* fix bugs of the _create_accumulators func in Momentum.

* fix bug when inserting post cast op.

* Add the update_loss_scaling op in allow_set of UnusedVarCheck.

* Update for ci coverage.

* Add some doc for OptimizerWithMixedPrecision.

* Fix the code style.

* Imporve the doc of `amp_init`.

* Change for fp16 testing if users have the infer program defined in separate way.

* Remove tensor copy in the update_loss_scaling op. (#29426)

* remove tensor copy in the update_loss_scaling op

* not use thrust.

* fix some cuda memory access error.

d8dfef54

07 1月, 2021 1 次提交

[Cherry-pick] Layer norm fp16 and Nvidia optimize (#29169 #29434 #29522 #29576) (#30110) · 44b81e63

由 furnace 提交于 1月 07, 2021

* Layer norm fp16 (#29169)

* add fp16 for layer_norm op

* revert layernorm api

* fix forward

* fix forward

* fix backward for layernorm with fp16

* fix unit test for layernorm with fp16

* fix with_mkldnn compile error for layernorm with fp16

* 1. revert to PADDLE_ENFORCE_NOT_NULL, 2. change static_cast<float> to static_cast<U>

* fix with_mkldnn compile error for layernorm with fp16

* fix with_mkldnn compile error for layernorm with fp16
Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>

* fix layer_norm accuracy (#29434)

* Layernorm opt (#29522)

* layernorm fw opt

* layernorm bw opt

* fix typo, test=develop

* remove const dim3 for windows CI compatibility

* merge develop
Co-authored-by: Nzlsh80826 <zlsh80826@gmail.com>

* Fix compile problem when cuda_arch < 6000 (#29576)

* fix compile problem when cuda_arch < 6000

* refine code

* refine code
Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>
Co-authored-by: Nzlsh80826 <zlsh80826@gmail.com>

44b81e63

18 11月, 2020 1 次提交
- L
  Add matmtl_v2 to amp list (#28693) · 11e32baf
  由 Leo Chen 提交于 11月 18, 2020
```
* add matmtl_v2 to amp list

* support dygraph
```
  11e32baf
23 9月, 2020 1 次提交
- Z
  add fuse_bn_act op (#27230) · 906e7f92
  由 Zhang Ting 提交于 9月 23, 2020
```
* add fused_bn_add_relu op
```
  906e7f92
26 11月, 2019 1 次提交
- Z
  Fix some typos in AMP. (#21354) · be2e3e67
  由 Zhen Wang 提交于 11月 26, 2019
```
* fix some typos in AMP. test=develop

* delete useless codes. test=develop
```
  be2e3e67
30 10月, 2019 1 次提交

Add custom black variable name set in amp interface. (#20875) · 3255fe69

由 gongweibao 提交于 10月 30, 2019

* add custom black varname test=develop

* fix dtype test=develop

* fix num test=develop

* fix ut test=develop

* fix coverage test=develop

* fix blackvar names test=develop

3255fe69

19 9月, 2019 1 次提交
- J
  Optimize amp for multi-gpu to enable FP16 gradients transfer across gpus. (#19714) · d9db94d7
  由 Jie Fang 提交于 9月 19, 2019
```
Optimize amp for multi-gpu to enable FP16 gradients transfer across gpus
```
  d9db94d7
06 9月, 2019 1 次提交
- J
  init new amp, optimize inserting cast op for batchnorm (#18596) · c6a598a2
  由 Jie Fang 提交于 9月 06, 2019
```
init new amp, optimize inserting cast op for batchnorm
```
  c6a598a2
31 8月, 2019 1 次提交
- Z
  
  remove reset recordio usage (#19519) · 5dce1da6
  由 Zeng Jinle 提交于 8月 31, 2019
  
  5dce1da6
28 6月, 2019 1 次提交
- J
  init custom black white list (#18377) · 2b4ef509
  由 Jie Fang 提交于 6月 28, 2019
```
test=develop
```
  2b4ef509
25 6月, 2019 1 次提交
- J
  init black/white lists (#17847) · 172c2fac
  由 Jie Fang 提交于 6月 25, 2019
```
test=develop
```
  172c2fac

PaddlePaddle / Paddle 接近 1 年 前同步成功

PaddlePaddle / Paddle
接近 1 年前同步成功