- 11 1月, 2021 1 次提交
-
-
由 Zhen Wang 提交于
* Support pure fp16 training for AMP API. (#29544) * add cast ops before and after unsupported fp16 ops. * Keep partial net in FP32 pattern. * Support check_finite_and_unscale and update_loss_scaling for FP16 calculation mode. * Add fp16 support for adam op. * add multi precision attr for adam. * Fix the bug of test_multi_precision_fp16_train UT. * Code format for CI. * Fix the redefine error about MPTypeTrait on windows. * fix bugs of the _create_accumulators func in Momentum. * fix bug when inserting post cast op. * Add the update_loss_scaling op in allow_set of UnusedVarCheck. * Update for ci coverage. * Add some doc for OptimizerWithMixedPrecision. * Fix the code style. * Imporve the doc of `amp_init`. * Change for fp16 testing if users have the infer program defined in separate way. * Remove tensor copy in the update_loss_scaling op. (#29426) * remove tensor copy in the update_loss_scaling op * not use thrust. * fix some cuda memory access error.
-
- 07 1月, 2021 1 次提交
-
-
由 furnace 提交于
* Layer norm fp16 (#29169) * add fp16 for layer_norm op * revert layernorm api * fix forward * fix forward * fix backward for layernorm with fp16 * fix unit test for layernorm with fp16 * fix with_mkldnn compile error for layernorm with fp16 * 1. revert to PADDLE_ENFORCE_NOT_NULL, 2. change static_cast<float> to static_cast<U> * fix with_mkldnn compile error for layernorm with fp16 * fix with_mkldnn compile error for layernorm with fp16 Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com> * fix layer_norm accuracy (#29434) * Layernorm opt (#29522) * layernorm fw opt * layernorm bw opt * fix typo, test=develop * remove const dim3 for windows CI compatibility * merge develop Co-authored-by: Nzlsh80826 <zlsh80826@gmail.com> * Fix compile problem when cuda_arch < 6000 (#29576) * fix compile problem when cuda_arch < 6000 * refine code * refine code Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com> Co-authored-by: Nzlsh80826 <zlsh80826@gmail.com>
-
- 18 11月, 2020 1 次提交
-
-
由 Leo Chen 提交于
* add matmtl_v2 to amp list * support dygraph
-
- 23 9月, 2020 1 次提交
-
-
由 Zhang Ting 提交于
* add fused_bn_add_relu op
-
- 26 11月, 2019 1 次提交
-
-
由 Zhen Wang 提交于
* fix some typos in AMP. test=develop * delete useless codes. test=develop
-
- 30 10月, 2019 1 次提交
-
-
由 gongweibao 提交于
* add custom black varname test=develop * fix dtype test=develop * fix num test=develop * fix ut test=develop * fix coverage test=develop * fix blackvar names test=develop
-
- 19 9月, 2019 1 次提交
-
-
由 Jie Fang 提交于
Optimize amp for multi-gpu to enable FP16 gradients transfer across gpus
-
- 06 9月, 2019 1 次提交
-
-
由 Jie Fang 提交于
init new amp, optimize inserting cast op for batchnorm
-
- 31 8月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
-
- 28 6月, 2019 1 次提交
-
-
由 Jie Fang 提交于
test=develop
-
- 25 6月, 2019 1 次提交
-
-
由 Jie Fang 提交于
test=develop
-