提交 · 64ecdc03db38377e37a1c149086c1eb8700c3e04 · PaddlePaddle / Paddle

12 6月, 2023 1 次提交
- N
  
  Change Static AMP List (#54135) · 64ecdc03
  由 niuliling123 提交于 6月 12, 2023
  
  64ecdc03
16 5月, 2023 1 次提交
- N
  
  [AMP] support OD level for static (#53768) · c2c3bd43
  由 niuliling123 提交于 5月 16, 2023
  
  c2c3bd43
11 5月, 2023 1 次提交
- 张
  
  昇腾和寒武纪相关代码退场 npu相关代码退场2 (#53568) · 0d45ac73
  由张春乔提交于 5月 11, 2023
  
  0d45ac73
10 5月, 2023 1 次提交
- Y
  
  Add einsum to the default white_list. (#53586) · 2eea311a
  由 Yiqun Liu 提交于 5月 10, 2023
  
  2eea311a
08 5月, 2023 1 次提交
- Z
  
  [AMP] fix static promote (#53439) · 2bf61284
  由 Zhang Ting 提交于 5月 08, 2023
  
  2bf61284
24 4月, 2023 1 次提交

[AMP] support promote kernel for static graph (#52514) · 71a513c2

由 Zhang Ting 提交于 4月 24, 2023

* support promote dtype for static amp training

* unify o1 and o2

* update for unittest

* fix op_role

* add use_promote arg

* fix doc

* add promote unittest

* polish unittests

* fix controflow and test

71a513c2

14 4月, 2023 1 次提交

[AMP] Unify the static amp codes of fp16 and bf16. (#52694) · dfcba7f4

由 Yiqun Liu 提交于 4月 14, 2023

* Unify the static amp codes of fp16 and bf16.

* Polish apis and add unittest.

* Add operator stats collecting tools for program.

* Add the check of number of bloat16 operators in unittest.

* Add warning for operator not supported for amp.

* Add testing of BF16 O1 and O2.

dfcba7f4

06 4月, 2023 1 次提交

rem is_compiled_with_npu (#52385) · 7976e2a3

由 Kim Yann 提交于 4月 06, 2023

* rem is_compiled_with_npu

* rem nup related code

* make lint happy

* rem test

* remove some tests

* Update grad_scaler.py

* fix an error

7976e2a3

03 4月, 2023 1 次提交

rem is_compiled_with_mlu (#52378) · 4b28f4ff

由 Kim Yann 提交于 4月 03, 2023

* rem is_compiled_with_mlu

* fix some mlu_place and mlu_device_coount

* make lint happy

4b28f4ff

17 1月, 2023 1 次提交
- Z
  
  Fix the paddle/staitc/amp/__init__.py (#49791) · fcc90531
  由 zhangkaihuo 提交于 1月 17, 2023
  
  fcc90531
12 1月, 2023 1 次提交
- Z
  
  move fuild.contrib.mixed_precision to paddle.static.amp (#49412) · 69d01eb9
  由 zhangkaihuo 提交于 1月 12, 2023
  
  69d01eb9
08 11月, 2022 1 次提交
- N
  [CodeStyle][py2][U004] unecessary explicit `object` inheritance in class definition (#47642) · 888272b5
  由 Nyakku Shigure 提交于 11月 08, 2022
```
* [CodeStyle][py2][U004] unecessary explicit `object` inheritance in class definition

* fix an increment
```
  888272b5
23 10月, 2022 1 次提交
- N
  [CodeStyle][black] use black instead of yapf (#46014) · 7097630f
  由 Nyakku Shigure 提交于 10月 23, 2022
```
* update config

* re-blacken python code

* temporarily disable date and diff_py_file

* skip a format
```
  7097630f
29 8月, 2022 1 次提交
- Z
  
  add interpolate op to default black lists (#45393) · 9a560f7c
  由 Zhang Ting 提交于 8月 29, 2022
  
  9a560f7c
26 4月, 2022 1 次提交
- W
  
  Add fused_multi_transformer op to optimize transformer generation performance (#41814) · 9dadf7df
  由 WangXi 提交于 4月 26, 2022
  
  9dadf7df
16 3月, 2022 1 次提交
- Q
  
  [MLU] support amp O1 of mlu (#40461) · ad81f22c
  由 qipengh 提交于 3月 16, 2022
  
  ad81f22c
28 12月, 2021 1 次提交

Fix scatter_op fp16 perf problem. (#38499) · 33ce249f

由 Li Min 提交于 12月 28, 2021

* Fix scatter_op fp16 perf problem.

* Add scatter into black list.

* Add scatter into black list for dygraph.

33ce249f

20 12月, 2021 1 次提交

Support FP16 for more ops (#38123) · 1f445bf3

由 sneaxiy 提交于 12月 20, 2021

* support FP16 for more ops

* add amp list tests

* refine reduce_mean_grad

* fix OP benchmark ci

* fix fp16 reduce_mean

* updat ut, but still have some problems

* remove mean/reduce_mean fp16 kernel

1f445bf3

27 10月, 2021 1 次提交

Fused transformer encoder layer and fused feedforward layer (#36604) · 9f3613f3

由 zhangkaihuo 提交于 10月 27, 2021

本PR是fused_transformer的layer层代码，包含FusedFeedForward的layer层代码和FusedTransformerEncoderLayer的代码。

9f3613f3

10 9月, 2021 1 次提交
- S
  
  fix bug of recompute in hybridparallel (#35588) · d53e567a
  由 ShenLiang 提交于 9月 10, 2021
  
  d53e567a
05 8月, 2021 1 次提交
- W
  
  optimize pipeline performance with recompute and amp, test=allcase (#34519) · 911c8593
  由 WangXi 提交于 8月 05, 2021
  
  911c8593
22 7月, 2021 1 次提交
- L
  
  enable amp unsupported_fp16_list for npu (#34314) · b0a2f005
  由 Leo Chen 提交于 7月 22, 2021
  
  b0a2f005
05 7月, 2021 1 次提交

add `reduce_sum` op into amp black list (#33960) · aa9fdd0d

由 jiangcheng 提交于 7月 05, 2021

* reduce sum op default fp32, add into amp black list

* reduce_sum default fp32 can avoid return inf when the sum value large than 65504

aa9fdd0d

01 7月, 2021 1 次提交
- T
  
  fix bug DLTP-31078 (#33877) · 3e82a794
  由 taixiurong 提交于 7月 01, 2021
  
  3e82a794
29 6月, 2021 1 次提交
- T
  
  xpu support amp (#33809) · 4d4fb660
  由 taixiurong 提交于 6月 29, 2021
  
  4d4fb660
21 6月, 2021 1 次提交
- W
  
  update fp16 gray_list for tensor parallel (#33660) · 1681a2dd
  由 WangXi 提交于 6月 21, 2021
  
  1681a2dd
26 5月, 2021 1 次提交
- J
  
  [Tensor Parallelism] split fix bug (#33015) · 20b9be65
  由 JZ-LIANG 提交于 5月 26, 2021
  
  20b9be65
08 4月, 2021 1 次提交

The unsupported_fp16_list using in AMP will be created automatically during the runtime. (#32102) · 6e65fe02

由 Zhen Wang 提交于 4月 08, 2021

* Use the runtime to create the unsupported_fp16_list using in AMP.

* Add more infos about supported ops.

* Add some comments for the function of OpSupportedInfos.

* Fix the unit test of test_multi_precision_fp16_train.

6e65fe02

22 3月, 2021 1 次提交
- A
  
  [oneDNN] Initial bf16 amp integration (#31093) · 7ccf6b60
  由 arlesniak 提交于 3月 22, 2021
  
  7ccf6b60
20 1月, 2021 1 次提交
- H
  Add fleet amp_init() (#30572) · 13862008
  由 huangxu96 提交于 1月 20, 2021
```
* add fleet amp.init()

* add unittest for fleet_amp_init
```
  13862008
13 1月, 2021 1 次提交
- H
  
  add amp example document (#30314) · 342d62de
  由 huangxu96 提交于 1月 13, 2021
  
  342d62de
08 1月, 2021 1 次提交

Support pure fp16 training for AMP API. (#29544) · 7f7dfccf

由 Zhen Wang 提交于 1月 08, 2021

* add cast ops before and after unsupported fp16 ops.

* Keep partial net in FP32 pattern.

* Support check_finite_and_unscale and update_loss_scaling for FP16 calculation mode.

* Add fp16 support for adam op.

* add multi precision attr for adam.

* Fix the bug of test_multi_precision_fp16_train UT.

* Code format for CI.

* Fix the redefine error about MPTypeTrait on windows.

* fix bugs of the _create_accumulators func in Momentum.

* fix bug when inserting post cast op.

* Add the update_loss_scaling op in allow_set of UnusedVarCheck.

* Update for ci coverage.

* Add some doc for OptimizerWithMixedPrecision.

* Fix the code style.

* Imporve the doc of `amp_init`.

* Change for fp16 testing if users have the infer program defined in separate way.

7f7dfccf

02 12月, 2020 1 次提交

Layer norm fp16 (#29169) · 7584bb50

由 furnace 提交于 12月 02, 2020

* add fp16 for layer_norm op

* revert layernorm api

* fix forward

* fix forward

* fix backward for layernorm with fp16

* fix unit test for layernorm with fp16

* fix with_mkldnn compile error for layernorm with fp16

* 1. revert to PADDLE_ENFORCE_NOT_NULL, 2. change static_cast<float> to static_cast<U>

* fix with_mkldnn compile error for layernorm with fp16

* fix with_mkldnn compile error for layernorm with fp16
Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>

7584bb50

18 11月, 2020 1 次提交
- L
  Add matmtl_v2 to amp list (#28693) · 11e32baf
  由 Leo Chen 提交于 11月 18, 2020
```
* add matmtl_v2 to amp list

* support dygraph
```
  11e32baf
23 9月, 2020 1 次提交
- Z
  add fuse_bn_act op (#27230) · 906e7f92
  由 Zhang Ting 提交于 9月 23, 2020
```
* add fused_bn_add_relu op
```
  906e7f92
26 11月, 2019 1 次提交
- Z
  Fix some typos in AMP. (#21354) · be2e3e67
  由 Zhen Wang 提交于 11月 26, 2019
```
* fix some typos in AMP. test=develop

* delete useless codes. test=develop
```
  be2e3e67
30 10月, 2019 1 次提交

Add custom black variable name set in amp interface. (#20875) · 3255fe69

由 gongweibao 提交于 10月 30, 2019

* add custom black varname test=develop

* fix dtype test=develop

* fix num test=develop

* fix ut test=develop

* fix coverage test=develop

* fix blackvar names test=develop

3255fe69

19 9月, 2019 1 次提交
- J
  Optimize amp for multi-gpu to enable FP16 gradients transfer across gpus. (#19714) · d9db94d7
  由 Jie Fang 提交于 9月 19, 2019
```
Optimize amp for multi-gpu to enable FP16 gradients transfer across gpus
```
  d9db94d7
06 9月, 2019 1 次提交
- J
  init new amp, optimize inserting cast op for batchnorm (#18596) · c6a598a2
  由 Jie Fang 提交于 9月 06, 2019
```
init new amp, optimize inserting cast op for batchnorm
```
  c6a598a2
31 8月, 2019 1 次提交
- Z
  
  remove reset recordio usage (#19519) · 5dce1da6
  由 Zeng Jinle 提交于 8月 31, 2019
  
  5dce1da6

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功