Skip to content

  • 体验新版
    • 正在加载...
  • 登录
  • PaddlePaddle
  • Paddle
  • 合并请求
  • !24903

P
Paddle
  • 项目概览

PaddlePaddle / Paddle
接近 2 年 前同步成功

通知 2323
Star 20933
Fork 5424
  • 代码
    • 文件
    • 提交
    • 分支
    • Tags
    • 贡献者
    • 分支图
    • Diff
  • Issue 1423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
  • Wiki 0
    • Wiki
  • 分析
    • 仓库
    • DevOps
  • 项目成员
  • Pages
P
Paddle
  • 项目概览
    • 项目概览
    • 详情
    • 发布
  • 仓库
    • 仓库
    • 文件
    • 提交
    • 分支
    • 标签
    • 贡献者
    • 分支图
    • 比较
  • Issue 1,423
    • Issue 1,423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
    • 合并请求 543
  • Pages
  • 分析
    • 分析
    • 仓库分析
    • DevOps
  • Wiki 0
    • Wiki
  • 成员
    • 成员
  • 收起侧边栏
  • 动态
  • 分支图
  • 创建新Issue
  • 提交
  • Issue看板

Feature/Enable Auto-Mixed-Precision in dynamic graph !24903

  • Report abuse
!24903 已合并 6月 04, 2020 由 saxon_zh@saxon_zh 创建
#<User:0x000055935efb1d40>
  • 概览 39
  • 提交 22
  • 变更 18

Created by: zhiqiu

PR types

New features

PR changes

APIs

Describe

Introduce Auto-Mixed Precision(AMP) in imperative mode.

Backgroud

AMP uses both single-precision and half-precision representations of floating number automatically to achieve high performance training and inference.

As Nvidia says,

Benefits of Mixed precision training

  • Speeds up math-intensive operations, such as linear and convolution layers, by using Tensor Cores.
  • Speeds up memory-limited operations by accessing half the bytes compared to single-precision.
  • Reduces memory requirements for training models, enabling larger models or larger minibatches.

Implementation

AMP mainly contains two phases.

  • Auto-casting tensor data type For each executed operator in the model, the AutoCast module will automatically decide which data type is better to use, i.e., float16 (half precision) or float32 (single precision). The decision is made with white_op_list (which contains the operators that can adopt float16 calculation to accelerate and are considered numerically-safe) and black_op_list (which contains the operators that are considered numerically-safe using float16). As the following figures show.

(a)A example of original execution image (b) A example of execution with amp_guard(True) image

fluid.dygraph.amp_guard() is used to provide context that will enable auto-casting tensor.

Example,

with fluid.dygraph.amp_guard():
    loss = model(inputs)  # the operators in model will be casted automatically
  • Scaling loss Float16 has narrower representation range than float32, as the figure (from Nvidia doc) below shows. image

The small gradients may becomes zero (out of representation range) when using float16, so we need to 'shift' the gradients into representation range of float16, which means loss scaling. The procedure of loss scaling is, (1) firstly, scales the loss by a factor, for example, multiply 1024, (2) then, performs backward propagation on scaled loss, (3) after that, un-scale the gradients, which means, multiply 1/1024, (4) finally, update parameters with un-scaled gradients.

fluid.dygraph.AmpScaler() is provided to manage the loss scaling. Example,

scaler = fluid.dygraph.AmpScaler()  # initialize a scaler
sgd = fluid.SGDOptimizer()
with fluid.dygraph.amp_guard():  # enable amp
    loss = model(inputs)  
scaled = scaler.scale(loss)   # scale the loss
scaled.backward()   # run backward on scaled loss
scaler.minimize(sgd, scaled)  # update the parameters

Related #24875, #24823

指派人
分配到
审核者
Request review from
无
里程碑
无
分配里程碑
工时统计
标识: paddlepaddle/Paddle!24903
Source branch: github/fork/zhiqiu/feature/imperative_amp
渝ICP备2023009037号

京公网安备11010502055752号

网络110报警服务 Powered by GitLab CE v13.7
开源知识
Git 入门 Pro Git 电子书 在线学 Git
Markdown 基础入门 IT 技术知识开源图谱
帮助
使用手册 反馈建议 博客
《GitCode 隐私声明》 《GitCode 服务条款》 关于GitCode
Powered by GitLab CE v13.7