Skip to content

  • 体验新版
    • 正在加载...
  • 登录
  • PaddlePaddle
  • Paddle
  • Issue
  • #6510

P
Paddle
  • 项目概览

PaddlePaddle / Paddle
大约 2 年 前同步成功

通知 2325
Star 20933
Fork 5424
  • 代码
    • 文件
    • 提交
    • 分支
    • Tags
    • 贡献者
    • 分支图
    • Diff
  • Issue 1423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
  • Wiki 0
    • Wiki
  • 分析
    • 仓库
    • DevOps
  • 项目成员
  • Pages
P
Paddle
  • 项目概览
    • 项目概览
    • 详情
    • 发布
  • 仓库
    • 仓库
    • 文件
    • 提交
    • 分支
    • 标签
    • 贡献者
    • 分支图
    • 比较
  • Issue 1,423
    • Issue 1,423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
    • 合并请求 543
  • Pages
  • 分析
    • 分析
    • 仓库分析
    • DevOps
  • Wiki 0
    • Wiki
  • 成员
    • 成员
  • 收起侧边栏
  • 动态
  • 分支图
  • 创建新Issue
  • 提交
  • Issue看板
已关闭
开放中
Opened 12月 12, 2017 by saxon_zh@saxon_zhGuest

Error/Gradient clipping survey and plan

Created by: reyoung

Gradient Clipping

Exploding gradients can be handled by gradient clipping. Before optimizing a parameter, we can clip its gradient to stabilize the training process.

The simplest clipping is just clip_by_value. It means we will limit the values of tensor within [clip_min, clip_max]. Every value of this tensor is larger than clip_max, will be clip_max. Every value of this tensor is less than clip_min, will be clip_min.

Just clip a value is not good because it will change the direction of gradients. If we do not want to change the direction of one gradient of the parameter, we can just scale the gradient and make the l2-norm of this gradient is less than a limit.

If we want the whole direction of gradients are not changed, we can scale all gradients and make the l2-norm of them is less than a limit.

So, there are two methods will be implemented.

  • clip_by_value
  • clip_by_l2_norm, which will takes a list of gradient. There could be two higher level API clip_by_local_l2_norm and clip_by_global_l2_norm, which will pass the current gradient or all gradients to clip_by_l2_norm

Error clipping

Just clipping the gradient after backwards cannot handle the exploding while backwards. Gradients could have been exploded during calculate the backward stage.

There is a trick in the previous Paddle called error clipping. It just clipping the gradient of hidden layers while backwards. Tensorflow does not provide this feature by default, but a user could implement this feature by hacking backwards method.

We should make our backward customizable in Python to support error clipping or other manipulation.

Maybe we can add a backward in Python and takes a Python callback. If the user does not provide any callback, it just generates backward operator in normal. If user customizes that callback, users can create error clipping by themselves.

指派人
分配到
Release 0.11.1
里程碑
Release 0.11.1 (Past due)
分配里程碑
工时统计
无
截止日期
无
标识: paddlepaddle/Paddle#6510
渝ICP备2023009037号

京公网安备11010502055752号

网络110报警服务 Powered by GitLab CE v13.7
开源知识
Git 入门 Pro Git 电子书 在线学 Git
Markdown 基础入门 IT 技术知识开源图谱
帮助
使用手册 反馈建议 博客
《GitCode 隐私声明》 《GitCode 服务条款》 关于GitCode
Powered by GitLab CE v13.7