Error/Gradient clipping survey and plan (#6510) · Issue · PaddlePaddle / Paddle

Error/Gradient clipping survey and plan

Created by: reyoung

Gradient Clipping

Exploding gradients can be handled by gradient clipping. Before optimizing a parameter, we can clip its gradient to stabilize the training process.

The simplest clipping is just clip_by_value. It means we will limit the values of tensor within [clip_min, clip_max]. Every value of this tensor is larger than clip_max, will be clip_max. Every value of this tensor is less than clip_min, will be clip_min.

Just clip a value is not good because it will change the direction of gradients. If we do not want to change the direction of one gradient of the parameter, we can just scale the gradient and make the l2-norm of this gradient is less than a limit.

If we want the whole direction of gradients are not changed, we can scale all gradients and make the l2-norm of them is less than a limit.

So, there are two methods will be implemented.

clip_by_value
clip_by_l2_norm, which will takes a list of gradient. There could be two higher level API clip_by_local_l2_norm and clip_by_global_l2_norm, which will pass the current gradient or all gradients to clip_by_l2_norm

Error clipping

Just clipping the gradient after backwards cannot handle the exploding while backwards. Gradients could have been exploded during calculate the backward stage.

There is a trick in the previous Paddle called error clipping. It just clipping the gradient of hidden layers while backwards. Tensorflow does not provide this feature by default, but a user could implement this feature by hacking backwards method.

We should make our backward customizable in Python to support error clipping or other manipulation.

Maybe we can add a backward in Python and takes a Python callback. If the user does not provide any callback, it just generates backward operator in normal. If user customizes that callback, users can create error clipping by themselves.

PaddlePaddle / Paddle 大约 1 年 前同步成功

Error/Gradient clipping survey and plan

Gradient Clipping

Error clipping

PaddlePaddle / Paddle
大约 1 年前同步成功