Require sparse gradient clipping (#9048) · Issue · PaddlePaddle / Paddle

Require sparse gradient clipping

Created by: gavin1332

Sparse gradient clipping is helpful in many NLP sequential tasks, such as machine translation, language model, semantic role labeling, etc., which is applied on embedding matrix to shrink the gradient accumulated along the sequence to avoid gradient exploding. We found a gradient clipping operator for dense matrix, but it is not suitable for sparse-updated embedding, so we need the sparse version.

Besides, L1/L2 regularization and Ada- series optimization algorithm should be also adapted for sparse updating.

PaddlePaddle / Paddle 大约 1 年 前同步成功

Require sparse gradient clipping

PaddlePaddle / Paddle
大约 1 年前同步成功