Optimize amp for multi-gpu to enable FP16 gradients transfer across gpus (!19714) · 合并请求 · PaddlePaddle / Paddle

Optimize amp for multi-gpu to enable FP16 gradients transfer across gpus !19714

Created by: Jie-Fang

For the original amp, all the gradients are FP32. Thus in multi-gpus training, FP32 gradients are transfered across gpus which may cause communication overhead. In the amp, we will obtain FP16 gradients of some ops automatically and we can transfer them instead to reduce the overhead.

PaddlePaddle / Paddle 大约 2 年 前同步成功

Optimize amp for multi-gpu to enable FP16 gradients transfer across gpus !19714

PaddlePaddle / Paddle
大约 2 年前同步成功