Update amp_check_finite_and_scale_op and add an updating_loss_scaling op for static graph amp training. !26240
Created by: wzzju
PR types
Performance optimization
PR changes
OPs
Describe
- Use the
check_finite_and_unscale
op to check infinite grads then unscale grads. - Use the
update_loss_scaling
op to update the loss scaling value. The performance of ResNet50 on Tesla V100-16GB single card:
Original | Optimized(After this pr) |
---|---|
952 images/sec | 985 images/sec |
After training 120 passes on ResNet50 model, the top-1 accuracy and top-5 accuracy is described as below.
FP32 or AMP | the num of epoch | top-1 accuracy | top-5 accuracy |
---|---|---|---|
FP32 | 120 | 75.674% | 92.674% |
AMP | 120 | 76.067% | 92.903% |