Created by: JZ-LIANG
PR types
Performance optimization
PR changes
OPs
Describe
modified the implement of Lars optimizer, increases it performance in convergence accuracy and convergence speed. modifications:
- allow to filter parameter of specific layer from lars weight decay.
- filter all layers that without lars weight decay from lars local lr scaling .
the modification allows the accuracy of resnet50 >= 75.9% when training batch size > 8k.
version / batch size | 2k | 8k | 16k |
---|---|---|---|
origin | 75.34 | 74.28 | 75.4 |
new | 76.51 | 76.14 | 76.14 |
# origin
optimizer = fluid.optimizer.LarsMomentumOptimizer(
lars_coeff=0.01, lars_weight_decay=0.0001, learning_rate = learning_rate,
momentum = 0.9, regularization=None)
# new
optimizer = fluid.optimizer.LarsMomentumOptimizer(
lars_coeff=0.01, lars_weight_decay=0.0001, learning_rate = learning_rate,
momentum = 0.9, regularization=None,
exclude_from_weight_decay=['batch_norm',"b_0"], epsilon = 0)