分布式训练无法设置学习率衰减
Created by: chenkhan
AI studio的脚本模式,版本应该是1.8.4,环境看ai studio了
错误描述:分布式训练中,Adam优化器上的“learning_rate"参数设置为fluid.layers.exponential_decay时弹出typeerror。学习率参数设置为常数不会有错误。
File "/opt/_internal/cpython-3.7.0/lib/python3.7/site-packages/paddle/fluid/dygraph/base.py", line 277, in __impl__
return func(*args, **kwargs)
File "/opt/_internal/cpython-3.7.0/lib/python3.7/site-packages/paddle/fluid/optimizer.py", line 838, in minimize
loss, startup_program=startup_program, params_grads=params_grads)
File "/opt/_internal/cpython-3.7.0/lib/python3.7/site-packages/paddle/fluid/optimizer.py", line 752, in apply_optimize
optimize_ops = self.apply_gradients(params_grads)
File "/opt/_internal/cpython-3.7.0/lib/python3.7/site-packages/paddle/fluid/optimizer.py", line 722, in apply_gradients
optimize_ops = self._create_optimization_pass(params_grads)
File "/opt/_internal/cpython-3.7.0/lib/python3.7/site-packages/paddle/fluid/optimizer.py", line 545, in _create_optimization_pass
self._create_global_learning_rate()
File "/opt/_internal/cpython-3.7.0/lib/python3.7/site-packages/paddle/fluid/optimizer.py", line 284, in _create_global_learning_rate
"learning rate variable is create outside optimizer,"
TypeError: learning rate variable is create outside optimizer,can not create new learning rate variable for new program
相关代码:
def create_optimizer(is_distributed):
l2 = fluid.regularizer.L2Decay(regularization_coeff=0.1)
optimizer =
fluid.optimizer.Adam(learning_rate=fluid.layers.exponential_decay(learning_rate=args.learning_rate,decay_steps=1,
decay_rate=0.9999,
staircase=False), regularization=l2)
if is_distributed:
optimizer = distributed_optimize(optimizer)
return optimizer
def distributed_optimize(optimizer):
strategy = DistributedStrategy()
strategy.fuse_all_reduce_ops = True
strategy.nccl_comm_num = 2
strategy.fuse_elewise_add_act_ops = True
strategy.fuse_bn_act_ops = True
strategy.fuse_all_optimizer_ops=True
return fleet.distributed_optimizer(optimizer, strategy=strategy)