LookAheadOptimizer导致模型不收敛
Created by: flishwang
我基于release/0.3的版本,在tools/train.py中增加如下内容:
lr = lr_builder()
optimizer = optim_builder(lr)
if cfg.get('lookahead',-1.0) > 1e-3:
optimizer= LookAheadOptimizer(optimizer,alpha=cfg.get('lookahead',-1.0),k=5)
optimizer.minimize(loss,startup_program=startup_prog)
else:
optimizer.minimize(loss)
模型使用大规模目标检测cascade_rcnn r101vd dcn的那个模型, 但模型训练后mAP只有0.01左右。
根据https://github.com/PaddlePaddle/Paddle/pull/25688 对Optimizer进行修改后,问题依旧没有得到解决。
将LookAheadOptimizer做下文所示的修改后,模型精度恢复,与不加lookahead相似,怀疑是optimizer中slow_params的初始化有问题。
我对lookaheadOptimizer的修改:
with fluid.framework.program_guard(main_block.program,startup_program):
# Add Var k to main prog and startup prog
k = fluid.layers.create_global_var(
name="lookahead_k",
shape=[1],
value=int(self.k),
dtype='int32',
persistable=True)
# Add Var alpha to main prog and startup prog
alpha = fluid.layers.create_global_var(
name="lookahead_alpha",
shape=[1],
value=1.0,
dtype='float32',
persistable=True)
alpha2 = fluid.layers.create_global_var(
name="lookahead_alpha2",
shape=[1],
value=float(self.alpha),
dtype='float32',
persistable=True) #这里增加alpha2, 把alpha的初值设为1.0
# Add Var step
step = fluid.layers.create_global_var(
name="lookahead_step",
shape=[1],
value=int(0),
dtype='int32',
persistable=True)
fluid.layers.increment(x=step, value=1.0, in_place=True)
# lookahead
zero_var = fluid.layers.fill_constant(shape=[1], dtype='float32', value=0.0)
one_var = fluid.layers.fill_constant(shape=[1], dtype='float32', value=1.0)
mod = fluid.layers.elementwise_mod(step, k)
with fluid.layers.control_flow.Switch() as switch:
with switch.case(step == k * 8): #这里增加一个switch case, 在step=40时把alpha从1.0更改为self.alpha。k=2时似乎训练也没有问题。
fluid.layers.assign(input=alpha2, output=alpha)
with switch.case(mod == zero_var):
for param_name in params:
fast_var = main_block.var(param_name)
slow_var = param_to_slow[param_name]
tmp_var = fluid.layers.elementwise_add(
fluid.layers.elementwise_mul(fast_var, alpha),
fluid.layers.elementwise_mul(slow_var, one_var - alpha))
fluid.layers.assign(input=tmp_var, output=slow_var)
fluid.layers.assign(input=tmp_var, output=fast_var)
with switch.default():
pass
return mini_out