训练过程中打印实时学习率时导致训练出错
Created by: minhozhou
- 版本、环境信息: 1)PaddlePaddle版本:1.6.1,静态图模式 2)CPU:厂内CPU 3)系统环境:python3.7
- 训练信息 1)单机
- 复现信息:如为报错,请给出复现环境、复现步骤
- 问题描述:请详细描述您的问题,同步贴出报错信息、日志、可复现的代码片段 优化器部分代码:
def optimization(cloud_train, base_lr, loss, train_steps, optimizer='adam'):
decayed_lr = fluid.layers.polynomial_decay(base_lr, train_steps, 0.0001)
if optimizer == 'sgd':
optimizer = fluid.optimizer.SGD(
decayed_lr,
regularization=fluid.regularizer.L2DecayRegularizer(
regularization_coeff=0.0025))
elif optimizer == 'adam':
# dont use gpu's lazy mode
optimizer = fluid.optimizer.Adam(decayed_lr)
else:
raise ValueError
log.info('learning rate:%f' % (base_lr))
optimizer.minimize(loss)
return decayed_lr
训练部分代码
def train_loop(train_exe, exe, decayed_lr, program, loss, data_reader, args, src_embed, input_data_list):
""" train
"""
model_save_dir = args.output_path
if not os.path.exists(model_save_dir):
os.makedirs(model_save_dir)
trainer_id = int(os.getenv("PADDLE_TRAINER_ID", "0"))
step = 0
epoch = 0
for epoch in range(args.epoch):
for data in data_reader():
print(decayed_lr)
print(train_exe.run(fetch_list=[decayed_lr.name]))
begin_time = time.time()
loss_val = train_exe.run(fetch_list=[loss], feed=data)
step += 1
if step % 10 == 0:
log.info("epoch %s: step %s: loss %.5f lr: %.5f speed: %.5f s/step reader qsize: %s" %
(epoch, step, np.mean(loss_val), lr, time.time() - begin_time, data_reader.queue.size()))
如果把 print(train_exe.run(fetch_list=[decayed_lr.name]))
这句话在训练循环中去掉,模型可以正常训练,