【论文复现】优化器加载保存的信息时出错
开放中
【论文复现】优化器加载保存的信息时出错
Created by: hbwslms
-
版本、环境信息: 1)PaddlePaddle版本:1.8 2)系统环境:AI Studio notebook 高级版
-
问题描述:模型是resnet50,优化器是MomentumOptimizer,在之前训练中,我使用 fluid.save_dygraph(resnet.state_dict(), 'resnet_params') fluid.save_dygraph(optimizer.state_dict(), 'resnet_params') 来保存模型权重和优化器相关参数,在中断训练后,继续训练时通过 if load_ckpt: model_dict, opt_dict = fluid.load_dygraph(load_ckpt) for key in opt_dict: print(key) resnet.set_dict(model_dict) optimizer.set_dict(opt_dict) for key in optimizer.state_dict(): print(key) print('Checkpoint Loaded') 来加载之前保存的数据,顺便打印信息以进行检查。 然后我就发现只打印出了 opt_dict 的信息,没有打印optimizer.state_dict() 中对应的参数字典的信息,我猜测在这里优化器刚进行初始化,没有建立起对应的参数字典。输出日志如下:
conv2d_0.w_0_MomentumOptimizer_0_velocity_0 batch_norm_0.w_0_MomentumOptimizer_0_velocity_0 batch_norm_0.b_0_MomentumOptimizer_0_velocity_0 conv2d_1.w_0_MomentumOptimizer_0_velocity_0 batch_norm_1.w_0_MomentumOptimizer_0_velocity_0 batch_norm_1.b_0_MomentumOptimizer_0_velocity_0 conv2d_2.w_0_MomentumOptimizer_0_velocity_0 batch_norm_2.w_0_MomentumOptimizer_0_velocity_0 batch_norm_2.b_0_MomentumOptimizer_0_velocity_0 conv2d_3.w_0_MomentumOptimizer_0_velocity_0 batch_norm_3.w_0_MomentumOptimizer_0_velocity_0 batch_norm_3.b_0_MomentumOptimizer_0_velocity_0 conv2d_4.w_0_MomentumOptimizer_0_velocity_0 batch_norm_4.w_0_MomentumOptimizer_0_velocity_0 batch_norm_4.b_0_MomentumOptimizer_0_velocity_0 conv2d_5.w_0_MomentumOptimizer_0_velocity_0 batch_norm_5.w_0_MomentumOptimizer_0_velocity_0 batch_norm_5.b_0_MomentumOptimizer_0_velocity_0 conv2d_6.w_0_MomentumOptimizer_0_velocity_0 batch_norm_6.w_0_MomentumOptimizer_0_velocity_0 batch_norm_6.b_0_MomentumOptimizer_0_velocity_0 conv2d_7.w_0_MomentumOptimizer_0_velocity_0 batch_norm_7.w_0_MomentumOptimizer_0_velocity_0 batch_norm_7.b_0_MomentumOptimizer_0_velocity_0 conv2d_8.w_0_MomentumOptimizer_0_velocity_0 batch_norm_8.w_0_MomentumOptimizer_0_velocity_0 batch_norm_8.b_0_MomentumOptimizer_0_velocity_0 conv2d_9.w_0_MomentumOptimizer_0_velocity_0 batch_norm_9.w_0_MomentumOptimizer_0_velocity_0 batch_norm_9.b_0_MomentumOptimizer_0_velocity_0 conv2d_10.w_0_MomentumOptimizer_0_velocity_0 batch_norm_10.w_0_MomentumOptimizer_0_velocity_0 batch_norm_10.b_0_MomentumOptimizer_0_velocity_0 conv2d_11.w_0_MomentumOptimizer_0_velocity_0 batch_norm_11.w_0_MomentumOptimizer_0_velocity_0 batch_norm_11.b_0_MomentumOptimizer_0_velocity_0 conv2d_12.w_0_MomentumOptimizer_0_velocity_0 batch_norm_12.w_0_MomentumOptimizer_0_velocity_0 batch_norm_12.b_0_MomentumOptimizer_0_velocity_0 conv2d_13.w_0_MomentumOptimizer_0_velocity_0 batch_norm_13.w_0_MomentumOptimizer_0_velocity_0 batch_norm_13.b_0_MomentumOptimizer_0_velocity_0 conv2d_14.w_0_MomentumOptimizer_0_velocity_0 batch_norm_14.w_0_MomentumOptimizer_0_velocity_0 batch_norm_14.b_0_MomentumOptimizer_0_velocity_0 conv2d_15.w_0_MomentumOptimizer_0_velocity_0 batch_norm_15.w_0_MomentumOptimizer_0_velocity_0 batch_norm_15.b_0_MomentumOptimizer_0_velocity_0 conv2d_16.w_0_MomentumOptimizer_0_velocity_0 batch_norm_16.w_0_MomentumOptimizer_0_velocity_0 batch_norm_16.b_0_MomentumOptimizer_0_velocity_0 conv2d_17.w_0_MomentumOptimizer_0_velocity_0 batch_norm_17.w_0_MomentumOptimizer_0_velocity_0 batch_norm_17.b_0_MomentumOptimizer_0_velocity_0 conv2d_18.w_0_MomentumOptimizer_0_velocity_0 batch_norm_18.w_0_MomentumOptimizer_0_velocity_0 batch_norm_18.b_0_MomentumOptimizer_0_velocity_0 conv2d_19.w_0_MomentumOptimizer_0_velocity_0 batch_norm_19.w_0_MomentumOptimizer_0_velocity_0 batch_norm_19.b_0_MomentumOptimizer_0_velocity_0 conv2d_20.w_0_MomentumOptimizer_0_velocity_0 batch_norm_20.w_0_MomentumOptimizer_0_velocity_0 batch_norm_20.b_0_MomentumOptimizer_0_velocity_0 conv2d_21.w_0_MomentumOptimizer_0_velocity_0 batch_norm_21.w_0_MomentumOptimizer_0_velocity_0 batch_norm_21.b_0_MomentumOptimizer_0_velocity_0 conv2d_22.w_0_MomentumOptimizer_0_velocity_0 batch_norm_22.w_0_MomentumOptimizer_0_velocity_0 batch_norm_22.b_0_MomentumOptimizer_0_velocity_0 conv2d_23.w_0_MomentumOptimizer_0_velocity_0 batch_norm_23.w_0_MomentumOptimizer_0_velocity_0 batch_norm_23.b_0_MomentumOptimizer_0_velocity_0 conv2d_24.w_0_MomentumOptimizer_0_velocity_0 batch_norm_24.w_0_MomentumOptimizer_0_velocity_0 batch_norm_24.b_0_MomentumOptimizer_0_velocity_0 conv2d_25.w_0_MomentumOptimizer_0_velocity_0 batch_norm_25.w_0_MomentumOptimizer_0_velocity_0 batch_norm_25.b_0_MomentumOptimizer_0_velocity_0 conv2d_26.w_0_MomentumOptimizer_0_velocity_0 batch_norm_26.w_0_MomentumOptimizer_0_velocity_0 batch_norm_26.b_0_MomentumOptimizer_0_velocity_0 conv2d_27.w_0_MomentumOptimizer_0_velocity_0 batch_norm_27.w_0_MomentumOptimizer_0_velocity_0 batch_norm_27.b_0_MomentumOptimizer_0_velocity_0 conv2d_28.w_0_MomentumOptimizer_0_velocity_0 batch_norm_28.w_0_MomentumOptimizer_0_velocity_0 batch_norm_28.b_0_MomentumOptimizer_0_velocity_0 conv2d_29.w_0_MomentumOptimizer_0_velocity_0 batch_norm_29.w_0_MomentumOptimizer_0_velocity_0 batch_norm_29.b_0_MomentumOptimizer_0_velocity_0 conv2d_30.w_0_MomentumOptimizer_0_velocity_0 batch_norm_30.w_0_MomentumOptimizer_0_velocity_0 batch_norm_30.b_0_MomentumOptimizer_0_velocity_0 conv2d_31.w_0_MomentumOptimizer_0_velocity_0 batch_norm_31.w_0_MomentumOptimizer_0_velocity_0 batch_norm_31.b_0_MomentumOptimizer_0_velocity_0 conv2d_32.w_0_MomentumOptimizer_0_velocity_0 batch_norm_32.w_0_MomentumOptimizer_0_velocity_0 batch_norm_32.b_0_MomentumOptimizer_0_velocity_0 conv2d_33.w_0_MomentumOptimizer_0_velocity_0 batch_norm_33.w_0_MomentumOptimizer_0_velocity_0 batch_norm_33.b_0_MomentumOptimizer_0_velocity_0 conv2d_34.w_0_MomentumOptimizer_0_velocity_0 batch_norm_34.w_0_MomentumOptimizer_0_velocity_0 batch_norm_34.b_0_MomentumOptimizer_0_velocity_0 conv2d_35.w_0_MomentumOptimizer_0_velocity_0 batch_norm_35.w_0_MomentumOptimizer_0_velocity_0 batch_norm_35.b_0_MomentumOptimizer_0_velocity_0 conv2d_36.w_0_MomentumOptimizer_0_velocity_0 batch_norm_36.w_0_MomentumOptimizer_0_velocity_0 batch_norm_36.b_0_MomentumOptimizer_0_velocity_0 conv2d_37.w_0_MomentumOptimizer_0_velocity_0 batch_norm_37.w_0_MomentumOptimizer_0_velocity_0 batch_norm_37.b_0_MomentumOptimizer_0_velocity_0 conv2d_38.w_0_MomentumOptimizer_0_velocity_0 batch_norm_38.w_0_MomentumOptimizer_0_velocity_0 batch_norm_38.b_0_MomentumOptimizer_0_velocity_0 conv2d_39.w_0_MomentumOptimizer_0_velocity_0 batch_norm_39.w_0_MomentumOptimizer_0_velocity_0 batch_norm_39.b_0_MomentumOptimizer_0_velocity_0 conv2d_40.w_0_MomentumOptimizer_0_velocity_0 batch_norm_40.w_0_MomentumOptimizer_0_velocity_0 batch_norm_40.b_0_MomentumOptimizer_0_velocity_0 conv2d_41.w_0_MomentumOptimizer_0_velocity_0 batch_norm_41.w_0_MomentumOptimizer_0_velocity_0 batch_norm_41.b_0_MomentumOptimizer_0_velocity_0 conv2d_42.w_0_MomentumOptimizer_0_velocity_0 batch_norm_42.w_0_MomentumOptimizer_0_velocity_0 batch_norm_42.b_0_MomentumOptimizer_0_velocity_0 conv2d_43.w_0_MomentumOptimizer_0_velocity_0 batch_norm_43.w_0_MomentumOptimizer_0_velocity_0 batch_norm_43.b_0_MomentumOptimizer_0_velocity_0 conv2d_44.w_0_MomentumOptimizer_0_velocity_0 batch_norm_44.w_0_MomentumOptimizer_0_velocity_0 batch_norm_44.b_0_MomentumOptimizer_0_velocity_0 conv2d_45.w_0_MomentumOptimizer_0_velocity_0 batch_norm_45.w_0_MomentumOptimizer_0_velocity_0 batch_norm_45.b_0_MomentumOptimizer_0_velocity_0 conv2d_46.w_0_MomentumOptimizer_0_velocity_0 batch_norm_46.w_0_MomentumOptimizer_0_velocity_0 batch_norm_46.b_0_MomentumOptimizer_0_velocity_0 conv2d_47.w_0_MomentumOptimizer_0_velocity_0 batch_norm_47.w_0_MomentumOptimizer_0_velocity_0 batch_norm_47.b_0_MomentumOptimizer_0_velocity_0 conv2d_48.w_0_MomentumOptimizer_0_velocity_0 batch_norm_48.w_0_MomentumOptimizer_0_velocity_0 batch_norm_48.b_0_MomentumOptimizer_0_velocity_0 conv2d_49.w_0_MomentumOptimizer_0_velocity_0 batch_norm_49.w_0_MomentumOptimizer_0_velocity_0 batch_norm_49.b_0_MomentumOptimizer_0_velocity_0 conv2d_50.w_0_MomentumOptimizer_0_velocity_0 batch_norm_50.w_0_MomentumOptimizer_0_velocity_0 batch_norm_50.b_0_MomentumOptimizer_0_velocity_0 conv2d_51.w_0_MomentumOptimizer_0_velocity_0 batch_norm_51.w_0_MomentumOptimizer_0_velocity_0 batch_norm_51.b_0_MomentumOptimizer_0_velocity_0 conv2d_52.w_0_MomentumOptimizer_0_velocity_0 batch_norm_52.w_0_MomentumOptimizer_0_velocity_0 batch_norm_52.b_0_MomentumOptimizer_0_velocity_0 linear_0.w_0_MomentumOptimizer_0_velocity_0 linear_0.b_0_MomentumOptimizer_0_velocity_0 global_step StructuredToParameterName@@ global_step Checkpoint Loaded Training Start
---------------------------------------------------------------------------AssertionError Traceback (most recent call last) in 1 if name == 'main': 2 ----> 3 train_resnet() in train_resnet() 112 113 avg_loss.backward() --> 114 optimizer.minimize(avg_loss) 115 resnet.clear_gradients() 116 </opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/decorator.py:decorator-gen-185> in minimize(self, loss, startup_program, parameter_list, no_grad_set) /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/base.py in impl(func, *args, **kwargs) 201 def impl(func, *args, **kwargs): 202 with switch_tracer_mode_guard(is_train=False): --> 203 return func(*args, **kwargs) 204 205 return impl(func) /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/optimizer.py in minimize(self, loss, startup_program, parameter_list, no_grad_set) 835 836 optimize_ops = self.apply_optimize( --> 837 loss, startup_program=startup_program, params_grads=params_grads) 838 839 return optimize_ops, params_grads /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/optimizer.py in apply_optimize(self, loss, startup_program, params_grads) 745 params_grads = append_regularization_ops(params_grads, 746 self.regularization) --> 747 optimize_ops = self._create_optimization_pass(params_grads) 748 else: 749 program = loss.block.program /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/optimizer.py in _create_optimization_pass(self, parameters_and_grads) 541 self._create_accumulators( 542 target_block, --> 543 [p[0] for p in parameters_and_grads if p[0].trainable]) 544 self._create_global_learning_rate() 545 /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/optimizer.py in _create_accumulators(self, block, parameters) 1030 1031 for p in parameters: -> 1032 self._add_accumulator(self._velocity_acc_str, p) 1033 1034 def _append_optimize_op(self, block, param_and_grad): /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/optimizer.py in _add_accumulator(self, name, param, dtype, fill_value, shape, type, device) 459 if len(self._accumulators_holder) > 0: 460 assert var_name in self._accumulators_holder,
--> 461 "Optimizer set error, {} should in state dict".format( var_name ) 462 var.set_value(self._accumulators_holder[var_name]) 463 AssertionError: Optimizer set error, conv2d_0.w_0_velocity_0 should in state dict
Created by: hbwslms
我觉得应该不是这样,‘resnet_params'只是文件前缀名,api文档中也写明了“该接口会根据 state_dict 的内容,自动给 model_path 添加 .pdparams 或者 .pdopt 后缀, 生成 model_path + ".pdparams" 或者 model_path + ".pdopt" 文件。” 并且我这里也确实产生了两个文件:
而我继续训练时,也的确读出了之前保存的优化器数据,并且把字典打印了出来。所以我认为问题出在了优化器读取数据时。 谢谢你了!
发送自 Windows 10 版邮件应用
发件人: Oliver Tang 发送时间: 2020年8月11日 04:21 收件人: PaddlePaddle/Paddle 抄送: hbwslms; Author 主题: Re: [PaddlePaddle/Paddle] 【论文复现】优化器加载保存的信息时出错 (#26113)
你在保存时,对于模型和优化器使用了一致的model_path=‘resnet_params',使得优化器的存储文件覆盖了模型的存储文件 — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
Created by: hbwslms
请问有可复现的代码吗
https://aistudio.baidu.com/aistudio/projectdetail/708019?shared=1 麻烦了
Created by: sandyhouse
你运行的代码和 https://aistudio.baidu.com/aistudio/projectdetail/708019?shared=1 有什么不同吗?我本地无法复现该错误。区别是我本地用的develop分支,如果代码一致的话你也可以试一下develop分支代码。 另外,你可以确认下保存模型和加载模型时用的python版本是否一致。如果版本不一致的话也有可能有问题。