在first_trainer进行fleet.save_persistables之后,直接使用fluid.io.load_persistables报错。
Created by: MrChengmo
- 标题:在0号Trainer执行fleet.save_persistables后,不关闭进程,直接使用fluid.io.load_persistables报错。
- 版本、环境信息: 1)PaddlePaddle版本:v1.5.0 2)CPU:开发机 3)GPU:无 4)系统环境:Centos,Python 2.7
- 复现信息:0号Trainer先执行:
if is_first_trainer: fleet.save_persistables(executor=exe, dirname=model_path,main_program=fluid.default_main_program()) logger.info("Train Success!") fleet.stop_worker()
再执行:
fluid.io.load_persistables(executor=exe,dirname=model_path,main_program=fluid.default_main_program())
报错,参数无法加载,进一步看是文件无法打开,仿佛不存在,而事实上是保存了的,怀疑是文件读写没有close。同时,若单独对infer部分代码测试,相同数据,没有问题。
- 问题描述:
2019-07-29 16:13:04,599 - INFO - Train Success! Traceback (most recent call last): File "model.py", line 151, in <module> runtime_main(params, CTR) File "/home/chengmo/workroot/ctr_cloud/dist_continuous_evaluation.py", line 286, in runtime_main model.run_infer(params) File "model.py", line 118, in run_infer main_program=fluid.default_main_program() File "/home/chengmo/.jumbo/lib/python2.7/site-packages/paddle/fluid/io.py", line 747, in load_persistables filename=filename) File "/home/chengmo/.jumbo/lib/python2.7/site-packages/paddle/fluid/io.py", line 611, in load_vars filename=filename) File "/home/chengmo/.jumbo/lib/python2.7/site-packages/paddle/fluid/io.py", line 648, in load_vars executor.run(load_prog) File "/home/chengmo/.jumbo/lib/python2.7/site-packages/paddle/fluid/executor.py", line 651, in run use_program_cache=use_program_cache) File "/home/chengmo/.jumbo/lib/python2.7/site-packages/paddle/fluid/executor.py", line 749, in _run exe.run(program.desc, scope, 0, True, True, fetch_var_name) paddle.fluid.core_avx.EnforceNotMet: Invoke operator load error. Python Callstacks: File "/home/chengmo/.jumbo/lib/python2.7/site-packages/paddle/fluid/framework.py", line 1771, in append_op attrs=kwargs.get("attrs", None)) File "/home/chengmo/.jumbo/lib/python2.7/site-packages/paddle/fluid/io.py", line 633, in load_vars 'file_path': os.path.join(load_dirname, new_var.name) File "/home/chengmo/.jumbo/lib/python2.7/site-packages/paddle/fluid/io.py", line 611, in load_vars filename=filename) File "/home/chengmo/.jumbo/lib/python2.7/site-packages/paddle/fluid/io.py", line 747, in load_persistables filename=filename) File "model.py", line 118, in run_infer main_program=fluid.default_main_program() File "/home/chengmo/workroot/ctr_cloud/dist_continuous_evaluation.py", line 286, in runtime_main model.run_infer(params) File "model.py", line 151, in <module> runtime_main(params, CTR) C++ Callstacks: Cannot open file output/final_pyReader_train/fc_4.w_0 for load op at [/paddle/paddle/fluid/operators/load_op.h:37] PaddlePaddle Call Stacks: 0 0x7f08c60ba6d0p void paddle::platform::EnforceNotMet::Init<char const*>(char const*, char const*, int) + 352 1 0x7f08c60baa49p paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int) + 137 2 0x7f08c66faed6p paddle::operators::LoadOpKernel<paddle::platform::CPUDeviceContext, float>::Compute(paddle::framework::ExecutionContext const&) const + 774 3 0x7f08c66fb143p _ZNSt17_Function_handlerIFvRKN6paddle9framework16ExecutionContextEEZNKS1_24OpKernelRegistrarFunctorINS0_8platform8CPUPlaceELb0ELm0EJNS0_9operators12LoadOpKernelINS7_16CPUDeviceContextEfEENSA_ISB_dEENSA_ISB_iEENSA_ISB_aEENSA_ISB_lEEEEclEPKcSJ_iEUlS4_E_E9_M_invokeERKSt9_Any_dataS4_ + 35 4 0x7f08c7403627p paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&, paddle::framework::RuntimeContext*) const + 375 5 0x7f08c7403d91p paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) const + 529 6 0x7f08c7401c3bp paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) + 267 7 0x7f08c623c00ep paddle::framework::Executor::RunPreparedContext(paddle::framework::ExecutorPrepareContext*, paddle::framework::Scope*, bool, bool, bool) + 206 8 0x7f08c623f08fp paddle::framework::Executor::Run(paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool, std::vector<std::string, std::allocator<std::string> > const&, bool) + 143 9 0x7f08c60acefdp 10 0x7f08c60e9ceep 11 0x7f097f70f3e4p PyEval_EvalFrameEx + 25956 12 0x7f097f710130p PyEval_EvalCodeEx + 2240 13 0x7f097f70e4a1p PyEval_EvalFrameEx + 22049 14 0x7f097f710130p PyEval_EvalCodeEx + 2240 15 0x7f097f70e4a1p PyEval_EvalFrameEx + 22049 16 0x7f097f710130p PyEval_EvalCodeEx + 2240 17 0x7f097f70e4a1p PyEval_EvalFrameEx + 22049 18 0x7f097f710130p PyEval_EvalCodeEx + 2240 19 0x7f097f70e4a1p PyEval_EvalFrameEx + 22049 20 0x7f097f710130p PyEval_EvalCodeEx + 2240 21 0x7f097f70e4a1p PyEval_EvalFrameEx + 22049 22 0x7f097f710130p PyEval_EvalCodeEx + 2240 23 0x7f097f70e4a1p PyEval_EvalFrameEx + 22049 24 0x7f097f710130p PyEval_EvalCodeEx + 2240 25 0x7f097f70e4a1p PyEval_EvalFrameEx + 22049 26 0x7f097f710130p PyEval_EvalCodeEx + 2240 27 0x7f097f710242p PyEval_EvalCode + 50 28 0x7f097f72a62cp 29 0x7f097f72a700p PyRun_FileExFlags + 144 30 0x7f097f72bc0cp PyRun_SimpleFileExFlags + 220 31 0x7f097f73d4ccp Py_Main + 3164 32 0x318ae1ecddp __libc_start_main + 253 33 0x400669p