Errors occurred when running training scripts in NeurIPS2019-Learn-to-Move-Challenge (#187) · Issue · PaddlePaddle / PARL

Errors occurred when running training scripts in NeurIPS2019-Learn-to-Move-Challenge

Created by: luoruiming

When running sh scripts/train_difficulty1.sh ./low_speed_model in /PARL/examples/NeurIPS2019-Learn-to-Move-Challenge, absurd errors occurred (as shown below). Can anyone help me? Thanks in advance!

(opensim-rl) luo@idserver:~/PARL/examples/NeurIPS2019-Learn-to-Move-Challenge$ sh scripts/train_difficulty1.sh ./low_speed_model /home/luo/anaconda3/envs/opensim-rl/bin/python [12-16 23:08:12 MainThread @logger.py:224] Argv: train.py --actor_num 300 --difficulty 1 --penalty_coeff 3.0 --logdir ./output/difficulty1 --restore_model_path ./low_speed_model /home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/opensim/simbody.py:15: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses import imp [12-16 23:08:12 MainThread @machine_info.py:86] nvidia-smi -L found gpu count: 4 [12-16 23:08:13 MainThread @machine_info.py:86] nvidia-smi -L found gpu count: 4 W1216 23:08:14.078102 6084 device_context.cc:259] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 10.0, Runtime API Version: 8.0 W1216 23:08:14.081565 6084 device_context.cc:267] device: 0, cuDNN Version: 7.5. [12-16 23:08:16 MainThread @machine_info.py:86] nvidia-smi -L found gpu count: 4 /home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/paddle/fluid/compiler.py:239: DeprecationWarning: The 'warn' function is deprecated, use 'warning' instead """) WARNING:root: You can try our memory optimize feature to save your memory usage: # create a build_strategy variable to set memory optimize option build_strategy = compiler.BuildStrategy() build_strategy.enable_inplace = True build_strategy.memory_optimize = True

     # pass the build_strategy to with_data_parallel API
     compiled_prog = compiler.CompiledProgram(main).with_data_parallel(
         loss_name=loss.name, build_strategy=build_strategy)
  
 !!! Memory optimize is our experimental feature !!!
     some variables may be removed/reused internal to save memory usage, 
     in order to fetch the right value of the fetch_list, please set the 
     persistable property to true for each variable in fetch_list

     # Sample
     conv1 = fluid.layers.conv2d(data, 4, 5, 1, act=None) 
     # if you need to fetch conv1, then:
     conv1.persistable = True

I1216 23:08:16.079864 6084 parallel_executor.cc:329] The number of CUDAPlace, which is used in ParallelExecutor, is 4. And the Program will be copied 4 copies I1216 23:08:17.081748 6084 build_strategy.cc:340] SeqOnlyAllReduceOps:0, num_trainers:1 [12-16 23:08:17 MainThread @machine_info.py:86] nvidia-smi -L found gpu count: 4 /home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/paddle/fluid/compiler.py:239: DeprecationWarning: The 'warn' function is deprecated, use 'warning' instead """) WARNING:root: You can try our memory optimize feature to save your memory usage: # create a build_strategy variable to set memory optimize option build_strategy = compiler.BuildStrategy() build_strategy.enable_inplace = True build_strategy.memory_optimize = True

     # pass the build_strategy to with_data_parallel API
     compiled_prog = compiler.CompiledProgram(main).with_data_parallel(
         loss_name=loss.name, build_strategy=build_strategy)
  
 !!! Memory optimize is our experimental feature !!!
     some variables may be removed/reused internal to save memory usage, 
     in order to fetch the right value of the fetch_list, please set the 
     persistable property to true for each variable in fetch_list

     # Sample
     conv1 = fluid.layers.conv2d(data, 4, 5, 1, act=None) 
     # if you need to fetch conv1, then:
     conv1.persistable = True

I1216 23:08:17.209542 6084 parallel_executor.cc:329] The number of CUDAPlace, which is used in ParallelExecutor, is 4. And the Program will be copied 4 copies I1216 23:08:17.324332 6084 build_strategy.cc:340] SeqOnlyAllReduceOps:0, num_trainers:1 [12-16 23:08:17 MainThread @machine_info.py:86] nvidia-smi -L found gpu count: 4 /home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/paddle/fluid/compiler.py:239: DeprecationWarning: The 'warn' function is deprecated, use 'warning' instead """) WARNING:root: You can try our memory optimize feature to save your memory usage: # create a build_strategy variable to set memory optimize option build_strategy = compiler.BuildStrategy() build_strategy.enable_inplace = True build_strategy.memory_optimize = True

     # pass the build_strategy to with_data_parallel API
     compiled_prog = compiler.CompiledProgram(main).with_data_parallel(
         loss_name=loss.name, build_strategy=build_strategy)
  
 !!! Memory optimize is our experimental feature !!!
     some variables may be removed/reused internal to save memory usage, 
     in order to fetch the right value of the fetch_list, please set the 
     persistable property to true for each variable in fetch_list

     # Sample
     conv1 = fluid.layers.conv2d(data, 4, 5, 1, act=None) 
     # if you need to fetch conv1, then:
     conv1.persistable = True

share_vars_from is set, scope is ignored. I1216 23:08:17.525264 6084 parallel_executor.cc:329] The number of CUDAPlace, which is used in ParallelExecutor, is 4. And the Program will be copied 4 copies I1216 23:08:17.640771 6084 build_strategy.cc:340] SeqOnlyAllReduceOps:0, num_trainers:1 [12-16 23:08:17 MainThread @train.py:303] restore model from ./low_speed_model Traceback (most recent call last): File "train.py", line 327, in learner = Learner(args) File "train.py", line 85, in init self.restore(args.restore_model_path) File "train.py", line 304, in restore self.agent.restore(model_path) File "/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/parl/core/fluid/agent.py", line 221, in restore filename=filename) File "/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/paddle/fluid/io.py", line 699, in load_params filename=filename) File "/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/paddle/fluid/io.py", line 611, in load_vars filename=filename) File "/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/paddle/fluid/io.py", line 648, in load_vars executor.run(load_prog) File "/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/paddle/fluid/executor.py", line 651, in run use_program_cache=use_program_cache) File "/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/paddle/fluid/executor.py", line 749, in run exe.run(program.desc, scope, 0, True, True, fetch_var_name) paddle.fluid.core_avx.EnforceNotMet: Invoke operator load_combine error. Python Callstacks: File "/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/paddle/fluid/framework.py", line 1771, in append_op attrs=kwargs.get("attrs", None)) File "/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/paddle/fluid/io.py", line 647, in load_vars attrs={'file_path': os.path.join(load_dirname, filename)}) File "/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/paddle/fluid/io.py", line 611, in load_vars filename=filename) File "/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/paddle/fluid/io.py", line 699, in load_params filename=filename) File "/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/parl/core/fluid/agent.py", line 221, in restore filename=filename) File "train.py", line 304, in restore self.agent.restore(model_path) File "train.py", line 85, in init self.restore(args.restore_model_path) File "train.py", line 327, in learner = Learner(args) C++ Callstacks: tensor version 3393762800 is not supported. at [/paddle/paddle/fluid/framework/lod_tensor.cc:256] PaddlePaddle Call Stacks: 0 0x7efdba6c1f10p void paddle::platform::EnforceNotMet::Init<char const*>(char const*, char const*, int) + 352 1 0x7efdba6c2289p paddle::platform::EnforceNotMet::EnforceNotMet(std::exception_ptr::exception_ptr, char const*, int) + 137 2 0x7efdbc38c7d4p paddle::framework::DeserializeFromStream(std::istream&, paddle::framework::LoDTensor*, paddle::platform::DeviceContext const&) + 724 3 0x7efdbb35e480p paddle::operators::LoadCombineOpKernel<paddle::platform::CUDADeviceContext, float>::LoadParamsFromBuffer(paddle::framework::ExecutionContext const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&, std::istream*, bool, std::vector<std::string, std::allocatorstd::string > const&) const + 352 4 0x7efdbb35edfep paddle::operators::LoadCombineOpKernel<paddle::platform::CUDADeviceContext, float>::Compute(paddle::framework::ExecutionContext const&) const + 798 5 0x7efdbb35f273p std::Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CUDAPlace, false, 0ul, paddle::operators::LoadCombineOpKernel<paddle::platform::CUDADeviceContext, float>, paddle::operators::LoadCombineOpKernel<paddle::platform::CUDADeviceContext, double>, paddle::operators::LoadCombineOpKernel<paddle::platform::CUDADeviceContext, int>, paddle::operators::LoadCombineOpKernel<paddle::platform::CUDADeviceContext, signed char>, paddle::operators::LoadCombineOpKernel<paddle::platform::CUDADeviceContext, long> >::operator()(char const*, char const*, int) const::{lambda(paddle::framework::ExecutionContext const&)#1}>::M_invoke(std::Any_data const&, paddle::framework::ExecutionContext const&) + 35 6 0x7efdbc7411e7p paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&, paddle::framework::RuntimeContext*) const + 375 7 0x7efdbc7415c1p paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) const + 529 8 0x7efdbc73ebbcp paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) + 332 9 0x7efdba84cd0ep paddle::framework::Executor::RunPreparedContext(paddle::framework::ExecutorPrepareContext*, paddle::framework::Scope*, bool, bool, bool) + 382 10 0x7efdba84fdafp paddle::framework::Executor::Run(paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool, std::vector<std::string, std::allocatorstd::string > const&, bool) + 143 11 0x7efdba6b359dp 12 0x7efdba6f4826p 13 0x7efe81ea2df2p _PyCFunction_FastCallDict + 258 14 0x7efe81f282bbp 15 0x7efe81f2b15dp _PyEval_EvalFrameDefault + 9981 16 0x7efe81f26a60p 17 0x7efe81f2848ap 18 0x7efe81f2a8ddp _PyEval_EvalFrameDefault + 7805 19 0x7efe81f26a60p 20 0x7efe81f2848ap 21 0x7efe81f2b15dp _PyEval_EvalFrameDefault + 9981 22 0x7efe81f26a60p 23 0x7efe81f2848ap 24 0x7efe81f2a8ddp _PyEval_EvalFrameDefault + 7805 25 0x7efe81f26a60p 26 0x7efe81f2848ap 27 0x7efe81f2a8ddp _PyEval_EvalFrameDefault + 7805 28 0x7efe81f26a60p 29 0x7efe81f2848ap 30 0x7efe81f2a8ddp _PyEval_EvalFrameDefault + 7805 31 0x7efe81f26a60p 32 0x7efe81f2848ap 33 0x7efe81f2b15dp _PyEval_EvalFrameDefault + 9981 34 0x7efe81f25e74p 35 0x7efe81f285e8p 36 0x7efe81f2b15dp _PyEval_EvalFrameDefault + 9981 37 0x7efe81f25e74p 38 0x7efe81f26e75p _PyFunction_FastCallDict + 645 39 0x7efe81e4bba6p _PyObject_FastCallDict + 358 40 0x7efe81e4bdfcp _PyObject_Call_Prepend + 204 41 0x7efe81e4be96p PyObject_Call + 86 42 0x7efe81ec4233p 43 0x7efe81eb9d4cp 44 0x7efe81e4badep _PyObject_FastCallDict + 158 45 0x7efe81f282bbp 46 0x7efe81f2b15dp _PyEval_EvalFrameDefault + 9981 47 0x7efe81f26a60p 48 0x7efe81f26ee3p PyEval_EvalCodeEx + 99 49 0x7efe81f26f2bp PyEval_EvalCode + 59 50 0x7efe81f596c0p PyRun_FileExFlags + 304 51 0x7efe81f5ac83p PyRun_SimpleFileExFlags + 371 52 0x7efe81f760b5p Py_Main + 3621 53 0x400c1dp main + 365 54 0x7efe80f01830p __libc_start_main + 240 55 0x4009e9p