parallel_for failed: invalid resource handle
Created by: wulipc
- 版本、环境信息: 1)PaddlePaddle版本:1.6.0 2)CPU: 3)GPU:k40 4)系统环境:centos, python 2.7
- 训练信息 1)单机多卡 2)显存信息: 12GB
问题描述:
由于特殊需求, 要手动分配送给多卡的数据, 参考以下文档进行实现:
https://www.paddlepaddle.org.cn/documentation/docs/zh/user_guides/howto/prepare_data/feeding_data.html
报错: parallel_for failed: invalid resource handle
报错详细信息:
I0220 13:54:39.286427 13237 parallel_executor.cc:421] The number of CUDAPlace, which is used in ParallelExecutor, is 2. And the Program will be copied 2 copies W0220 13:54:41.836264 13237 fuse_all_reduce_op_pass.cc:72] Find all_reduce operators: 291. To make the speed faster, some all_reduce ops are fused during training, after fusion, the number of all_reduce ops is 258. I0220 13:54:41.897410 13237 build_strategy.cc:363] SeqOnlyAllReduceOps:0, num_trainers:1 I0220 13:54:42.755913 13237 parallel_executor.cc:285] Inplace strategy is enabled, when build_strategy.enable_inplace = True I0220 13:54:43.089164 13237 parallel_executor.cc:368] Garbage collection strategy is enabled, when FLAGS_eager_delete_tensor_gb = 0 F0220 13:54:43.193617 15024 exception_holder.h:37] std::exception caught, parallel_for failed: invalid resource handle *** Check failure stack trace: *** @ 0x7f3c4330535d google::LogMessage::Fail() @ 0x7f3c43308e0c google::LogMessage::SendToLog() @ 0x7f3c43304e83 google::LogMessage::Flush() @ 0x7f3c4330a31e google::LogMessageFatal::~LogMessageFatal() @ 0x7f3c4529086c paddle::framework::details::ExceptionHolder::Catch() @ 0x7f3c45322aab paddle::framework::details::FastThreadedSSAGraphExecutor::RunOpSync() @ 0x7f3c4532168f paddle::framework::details::FastThreadedSSAGraphExecutor::RunOp() @ 0x7f3c45321954 _ZNSt17_Function_handlerIFvvESt17reference_wrapperISt12_Bind_simpleIFS1_ISt5_BindIFZN6paddle9framework7details28FastThreadedSSAGraphExecutor10RunOpAsyncEPSt13unordered_mapIPNS6_12OpHandleBaseESt6atomicIiESt4hashISA_ESt8equal_toISA_ESaISt4pairIKSA_SC_EEESA_RKSt10shared_ptrINS5_13BlockingQueueImEEEEUlvE_vEEEvEEEE9_M_invokeERKSt9_Any_data @ 0x7f3c44a04103 std::_Function_handler<>::_M_invoke() @ 0x7f3c43279437 std::__future_base::_State_base::_M_do_set() @ 0x7f3c910c4be0 __GI___pthread_once @ 0x7f3c4531d1b2 _ZNSt13__future_base11_Task_stateISt5_BindIFZN6paddle9framework7details28FastThreadedSSAGraphExecutor10RunOpAsyncEPSt13unordered_mapIPNS4_12OpHandleBaseESt6atomicIiESt4hashIS8_ESt8equal_toIS8_ESaISt4pairIKS8_SA_EEES8_RKSt10shared_ptrINS3_13BlockingQueueImEEEEUlvE_vEESaIiEFvvEE6_M_runEv @ 0x7f3c4327ac94 _ZZN10ThreadPoolC1EmENKUlvE_clEv @ 0x7f3c69028421 execute_native_thread_routine_compat @ 0x7f3c910bfdf3 start_thread @ 0x7f3c906e42cd __clone @ (nil) (unknown)
其他说明: