多线程处理时,多卡执行tools/eval.py报错
Created by: flishwang
EvalReader中use_process设为false,且batch_size设为1时,且gpu数量大于1时,程序在退出时报错。 use_process设为true,或者batch_size设为2,或data_parrallel部分只设置1个CUDAPlace时, 程序都能正常运行。
报错信息如下:
2020-06-03 11:11:54,109-INFO: start loading proposals 2020-06-03 11:11:54,229-INFO: loading roidb 2012_test 2020-06-03 11:11:58,729-INFO: load roidb from scope 2012_test 2020-06-03 11:11:58,729-INFO: finish loading roidbs, total num = 293 2020-06-03 11:11:58,730-INFO: set max batches to 0 2020-06-03 11:11:58,731-INFO: places would be ommited when DataLoader is not iterable W0603 11:11:58.798434 13498 device_context.cc:252] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 10.1, Runtime API Version: 10.0 W0603 11:11:59.370779 13498 device_context.cc:260] device: 0, cuDNN Version: 7.6. 2020-06-03 11:12:35,821-INFO: Test iter 0 2020-06-03 11:12:51,766-INFO: Test iter 100 2020-06-03 11:13:06,355-INFO: Test iter 200 2020-06-03 11:13:19,731-INFO: Test finish iter 293 2020-06-03 11:13:19,731-INFO: Total iteration: 293, inference time: 5.194278110324853 batch/s. 2020-06-03 11:13:19,791-INFO: Start evaluate... 2020-06-03 11:13:19,872-INFO: Accumulating evaluatation results... 2020-06-03 11:13:19,875-INFO: mAP@0.20(integral) = 93.30 2020-06-03 11:13:19,875-INFO: Start evaluate... 2020-06-03 11:13:19,956-INFO: Accumulating evaluatation results... 2020-06-03 11:13:19,963-INFO: mAP@0.50(integral) = 90.65 2020-06-03 11:13:19,963-INFO: Start evaluate... 2020-06-03 11:13:20,064-INFO: Accumulating evaluatation results... 2020-06-03 11:13:20,076-INFO: mAP@0.70(integral) = 71.76 terminate called without an active exception W0603 11:13:20.411875 13642 init.cc:216] Warning: PaddlePaddle catches a failure signal, it may not work properly W0603 11:13:20.411936 13642 init.cc:218] You could check whether you killed PaddlePaddle thread/process accidentally or report the case to PaddlePaddle W0603 11:13:20.411949 13642 init.cc:221] The detail failure signal is:
W0603 11:13:20.411965 13642 init.cc:224] *** Aborted at 1591154000 (unix time) try "date -d @1591154000" if you are using GNU date *** W0603 11:13:20.416925 13642 init.cc:224] PC: @ 0x0 (unknown) W0603 11:13:20.417045 13642 init.cc:224] *** SIGABRT (@0x2716000034ba) received by PID 13498 (TID 0x7f7e69feb700) from PID 13498; stack trace: *** W0603 11:13:20.421667 13642 init.cc:224] @ 0x7f8124ef5390 (unknown) W0603 11:13:20.426170 13642 init.cc:224] @ 0x7f8124b4f428 gsignal W0603 11:13:20.429155 13642 init.cc:224] @ 0x7f8124b5102a abort W0603 11:13:20.455327 13642 init.cc:224] @ 0x7f807560184a __gnu_cxx::__verbose_terminate_handler() W0603 11:13:20.457710 13642 init.cc:224] @ 0x7f80755fff47 __cxxabiv1::__terminate() W0603 11:13:20.460386 13642 init.cc:224] @ 0x7f80755fff7d std::terminate() W0603 11:13:20.462821 13642 init.cc:224] @ 0x7f80755ffc5a __gxx_personality_v0 W0603 11:13:20.466262 13642 init.cc:224] @ 0x7f81220c1b97 _Unwind_ForcedUnwind_Phase2 W0603 11:13:20.468724 13642 init.cc:224] @ 0x7f81220c1e7d _Unwind_ForcedUnwind W0603 11:13:20.471596 13642 init.cc:224] @ 0x7f8124ef4070 __GI___pthread_unwind W0603 11:13:20.474467 13642 init.cc:224] @ 0x7f8124eec845 __pthread_exit W0603 11:13:20.533114 13642 init.cc:224] @ 0x55ad710df059 PyThread_exit_thread W0603 11:13:20.533632 13642 init.cc:224] @ 0x55ad70f64c10 PyEval_RestoreThread.cold.799 W0603 11:13:20.537024 13642 init.cc:224] @ 0x7f8111691cde (unknown) W0603 11:13:20.538024 13642 init.cc:224] @ 0x55ad71065ab4 _PyMethodDef_RawFastCallKeywords W0603 11:13:20.538980 13642 init.cc:224] @ 0x55ad71065bd1 _PyCFunction_FastCallKeywords W0603 11:13:20.539932 13642 init.cc:224] @ 0x55ad710cc57b _PyEval_EvalFrameDefault W0603 11:13:20.540813 13642 init.cc:224] @ 0x55ad71011389 _PyEval_EvalCodeWithName W0603 11:13:20.541718 13642 init.cc:224] @ 0x55ad710124c5 _PyFunction_FastCallDict W0603 11:13:20.542596 13642 init.cc:224] @ 0x55ad71031a73 _PyObject_Call_Prepend W0603 11:13:20.543094 13642 init.cc:224] @ 0x55ad7107927a slot_tp_call W0603 11:13:20.543982 13642 init.cc:224] @ 0x55ad7107a2db _PyObject_FastCallKeywords W0603 11:13:20.544926 13642 init.cc:224] @ 0x55ad710cc146 _PyEval_EvalFrameDefault W0603 11:13:20.545817 13642 init.cc:224] @ 0x55ad710123fb _PyFunction_FastCallDict W0603 11:13:20.546566 13642 init.cc:224] @ 0x55ad71031a73 _PyObject_Call_Prepend W0603 11:13:20.546892 13642 init.cc:224] @ 0x55ad7107927a slot_tp_call W0603 11:13:20.547472 13642 init.cc:224] @ 0x55ad7107a2db _PyObject_FastCallKeywords W0603 11:13:20.548089 13642 init.cc:224] @ 0x55ad710cca39 _PyEval_EvalFrameDefault W0603 11:13:20.548661 13642 init.cc:224] @ 0x55ad71011389 _PyEval_EvalCodeWithName W0603 11:13:20.549237 13642 init.cc:224] @ 0x55ad710124c5 _PyFunction_FastCallDict W0603 11:13:20.549818 13642 init.cc:224] @ 0x55ad71031a73 _PyObject_Call_Prepend W0603 11:13:20.550443 13642 init.cc:224] @ 0x55ad71023fde PyObject_Call Aborted (core dumped)