训练到一半,发生core dump
Created by: nihuizhidao
CentOS 7.6 CUDA 10.2 paddlepaddle-gpu 1.8.3.post107 paddledetection release 0.3
再训练到一半的时候发生以下错误:
2020-08-28 22:29:35,474-INFO: Save model to output/jinghuajian_release06_small/50500. 2020-08-28 22:29:40,062-INFO: Test iter 0 2020-08-28 22:29:41,411-INFO: Test finish iter 13 2020-08-28 22:29:41,411-INFO: Total number of images: 74, inference time: 48.890845358284935 fps. loading annotations into memory... Done (t=0.01s) creating index... index created! 2020-08-28 22:29:41,528-INFO: Start evaluate... Loading and preparing results... Traceback (most recent call last): File "tools/train.py", line 368, in main() File "tools/train.py", line 290, in main cfg['EvalReader']['dataset']) File "/home/aas/Users/Schwarz/ShenNanDianLu/PaddleDetection-release-0.3/ppdet/utils/eval_utils.py", line 218, in eval_results save_only=save_only) File "/home/aas/Users/Schwarz/ShenNanDianLu/PaddleDetection-release-0.3/ppdet/utils/coco_eval.py", line 102, in bbox_eval map_stats = cocoapi_eval(outfile, 'bbox', coco_gt=coco_gt) File "/home/aas/Users/Schwarz/ShenNanDianLu/PaddleDetection-release-0.3/ppdet/utils/coco_eval.py", line 188, in cocoapi_eval coco_dt = coco_gt.loadRes(jsonfile) File "/home/aas/anaconda3/lib/python3.7/site-packages/pycocotools-2.0-py3.7-linux-x86_64.egg/pycocotools/coco.py", line 317, in loadRes anns = json.load(open(resFile)) File "/home/aas/anaconda3/lib/python3.7/json/init.py", line 296, in load parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw) File "/home/aas/anaconda3/lib/python3.7/json/init.py", line 348, in loads return _default_decoder.decode(s) File "/home/aas/anaconda3/lib/python3.7/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/home/aas/anaconda3/lib/python3.7/json/decoder.py", line 353, in raw_decode obj, end = self.scan_once(s, idx) json.decoder.JSONDecodeError: Expecting ',' delimiter: line 1 column 245912 (char 245911) terminate called without an active exception W0828 22:29:42.231966 7718 init.cc:216] Warning: PaddlePaddle catches a failure signal, it may not work properly W0828 22:29:42.232002 7718 init.cc:218] You could check whether you killed PaddlePaddle thread/process accidentally or report the case to PaddlePaddle W0828 22:29:42.232010 7718 init.cc:221] The detail failure signal is:
W0828 22:29:42.232020 7718 init.cc:224] *** Aborted at 1598624982 (unix time) try "date -d @1598624982" if you are using GNU date *** W0828 22:29:42.236173 7718 init.cc:224] PC: @ 0x0 (unknown) W0828 22:29:42.236294 7718 init.cc:224] *** SIGABRT (@0x3e800001d21) received by PID 7457 (TID 0x7f53a5fff700) from PID 7457; stack trace: *** W0828 22:29:42.239423 7718 init.cc:224] @ 0x7f546f7f1630 (unknown) W0828 22:29:42.242748 7718 init.cc:224] @ 0x7f546f44a387 __GI_raise W0828 22:29:42.245952 7718 init.cc:224] @ 0x7f546f44ba78 __GI_abort W0828 22:29:42.248221 7718 init.cc:224] @ 0x7f543a23784a __gnu_cxx::__verbose_terminate_handler() W0828 22:29:42.250105 7718 init.cc:224] @ 0x7f543a235f47 __cxxabiv1::__terminate() W0828 22:29:42.252313 7718 init.cc:224] @ 0x7f543a235f7d std::terminate() W0828 22:29:42.254318 7718 init.cc:224] @ 0x7f543a235c5a __gxx_personality_v0 W0828 22:29:42.257339 7718 init.cc:224] @ 0x7f5468539b97 _Unwind_ForcedUnwind_Phase2 W0828 22:29:42.260264 7718 init.cc:224] @ 0x7f5468539e7d _Unwind_ForcedUnwind W0828 22:29:42.263258 7718 init.cc:224] @ 0x7f546f7f0362 __GI___pthread_unwind W0828 22:29:42.266294 7718 init.cc:224] @ 0x7f546f7eaef7 __pthread_exit W0828 22:29:42.266974 7718 init.cc:224] @ 0x561ab9a451c9 PyThread_exit_thread W0828 22:29:42.267164 7718 init.cc:224] @ 0x561ab98d7cb1 PyEval_RestoreThread.cold.787 W0828 22:29:42.270735 7718 init.cc:224] @ 0x7f53f21b9b19 pybind11::gil_scoped_release::~gil_scoped_release() W0828 22:29:42.271633 7718 init.cc:224] @ 0x7f53f22a2055 ZZN8pybind1112cpp_function10initializeIZN6paddle6pybind10BindReaderEPNS_6moduleEEUlRNS2_9operators6reader40OrderedMultiDeviceLoDTensorBlockingQueueERKSt6vectorINS2_9framework9LoDTensorESaISC_EEE2_bIS9_SG_EINS_4nameENS_9is_methodENS_7siblingENS_10call_guardIINS_18gil_scoped_releaseEEEEEEEvOT_PFT0_DpT1_EDpRKT2_ENUlRNS_6detail13function_callEE1_4_FUNES11 W0828 22:29:42.275063 7718 init.cc:224] @ 0x7f53f21d7329 pybind11::cpp_function::dispatcher() W0828 22:29:42.275817 7718 init.cc:224] @ 0x561ab99cf114 _PyMethodDef_RawFastCallKeywords W0828 22:29:42.276522 7718 init.cc:224] @ 0x561ab99cf231 _PyCFunction_FastCallKeywords W0828 22:29:42.277287 7718 init.cc:224] @ 0x561ab9a33e8f _PyEval_EvalFrameDefault W0828 22:29:42.278061 7718 init.cc:224] @ 0x561ab99889da _PyEval_EvalCodeWithName W0828 22:29:42.278834 7718 init.cc:224] @ 0x561ab9989805 _PyFunction_FastCallDict W0828 22:29:42.279557 7718 init.cc:224] @ 0x561ab9a309f5 _PyEval_EvalFrameDefault W0828 22:29:42.280256 7718 init.cc:224] @ 0x561ab99ce68b _PyFunction_FastCallKeywords W0828 22:29:42.280979 7718 init.cc:224] @ 0x561ab9a2f260 _PyEval_EvalFrameDefault W0828 22:29:42.281596 7718 init.cc:224] @ 0x561ab99ce68b _PyFunction_FastCallKeywords W0828 22:29:42.282305 7718 init.cc:224] @ 0x561ab9a2f260 _PyEval_EvalFrameDefault W0828 22:29:42.282969 7718 init.cc:224] @ 0x561ab998973b _PyFunction_FastCallDict W0828 22:29:42.283612 7718 init.cc:224] @ 0x561ab99a4943 _PyObject_Call_Prepend W0828 22:29:42.284323 7718 init.cc:224] @ 0x561ab9997b9e PyObject_Call W0828 22:29:42.284615 7718 init.cc:224] @ 0x561ab9a83af7 t_bootstrap W0828 22:29:42.284747 7718 init.cc:224] @ 0x561ab9a40e18 pythread_wrapper W0828 22:29:42.288245 7718 init.cc:224] @ 0x7f546f7e9ea5 start_thread 已放弃(吐核)
请问这个可能是什么原因呢?谢谢!