训练结束后以core dump的形式退出
Created by: nihuizhidao
paddlepaddle-gpu==1.8.2.post107, paddledetection release 0.3 使用tools/train.py进行训练,结束后出现如下情况:
2020-09-11 18:05:07,660-INFO: Save model to output/huaxue_1/model_final. 2020-09-11 18:05:12,998-INFO: Test iter 0 2020-09-11 18:05:21,855-INFO: Test finish iter 74 2020-09-11 18:05:21,855-INFO: Total number of images: 443, inference time: 49.067110368556435 fps. loading annotations into memory... Done (t=0.03s) creating index... index created! 2020-09-11 18:05:22,009-INFO: Start evaluate... Loading and preparing results... DONE (t=0.02s) creating index... index created! Running per image evaluation... Evaluate annotation type bbox DONE (t=1.78s). Accumulating evaluation results... DONE (t=0.26s). Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.712 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.934 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.827 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.230 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.714 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.660 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.778 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.778 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.277 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.779 2020-09-11 18:05:24,121-INFO: Best test box ap: 0.7137067344079995, in iter: 90000 terminate called without an active exception W0911 18:05:27.916838 28227 init.cc:216] Warning: PaddlePaddle catches a failure signal, it may not work properly W0911 18:05:27.916893 28227 init.cc:218] You could check whether you killed PaddlePaddle thread/process accidentally or report the case to PaddlePaddle W0911 18:05:27.916905 28227 init.cc:221] The detail failure signal is:
W0911 18:05:27.916923 28227 init.cc:224] *** Aborted at 1599818727 (unix time) try "date -d @1599818727" if you are using GNU date *** W0911 18:05:30.216557 28227 init.cc:224] PC: @ 0x0 (unknown) W0911 18:05:31.515676 28227 init.cc:224] *** SIGABRT (@0x3e800005c51) received by PID 23633 (TID 0x7fa999fff700) from PID 23633; stack trace: *** W0911 18:05:31.553113 28227 init.cc:224] @ 0x7faca21af630 (unknown) W0911 18:05:31.600666 28227 init.cc:224] @ 0x7faca1e08387 __GI_raise W0911 18:05:31.603421 28227 init.cc:224] @ 0x7faca1e09a78 __GI_abort W0911 18:05:31.680852 28227 init.cc:224] @ 0x7fac4b96e84a __gnu_cxx::__verbose_terminate_handler() W0911 18:05:31.682199 28227 init.cc:224] @ 0x7fac4b96cf47 __cxxabiv1::__terminate() W0911 18:05:31.683813 28227 init.cc:224] @ 0x7fac4b96cf7d std::terminate() W0911 18:05:31.685241 28227 init.cc:224] @ 0x7fac4b96cc5a __gxx_personality_v0 W0911 18:05:31.744850 28227 init.cc:224] @ 0x7faca2410b97 _Unwind_ForcedUnwind_Phase2 W0911 18:05:31.747715 28227 init.cc:224] @ 0x7faca2410e7d _Unwind_ForcedUnwind W0911 18:05:31.750131 28227 init.cc:224] @ 0x7faca21ae362 __GI___pthread_unwind W0911 18:05:31.752398 28227 init.cc:224] @ 0x7faca21a8ef7 __pthread_exit W0911 18:05:31.806854 28227 init.cc:224] @ 0x56380d99db69 PyThread_exit_thread W0911 18:05:31.807189 28227 init.cc:224] @ 0x56380d837cb2 PyEval_RestoreThread.cold.740 W0911 18:05:31.807703 28227 init.cc:224] @ 0x7fac3a404048 (unknown) W0911 18:05:31.808542 28227 init.cc:224] @ 0x56380d8e7304 _PyCFunction_FastCallDict W0911 18:05:31.809388 28227 init.cc:224] @ 0x56380d913cd0 _PyCFunction_FastCallKeywords W0911 18:05:31.809839 28227 init.cc:224] @ 0x56380d96eb0c call_function W0911 18:05:31.810655 28227 init.cc:224] @ 0x56380d9925d9 _PyEval_EvalFrameDefault W0911 18:05:31.811090 28227 init.cc:224] @ 0x56380d967f26 _PyEval_EvalCodeWithName W0911 18:05:31.811724 28227 init.cc:224] @ 0x56380d96940e _PyFunction_FastCallDict W0911 18:05:31.812485 28227 init.cc:224] @ 0x56380d8e76cf _PyObject_FastCallDict W0911 18:05:31.813113 28227 init.cc:224] @ 0x56380d8ec143 _PyObject_Call_Prepend W0911 18:05:31.813810 28227 init.cc:224] @ 0x56380d8e710e PyObject_Call W0911 18:05:31.814169 28227 init.cc:224] @ 0x56380d9404d1 slot_tp_call W0911 18:05:31.814781 28227 init.cc:224] @ 0x56380d8e74eb _PyObject_FastCallDict W0911 18:05:31.815223 28227 init.cc:224] @ 0x56380d96ec5e call_function W0911 18:05:31.815901 28227 init.cc:224] @ 0x56380d99181a _PyEval_EvalFrameDefault W0911 18:05:31.816532 28227 init.cc:224] @ 0x56380d96936b _PyFunction_FastCallDict W0911 18:05:31.817149 28227 init.cc:224] @ 0x56380d8e76cf _PyObject_FastCallDict W0911 18:05:31.817764 28227 init.cc:224] @ 0x56380d8ec143 _PyObject_Call_Prepend W0911 18:05:31.818450 28227 init.cc:224] @ 0x56380d8e710e PyObject_Call W0911 18:05:31.818823 28227 init.cc:224] @ 0x56380d9404d1 slot_tp_call 已放弃(吐核)
我的理解是这个训练已经结束了,model_final正常保存了,但是程序没有正常退出,这个是哪里有问题么?