PaddlePaddle catches a failure signal, it may not work properly
Created by: ellinyang
1)PaddlePaddle版本:paddle1.7.1
2)GPU:V100,CUDA7.0和CUDNN9.0 3)系统环境:Linux
- 训练信息 V100, 单机,单卡
W0903 22:22:27.505039 98722 device_context.cc:237] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 10.1, Runtime API Version: 9.0
W0903 22:22:27.510004 98722 device_context.cc:245] device: 0, cuDNN Version: 7.5.
- 复现信息:偶发现象
- 问题描述: 训练结束时报错,请问有哪些可能的原因?
terminate called without an active exception
W0903 22:54:09.612478 99295 init.cc:209] Warning: PaddlePaddle catches a failure signal, it may not work properly
W0903 22:54:09.612509 99295 init.cc:211] You could check whether you killed PaddlePaddle thread/process accidentally or report the case to PaddlePaddle
W0903 22:54:09.612514 99295 init.cc:214] The detail failure signal is:
W0903 22:54:09.612520 99295 init.cc:217] *** Aborted at 1599144849 (unix time) try "date -d @1599144849" if you are using GNU date ***
W0903 22:54:09.615001 99295 init.cc:217] PC: @ 0x0 (unknown)
W0903 22:54:09.615162 99295 init.cc:217] *** SIGABRT (@0x181a2) received by PID 98722 (TID 0x7f64a3fff700) from PID 98722; stack trace: ***
W0903 22:54:09.617220 99295 init.cc:217] @ 0x7f66f9600390 (unknown)
W0903 22:54:09.619148 99295 init.cc:217] @ 0x7f66f925a428 gsignal
W0903 22:54:09.621076 99295 init.cc:217] @ 0x7f66f925c02a abort
W0903 22:54:09.622014 99295 init.cc:217] @ 0x7f6630bb684a __gnu_cxx::__verbose_terminate_handler()
W0903 22:54:09.622663 99295 init.cc:217] @ 0x7f6630bb4f47 __cxxabiv1::__terminate()
W0903 22:54:09.623565 99295 init.cc:217] @ 0x7f6630bb4f7d std::terminate()
W0903 22:54:09.624346 99295 init.cc:217] @ 0x7f6630bb4c5a __gxx_personality_v0
W0903 22:54:09.625113 99295 init.cc:217] @ 0x7f668e0ccb97 _Unwind_ForcedUnwind_Phase2
W0903 22:54:09.625880 99295 init.cc:217] @ 0x7f668e0cce7d _Unwind_ForcedUnwind
W0903 22:54:09.627804 99295 init.cc:217] @ 0x7f66f95ff070 __GI___pthread_unwind
W0903 22:54:09.629688 99295 init.cc:217] @ 0x7f66f95f7845 __pthread_exit
W0903 22:54:09.632023 99295 init.cc:217] @ 0x7f66f9c0b1c9 PyThread_exit_thread
W0903 22:54:09.634066 99295 init.cc:217] @ 0x7f66f9a9dcb1 PyEval_RestoreThread.cold.787
W0903 22:54:09.634805 99295 init.cc:217] @ 0x7f665f21acde (unknown)
W0903 22:54:09.637125 99295 init.cc:217] @ 0x7f66f9b95114 _PyMethodDef_RawFastCallKeywords
W0903 22:54:09.639478 99295 init.cc:217] @ 0x7f66f9b95231 _PyCFunction_FastCallKeywords
W0903 22:54:09.641809 99295 init.cc:217] @ 0x7f66f9bf9a5d _PyEval_EvalFrameDefault
W0903 22:54:09.644102 99295 init.cc:217] @ 0x7f66f9b4e6f9 _PyEval_EvalCodeWithName
W0903 22:54:09.646406 99295 init.cc:217] @ 0x7f66f9b4f805 _PyFunction_FastCallDict
W0903 22:54:09.648681 99295 init.cc:217] @ 0x7f66f9b6a943 _PyObject_Call_Prepend
W0903 22:54:09.650774 99295 init.cc:217] @ 0x7f66f9ba912a slot_tp_call
W0903 22:54:09.653072 99295 init.cc:217] @ 0x7f66f9baa18b _PyObject_FastCallKeywords
W0903 22:54:09.655431 99295 init.cc:217] @ 0x7f66f9bf9626 _PyEval_EvalFrameDefault
W0903 22:54:09.657712 99295 init.cc:217] @ 0x7f66f9b4f73b _PyFunction_FastCallDict
W0903 22:54:09.659974 99295 init.cc:217] @ 0x7f66f9b6a943 _PyObject_Call_Prepend
W0903 22:54:09.662071 99295 init.cc:217] @ 0x7f66f9ba912a slot_tp_call
W0903 22:54:09.664360 99295 init.cc:217] @ 0x7f66f9baa18b _PyObject_FastCallKeywords
W0903 22:54:09.666739 99295 init.cc:217] @ 0x7f66f9bf9e8f _PyEval_EvalFrameDefault
W0903 22:54:09.669080 99295 init.cc:217] @ 0x7f66f9b4e6f9 _PyEval_EvalCodeWithName
W0903 22:54:09.671393 99295 init.cc:217] @ 0x7f66f9b4f805 _PyFunction_FastCallDict
W0903 22:54:09.673691 99295 init.cc:217] @ 0x7f66f9b6a943 _PyObject_Call_Prepend
W0903 22:54:09.676033 99295 init.cc:217] @ 0x7f66f9b5db9e PyObject_Call