提交paddle 集群任务运行出core
Created by: zhouksh
以下是堆栈信息,: Wed Sep 13 18:19:44 2017[1,16]:*** Check failure stack trace: *** Wed Sep 13 18:19:44 2017[1,16]: @ 0x7f18d985227d google::LogMessage::Fail() Wed Sep 13 18:19:44 2017[1,16]: @ 0x7f18d9855d2c google::LogMessage::SendToLog() Wed Sep 13 18:19:44 2017[1,16]: @ 0x7f18d9851da3 google::LogMessage::Flush() Wed Sep 13 18:19:44 2017[1,16]: @ 0x7f18d985723e google::LogMessageFatal::~LogMessageFatal() Wed Sep 13 18:19:44 2017[1,16]: @ 0x7f18d985add3 google::glog_internal_namespace_::InitGoogleLoggingUtilities() Wed Sep 13 18:19:44 2017[1,16]: @ 0x7f18d97b0aee paddle::initMain() Wed Sep 13 18:19:44 2017[1,16]: @ 0x7f18d9839e01 initPaddle() Wed Sep 13 18:19:44 2017[1,16]: @ 0x7f18d94c26a9 _wrap_initPaddle Wed Sep 13 18:19:44 2017[1,16]: @ 0x4b4cb9 PyEval_EvalFrameEx Wed Sep 13 18:19:44 2017[1,16]: @ 0x4b6b28 PyEval_EvalCodeEx Wed Sep 13 18:19:44 2017[1,16]: @ 0x4b5d10 PyEval_EvalFrameEx Wed Sep 13 18:19:44 2017[1,16]: @ 0x4b6b28 PyEval_EvalCodeEx Wed Sep 13 18:19:44 2017[1,16]: @ 0x529340 function_call Wed Sep 13 18:19:44 2017[1,16]: @ 0x422cba PyObject_Call Wed Sep 13 18:19:44 2017[1,16]: @ 0x4b1bd0 PyEval_EvalFrameEx Wed Sep 13 18:19:44 2017[1,16]: @ 0x4b6b28 PyEval_EvalCodeEx Wed Sep 13 18:19:44 2017[1,16]: @ 0x4b5d10 PyEval_EvalFrameEx Wed Sep 13 18:19:44 2017[1,16]: @ 0x4b6b28 PyEval_EvalCodeEx Wed Sep 13 18:19:44 2017[1,16]: @ 0x4b5d10 PyEval_EvalFrameEx Wed Sep 13 18:19:44 2017[1,16]: @ 0x4b6b28 PyEval_EvalCodeEx Wed Sep 13 18:19:44 2017[1,16]: @ 0x4b6c52 PyEval_EvalCode Wed Sep 13 18:19:44 2017[1,16]: @ 0x4e1c7d PyRun_FileExFlags Wed Sep 13 18:19:44 2017[1,16]: @ 0x4e3501 PyRun_SimpleFileExFlags Wed Sep 13 18:19:44 2017[1,16]: @ 0x4159dd Py_Main Wed Sep 13 18:19:44 2017[1,16]: @ 0x7f18dfa44bd5 __libc_start_main Wed Sep 13 18:19:44 2017[1,16]: @ 0x414b71 (unknown) Wed Sep 13 18:19:44 2017[1,16]: @ (nil) (unknown) Wed Sep 13 18:19:44 2017[1,16]:./train.sh: line 239: 17563 Aborted python27-gcc482/bin/python conf/trainer_config.conf Wed Sep 13 18:19:44 2017[1,16]:+ '[' 134 -ne 0 ']' Wed Sep 13 18:19:44 2017[1,16]:+ kill_pserver2_exit Wed Sep 13 18:19:44 2017[1,16]:+ ps aux Wed Sep 13 18:19:44 2017[1,16]:+ grep paddle_pserver2 Wed Sep 13 18:19:44 2017[1,16]:+ grep paddle_cluster_job Wed Sep 13 18:19:44 2017[1,16]:+ grep -v grep Wed Sep 13 18:19:44 2017[1,16]:+ cut -c10-14 Wed Sep 13 18:19:44 2017[1,16]:+ xargs kill -9 Wed Sep 13 18:19:44 2017[1,16]:+ log_fatal 'paddle_trainer failed kill paddle_pserver2 and exit' Wed Sep 13 18:19:44 2017[1,16]:+ echo '[./common.sh : 399] [kill_pserver2_exit]' Wed Sep 13 18:19:44 2017[1,16]:[./common.sh : 399] [kill_pserver2_exit] Wed Sep 13 18:19:44 2017[1,16]:+ echo '[FATAL]: paddle_trainer failed kill paddle_pserver2 and exit' Wed Sep 13 18:19:44 2017[1,16]:[FATAL]: paddle_trainer failed kill paddle_pserver2 and exit Wed Sep 13 18:19:44 2017[1,16]:+ get_stack Wed Sep 13 18:19:44 2017[1,16]:+ set +x Wed Sep 13 18:19:44 2017[1,16]: