训练一段时候后挂掉。像是训练正常,预测出现问题。报错PyEval_EvalFrameEx相关。
Created by: liushanshan07
Saved model to: ./output/models/model_186000.
Time: 1541876304.23; Iter[187000]; Avg Warp-CTC loss: 1.017; Avg seq err: 0.138
Time: 1541876684.59; Iter[188000]; Avg Warp-CTC loss: 0.914; Avg seq err: 0.131
* Aborted at 1541876972 (unix time) try "date -d @1541876972" if you are using GNU date *
PC: @ 0x0 (unknown)
* SIGSEGV (@0x22) received by PID 802 (TID 0x7fd87e73b700) from PID 34; stack trace: *
@ 0x7fd87dd3c500 (unknown)
@ 0x7fd87e03cd5b PyEval_EvalFrameEx
@ 0x7fd87dfb616c gen_send_ex
@ 0x7fd87e03d241 PyEval_EvalFrameEx
@ 0x7fd87dfb616c gen_send_ex
@ 0x7fd87e03d241 PyEval_EvalFrameEx
@ 0x7fd87dfb616c gen_send_ex
@ 0x7fd87e03d241 PyEval_EvalFrameEx
@ 0x7fd87dfb616c gen_send_ex
@ 0x7fd87e03d241 PyEval_EvalFrameEx
@ 0x7fd87dfb616c gen_send_ex
@ 0x7fd87e03d241 PyEval_EvalFrameEx
@ 0x7fd87dfb616c gen_send_ex
@ 0x7fd87e03d241 PyEval_EvalFrameEx
@ 0x7fd87dfb616c gen_send_ex
@ 0x7fd87e03d241 PyEval_EvalFrameEx
@ 0x7fd87e043bce PyEval_EvalCodeEx
@ 0x7fd87e04220a PyEval_EvalFrameEx
@ 0x7fd87e042560 PyEval_EvalFrameEx
@ 0x7fd87e043bce PyEval_EvalCodeEx
@ 0x7fd87e043ce2 PyEval_EvalCode
@ 0x7fd87e0639e0 PyRun_FileExFlags
@ 0x7fd87e063bbf PyRun_SimpleFileExFlags
@ 0x7fd87e079454 Py_Main
@ 0x7fd87d32dcdd __libc_start_main
/root/paddlejob/run.sh: line 307: 802 Aborted (core dumped) python train.py
*error messages
[/root/paddlejob/run.sh : 310] [start_paddle_trainer]
[ERROR]: execute user cmd failed, Sun Nov 11 03:09:36 CST 2018
~/paddlejob/workspace
end_hook start ...
~/paddlejob/workspace/env_run ~/paddlejob/workspace
~/paddlejob/workspace
data_clear start ...
rm /root/paddlejob/workspace/env_run/train_data
upload start ...
current path:/root/paddlejob/workspace
~/paddlejob/workspace ~/paddlejob/workspace
start upload model
upload model to hdfs
18/11/11 03:09:37 INFO fs.LibdfsLoader: Trying to load the libdfs library...
18/11/11 03:09:37 INFO fs.LibdfsLoader: Loaded the libdfs library
18/11/11 03:09:37 INFO fs.DFileSystem: Loaded the libdfs library
PADDLE_TRAINER_ID=0