Segment fault when using v2 FC in huge data/cluster mode.
Created by: youan1
如题:日志如下,在跑完pass 0后出segment fault error,请paddle 同学帮忙看看是什么问题导致的 ,谢谢!
Thu Jul 20 13:04:14 2017[1,3]<stdout>:Pass 0, Batch 0, Cost 35.821964, {'__auc_evaluator_0__': 0.0, 'classification_error_evaluator': 0.984375}
Thu Jul 20 13:04:14 2017[1,19]<stdout>:
Thu Jul 20 13:04:14 2017[1,19]<stdout>:Pass 0, Batch 0, Cost 34.752586, {'__auc_evaluator_0__': 0.0, 'classification_error_evaluator': 1.0}
Thu Jul 20 13:04:14 2017[1,8]<stdout>:
Thu Jul 20 13:04:14 2017[1,8]<stdout>:Pass 0, Batch 0, Cost 40.112804, {'__auc_evaluator_0__': 0.0, 'classification_error_evaluator': 1.0}
Thu Jul 20 13:04:14 2017[1,17]<stdout>:
Thu Jul 20 13:04:14 2017[1,17]<stdout>:Pass 0, Batch 0, Cost 30.483822, {'__auc_evaluator_0__': 0.0, 'classification_error_evaluator': 1.0}
Thu Jul 20 13:04:14 2017[1,17]<stderr>:Thread [140637305579264] Forwarding __fc_layer_5__,
Thu Jul 20 13:04:14 2017[1,17]<stderr>:*** Aborted at 1500527054 (unix time) try "date -d @1500527054" if you are using GNU date ***
Thu Jul 20 13:04:14 2017[1,17]<stderr>:PC: @ 0x0 (unknown)
Thu Jul 20 13:04:14 2017[1,17]<stderr>:*** SIGSEGV (@0x0) received by PID 18317 (TID 0x7fe8aca42700) from PID 0; stack trace: ***
Thu Jul 20 13:04:14 2017[1,17]<stderr>: @ 0x7fe8ac619160 (unknown)
Thu Jul 20 13:04:14 2017[1,17]<stderr>: @ 0x7fe8a5fccb5c paddle::FullyConnectedLayer::backward()
Thu Jul 20 13:04:14 2017[1,17]<stderr>: @ 0x7fe8a603bbaf paddle::NeuralNetwork::backward()
Thu Jul 20 13:04:14 2017[1,17]<stderr>: @ 0x7fe8a62571c4 GradientMachine::forwardBackward()
Thu Jul 20 13:04:14 2017[1,17]<stderr>: @ 0x7fe8a5f041d9 _wrap_GradientMachine_forwardBackward
Thu Jul 20 13:04:14 2017[1,17]<stderr>: @ 0x4b4cb9 PyEval_EvalFrameEx
Thu Jul 20 13:04:14 2017[1,17]<stderr>: @ 0x4b6b28 PyEval_EvalCodeEx
Thu Jul 20 13:04:14 2017[1,17]<stderr>: @ 0x4b5d10 PyEval_EvalFrameEx
Thu Jul 20 13:04:14 2017[1,17]<stderr>: @ 0x4b6b28 PyEval_EvalCodeEx
Thu Jul 20 13:04:14 2017[1,17]<stderr>: @ 0x4b5d10 PyEval_EvalFrameEx
Thu Jul 20 13:04:14 2017[1,17]<stderr>: @ 0x4b6b28 PyEval_EvalCodeEx
Thu Jul 20 13:04:14 2017[1,17]<stderr>: @ 0x4b5d10 PyEval_EvalFrameEx
Thu Jul 20 13:04:14 2017[1,17]<stderr>: @ 0x4b6b28 PyEval_EvalCodeEx
Thu Jul 20 13:04:14 2017[1,17]<stderr>: @ 0x4b5d10 PyEval_EvalFrameEx
Thu Jul 20 13:04:14 2017[1,17]<stderr>: @ 0x4b6b28 PyEval_EvalCodeEx
Thu Jul 20 13:04:14 2017[1,17]<stderr>: @ 0x4b6c52 PyEval_EvalCode
Thu Jul 20 13:04:14 2017[1,17]<stderr>: @ 0x4e1c7d PyRun_FileExFlags
Thu Jul 20 13:04:14 2017[1,17]<stderr>: @ 0x4e3501 PyRun_SimpleFileExFlags
Thu Jul 20 13:04:14 2017[1,17]<stderr>: @ 0x4159dd Py_Main
Thu Jul 20 13:04:14 2017[1,17]<stderr>: @ 0x7fe8abb73bd5 __libc_start_main
Thu Jul 20 13:04:14 2017[1,17]<stderr>: @ 0x414b71 (unknown)
Thu Jul 20 13:04:14 2017[1,17]<stderr>: @ 0x0 (unknown)
Thu Jul 20 13:04:16 2017[1,17]<stderr>:./train.sh: line 239: 18317 Segmentation fault python27-gcc482/bin/python conf/trainer_config.conf