python pdseg/train.py --use_gpu --use_mpio --cfg ./configs/deeplabv3p_xception65_cityscapes.yaml
Created by: githupbill
[SKIP] Shape of pretrained weight pretrained_model/deeplabv3p_xception65_bn_coco/logit/weights doesn't match.(Pretrained: (21, 256, 1, 1), Actual: (19, 256, 1, 1)) [SKIP] Shape of pretrained weight pretrained_model/deeplabv3p_xception65_bn_coco/logit/biases doesn't match.(Pretrained: (21,), Actual: (19,)) 2020-08-21 14:10:32,832-WARNING: pretrained_model/deeplabv3p_xception65_bn_coco.pdparams not found, try to load model file saved with [ save_params, save_persistables, save_vars ] There are 730/732 varaibles in pretrained_model/deeplabv3p_xception65_bn_coco are loaded. Use multiprocess reader epoch=1 step=10 lr=0.00997 loss=2.4465 step/sec=1.913 | ETA 00:25:49 epoch=1 step=20 lr=0.00994 loss=2.5981 step/sec=2.255 | ETA 00:21:50 epoch=1 step=30 lr=0.00991 loss=2.2055 step/sec=2.273 | ETA 00:21:35 epoch=1 step=40 lr=0.00988 loss=1.6398 step/sec=2.232 | ETA 00:21:54 2020-08-21 14:10:54,902-WARNING: Your reader has raised an exception! Exception in thread Thread-1: Traceback (most recent call last): File "/home/ai-master/PycharmProjects/PaddleSeg/pdseg/reader.py", line 121, in multiprocess_generator generator_out = enqueuer.queue.get(timeout=5) File "", line 2, in get File "/home/ai-master/anaconda3/envs/paddle/lib/python3.6/multiprocessing/managers.py", line 757, in _callmethod kind, result = conn.recv() File "/home/ai-master/anaconda3/envs/paddle/lib/python3.6/multiprocessing/connection.py", line 250, in recv buf = self._recv_bytes() File "/home/ai-master/anaconda3/envs/paddle/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes buf = self._recv(4) File "/home/ai-master/anaconda3/envs/paddle/lib/python3.6/multiprocessing/connection.py", line 383, in _recv raise EOFError EOFError
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/home/ai-master/anaconda3/envs/paddle/lib/python3.6/threading.py", line 916, in _bootstrap_inner self.run() File "/home/ai-master/anaconda3/envs/paddle/lib/python3.6/threading.py", line 864, in run self._target(*self._args, **self._kwargs) File "/home/ai-master/anaconda3/envs/paddle/lib/python3.6/site-packages/paddle/fluid/reader.py", line 1145, in thread_main six.reraise(*sys.exc_info()) File "/home/ai-master/anaconda3/envs/paddle/lib/python3.6/site-packages/six.py", line 703, in reraise raise value File "/home/ai-master/anaconda3/envs/paddle/lib/python3.6/site-packages/paddle/fluid/reader.py", line 1125, in thread_main for tensors in self._tensor_reader(): File "/home/ai-master/anaconda3/envs/paddle/lib/python3.6/site-packages/paddle/fluid/data_feeder.py", line 196, in call for each_sample in self.generator(): File "pdseg/train.py", line 207, in data_generator for b in data_gen: File "/home/ai-master/PycharmProjects/PaddleSeg/pdseg/reader.py", line 130, in multiprocess_generator enqueuer.stop() File "/home/ai-master/PycharmProjects/PaddleSeg/pdseg/data_utils.py", line 117, in stop if self.is_running(): File "/home/ai-master/PycharmProjects/PaddleSeg/pdseg/data_utils.py", line 100, in is_running if not self.queue.empty(): File "", line 2, in empty File "/home/ai-master/anaconda3/envs/paddle/lib/python3.6/multiprocessing/managers.py", line 756, in _callmethod conn.send((self._id, methodname, args, kwds)) File "/home/ai-master/anaconda3/envs/paddle/lib/python3.6/multiprocessing/connection.py", line 206, in send self._send_bytes(_ForkingPickler.dumps(obj)) File "/home/ai-master/anaconda3/envs/paddle/lib/python3.6/multiprocessing/connection.py", line 404, in _send_bytes self._send(header + buf) File "/home/ai-master/anaconda3/envs/paddle/lib/python3.6/multiprocessing/connection.py", line 368, in _send n = write(self._handle, buf) BrokenPipeError: [Errno 32] Broken pipe
/home/ai-master/anaconda3/envs/paddle/lib/python3.6/site-packages/paddle/fluid/executor.py:1070: UserWarning: The following exception is not an EOF exception. "The following exception is not an EOF exception.")
C++ Call Stacks (More useful to developers):
0 std::string paddle::platform::GetTraceBackString<std::string const&>(std::string const&, char const*, int) 1 paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const*, int) 2 paddle::operators::reader::BlockingQueue<std::vector<paddle::framework::LoDTensor, std::allocatorpaddle::framework::LoDTensor > >::Receive(std::vector<paddle::framework::LoDTensor, std::allocatorpaddle::framework::LoDTensor >) 3 paddle::operators::reader::PyReader::ReadNext(std::vector<paddle::framework::LoDTensor, std::allocatorpaddle::framework::LoDTensor >) 4 std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result, std::__future_base::_Result_base::_Deleter>, unsigned long> >::_M_invoke(std::_Any_data const&) 5 std::__future_base::_State_base::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>&, bool&) 6 ThreadPool::ThreadPool(unsigned long)::{lambda()#1}::operator()() const
Python Call Stacks (More useful to users):
File "/home/ai-master/anaconda3/envs/paddle/lib/python3.6/site-packages/paddle/fluid/framework.py", line 2610, in append_op attrs=kwargs.get("attrs", None)) File "/home/ai-master/anaconda3/envs/paddle/lib/python3.6/site-packages/paddle/fluid/reader.py", line 1080, in _init_non_iterable attrs={'drop_last': self._drop_last}) File "/home/ai-master/anaconda3/envs/paddle/lib/python3.6/site-packages/paddle/fluid/reader.py", line 978, in init self._init_non_iterable() File "/home/ai-master/anaconda3/envs/paddle/lib/python3.6/site-packages/paddle/fluid/reader.py", line 620, in from_generator iterable, return_list, drop_last) File "/home/ai-master/PycharmProjects/PaddleSeg/pdseg/models/model_builder.py", line 144, in build_model use_double_buffer=True) File "pdseg/train.py", line 237, in train train_prog, startup_prog, phase=ModelPhase.TRAIN) File "pdseg/train.py", line 453, in main train(cfg) File "pdseg/train.py", line 466, in main(args)
Error Message Summary:
Error: Blocking queue is killed because the data reader raises an exception [Hint: Expected killed_ != true, but received killed_:1 == true:1.] at (/paddle/paddle/fluid/operators/reader/blocking_queue.h:141) [operator < read > error] W0821 14:10:55.495247 11763 operator.cc:187] read raises an exception std::future_error, St12future_error F0821 14:10:55.495333 11763 exception_holder.h:37] std::exception caught, No associated state *** Check failure stack trace: *** @ 0x7fe201c8480d google::LogMessage::Fail() @ 0x7fe201c882bc google::LogMessage::SendToLog() @ 0x7fe201c84333 google::LogMessage::Flush() @ 0x7fe201c897ce google::LogMessageFatal::~LogMessageFatal() @ 0x7fe204e50598 paddle::framework::details::ExceptionHolder::Catch() @ 0x7fe204eedbbe paddle::framework::details::FastThreadedSSAGraphExecutor::RunOpSync() @ 0x7fe204ee83a4 paddle::framework::details::FastThreadedSSAGraphExecutor::RunTracedOps() @ 0x7fe204eec7e3 paddle::framework::details::FastThreadedSSAGraphExecutor::Run() @ 0x7fe204e41acc _ZZN6paddle9framework7details29ScopeBufferedSSAGraphExecutor3RunERKSt6vectorISsSaISsEEbENKUlvE_clEv @ 0x7fe204e46034 paddle::framework::details::ScopeBufferedMonitor::Apply() @ 0x7fe204e42704 paddle::framework::details::ScopeBufferedSSAGraphExecutor::Run() @ 0x7fe201d4a13d paddle::framework::ParallelExecutor::Run() @ 0x7fe2019c0747 ZZN8pybind1112cpp_function10initializeIZN6paddle6pybindL22pybind11_init_core_avxERNS_6moduleEEUlRNS2_9framework16ParallelExecutorERKSt6vectorISsSaISsEEbE210_NS_6objectEIS8_SD_bEINS_4nameENS_9is_methodENS_7siblingEEEEvOT_PFT0_DpT1_EDpRKT2_ENUlRNS_6detail13function_callEE1_4_FUNESW @ 0x7fe201a0d4d9 pybind11::cpp_function::dispatcher() @ 0x55f3fc9f8334 _PyCFunction_FastCallDict @ 0x55f3fca7fade call_function @ 0x55f3fcaa255a _PyEval_EvalFrameDefault @ 0x55f3fca78b76 _PyEval_EvalCodeWithName @ 0x55f3fca79be6 fast_function @ 0x55f3fca7fa65 call_function @ 0x55f3fcaa331b _PyEval_EvalFrameDefault @ 0x55f3fca78b76 _PyEval_EvalCodeWithName @ 0x55f3fca79be6 fast_function @ 0x55f3fca7fa65 call_function @ 0x55f3fcaa331b _PyEval_EvalFrameDefault @ 0x55f3fca78b76 _PyEval_EvalCodeWithName @ 0x55f3fca79be6 fast_function @ 0x55f3fca7fa65 call_function @ 0x55f3fcaa331b _PyEval_EvalFrameDefault @ 0x55f3fca78fae _PyEval_EvalCodeWithName @ 0x55f3fca79be6 fast_function @ 0x55f3fca7fa65 call_function 已放弃 (核心已转储)