[百度之星大赛]细分类训练时的bug
Created by: zky001
有时候训练到两千多轮、有时候训练到六千多七千多轮 还有的时候训练开始就会 我把batch已经调到非常小,这个问题也还会出现,改了好几次代码 也没起作用 运行七八个小时结果出错 这个错误成本太大了 这是框架问题吗? Valuerror: could not broadcast input array from shape (3,64,64) into shape (3) 完整的错误栈大概是下面这样 2020-08-09 23:14:23,889-WARNING: Your reader has raised an exception! Exception in thread Thread-1: Traceback (most recent call last): File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/threading.py", line 926, in _bootstrap_inner self.run() File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/reader.py", line 1156, in thread_main six.reraise(*sys.exc_info()) File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/six.py", line 703, in reraise raise value File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/reader.py", line 1136, in thread_main for tensors in self._tensor_reader(): File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/data_feeder.py", line 203, in call yield self._done() File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/data_feeder.py", line 191, in _done return [c.done() for c in self.converters] File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/data_feeder.py", line 191, in return [c.done() for c in self.converters] File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/data_feeder.py", line 156, in done arr = np.array(self.data, dtype=self.dtype) ValueError: could not broadcast input array from shape (3,64,64) into shape (3)
Traceback (most recent call last): File "train_elem.py", line 243, in main() File "train_elem.py", line 239, in main train_async(args) File "train_elem.py", line 188, in train_async for train_batch in train_loader(): File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/reader.py", line 1102, in next return self._reader.read_next() paddle.fluid.core_avx.EnforceNotMet
C++ Call Stacks (More useful to developers):
0 std::string paddle::platform::GetTraceBackString<std::string const&>(std::string const&, char const*, int) 1 paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const*, int) 2 paddle::operators::reader::BlockingQueue<std::vector<paddle::framework::LoDTensor, std::allocatorpaddle::framework::LoDTensor > >::Receive(std::vector<paddle::framework::LoDTensor, std::allocatorpaddle::framework::LoDTensor >) 3 paddle::operators::reader::PyReader::ReadNext(std::vector<paddle::framework::LoDTensor, std::allocatorpaddle::framework::LoDTensor >) 4 std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result, std::__future_base::_Result_base::_Deleter>, unsigned long> >::_M_invoke(std::_Any_data const&) 5 std::__future_base::_State_base::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>&, bool&) 6 ThreadPool::ThreadPool(unsigned long)::{lambda()#1 (closed)}::operator()() const
Error Message Summary:
Error: Blocking queue is killed because the data reader raises an exception [Hint: Expected killed_ != true, but received killed_:1 == true:1.] at (/paddle/paddle/fluid/operators/reader/blocking_queue.h:141)
terminate called without an active exception W0809 23:14:24.456936 1279 init.cc:216] Warning: PaddlePaddle catches a failure signal, it may not work properly W0809 23:14:24.456981 1279 init.cc:218] You could check whether you killed PaddlePaddle thread/process accidentally or report the case to PaddlePaddle W0809 23:14:24.456990 1279 init.cc:221] The detail failure signal is: W0809 23:14:24.456998 1279 init.cc:224] *** Aborted at 1596986064 (unix time) try "date -d @1596986064" if you are using GNU date *** W0809 23:14:24.459416 1279 init.cc:224] PC: @ 0x0 (unknown) W0809 23:14:24.459564 1279 init.cc:224] *** SIGABRT (@0x3e8000004cc) received by PID 1228 (TID 0x7f1aa3fff700) from PID 1228; stack trace: *** W0809 23:14:24.462023 1279 init.cc:224] @ 0x7f1c7d1e9390 (unknown) W0809 23:14:24.464035 1279 init.cc:224] @ 0x7f1c7ce43428 gsignal W0809 23:14:24.465437 1279 init.cc:224] @ 0x7f1c7ce4502a abort W0809 23:14:24.469358 1279 init.cc:224] @ 0x7f1c51bc784a __gnu_cxx::__verbose_terminate_handler() W0809 23:14:24.470468 1279 init.cc:224] @ 0x7f1c51bc5f47 __cxxabiv1::__terminate() W0809 23:14:24.471663 1279 init.cc:224] @ 0x7f1c51bc5f7d std::terminate() W0809 23:14:24.472723 1279 init.cc:224] @ 0x7f1c51bc5c5a __gxx_personality_v0 W0809 23:14:24.474215 1279 init.cc:224] @ 0x7f1c51eb8b97 _Unwind_ForcedUnwind_Phase2 W0809 23:14:24.475225 1279 init.cc:224] @ 0x7f1c51eb8e7d _Unwind_ForcedUnwind W0809 23:14:24.477159 1279 init.cc:224] @ 0x7f1c7d1e8070 __GI___pthread_unwind W0809 23:14:24.479091 1279 init.cc:224] @ 0x7f1c7d1e0845 __pthread_exit W0809 23:14:24.479468 1279 init.cc:224] @ 0x55f638a8de59 PyThread_exit_thread W0809 23:14:24.479591 1279 init.cc:224] @ 0x55f638913c17 PyEval_RestoreThread.cold.798 W0809 23:14:24.480059 1279 init.cc:224] @ 0x7f1c469487ba (unknown) W0809 23:14:24.480401 1279 init.cc:224] @ 0x55f638a0f744 _PyMethodDef_RawFastCallKeywords W0809 23:14:24.480726 1279 init.cc:224] @ 0x55f638a0f861 _PyCFunction_FastCallKeywords W0809 23:14:24.481055 1279 init.cc:224] @ 0x55f638a7b2bd _PyEval_EvalFrameDefault W0809 23:14:24.481350 1279 init.cc:224] @ 0x55f6389bf539 _PyEval_EvalCodeWithName W0809 23:14:24.481643 1279 init.cc:224] @ 0x55f6389c0860 _PyFunction_FastCallDict W0809 23:14:24.481848 1279 init.cc:224] @ 0x55f638aceb5b partial_call W0809 23:14:24.482152 1279 init.cc:224] @ 0x55f638a178fb _PyObject_FastCallKeywords W0809 23:14:24.482472 1279 init.cc:224] @ 0x55f638a7ae86 _PyEval_EvalFrameDefault W0809 23:14:24.482764 1279 init.cc:224] @ 0x55f6389bf81a _PyEval_EvalCodeWithName W0809 23:14:24.483067 1279 init.cc:224] @ 0x55f6389c0635 _PyFunction_FastCallDict W0809 23:14:24.483397 1279 init.cc:224] @ 0x55f638a78232 _PyEval_EvalFrameDefault W0809 23:14:24.483678 1279 init.cc:224] @ 0x55f638a0eccb _PyFunction_FastCallKeywords W0809 23:14:24.484002 1279 init.cc:224] @ 0x55f638a76a93 _PyEval_EvalFrameDefault W0809 23:14:24.484282 1279 init.cc:224] @ 0x55f638a0eccb _PyFunction_FastCallKeywords W0809 23:14:24.484601 1279 init.cc:224] @ 0x55f638a76a93 _PyEval_EvalFrameDefault W0809 23:14:24.484897 1279 init.cc:224] @ 0x55f6389c056b _PyFunction_FastCallDict W0809 23:14:24.485204 1279 init.cc:224] @ 0x55f6389dee53 _PyObject_Call_Prepend W0809 23:14:24.485523 1279 init.cc:224] @ 0x55f6389d1dbe PyObject_Call Aborted (core dumped)