Created by: chenwhql
PR types
Bug fixes
PR changes
Others
Describe
Fix issue: https://github.com/PaddlePaddle/Paddle/issues/25302
Our paddle.batch
and DataLoader.set_batch_generator
are easy to be mistaken by users as a pair APIs, but this is actually wrong, it will cause a segmentation fault, so this PR add data check interception error.
original:
λ yq01-gpu-255-137-12-00 /work/scripts/travel {master} python train.py
grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
W0702 13:15:35.228315 32642 init.cc:232] Warning: PaddlePaddle catches a failure signal, it may not work properly
W0702 13:15:35.228341 32642 init.cc:234] You could check whether you killed PaddlePaddle thread/process accidentally or report the case to PaddlePaddle
W0702 13:15:35.228356 32642 init.cc:237] The detail failure signal is:
W0702 13:15:35.228369 32642 init.cc:240] *** Aborted at 1593695735 (unix time) try "date -d @1593695735" if you are using GNU date ***
W0702 13:15:35.229774 32642 init.cc:240] PC: @ 0x0 (unknown)
W0702 13:15:35.230036 32642 init.cc:240] *** SIGSEGV (@0x7ff5f8027000) received by PID 32642 (TID 0x7ff6a782a700) from PID 18446744073575493632; stack trace: ***
W0702 13:15:35.230265 32642 init.cc:240] @ 0x7ff6041552ae google::(anonymous namespace)::FailureSignalHandler()
W0702 13:15:35.231428 32642 init.cc:240] @ 0x7ff6a740a390 (unknown)
W0702 13:15:35.231631 32642 init.cc:240] @ 0x7ff603f9e8dc _ZNSt10_HashtableINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt4pairIKS5_N6paddle9framework9LoDTensorEESaISB_ENSt8__detail10_Select1stESt8equal_toIS5_ESt4hashIS5_ENSD_18_Mod_range_hashingENSD_20_Default_ranged_hashENSD_20_Prime_rehash_policyENSD_17_Hashtable_traitsILb1ELb0ELb1EEEE10_M_emplaceIJRS5_SA_EEES6_INSD_14_Node_iteratorISB_Lb0ELb1EEEbESt17integral_constantIbLb1EEDpOT_.constprop.1600
W0702 13:15:35.237118 32642 init.cc:240] @ 0x7ff603fbd6a2 _ZN6paddle6pybind21MultiDeviceFeedReaderINS_9operators6reader40OrderedMultiDeviceLoDTensorBlockingQueueEE8ReadNextB5cxx11Ev
W0702 13:15:35.239085 32642 init.cc:240] @ 0x7ff603fb624a _ZZN8pybind1112cpp_function10initializeIZNS0_C4ISt6vectorISt13unordered_mapINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEN6paddle9framework9LoDTensorESt4hashISA_ESt8equal_toISA_ESaISt4pairIKSA_SD_EEESaISM_EENSB_6pybind21MultiDeviceFeedReaderINSB_9operators6reader40OrderedMultiDeviceLoDTensorBlockingQueueEEEJEJNS_4nameENS_9is_methodENS_7siblingENS_10call_guardIJNS_18gil_scoped_releaseEEEEEEEMT0_FT_DpT1_EDpRKT2_EUlPSU_E_SO_JS1B_EJSV_SW_SX_S10_EEEvOS12_PFS11_S14_ES1A_ENUlRNS_6detail13function_callEE1_4_FUNES1I_
W0702 13:15:35.240906 32642 init.cc:240] @ 0x7ff603e109de pybind11::cpp_function::dispatcher()
W0702 13:15:35.241127 32642 init.cc:240] @ 0x4e1307 PyCFunction_Call
W0702 13:15:35.241308 32642 init.cc:240] @ 0x535fcb PyEval_EvalFrameEx
W0702 13:15:35.241477 32642 init.cc:240] @ 0x53a81b PyEval_EvalCodeEx
W0702 13:15:35.241664 32642 init.cc:240] @ 0x4e3423 (unknown)
W0702 13:15:35.241819 32642 init.cc:240] @ 0x5c3bd7 PyObject_Call
W0702 13:15:35.242022 32642 init.cc:240] @ 0x4f08be (unknown)
W0702 13:15:35.242178 32642 init.cc:240] @ 0x5c3bd7 PyObject_Call
W0702 13:15:35.242357 32642 init.cc:240] @ 0x57f216 (unknown)
W0702 13:15:35.242533 32642 init.cc:240] @ 0x4e554f (unknown)
W0702 13:15:35.242712 32642 init.cc:240] @ 0x53102e PyEval_EvalFrameEx
W0702 13:15:35.242897 32642 init.cc:240] @ 0x539f5f (unknown)
W0702 13:15:35.243041 32642 init.cc:240] @ 0x535af2 PyEval_EvalFrameEx
W0702 13:15:35.243189 32642 init.cc:240] @ 0x539a13 (unknown)
W0702 13:15:35.243276 32642 init.cc:240] @ 0x53a6cf PyEval_EvalCode
W0702 13:15:35.243436 32642 init.cc:240] @ 0x6292c2 (unknown)
W0702 13:15:35.243569 32642 init.cc:240] @ 0x62b76a PyRun_FileExFlags
W0702 13:15:35.243655 32642 init.cc:240] @ 0x62bf5c PyRun_SimpleFileExFlags
W0702 13:15:35.243798 32642 init.cc:240] @ 0x63d506 Py_Main
W0702 13:15:35.243935 32642 init.cc:240] @ 0x4cfd11 main
W0702 13:15:35.245245 32642 init.cc:240] @ 0x7ff6a704f830 __libc_start_main
W0702 13:15:35.245362 32642 init.cc:240] @ 0x5d36e9 _start
W0702 13:15:35.246428 32642 init.cc:240] @ 0x0 (unknown)
Segmentation fault
new:
λ yq01-gpu-255-137-12-00 /work/scripts/travel {master} python train.py
grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
Traceback (most recent call last):
File "train.py", line 151, in <module>
train()
File "train.py", line 103, in train
for step_id, train_data in enumerate(data_loader()):
File "/usr/local/lib/python3.5/dist-packages/paddle/fluid/reader.py", line 1104, in __next__
return self._reader.read_next()
paddle.fluid.core_avx.EnforceNotMet:
--------------------------------------------
C++ Call Stacks (More useful to developers):
--------------------------------------------
0 std::string paddle::platform::GetTraceBackString<std::string const&>(std::string const&, char const*, int)
1 paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const*, int)
2 paddle::pybind::MultiDeviceFeedReader<paddle::operators::reader::OrderedMultiDeviceLoDTensorBlockingQueue>::ReadNext[abi:cxx11]()
----------------------
Error Message Summary:
----------------------
InvalidArgumentError: The sample number of reader's input data and the input number of feed list are not equal.
Possible reasons are:
The generator is decorated by `paddle.batch` and configured by `set_batch_generator`, but here need to used `set_sample_list_generator`.
[Hint: Expected names_.size() == ret_[i].size(), but received names_.size():1314 != ret_[i].size():1024.] at (/work/paddle/paddle/fluid/pybind/reader_py.cc:195)