DataLoader读取数据Error in `python': free(): invalid pointer
Created by: liangzhenduo0608
-
版本、环境信息: 1)PaddlePaddle版本:1.7.1 2)CPU: 3)GPU:V100单卡 NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 CUDNN_VERSION=7.6.3.30 4)系统环境:公司内网PaddleCloud k8s
-
训练信息 1)单机单卡 2)Device 0: "Tesla V100-SXM2-16GB" CUDA Driver Version 10.2 CUDA Capability Major/Minor version number: 7.0 Total amount of global memory: 16160 MBytes (16945512448 bytes) 3)Operator信息
-
复现信息: 使用data_loader替代train_reader
'''py data_loader = fluid.io.DataLoader.from_generator( feed_list=feed_list, capacity=4, iterable=True) data_loader.set_batch_generator(train_reader, places=place) '''
分batch训练的时候报错
'''py for step_id, train_data in enumerate(data_loader()): loss_value, acc_value = exe.run( program=compiled_train_prog, feed=feeder.feed(train_data), fetch_list=[avg_cost.name, batch_acc.name] ) '''
错误日志参见 http://10.255.125.11:8388/v1/containers/abe565090f7b71a0997ef0fead5b8217063102a19a5c35817414e9382beb6336/backuplog 的log/err.log,训练代码env_run/train.py
- 问题描述: W0701 15:14:27.777395 3211 init.cc:209] Warning: PaddlePaddle catches a failure signal, it may not work properly W0701 15:14:27.777421 3211 init.cc:211] You could check whether you killed PaddlePaddle thread/process accidentally or report the case to PaddlePaddle W0701 15:14:27.777426 3211 init.cc:214] The detail failure signal is: W0701 15:14:27.777429 3211 init.cc:217] *** Aborted at 1593587667 (unix time) try "date -d @1593587667" if you are using GNU date *** W0701 15:14:27.778998 3211 init.cc:217] PC: @ 0x0 (unknown) W0701 15:14:27.779072 3211 init.cc:217] *** SIGSEGV (@0x7fceffa2c108) received by PID 3211 (TID 0x7fd03ba48700) from PID 18446744073703440648; stack trace: *** W0701 15:14:27.781042 3211 init.cc:217] @ 0x7fd03b01bbb0 (unknown) W0701 15:14:27.785344 3211 init.cc:217] @ 0x7fcf393e50bf std::_Sp_counted_base<>::_M_release() W0701 15:14:27.787106 3211 init.cc:217] @ 0x7fcf394ed2c5 std::_Hashtable<>::_M_allocate_node<>() W0701 15:14:27.790634 3211 init.cc:217] @ 0x7fcf394efdf6 paddle::pybind::MultiDeviceFeedReader::ReadNext() W0701 15:14:27.792016 3211 init.cc:217] @ 0x7fcf394f1601 ZZN8pybind1112cpp_function10initializeIZNS0_C1ISt6vectorISt13unordered_mapISsN6paddle9framework9LoDTensorESt4hashISsESt8equal_toISsESaISt4pairIKSsS7_EEESaISG_EENS5_6pybind21MultiDeviceFeedReaderEIEINS_4nameENS_9is_methodENS_7siblingENS_10call_guardIINS_18gil_scoped_releaseEEEEEEEMT0_FT_DpT1_EDpRKT2_EUlPSK_E_SI_IS11_EISL_SM_SN_SQ_EEEvOSS_PFSR_SU_ES10_ENUlRNS_6detail13function_callEE1_4_FUNES18 W0701 15:14:27.793130 3211 init.cc:217] @ 0x7fcf39422bb1 pybind11::cpp_function::dispatcher() W0701 15:14:27.794306 3211 init.cc:217] @ 0x7fd03b333ce8 PyEval_EvalFrameEx W0701 15:14:27.795284 3211 init.cc:217] @ 0x7fd03b333e9e PyEval_EvalFrameEx W0701 15:14:27.796257 3211 init.cc:217] @ 0x7fd03b33637d PyEval_EvalCodeEx W0701 15:14:27.797233 3211 init.cc:217] @ 0x7fd03b2ad830 (unknown) W0701 15:14:27.798187 3211 init.cc:217] @ 0x7fd03b27bd33 PyObject_Call W0701 15:14:27.799155 3211 init.cc:217] @ 0x7fd03b28a74d (unknown) W0701 15:14:27.800107 3211 init.cc:217] @ 0x7fd03b27bd33 PyObject_Call W0701 15:14:27.801081 3211 init.cc:217] @ 0x7fd03b2e7ee6 (unknown) W0701 15:14:27.802049 3211 init.cc:217] @ 0x7fd03b29c8ef (unknown) W0701 15:14:27.803011 3211 init.cc:217] @ 0x7fd03b32f8b2 PyEval_EvalFrameEx *** Error in `python': free(): invalid pointer: 0x00007fcef804c940 *** ======= Backtrace: ========= /lib64/libc.so.6(+0x6db01)[0x7fd03a5d0b01] /lib64/libc.so.6(+0x73296)[0x7fd03a5d6296] /lib64/libc.so.6(+0x73fe3)[0x7fd03a5d6fe3] W0701 15:14:27.804011 3211 init.cc:217] @ 0x7fd03b33637d PyEval_EvalCodeEx /opt/_internal/cpython-2.7.11-ucs4/lib/python2.7/site-packages/paddle/fluid/core_avx.so(+0x42a67c8)[0x7fcf3bf267c8] W0701 15:14:27.804991 3211 init.cc:217] @ 0x7fd03b333d70 PyEval_EvalFrameEx /opt/_internal/cpython-2.7.11-ucs4/lib/python2.7/site-packages/paddle/fluid/core_avx.so(_ZNSt17_Function_handlerIFSt10unique_ptrINSt13__future_base12_Result_baseENS2_8_DeleterEEvENS1_12_Task_setterIS0_INS1_7_ResultImEES3_EmEEE9_M_invokeERKSt9_Any_data+0x23)[0x7fcf3bf297c3] W0701 15:14:27.805953 3211 init.cc:217] @ 0x7fd03b33637d PyEval_EvalCodeEx /opt/_internal/cpython-2.7.11-ucs4/lib/python2.7/site-packages/paddle/fluid/core_avx.so(_ZNSt13__future_base11_State_base9_M_do_setERSt8functionIFSt10unique_ptrINS_12_Result_baseENS3_8_DeleterEEvEERb+0x27)[0x7fcf394e8b37] /lib64/libpthread.so.0(+0x6b4f)[0x7fd03b011b4f] W0701 15:14:27.806897 3211 init.cc:217] @ 0x7fd03b3364b2 PyEval_EvalCode /opt/_internal/cpython-2.7.11-ucs4/lib/python2.7/site-packages/paddle/fluid/core_avx.so(+0x42a55f2)[0x7fcf3bf255f2] W0701 15:14:27.807849 3211 init.cc:217] @ 0x7fd03b3601c2 PyRun_FileExFlags /opt/_internal/cpython-2.7.11-ucs4/lib/python2.7/site-packages/paddle/fluid/core_avx.so(_ZZN10ThreadPoolC1EmENKUlvE_clEv+0x194)[0x7fcf394eadb4] /usr/lib64/libstdc++.so.6(+0xb6470)[0x7fcf5f748470] /lib64/libpthread.so.0(+0x86a2)[0x7fd03b0136a2] /lib64/libc.so.6(clone+0x6d)[0x7fd03a64908d] W0701 15:14:27.809892 3211 init.cc:217] @ 0x7fd03b3771dd Py_Main W0701 15:14:27.812273 3211 init.cc:217] @ 0x7fd03a5837f5 __libc_start_main W0701 15:14:27.812559 3211 init.cc:217] @ 0x4006b1 (unknown) /root/paddlejob/run.sh: line 325: 3211 Segmentation fault python train.py