数据量大时的预测错误
Created by: DominicXWang
将chnsenticorp 数据集中的文本替换成自己需要的文本,然后用ernie_encoder.py导出词向量遇到问题。 小数据:5000条以下,正常执行 中等规模2-4万,有时报错,有时正常 10万以上行,都会报错。
每次报错不同,分两种: 第一种是 Load pretraining parameters from /home/X/tools/py27/ernie/model/params. Traceback (most recent call last): File "ernie_encoder.py", line 182, in main(args) File "ernie_encoder.py", line 160, in main return_numpy=False) File "/home/X/.jumbo/lib/python2.7/lib/python2.7/site-packages/paddle/fluid/executor.py", line 565, in run use_program_cache=use_program_cache) File "/home/X/.jumbo/lib/python2.7/lib/python2.7/site-packages/paddle/fluid/executor.py", line 642, in run exe.run(program.desc, scope, 0, True, True, fetch_var_name) paddle.fluid.core.EnforceNotMet: Invoke operator sequence_unpad error. Python Callstacks: File "/home/X/.jumbo/lib/python2.7/lib/python2.7/site-packages/paddle/fluid/framework.py", line 1654, in append_op attrs=kwargs.get("attrs", None)) File "/home/X/.jumbo/lib/python2.7/lib/python2.7/site-packages/paddle/fluid/layer_helper.py", line 43, in append_op return self.main_program.current_block().append_op(*args, **kwargs) File "/home/X/.jumbo/lib/python2.7/lib/python2.7/site-packages/paddle/fluid/layers/nn.py", line 4120, in sequence_unpad outputs={'Out': out}) File "ernie_encoder.py", line 75, in create_model unpad_enc_out = fluid.layers.sequence_unpad(enc_out, length=seq_lens) File "ernie_encoder.py", line 128, in main args, pyreader_name='reader', ernie_config=ernie_config) File "ernie_encoder.py", line 182, in main(args) C++ Callstacks: Enforce failed. Expected numel() * SizeOfType(type()) <= memory_size(), but received numel() * SizeOfType(type()):16195584 > memory_size():3735552. Tensor's dims is out of bound. Call Tensor::mutable_data first to re-allocate memory. or maybe the required data-type mismatches the data already stored. at [/paddle/paddle/fluid/framework/tensor.cc:28] PaddlePaddle Call Stacks: 0 0x7f561f86cb68p void paddle::platform::EnforceNotMet::Initstd::string(std::string, char const*, int) + 360 1 0x7f561f86ceb7p paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const*, int) + 87 2 0x7f56215525cap paddle::framework::Tensor::check_memory_size() const + 394 3 0x7f56211bb903p paddle::operators::math::UnpaddingLoDTensorFunctor<paddle::platform::CUDADeviceContext, float>::operator()(paddle::platform::CUDADeviceContext const&, paddle::framework::LoDTensor const&, paddle::framework::LoDTensor*, int, int, bool, paddle::operators::math::PadLayout) + 611 4 0x7f562069e2a5p paddle::operators::SequenceUnpadOpKernel<paddle::platform::CUDADeviceContext, float>::Compute(paddle::framework::ExecutionContext const&) const + 981 5 0x7f562069e4a3p std::Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CUDAPlace, false, 0ul, paddle::operators::SequenceUnpadOpKernel<paddle::platform::CUDADeviceContext, float>, paddle::operators::SequenceUnpadOpKernel<paddle::platform::CUDADeviceContext, double>, paddle::operators::SequenceUnpadOpKernel<paddle::platform::CUDADeviceContext, int>, paddle::operators::SequenceUnpadOpKernel<paddle::platform::CUDADeviceContext, long> >::operator()(char const*, char const*, int) const::{lambda(paddle::framework::ExecutionContext const&)#1}>::M_invoke(std::Any_data const&, paddle::framework::ExecutionContext const&) + 35 6 0x7f56214fe6f6p paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&, paddle::framework::RuntimeContext*) const + 662 7 0x7f56214fee64p paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) const + 292 8 0x7f56214fc78cp paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) + 332 9 0x7f561f9e18bep paddle::framework::Executor::RunPreparedContext(paddle::framework::ExecutorPrepareContext*, paddle::framework::Scope*, bool, bool, bool) + 382 10 0x7f561f9e26ffp paddle::framework::Executor::Run(paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool, std::vector<std::string, std::allocatorstd::string > const&, bool) + 143 11 0x7f561f85c35ep 12 0x7f561f89f72ep 13 0x7f56836264e2p PyEval_EvalFrameEx + 29874 14 0x7f56836287fdp PyEval_EvalCodeEx + 2061 15 0x7f5683625982p PyEval_EvalFrameEx + 26962 16 0x7f56836287fdp PyEval_EvalCodeEx + 2061 17 0x7f5683625982p PyEval_EvalFrameEx + 26962 18 0x7f5683625a9dp PyEval_EvalFrameEx + 27245 19 0x7f56836287fdp PyEval_EvalCodeEx + 2061 20 0x7f5683628932p PyEval_EvalCode + 50 21 0x7f5683654882p PyRun_FileExFlags + 146 22 0x7f5683655bf9p PyRun_SimpleFileExFlags + 217 23 0x7f568366bb0dp Py_Main + 3149 24 0x38bfc21b45p __libc_start_main + 245 25 0x400691p 请问这是某一个batch分配现存时的错误吗?reader里没有控制机制吗?
另一种错误是 Load pretraining parameters from /home/X/tools/py367gcc48_paddle/ernie/model/params.
* Aborted at 1561361472 (unix time) try "date -d @1561361472" if you are using GNU date *
PC: @ 0x0 (unknown)* SIGFPE (@0x7ff61b501850) received by PID 14084 (TID 0x7ff67d886740) from PID 458233936; stack trace: *
@ 0x38c040f130 (unknown) @ 0x7ff61b501850 paddle::operators::math::UnpaddingLoDTensorFunctor<>::operator()() @ 0x7ff61a9e42a5 paddle::operators::SequenceUnpadOpKernel<>::Compute() @ 0x7ff61a9e44a3 ZNSt17_Function_handlerIFvRKN6paddle9framework16ExecutionContextEEZNKS1_24OpKernelRegistrarFunctorINS0_8platform9CUDAPlaceELb0ELm0EINS0_9operators21SequenceUnpadOpKernelINS7_17CUDADeviceContextEfEENSA_ISB_dEENSA_ISB_iEENSA_ISB_lEEEEclEPKcSI_iEUlS4_E_E9_M_invokeERKSt9_Any_dataS4 @ 0x7ff61b8446f6 paddle::framework::OperatorWithKernel::RunImpl() @ 0x7ff61b844e64 paddle::framework::OperatorWithKernel::RunImpl() @ 0x7ff61b84278c paddle::framework::OperatorBase::Run() @ 0x7ff619d278be paddle::framework::Executor::RunPreparedContext() @ 0x7ff619d286ff paddle::framework::Executor::Run() @ 0x7ff619ba235e ZZN8pybind1112cpp_function10initializeIZN6paddle6pybindL18pybind11_init_coreERNS_6moduleEEUlRNS2_9framework8ExecutorERKNS6_11ProgramDescEPNS6_5ScopeEibbRKSt6vectorISsSaISsEEE97_vIS8_SB_SD_ibbSI_EINS_4nameENS_9is_methodENS_7siblingEEEEvOT_PFT0_DpT1_EDpRKT2_ENUlRNS_6detail13function_callEE1_4_FUNES10 @ 0x7ff619be572e pybind11::cpp_function::dispatcher() @ 0x7ff67d9ac4e2 PyEval_EvalFrameEx @ 0x7ff67d9ae7fd PyEval_EvalCodeEx @ 0x7ff67d9ab982 PyEval_EvalFrameEx @ 0x7ff67d9ae7fd PyEval_EvalCodeEx @ 0x7ff67d9ab982 PyEval_EvalFrameEx @ 0x7ff67d9aba9d PyEval_EvalFrameEx @ 0x7ff67d9ae7fd PyEval_EvalCodeEx @ 0x7ff67d9ae932 PyEval_EvalCode @ 0x7ff67d9da882 PyRun_FileExFlags @ 0x7ff67d9dbbf9 PyRun_SimpleFileExFlags @ 0x7ff67d9f1b0d Py_Main @ 0x38bfc21b45 (unknown) @ 0x400691 (unknown) @ 0x0 (unknown)相同环境和代码测试多遍(未shuffle),发现出错时两个错误都可能出现,没有什么规律