DCN预测时5%概率会有core出现
开放中
DCN预测时5%概率会有core出现
Created by: AltenLi
- 版本、环境信息:
1)PaddlePaddle版本:1.6
2)CPU
4)系统环境:python3.6.7
复现信息:以下是在hadoop上运行时的报错,本地运行时能看到很多core文件生成。 Namespace(batch_size=512, cat_feat_num='./data/poi_all/cat_feature_num.txt', clip_by_norm=100.0, cross_num=6, dnn_hidden_units=[1024, 1024], infer_by_user=True, infer_thre=0.9, is_sparse=False, l2_reg_cross=1e-05, lr=0.0001, model_output_dir='./cluster_model', num_epoch=2, num_thread=20, poi_fea='./data/poi_all/poi-info.infer.dat', pre_output_dir='data/predict_result', print_steps=100, steps=150000, test_epoch='10', test_valid_data_dir='data/test_valid', train_data_dir='data/train', use_bn=True, use_cuda=False, vocab_dir='./data/poi_all/vocab') OMP: Error #100: Fatal system error detected. OMP: System error #22: Invalid argument W1220 15:22:38.570390 215818 init.cc:205] *** Aborted at 1576826558 (unix time) try "date -d @1576826558" if you are using GNU date *** W1220 15:22:38.573508 215818 init.cc:205] PC: @ 0x0 (unknown) W1220 15:22:38.574007 215818 init.cc:205] *** SIGABRT (@0x1f500034b0a) received by PID 215818 (TID 0x7fc763c85700) from PID 215818; stack trace: *** W1220 15:22:38.582072 215818 init.cc:205] @ 0x7fc76385c160 (unknown) W1220 15:22:38.733220 215818 init.cc:205] @ 0x7fc762dca3f7 __GI_raise W1220 15:22:38.740550 215818 init.cc:205] @ 0x7fc762dcb7d8 __GI_abort W1220 15:22:38.743173 215818 init.cc:205] @ 0x7fc739d7e023 __kmp_abort_process W1220 15:22:38.749852 215818 init.cc:205] @ 0x7fc739d69aaf __kmp_fatal W1220 15:22:38.777932 215818 init.cc:205] @ 0x7fc739d0b4a8 KMPNativeAffinity::Mask::set_system_affinity() W1220 15:22:38.788611 215818 init.cc:205] @ 0x7fc739db9517 __kmp_affinity_bind_thread W1220 15:22:38.791082 215818 init.cc:205] @ 0x7fc739d03c5d _INTERNAL_26_______src_kmp_affinity_cpp_da295ce7::__kmp_affinity_create_x2apicid_map() W1220 15:22:38.801770 215818 init.cc:205] @ 0x7fc739cf97b5 _INTERNAL_26_______src_kmp_affinity_cpp_da295ce7::__kmp_aux_affinity_initialize() W1220 15:22:38.812433 215818 init.cc:205] @ 0x7fc739cf8d8b __kmp_affinity_initialize W1220 15:22:38.815022 215818 init.cc:205] @ 0x7fc739d7cf98 __kmp_middle_initialize W1220 15:22:38.825731 215818 init.cc:205] @ 0x7fc739d5e9ee __kmp_api_omp_get_num_procs W1220 15:22:38.847759 215818 init.cc:205] @ 0x7fc73a49105e mkl_serv_get_num_stripes W1220 15:22:38.861289 215818 init.cc:205] @ 0x7fc73a370974 mkl_blas_sgemm W1220 15:22:38.874635 215818 init.cc:205] @ 0x7fc73a2def09 SGEMM W1220 15:22:38.896353 215818 init.cc:205] @ 0x7fc73a29daa1 cblas_sgemm W1220 15:22:38.909498 215818 init.cc:205] @ 0x7fc7446654ab paddle::operators::math::Blas<>::MatMul<>() W1220 15:22:38.931319 215818 init.cc:205] @ 0x7fc7446659e3 paddle::operators::MulKernel<>::Compute() W1220 15:22:38.943982 215818 init.cc:205] @ 0x7fc744665bd3 ZNSt17_Function_handlerIFvRKN6paddle9framework16ExecutionContextEEZNKS1_24OpKernelRegistrarFunctorINS0_8platform8CPUPlaceELb0ELm0EINS0_9operators9MulKernelINS7_16CPUDeviceContextEfEENSA_ISB_dEEEEclEPKcSG_iEUlS4_E_E9_M_invokeERKSt9_Any_dataS4 W1220 15:22:38.957010 215818 init.cc:205] @ 0x7fc74564ba9b paddle::framework::OperatorWithKernel::RunImpl() W1220 15:22:38.971024 215818 init.cc:205] @ 0x7fc74564c421 paddle::framework::OperatorWithKernel::RunImpl() W1220 15:22:38.988966 215818 init.cc:205] @ 0x7fc745646500 paddle::framework::OperatorBase::Run() W1220 15:22:39.002995 215818 init.cc:205] @ 0x7fc7440a5736 paddle::framework::Executor::RunPreparedContext() W1220 15:22:39.015275 215818 init.cc:205] @ 0x7fc7440a89df paddle::framework::Executor::Run() W1220 15:22:39.027966 215818 init.cc:205] @ 0x7fc743ef630d ZZN8pybind1112cpp_function10initializeIZN6paddle6pybindL22pybind11_init_core_avxERNS_6moduleEEUlRNS2_9framework8ExecutorERKNS6_11ProgramDescEPNS6_5ScopeEibbRKSt6vectorISsSaISsEEE102_vIS8_SB_SD_ibbSI_EINS_4nameENS_9is_methodENS_7siblingEEEEvOT_PFT0_DpT1_EDpRKT2_ENUlRNS_6detail13function_callEE1_4_FUNES10 W1220 15:22:39.037434 215818 init.cc:205] @ 0x7fc743f3944e pybind11::cpp_function::dispatcher() W1220 15:22:39.037901 215818 init.cc:205] @ 0x4ac717 _PyCFunction_FastCallKeywords W1220 15:22:39.038012 215818 init.cc:205] @ 0x543b75 call_function W1220 15:22:39.038370 215818 init.cc:205] @ 0x5492f1 _PyEval_EvalFrameDefault W1220 15:22:39.038463 215818 init.cc:205] @ 0x543a21 _PyEval_EvalCodeWithName W1220 15:22:39.038542 215818 init.cc:205] @ 0x543d1f call_function W1220 15:22:39.038889 215818 init.cc:205] @ 0x54917d _PyEval_EvalFrameDefault mapper_infer_all.sh: line 11: 215818 Aborted $PYTHON_BIN -u infer_hdp.py --test_epoch 10 --vocab_dir ./data/poi_all/vocab --cat_feat_num ./data/poi_all/cat_feature_num.txt --poi_fea ./data/poi_all/poi-info.infer.dat --model_output_dir ./cluster_model --infer_thre 0.9 --infer_by_user True