fluid训练过程中出现Expected predict_data <= 1, but received predict_data:-nan > 1:1.
Created by: qnikev
fluid1.3版本 本地训练无问题 paddlecloud训练一段时间后报错 Tue Jul 9 15:04:47 2019[1,4]:Traceback (most recent call last): Tue Jul 9 15:04:47 2019[1,4]: File "train.py", line 81, in Tue Jul 9 15:04:47 2019[1,4]: train(option.is_mpi, option.use_cuda, option.lr, option.weight_decay) Tue Jul 9 15:04:47 2019[1,4]: File "train.py", line 70, in train Tue Jul 9 15:04:47 2019[1,4]: pass_num, output_path, use_cuda, lr, weight_decay) Tue Jul 9 15:04:47 2019[1,4]: File "/home/disk1/task_data/history/20190709/4.app-user-20190709140810-28131--tieba_middlepage_dssm_201907091407_paddlecloud/logs/workspace/env_run/topology.py", line 303, in train Tue Jul 9 15:04:47 2019[1,4]: train_loop(t.get_trainer_program()) Tue Jul 9 15:04:47 2019[1,4]: File "/home/disk1/task_data/history/20190709/4.app-user-20190709140810-28131--tieba_middlepage_dssm_201907091407_paddlecloud/logs/workspace/env_run/topology.py", line 253, in train_loop Tue Jul 9 15:04:47 2019[1,4]: uid_vec]) Tue Jul 9 15:04:47 2019[1,4]: File "/home/disk1/task_data/history/20190709/4.app-user-20190709140810-28131--tieba_middlepage_dssm_201907091407_paddlecloud/logs/workspace/python27-gcc482/lib/python2.7/site-packages/paddle/fluid/executor.py", line 525, in run Tue Jul 9 15:04:47 2019[1,4]: use_program_cache=use_program_cache) Tue Jul 9 15:04:47 2019[1,4]: File "/home/disk1/task_data/history/20190709/4.app-user-20190709140810-28131--tieba_middlepage_dssm_201907091407_paddlecloud/logs/workspace/python27-gcc482/lib/python2.7/site-packages/paddle/fluid/executor.py", line 591, in run Tue Jul 9 15:04:47 2019[1,4]: exe.run(program.desc, scope, 0, True, True) Tue Jul 9 15:04:47 2019[1,4]:paddle.fluid.core.EnforceNotMet: Invoke operator auc error. Tue Jul 9 15:04:47 2019[1,4]:Python Callstacks: Tue Jul 9 15:04:47 2019[1,4]: File "/home/disk1/task_data/history/20190709/4.app-user-20190709140810-28131--tieba_middlepage_dssm_201907091407_paddlecloud/logs/workspace/python27-gcc482/lib/python2.7/site-packages/paddle/fluid/framework.py", line 1317, in append_op Tue Jul 9 15:04:47 2019[1,4]: attrs=kwargs.get("attrs", None)) Tue Jul 9 15:04:47 2019[1,4]: File "/home/disk1/task_data/history/20190709/4.app-user-20190709140810-28131--tieba_middlepage_dssm_201907091407_paddlecloud/logs/workspace/python27-gcc482/lib/python2.7/site-packages/paddle/fluid/layer_helper.py", line 56, in append_op Tue Jul 9 15:04:47 2019[1,4]: return self.main_program.current_block().append_op(args, kwargs) Tue Jul 9 15:04:47 2019[1,4]: File "/home/disk1/task_data/history/20190709/4.app-user-20190709140810-28131--tieba_middlepage_dssm_201907091407_paddlecloud/logs/workspace/python27-gcc482/lib/python2.7/site-packages/paddle/fluid/layers/metric_op.py", line 169, in auc Tue Jul 9 15:04:47 2019[1,4]: "StatNegOut": [batch_stat_neg] Tue Jul 9 15:04:47 2019[1,4]: File "/home/disk1/task_data/history/20190709/4.app-user-20190709140810-28131--tieba_middlepage_dssm_201907091407_paddlecloud/logs/workspace/env_run/topology.py", line 202, in train_program Tue Jul 9 15:04:47 2019[1,4]: _, auc_batch, _ = fluid.layers.auc(input=predict, label=label) Tue Jul 9 15:04:47 2019[1,4]: File "/home/disk1/task_data/history/20190709/4.app-user-20190709140810-28131--tieba_middlepage_dssm_201907091407_paddlecloud/logs/workspace/env_run/topology.py", line 234, in train Tue Jul 9 15:04:47 2019[1,4]: inputs, predict, avg_cost, auc, tid_vec, uid_vec = train_program() Tue Jul 9 15:04:47 2019[1,4]: File "train.py", line 70, in train Tue Jul 9 15:04:47 2019[1,4]: pass_num, output_path, use_cuda, lr, weight_decay) Tue Jul 9 15:04:47 2019[1,4]: File "train.py", line 81, in Tue Jul 9 15:04:47 2019[1,4]: train(option.is_mpi, option.use_cuda, option.lr, option.weight_decay) Tue Jul 9 15:04:47 2019[1,4]:C++ Callstacks: Tue Jul 9 15:04:47 2019[1,4]:Enforce failed. Expected predict_data <= 1, but received predict_data:-nan > 1:1. Tue Jul 9 15:04:47 2019[1,4]:The predict data must less or equal 1. at [/paddle/paddle/fluid/operators/metrics/auc_op.h:80] Tue Jul 9 15:04:47 2019[1,4]:PaddlePaddle Call Stacks: Tue Jul 9 15:04:47 2019[1,4]:0 0x7f2d808c659dp void paddle::platform::EnforceNotMet::Initstd::string(std::string, char const, int) + 365 Tue Jul 9 15:04:47 2019[1,4]:1 0x7f2d808c68e7p paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const, int) + 87 Tue Jul 9 15:04:47 2019[1,4]:2 0x7f2d81213482p paddle::operators::AucKernel<paddle::platform::CPUPlace, float>::statAuc(paddle::framework::Tensor const, paddle::framework::Tensor const*, int, int, int, long*, long*, long**, long**) + 1202 Tue Jul 9 15:04:47 2019[1,4]:3 0x7f2d8121396ep paddle::operators::AucKernel<paddle::platform::CPUPlace, float>::Compute(paddle::framework::ExecutionContext const&) const + 830 Tue Jul 9 15:04:47 2019[1,4]:4 0x7f2d81213c23p std::Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CPUPlace, false, 0ul, paddle::operators::AucKernel<paddle::platform::CPUPlace, float> >::operator()(char const*, char const*, int) const::{lambda(paddle::framework::ExecutionContext const&)#1 (closed)}>::M_invoke(std::Any_data const&, paddle::framework::ExecutionContext const&) + 35 Tue Jul 9 15:04:47 2019[1,4]:5 0x7f2d8188ac93p paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) const + 659 Tue Jul 9 15:04:47 2019[1,4]:6 0x7f2d818897bbp paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::va Tue Jul 9 15:04:47 2019[1,4]:riant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) + 267 Tue Jul 9 15:04:47 2019[1,4]:7 0x7f2d809db042p paddle::framework::Executor::RunPreparedContext(paddle::framework::ExecutorPrepareContext*, paddle::framework::Scope*, bool, bool, bool) + 226 Tue Jul 9 15:04:47 2019[1,4]:8 0x7f2d809dc105p paddle::framework::Executor::Run(paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool) + 261 Tue Jul 9 15:04:47 2019[1,4]:9 0x7f2d808ab70bp Tue Jul 9 15:04:47 2019[1,4]:10 0x7f2d808ecf6ep Tue Jul 9 15:04:47 2019[1,4]:11 0x7f2de9ac2010p PyEval_EvalFrameEx + 16384 Tue Jul 9 15:04:47 2019[1,4]:12 0x7f2de9ac3b80p PyEval_EvalCodeEx + 2128 Tue Jul 9 15:04:47 2019[1,4]:13 0x7f2de9ac208ep PyEval_EvalFrameEx + 16510 Tue Jul 9 15:04:47 2019[1,4]:14 0x7f2de9ac3b80p PyEval_EvalCodeEx + 2128 Tue Jul 9 15:04:47 2019[1,4]:15 0x7f2de9ac208ep PyEval_EvalFrameEx + 16510 Tue Jul 9 15:04:47 2019[1,4]:16 0x7f2de9ac3b80p PyEval_EvalCodeEx + 2128 Tue Jul 9 15:04:47 2019[1,4]:17 0x7f2de9ac208ep PyEval_EvalFrameEx + 16510 Tue Jul 9 15:04:47 2019[1,4]:18 0x7f2de9ac3b80p PyEval_EvalCodeEx + 2128 Tue Jul 9 15:04:47 2019[1,4]:19 0x7f2de9ac208ep PyEval_EvalFrameEx + 16510 Tue Jul 9 15:04:47 2019[1,4]:20 0x7f2de9ac2132p PyEval_EvalFrameEx + 16674 Tue Jul 9 15:04:47 2019[1,4]:21 0x7f2de9ac3b80p PyEval_EvalCodeEx + 2128 Tue Jul 9 15:04:47 2019[1,4]:22 0x7f2de9ac3c82p PyEval_EvalCode + 50 Tue Jul 9 15:04:47 2019[1,4]:23 0x7f2de9adc60fp Tue Jul 9 15:04:47 2019[1,4]:24 0x7f2de9add67ep PyRun_FileExFlags + 126 Tue Jul 9 15:04:47 2019[1,4]:25 0x7f2de9ade7d7p PyRun_SimpleFileExFlags + 199 Tue Jul 9 15:04:47 2019[1,4]:26 0x7f2de9aeed9dp Py_Main + 3133 Tue Jul 9 15:04:47 2019[1,4]:27 0x7f2de8d34bd5p __libc_start_main + 245 Tue Jul 9 15:04:47 2019[1,4]:28 0x4007c1p Tue Jul 9 15:04:47 2019[1,4]: Tue Jul 9 15:04:50 2019[1,4]:+ trainer_ret=1 Tue Jul 9 15:04:50 2019[1,4]:+ popd Tue Jul 9 15:04:50 2019[1,4]:/home/disk1/task_data/history/20190709/4.app-user-20190709140810-28131--tieba_middlepage_dssm_201907091407_paddlecloud/logs/workspace Tue Jul 9 15:04:50 2019[1,4]:+ exit 1