paddle 训练报错信息优化
Created by: ccmeteorljh
- 版本、环境信息 1)PaddlePaddle版本:2019年1月28号编译版本 2)CPU/GPU:GPU, cuda8,cudnn5 3)系统环境:Ubuntu
- 复现信息: https://github.com/PaddlePaddle/models/blob/develop/fluid/mnist/model.py#L73 将mnist模型的SIZE 改为小于10,然后执行
python model.py
报错显示如下:
/usr/local/lib/python2.7/dist-packages/paddle/fluid/average.py:64: Warning: The WeightedAverage is deprecated, please use fluid.metrics.Accuracy instead.
(self.__class__.__name__), Warning)
/paddle/paddle/fluid/operators/math/cross_entropy.cu:40 Assertion `label[i] >= 0 && label[i] < D || label[i] == ignore_index` failed.
/paddle/paddle/fluid/operators/math/cross_entropy.cu:40 Assertion `label[i] >= 0 && label[i] < D || label[i] == ignore_index` failed.
/paddle/paddle/fluid/operators/math/cross_entropy.cu:40 Assertion `label[i] >= 0 && label[i] < D || label[i] == ignore_index` failed.
/paddle/paddle/fluid/operators/math/cross_entropy.cu:40 Assertion `label[i] >= 0 && label[i] < D || label[i] == ignore_index` failed.
/paddle/paddle/fluid/operators/math/cross_entropy.cu:40 Assertion `label[i] >= 0 && label[i] < D || label[i] == ignore_index` failed.
/paddle/paddle/fluid/operators/math/cross_entropy.cu:40 Assertion `label[i] >= 0 && label[i] < D || label[i] == ignore_index` failed.
/paddle/paddle/fluid/operators/math/cross_entropy.cu:40 Assertion `label[i] >= 0 && label[i] < D || label[i] == ignore_index` failed.
/paddle/paddle/fluid/operators/math/cross_entropy.cu:40 Assertion `label[i] >= 0 && label[i] < D || label[i] == ignore_index` failed.
/paddle/paddle/fluid/operators/math/cross_entropy.cu:40 Assertion `label[i] >= 0 && label[i] < D || label[i] == ignore_index` failed.
/paddle/paddle/fluid/operators/math/cross_entropy.cu:40 Assertion `label[i] >= 0 && label[i] < D || label[i] == ignore_index` failed.
/paddle/paddle/fluid/operators/math/cross_entropy.cu:40 Assertion `label[i] >= 0 && label[i] < D || label[i] == ignore_index` failed.
/paddle/paddle/fluid/operators/math/cross_entropy.cu:40 Assertion `label[i] >= 0 && label[i] < D || label[i] == ignore_index` failed.
/paddle/paddle/fluid/operators/math/cross_entropy.cu:40 Assertion `label[i] >= 0 && label[i] < D || label[i] == ignore_index` failed.
/paddle/paddle/fluid/operators/math/cross_entropy.cu:40 Assertion `label[i] >= 0 && label[i] < D || label[i] == ignore_index` failed.
/paddle/paddle/fluid/operators/math/cross_entropy.cu:40 Assertion `label[i] >= 0 && label[i] < D || label[i] == ignore_index` failed.
/paddle/paddle/fluid/operators/math/cross_entropy.cu:40 Assertion `label[i] >= 0 && label[i] < D || label[i] == ignore_index` failed.
/paddle/paddle/fluid/operators/math/cross_entropy.cu:40 Assertion `label[i] >= 0 && label[i] < D || label[i] == ignore_index` failed.
/paddle/paddle/fluid/operators/math/cross_entropy.cu:40 Assertion `label[i] >= 0 && label[i] < D || label[i] == ignore_index` failed.
/paddle/paddle/fluid/operators/math/cross_entropy.cu:40 Assertion `label[i] >= 0 && label[i] < D || label[i] == ignore_index` failed.
/paddle/paddle/fluid/operators/math/cross_entropy.cu:40 Assertion `label[i] >= 0 && label[i] < D || label[i] == ignore_index` failed.
/paddle/paddle/fluid/operators/math/cross_entropy.cu:40 Assertion `label[i] >= 0 && label[i] < D || label[i] == ignore_index` failed.
/paddle/paddle/fluid/operators/math/cross_entropy.cu:40 Assertion `label[i] >= 0 && label[i] < D || label[i] == ignore_index` failed.
Traceback (most recent call last):
File "model.py", line 205, in <module>
run_benchmark(cnn_model, args)
File "model.py", line 171, in run_benchmark
fetch_list=[avg_cost, batch_acc, batch_size_tensor]
File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py", line 525, in run
use_program_cache=use_program_cache)
File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py", line 591, in _run
exe.run(program.desc, scope, 0, True, True)
paddle.fluid.core.EnforceNotMet: Invoke operator fetch error.
Python Callstacks:
File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/framework.py", line 1311, in append_op
attrs=kwargs.get("attrs", None))
File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py", line 361, in _add_feed_fetch_ops
attrs={'col': i})
File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py", line 588, in _run
fetch_var_name=fetch_var_name)
File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py", line 525, in run
use_program_cache=use_program_cache)
File "model.py", line 171, in run_benchmark
fetch_list=[avg_cost, batch_acc, batch_size_tensor]
File "model.py", line 205, in <module>
run_benchmark(cnn_model, args)
C++ Callstacks:
cudaMemcpy failed in paddle::platform::GpuMemcpySync (0x11076af1940 -> 0x7f5e4d927040, length: 4): unspecified launch failure at [/paddle/paddle/fluid/platform/gpu_info.cc:234]
PaddlePaddle Call Stacks:
0 0x7f5ebe881795p void paddle::platform::EnforceNotMet::Init<char const*>(char const*, char const*, int) + 357
1 0x7f5ebe881b19p paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int) + 137
2 0x7f5ec03b2878p paddle::platform::GpuMemcpySync(void*, void const*, unsigned long, cudaMemcpyKind) + 280
3 0x7f5ebe9991cbp void paddle::memory::Copy<paddle::platform::CPUPlace, paddle::platform::CUDAPlace>(paddle::platform::CPUPlace, void*, paddle::platform::CUDAPlace, void const*, unsigned long, CUstream_st*) + 91
4 0x7f5ec03729ebp paddle::framework::TensorCopySync(paddle::framework::Tensor const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&, paddle::framework::Tensor*) + 843
5 0x7f5ebfdedf3cp paddle::operators::FetchOp::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) const + 636
6 0x7f5ec02ddf95p paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) + 341
7 0x7f5ebe99bb6ap paddle::framework::Executor::RunPreparedContext(paddle::framework::ExecutorPrepareContext*, paddle::framework::Scope*, bool, bool, bool) + 218
8 0x7f5ebe99d8e5p paddle::framework::Executor::Run(paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool) + 261
9 0x7f5ebe866adbp
10 0x7f5ebe8ac81ep
11 0x4c37edp PyEval_EvalFrameEx + 31165
12 0x4b9ab6p PyEval_EvalCodeEx + 774
13 0x4c1e6fp PyEval_EvalFrameEx + 24639
14 0x4b9ab6p PyEval_EvalCodeEx + 774
15 0x4c16e7p PyEval_EvalFrameEx + 22711
16 0x4b9ab6p PyEval_EvalCodeEx + 774
17 0x4c1e6fp PyEval_EvalFrameEx + 24639
18 0x4b9ab6p PyEval_EvalCodeEx + 774
19 0x4eb30fp
20 0x4e5422p PyRun_FileExFlags + 130
21 0x4e3cd6p PyRun_SimpleFileExFlags + 390
22 0x493ae2p Py_Main + 1554
23 0x7f5f1ea62830p __libc_start_main + 240
24 0x4933e9p _start + 41
- 建议描述: 目前增加了python statch信息,相比于1.2来说显式的增加了paddle框架的一些op报错信息,但是对于用户来说,依然不清楚是那一行出错,或者是什么原因导致的,需要仔细排查,才能发现错误; 在最一开始的报错中其实有重要的提示信息
Assertion `label[i] >= 0 && label[i] < D || label[i] == ignore_index` failed
查看具体代码是: https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/operators/math/cross_entropy.cu
__global__ void CrossEntropyKernel(T* Y, const T* X, const int64_t* label,
const int N, const int D,
const int ignore_index) {
for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < N;
i += blockDim.x * gridDim.x) {
PADDLE_ASSERT(label[i] >= 0 && label[i] < D || label[i] == ignore_index);
Y[i] = ignore_index == label[i]
? static_cast<T>(0)
: -math::TolerableValue<T>()(real_log(X[i * D + label[i]]));
}
}
这个错误对于写op的同学应该很熟悉,但是对于普通用户可能一下子抓不住,如果写op的同学能对PADDLE_ASSERT失败后增加对应的文字说明,给出具体的错误分类,是否更好一些