Skip to content

  • 体验新版
    • 正在加载...
  • 登录
  • PaddlePaddle
  • models
  • Issue
  • #3630

M
models
  • 项目概览

PaddlePaddle / models
接近 2 年 前同步成功

通知 230
Star 6828
Fork 2962
  • 代码
    • 文件
    • 提交
    • 分支
    • Tags
    • 贡献者
    • 分支图
    • Diff
  • Issue 602
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 255
  • Wiki 0
    • Wiki
  • 分析
    • 仓库
    • DevOps
  • 项目成员
  • Pages
M
models
  • 项目概览
    • 项目概览
    • 详情
    • 发布
  • 仓库
    • 仓库
    • 文件
    • 提交
    • 分支
    • 标签
    • 贡献者
    • 分支图
    • 比较
  • Issue 602
    • Issue 602
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 255
    • 合并请求 255
  • Pages
  • 分析
    • 分析
    • 仓库分析
    • DevOps
  • Wiki 0
    • Wiki
  • 成员
    • 成员
  • 收起侧边栏
  • 动态
  • 分支图
  • 创建新Issue
  • 提交
  • Issue看板
已关闭
开放中
Opened 10月 17, 2019 by saxon_zh@saxon_zhGuest

使用Docker镜像运行PyramidBox遇到的问题以及一些建议

Created by: fengyuentau

相关issue: https://github.com/PaddlePaddle/Paddle/issues/20667

受限于Docker版本,在我的机器上只能使用cuda9.0及以下版本的Docker镜像。机器的GPU是16GB显存的P100。

首先我拉取了paddlepaddle/paddle:latest-gpu-cuda9.0-cudnn7,并且尝试运行这个repo下最新的PyramidBox代码,但是遇到了以下错误:

root@df3a17988b31:~/pyramidbox# python3.6 -u widerface_eval.py --model_dir=/root/pyramidbox/models/PyramidBox_WiderFace
-----------  Configuration Arguments -----------
confs_threshold: 0.15
data_dir: data/WIDER_val/images/
file_list: data/wider_face_split/wider_face_val_bbx_gt.txt
image_path:
infer: False
model_dir: /root/pyramidbox/models/PyramidBox_WiderFace
pred_dir: pred
use_gpu: True
use_pyramidbox: True
------------------------------------------------
Traceback (most recent call last):
  File "widerface_eval.py", line 328, in <module>
    exe, args.model_dir, main_program=infer_program)
  File "/usr/local/lib/python3.6/site-packages/paddle/fluid/io.py", line 803, in load_persistables
    filename=filename)
  File "/usr/local/lib/python3.6/site-packages/paddle/fluid/io.py", line 643, in load_vars
    filename=filename)
  File "/usr/local/lib/python3.6/site-packages/paddle/fluid/io.py", line 664, in load_vars
    assert var_temp != None, "can't not find var: " + each_var.name
AssertionError: can't not find var: conv2d_61.w_0

可以保证的是,模型文件的确是放置在了指定的路径下,而且conv2d_61.w_0这个文件也是有的。

从相关issue中得知latest是develop分支,感觉可能是develop分支的问题,于是转而拉取了paddlepaddle/paddle:1.5.0-cuda9.0-cudnn7,并且以同样的步骤运行同样的PyramidBox代码。这次运行没有报找不到模型文件的错,而是报出了显存不足的错:

root@fe31a5b0b0bd:~/pyramidbox# python widerface_eval.py --model_dir=models/PyramidBox_WiderFace
-----------  Configuration Arguments -----------
confs_threshold: 0.15
data_dir: data/WIDER_val/images/
file_list: data/wider_face_split/wider_face_val_bbx_gt.txt
image_path:
infer: False
model_dir: models/PyramidBox_WiderFace
pred_dir: pred
use_gpu: True
use_pyramidbox: True
------------------------------------------------
W1016 11:56:06.097822    14 device_context.cc:259] Please NOTE: device: 0, CUDA Capability: 60, Driver API Version: 10.1, Runtime API Version: 9.0
W1016 11:56:06.101693    14 device_context.cc:267] device: 0, cuDNN Version: 7.4.
Traceback (most recent call last):
  File "widerface_eval.py", line 328, in <module>
    exe, args.model_dir, main_program=infer_program)
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/io.py", line 742, in load_persistables
    filename=filename)
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/io.py", line 608, in load_vars
    filename=filename)
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/io.py", line 645, in load_vars
    executor.run(load_prog)
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py", line 650, in run
    use_program_cache=use_program_cache)
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py", line 748, in _run
    exe.run(program.desc, scope, 0, True, True, fetch_var_name)
paddle.fluid.core_avx.EnforceNotMet: Invoke operator load error.
Python Callstacks:
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/framework.py", line 1748, in append_op
    attrs=kwargs.get("attrs", None))
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/io.py", line 630, in load_vars
    'file_path': os.path.join(load_dirname, new_var.name)
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/io.py", line 608, in load_vars
    filename=filename)
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/io.py", line 742, in load_persistables
    filename=filename)
  File "widerface_eval.py", line 328, in <module>
    exe, args.model_dir, main_program=infer_program)
C++ Callstacks:
Enforce failed. Expected allocating <= available, but received allocating:14920696460 > available:13418495744.
Insufficient GPU memory to allocation. at [/paddle/paddle/fluid/platform/gpu_info.cc:262]
PaddlePaddle Call Stacks:
0       0x7f23f14bfbc8p void paddle::platform::EnforceNotMet::Init<std::string>(std::string, char const*, int) + 360
1       0x7f23f14bff17p paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const*, int) + 87
2       0x7f23f34d7996p paddle::platform::GpuMaxChunkSize() + 630
3       0x7f23f34abc8ap
4       0x7f24deb1ea99p
5       0x7f23f34ab32dp paddle::memory::legacy::GetGPUBuddyAllocator(int) + 109
6       0x7f23f34ac175p void* paddle::memory::legacy::Alloc<paddle::platform::CUDAPlace>(paddle::platform::CUDAPlace const&, unsigned long) + 37
7       0x7f23f34ac6b5p paddle::memory::allocation::LegacyAllocator::AllocateImpl(unsigned long) + 421
8       0x7f23f34a07d5p paddle::memory::allocation::AllocatorFacade::Alloc(boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&, unsigned long) + 181
9       0x7f23f34a095ap paddle::memory::allocation::AllocatorFacade::AllocShared(boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&, unsigned long) + 26
10      0x7f23f30adfccp paddle::memory::AllocShared(boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&, unsigned long) + 44
11      0x7f23f3473204p paddle::framework::Tensor::mutable_data(boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_>, paddle::framework::proto::VarType_Type, unsigned long) + 148
12      0x7f23f34768a4p paddle::framework::TensorCopy(paddle::framework::Tensor const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&, paddle::platform::DeviceContext const&, paddle::framework::Tensor*) + 452
13      0x7f23f347a49bp paddle::framework::TensorFromStream(std::istream&, paddle::framework::Tensor*, paddle::platform::DeviceContext const&) + 699
14      0x7f23f30698d0p paddle::framework::DeserializeFromStream(std::istream&, paddle::framework::LoDTensor*, paddle::platform::DeviceContext const&) + 576
15      0x7f23f1f72f99p paddle::operators::LoadOpKernel<paddle::platform::CUDADeviceContext, float>::LoadLodTensor(std::istream&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&, paddle::framework::Variable*, paddle::framework::ExecutionContext const&) const + 89
16      0x7f23f1f734c0p paddle::operators::LoadOpKernel<paddle::platform::CUDADeviceContext, float>::Compute(paddle::framework::ExecutionContext const&) const + 432
17      0x7f23f1f73883p std::_Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CUDAPlace, false, 0ul, paddle::operators::LoadOpKernel<paddle::platform::CUDADeviceContext, float>, paddle::operators::LoadOpKernel<paddle::platform::CUDADeviceContext, double>, paddle::operators::LoadOpKernel<paddle::platform::CUDADeviceContext, int>, paddle::operators::LoadOpKernel<paddle::platform::CUDADeviceContext, signed char>, paddle::operators::LoadOpKernel<paddle::platform::CUDADeviceContext, long> >::operator()(char const*, char const*, int) const::{lambda(paddle::framework::ExecutionContext const&)#1}>::_M_invoke(std::_Any_data const&, paddle::framework::ExecutionContext const&) + 35
18      0x7f23f341c907p paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&, paddle::framework::RuntimeContext*) const + 375
19      0x7f23f341cce1p paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) const + 529
20      0x7f23f341a2dcp paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) + 332
21      0x7f23f164b38ep paddle::framework::Executor::RunPreparedContext(paddle::framework::ExecutorPrepareContext*, paddle::framework::Scope*, bool, bool, bool) + 382
22      0x7f23f164e42fp paddle::framework::Executor::Run(paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool, std::vector<std::string, std::allocator<std::string> > const&, bool) + 143
23      0x7f23f14b0b2dp
24      0x7f23f14f25c6p
25            0x4c5326p PyEval_EvalFrameEx + 37958
26            0x4b9b66p PyEval_EvalCodeEx + 774
27            0x4c1f56p PyEval_EvalFrameEx + 24694
28            0x4b9b66p PyEval_EvalCodeEx + 774
29            0x4c17c6p PyEval_EvalFrameEx + 22758
30            0x4b9b66p PyEval_EvalCodeEx + 774
31            0x4c17c6p PyEval_EvalFrameEx + 22758
32            0x4b9b66p PyEval_EvalCodeEx + 774
33            0x4c17c6p PyEval_EvalFrameEx + 22758
34            0x4b9b66p PyEval_EvalCodeEx + 774
35            0x4c17c6p PyEval_EvalFrameEx + 22758
36            0x4b9b66p PyEval_EvalCodeEx + 774
37            0x4eb69fp
38            0x4e58f2p PyRun_FileExFlags + 130
39            0x4e41a6p PyRun_SimpleFileExFlags + 390
40            0x4938cep Py_Main + 1358
41      0x7f24de766830p __libc_start_main + 240
42            0x493299p _start + 41

从https://github.com/PaddlePaddle/models/issues/1259 和https://github.com/PaddlePaddle/models/issues/1262#issuecomment-422724707 得知可能有超显存的风险,并且可以通过加入显存优化策略缓解这个问题:

infer_program, nmsed_out = network.infer(main_program)
fluid.memory_optimize(infer_program)

加了这一行之后的确可以跑widerface的val和test了,虽然在paddle1.5.0中提示这个api已经被舍弃了。

我的建议是:

  1. 能否在PyramidBox的Readme中加入支持的Paddle版本的说明?
  2. 能否在PyramidBox的Readme中加入这个模型对显存的需求的说明?
  3. 能否提供新api下显存优化策略的用法?

这样能够极大地提升和节省使用者的效率和时间。谢谢!

指派人
分配到
无
里程碑
无
分配里程碑
工时统计
无
截止日期
无
标识: paddlepaddle/models#3630
渝ICP备2023009037号

京公网安备11010502055752号

网络110报警服务 Powered by GitLab CE v13.7
开源知识
Git 入门 Pro Git 电子书 在线学 Git
Markdown 基础入门 IT 技术知识开源图谱
帮助
使用手册 反馈建议 博客
《GitCode 隐私声明》 《GitCode 服务条款》 关于GitCode
Powered by GitLab CE v13.7