Skip to content

  • 体验新版
    • 正在加载...
  • 登录
  • PaddlePaddle
  • PaddleSeg
  • Issue
  • #77

P
PaddleSeg
  • 项目概览

PaddlePaddle / PaddleSeg

通知 289
Star 8
Fork 1
  • 代码
    • 文件
    • 提交
    • 分支
    • Tags
    • 贡献者
    • 分支图
    • Diff
  • Issue 53
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 3
  • Wiki 0
    • Wiki
  • 分析
    • 仓库
    • DevOps
  • 项目成员
  • Pages
P
PaddleSeg
  • 项目概览
    • 项目概览
    • 详情
    • 发布
  • 仓库
    • 仓库
    • 文件
    • 提交
    • 分支
    • 标签
    • 贡献者
    • 分支图
    • 比较
  • Issue 53
    • Issue 53
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 3
    • 合并请求 3
  • Pages
  • 分析
    • 分析
    • 仓库分析
    • DevOps
  • Wiki 0
    • Wiki
  • 成员
    • 成员
  • 收起侧边栏
  • 动态
  • 分支图
  • 创建新Issue
  • 提交
  • Issue看板
已关闭
开放中
Opened 11月 01, 2019 by saxon_zh@saxon_zhGuest

双卡环境使用指定单卡训练报错问题

Created by: wz940216

报错环境ubuntu18.04 双2080ti显卡,nvidia-smi 显示 显卡驱动是418.88,同时cuda10.1 但是nvcc -V中显示cuda10.0。paddle版本是1.5.1。 代码在另一个单卡v100电脑中完美运行。放在双卡电脑中,首先另一张卡被他人占用了,使用export CUDA_VISIBLE_DEVICES=0 只用一张卡。运行train.py文件报错 Traceback (most recent call last): File "pdseg/train.py", line 467, in main(args) File "pdseg/train.py", line 454, in main train(cfg) File "pdseg/train.py", line 235, in train exe.run(startup_prog) File "/home/yangjing/anaconda3/envs/paddle/lib/python3.6/site-packages/paddle/fluid/executor.py", line 651, in run use_program_cache=use_program_cache) File "/home/yangjing/anaconda3/envs/paddle/lib/python3.6/site-packages/paddle/fluid/executor.py", line 749, in run exe.run(program.desc, scope, 0, True, True, fetch_var_name) RuntimeError: function_attributes(): after cudaFuncGetAttributes: invalid device function 在另一张卡被占用的情况下使用双卡export CUDA_VISIBLE_DEVICES=0,1 报错如下 Traceback (most recent call last): File "pdseg/train.py", line 467, in main(args) File "pdseg/train.py", line 454, in main train(cfg) File "pdseg/train.py", line 235, in train exe.run(startup_prog) File "/home/yangjing/anaconda3/envs/paddle/lib/python3.6/site-packages/paddle/ fluid/executor.py", line 651, in run use_program_cache=use_program_cache) File "/home/yangjing/anaconda3/envs/paddle/lib/python3.6/site-packages/paddle/ fluid/executor.py", line 749, in run exe.run(program.desc, scope, 0, True, True, fetch_var_name) paddle.fluid.core_avx.EnforceNotMet: Invoke operator fill_constant error. Python Callstacks: File "/home/yangjing/anaconda3/envs/paddle/lib/python3.6/site-packages/paddle/ fluid/framework.py", line 1842, in prepend_op attrs=kwargs.get("attrs", None)) File "/home/yangjing/anaconda3/envs/paddle/lib/python3.6/site-packages/paddle/ fluid/initializer.py", line 189, in call stop_gradient=True) File "/home/yangjing/anaconda3/envs/paddle/lib/python3.6/site-packages/paddle/ fluid/framework.py", line 1625, in create_var kwargs'initializer' File "/home/yangjing/anaconda3/envs/paddle/lib/python3.6/site-packages/paddle/ fluid/layer_helper_base.py", line 383, in set_variable_initializer initializer=initializer) File "/home/yangjing/anaconda3/envs/paddle/lib/python3.6/site-packages/paddle/ fluid/optimizer.py", line 317, in add_accumulator var, initializer=Constant(value=float(fill_value))) File "/home/yangjing/anaconda3/envs/paddle/lib/python3.6/site-packages/paddle/ fluid/optimizer.py", line 760, in create_accumulators self.add_accumulator(self.velocity_acc_str, p) File "/home/yangjing/anaconda3/envs/paddle/lib/python3.6/site-packages/paddle/ fluid/optimizer.py", line 364, in create_optimization_pass [p[0] for p in parameters_and_grads]) File "/home/yangjing/anaconda3/envs/paddle/lib/python3.6/site-packages/paddle/ fluid/optimizer.py", line 532, in apply_gradients optimize_ops = self.create_optimization_pass(params_grads) File "/home/yangjing/anaconda3/envs/paddle/lib/python3.6/site-packages/paddle/ fluid/optimizer.py", line 562, in apply_optimize optimize_ops = self.apply_gradients(params_grads) File "/home/yangjing/anaconda3/envs/paddle/lib/python3.6/site-packages/paddle/ fluid/optimizer.py", line 601, in minimize loss, startup_program=startup_program, params_grads=params_grads) File "/home/yangjing/anaconda3/envs/paddle/lib/python3.6/site-packages/paddle/ fluid/dygraph/base.py", line 87, in impl return func(*args, **kwargs) File "/home/yangjing/anaconda3/envs/paddle/lib/python3.6/site-packages/paddle/ fluid/wrapped_decorator.py", line 25, in impl return wrapped_func(args, kwargs) File "</home/yangjing/anaconda3/envs/paddle/lib/python3.6/site-packages/decora tor.py:decorator-gen-20>", line 2, in minimize File "/home/yangjing/PaddleSeg/pdseg/solver.py", line 85, in sgd_optimizer optimizer.minimize(loss) File "/home/yangjing/PaddleSeg/pdseg/solver.py", line 107, in optimise return self.sgd_optimizer(lr_policy, loss) File "/home/yangjing/PaddleSeg/pdseg/models/model_builder.py", line 182, in bu ild_model decayed_lr = optimizer.optimise(avg_loss) File "pdseg/train.py", line 230, in train train_prog, startup_prog, phase=ModelPhase.TRAIN) File "pdseg/train.py", line 454, in main train(cfg) File "pdseg/train.py", line 467, in main(args) C++ Callstacks: Enforce failed. Expected allocating <= available, but received allocating:100684 65874 > available:8840675072. Insufficient GPU memory to allocation. at [/paddle/paddle/fluid/platform/gpu_inf o.cc:262] PaddlePaddle Call Stacks: 0 0x7fb492147438p void paddle::platform::EnforceNotMet::Initstd::string( std::string, char const, int) + 360 1 0x7fb492147787p paddle::platform::EnforceNotMet::EnforceNotMet(std::stri ng const&, char const, int) + 87 2 0x7fb4942ba8c6p paddle::platform::GpuMaxChunkSize() + 630 3 0x7fb49428eadap 4 0x7fb5178b9827p 5 0x7fb49428e17dp paddle::memory::legacy::GetGPUBuddyAllocator(int) + 109 6 0x7fb49428efc5p void paddle::memory::legacy::Alloc<paddle::platform::CU DAPlace>(paddle::platform::CUDAPlace const&, unsigned long) + 37 7 0x7fb49428f505p paddle::memory::allocation::LegacyAllocator::AllocateImp l(unsigned long) + 421 8 0x7fb494283625p paddle::memory::allocation::AllocatorFacade::Alloc(boost ::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platf orm::CUDAPinnedPlace, boost::detail::variant::void, boost::detail::variant::voi d, boost::detail::variant::void, boost::detail::variant::void, boost::detail: :variant::void, boost::detail::variant::void, boost::detail::variant::void, b oost::detail::variant::void, boost::detail::variant::void, boost::detail::vari ant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost: :detail::variant::void_, boost::detail::variant::void_, boost::detail::variant:: void_, boost::detail::variant::void_, boost::detail::variant::void_> const&, uns igned long) + 181 9 0x7fb4942837aap paddle::memory::allocation::AllocatorFacade::AllocShared (boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle: :platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::varian t::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::d etail::variant::void_, boost::detail::variant::void_, boost::detail::variant::vo id_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail ::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::var iant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const &, unsigned long) + 26 10 0x7fb493e55e2cp paddle::memory::AllocShared(boost::variant<paddle::platf orm::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, b oost::detail::variant::void_, boost::detail::variant::void_, boost::detail::vari ant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost: :detail::variant::void_, boost::detail::variant::void_, boost::detail::variant:: void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::deta il::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_ , boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::v ariant::void_, boost::detail::variant::void_> const&, unsigned long) + 44 11 0x7fb494256434p paddle::framework::Tensor::mutable_data(boost::variant<p addle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPi nnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost:: detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::v oid_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detai l::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::va riant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boos t::detail::variant::void_, boost::detail::variant::void_>, paddle::framework::pr oto::VarType_Type, unsigned long) + 148 12 0x7fb49256363ep paddle::operators::FillConstantKernel::Compute(pa ddle::framework::ExecutionContext const&) const + 494 13 0x7fb492566753p std::Function_handler<void (paddle::framework::Executio nContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform:: CUDAPlace, false, 0ul, paddle::operators::FillConstantKernel, paddle::ope rators::FillConstantKernel, paddle::operators::FillConstantKernel, paddle::operators::FillConstantKernel, paddle::operators::FillConstantKern elpaddle::platform::float16 >::operator()(char const*, char const*, int) const ::{lambda(paddle::framework::ExecutionContext const&)#1}>::M_invoke(std::Any_d ata const&, paddle::framework::ExecutionContext const&) + 35 14 0x7fb4941c61e7p paddle::framework::OperatorWithKernel::RunImpl(paddle::f ramework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::plat form::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void , boost::detail::variant::void, boost::detail::variant::void, boost::detail::v ariant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boo st::detail::variant::void_, boost::detail::variant::void_, boost::detail::varian t::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::d etail::variant::void_, boost::detail::variant::void_, boost::detail::variant::vo id_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail ::variant::void_> const&, paddle::framework::RuntimeContext*) const + 375 15 0x7fb4941c65c1p paddle::framework::OperatorWithKernel::RunImpl(paddle::f ramework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::plat form::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_ , boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::v ariant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boo st::detail::variant::void_, boost::detail::variant::void_, boost::detail::varian t::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::d etail::variant::void_, boost::detail::variant::void_, boost::detail::variant::vo id_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail ::variant::void_> const&) const + 529 16 0x7fb4941c3bbcp paddle::framework::OperatorBase::Run(paddle::framework:: Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUP lace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::d etail::variant::void_, boost::detail::variant::void_, boost::detail::variant::vo id_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail ::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::var iant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost ::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant: :void_> const&) + 332 17 0x7fb4922d1deep paddle::framework::Executor::RunPreparedContext(paddle:: framework::ExecutorPrepareContext*, paddle::framework::Scope*, bool, bool, bool) + 606 18 0x7fb4922d4dafp paddle::framework::Executor::Run(paddle::framework::Prog ramDesc const&, paddle::framework::Scope*, int, bool, bool, std::vector<std::str ing, std::allocatorstd::string > const&, bool) + 143 19 0x7fb49213859dp 20 0x7fb492179826p 21 0x559fab5e1c54p _PyCFunction_FastCallDict + 340 22 0x559fab669c0ep 23 0x559fab68c75ap _PyEval_EvalFrameDefault + 778 24 0x559fab662e66p 25 0x559fab663ed6p 26 0x559fab669b95p 27 0x559fab68d51cp _PyEval_EvalFrameDefault + 4300 28 0x559fab662e66p 29 0x559fab663e73p 30 0x559fab669b95p 31 0x559fab68c75ap _PyEval_EvalFrameDefault + 778 32 0x559fab66329ep 33 0x559fab663ed6p 34 0x559fab669b95p 35 0x559fab68c75ap _PyEval_EvalFrameDefault + 778 36 0x559fab663c5bp 37 0x559fab669b95p 38 0x559fab68c75ap _PyEval_EvalFrameDefault + 778 39 0x559fab6649b9p PyEval_EvalCodeEx + 809 40 0x559fab66575cp PyEval_EvalCode + 28 41 0x559fab6e5744p 42 0x559fab6e5b41p PyRun_FileExFlags + 161 43 0x559fab6e5d43p PyRun_SimpleFileExFlags + 451 44 0x559fab6e9833p Py_Main + 1555 45 0x559fab5b388ep main + 238 46 0x7fb5174dab97p __libc_start_main + 231 47 0x559fab693160p 请问如何解决 感谢感谢!

指派人
分配到
无
里程碑
无
分配里程碑
工时统计
无
截止日期
无
标识: paddlepaddle/PaddleSeg#77
渝ICP备2023009037号

京公网安备11010502055752号

网络110报警服务 Powered by GitLab CE v13.7
开源知识
Git 入门 Pro Git 电子书 在线学 Git
Markdown 基础入门 IT 技术知识开源图谱
帮助
使用手册 反馈建议 博客
《GitCode 隐私声明》 《GitCode 服务条款》 关于GitCode
Powered by GitLab CE v13.7