Skip to content

  • 体验新版
    • 正在加载...
  • 登录
  • PaddlePaddle
  • Paddle
  • Issue
  • #15989

P
Paddle
  • 项目概览

PaddlePaddle / Paddle
大约 2 年 前同步成功

通知 2325
Star 20933
Fork 5424
  • 代码
    • 文件
    • 提交
    • 分支
    • Tags
    • 贡献者
    • 分支图
    • Diff
  • Issue 1423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
  • Wiki 0
    • Wiki
  • 分析
    • 仓库
    • DevOps
  • 项目成员
  • Pages
P
Paddle
  • 项目概览
    • 项目概览
    • 详情
    • 发布
  • 仓库
    • 仓库
    • 文件
    • 提交
    • 分支
    • 标签
    • 贡献者
    • 分支图
    • 比较
  • Issue 1,423
    • Issue 1,423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
    • 合并请求 543
  • Pages
  • 分析
    • 分析
    • 仓库分析
    • DevOps
  • Wiki 0
    • Wiki
  • 成员
    • 成员
  • 收起侧边栏
  • 动态
  • 分支图
  • 创建新Issue
  • 提交
  • Issue看板
已关闭
开放中
Opened 3月 01, 2019 by saxon_zh@saxon_zhGuest

单机多卡训练报错: Cannot find fetched variable

Created by: dubhex

程序在单机单卡下能正常训练,但是在使用 train_exe = fluid.ParallelExecutor(use_cuda = True, loss_name = avg_cost.name, main_program = fluid.default_main_program())

进行多卡训练时报下述错误

File "run_train.py", line 199, in main(args) File "run_train.py", line 140, in main feed = feeder.feed(data)) File "/opt/python/cp27-cp27mu/lib/python2.7/site-packages/paddle/fluid/executor.py", line 472, in run self.executor.run(program.desc, scope, 0, True, True) KeyboardInterrupt [root@272c5daea05d face_align_150pts]# python run_train.py Hi, Program begin W0301 02:37:43.430696 1892 device_context.cc:213] Please NOTE: device: 0, CUDA Capability: 35, Driver Version: 9.0, Runtime Version: 8.0 W0301 02:37:43.430814 1892 device_context.cc:220] device: 0, cuDNN Version: 5.1. Data reader is ready Training begin Traceback (most recent call last): File "run_train.py", line 199, in main(args) File "run_train.py", line 140, in main feed = feeder.feed(data)) File "/opt/python/cp27-cp27mu/lib/python2.7/site-packages/paddle/fluid/parallel_executor.py", line 287, in run self.executor.run(fetch_list, fetch_var_name) paddle.fluid.core.EnforceNotMet: Cannot find fetched variable.(Perhaps the main_program is not set to ParallelExecutor) at [/paddle/paddle/fluid/framework/details/threaded_ssa_graph_executor.cc:166] PaddlePaddle Call Stacks: 0 0x7f20c96d5406p paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int) + 486 1 0x7f20caf4c658p paddle::framework::details::ThreadedSSAGraphExecutor::InsertFetchOps(std::vector<std::string, std::allocatorstd::string > const&, std::vector<paddle::framework::details::FetchOpHandle*, std::allocatorpaddle::framework::details::FetchOpHandle* >, std::unordered_set<paddle::framework::details::VarHandleBase, std::hashpaddle::framework::details::VarHandleBase*, std::equal_topaddle::framework::details::VarHandleBase*, std::allocatorpaddle::framework::details::VarHandleBase* >, std::unordered_map<paddle::framework::details::OpHandleBase, unsigned long, std::hashpaddle::framework::details::OpHandleBase*, std::equal_topaddle::framework::details::OpHandleBase*, std::allocator<std::pair<paddle::framework::details::OpHandleBase* const, unsigned long> > >, std::unordered_set<paddle::framework::details::VarHandleBase, std::hashpaddle::framework::details::VarHandleBase*, std::equal_topaddle::framework::details::VarHandleBase*, std::allocatorpaddle::framework::details::VarHandleBase* >, paddle::framework::BlockingQueuepaddle::framework::details::VarHandleBase*, std::vector<paddle::framework::LoDTensor, std::allocatorpaddle::framework::LoDTensor >*) + 3208 2 0x7f20caf4cdd8p paddle::framework::details::ThreadedSSAGraphExecutor::Run(std::vector<std::string, std::allocatorstd::string > const&) + 1768 3 0x7f20caf51707p paddle::framework::details::ScopeBufferedSSAGraphExecutor::Run(std::vector<std::string, std::allocatorstd::string > const&) + 391 4 0x7f20c97cebb9p paddle::framework::ParallelExecutor::Run(std::vector<std::string, std::allocatorstd::string > const&, std::string const&) + 489 5 0x7f20c96ca2dep 6 0x7f20c96fd78ep 7 0x7f210a90cce8p PyEval_EvalFrameEx + 28264 8 0x7f210a90f37dp PyEval_EvalCodeEx + 2061 9 0x7f210a90cd70p PyEval_EvalFrameEx + 28400 10 0x7f210a90ce9ep PyEval_EvalFrameEx + 28702 11 0x7f210a90f37dp PyEval_EvalCodeEx + 2061 12 0x7f210a90f4b2p PyEval_EvalCode + 50 13 0x7f210a9391c2p PyRun_FileExFlags + 146 14 0x7f210a93a559p PyRun_SimpleFileExFlags + 217 15 0x7f210a9501ddp Py_Main + 3149 16 0x7f2109be3d1dp __libc_start_main + 253 17 0x4006b1p

请问有什么方法解决?

指派人
分配到
无
里程碑
无
分配里程碑
工时统计
无
截止日期
无
标识: paddlepaddle/Paddle#15989
渝ICP备2023009037号

京公网安备11010502055752号

网络110报警服务 Powered by GitLab CE v13.7
开源知识
Git 入门 Pro Git 电子书 在线学 Git
Markdown 基础入门 IT 技术知识开源图谱
帮助
使用手册 反馈建议 博客
《GitCode 隐私声明》 《GitCode 服务条款》 关于GitCode
Powered by GitLab CE v13.7