Skip to content

  • 体验新版
    • 正在加载...
  • 登录
  • PaddlePaddle
  • Paddle
  • Issue
  • #19354

P
Paddle
  • 项目概览

PaddlePaddle / Paddle
大约 2 年 前同步成功

通知 2325
Star 20933
Fork 5424
  • 代码
    • 文件
    • 提交
    • 分支
    • Tags
    • 贡献者
    • 分支图
    • Diff
  • Issue 1423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
  • Wiki 0
    • Wiki
  • 分析
    • 仓库
    • DevOps
  • 项目成员
  • Pages
P
Paddle
  • 项目概览
    • 项目概览
    • 详情
    • 发布
  • 仓库
    • 仓库
    • 文件
    • 提交
    • 分支
    • 标签
    • 贡献者
    • 分支图
    • 比较
  • Issue 1,423
    • Issue 1,423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
    • 合并请求 543
  • Pages
  • 分析
    • 分析
    • 仓库分析
    • DevOps
  • Wiki 0
    • Wiki
  • 成员
    • 成员
  • 收起侧边栏
  • 动态
  • 分支图
  • 创建新Issue
  • 提交
  • Issue看板
已关闭
开放中
Opened 8月 22, 2019 by saxon_zh@saxon_zhGuest

paddle多cpu训练与预测问题

Created by: xuzhenglei1991

之前用的paddle1.2版本进行的训练,但是速度太慢,故改成多cpu的训练方式,由于paddle1.2调用多cpu的时候报错,具体错误为:

File "python/train_multi_cpu.py", line 397, in train
    feed=feeder.feed(full_batch))
  File "/home/disk7/paddle_release_home/python/lib/python2.7/site-packages/paddle/fluid/parallel_executor.py", line 247, in run
    feed_tensor_dict)
paddle.fluid.core.EnforceNotMet: Enforce failed. Expected member_->places_.size() == lod_tensors.size(), but received member_->places_.size():48 != lod_tensors.size():40.
The number of samples of current batch is less than the count of devices, currently, it is not allowed. (48 vs 40) at [/paddle/paddle/fluid/framework/parallel_executor.cc:314]
PaddlePaddle Call Stacks:
0       0x7f96bfe40986p paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int) + 486
1       0x7f96bff2ec4ep paddle::framework::ParallelExecutor::FeedAndSplitTensorIntoLocalScopes(std::unordered_map<std::string, paddle::framework::LoDTensor, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, paddle::framework::LoDTensor> > > const&) + 1118
2       0x7f96bfe8e7f1p
3       0x7f96bfe7a7c0p
4       0x7f974c3d9bb8p PyEval_EvalFrameEx + 25016
5       0x7f974c3dd0bdp PyEval_EvalCodeEx + 2061
6       0x7f974c3da345p PyEval_EvalFrameEx + 26949
7       0x7f974c3da460p PyEval_EvalFrameEx + 27232
8       0x7f974c3dd0bdp PyEval_EvalCodeEx + 2061
9       0x7f974c3dd1f2p PyEval_EvalCode + 50
10      0x7f974c405f42p PyRun_FileExFlags + 146
11      0x7f974c4072d9p PyRun_SimpleFileExFlags + 217
12      0x7f974c41d00dp Py_Main + 3149
13      0x7f974b61abd5p __libc_start_main + 245
14            0x4007a1p

故改成paddle1.5进行训练

这样又遇到一个问题,预测的代码是C++ paddle1.2.0_pb32版本的 load模型的时候会报core,堆栈信息如下:

(gdb) bt
#0  0x00007ffc829783f7 in raise () from /opt/compiler/gcc-4.8.2/lib/libc.so.6
#1  0x00007ffc829797d8 in abort () from /opt/compiler/gcc-4.8.2/lib/libc.so.6
#2  0x00007ffc83268c65 in __gnu_cxx::__verbose_terminate_handler () at ../../../../libstdc++-v3/libsupc++/vterminate.cc:95
#3  0x00007ffc83266e06 in __cxxabiv1::__terminate (handler=<optimized out>) at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:38
#4  0x00007ffc83266e33 in std::terminate () at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:48
#5  0x00007ffc83267052 in __cxxabiv1::__cxa_throw (obj=0x7ffc740319b0, tinfo=0x1785710 <typeinfo for paddle::platform::EnforceNotMet>, dest=
    0x64ddb4 <paddle::platform::EnforceNotMet::~EnforceNotMet()>) at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:87
#6  0x000000000077c5f4 in paddle::framework::ExtractAttribute<std::vector<int, std::allocator<int> > >::operator()(boost::variant<boost::blank, int, float, std::string, std::vector<int, std::allocator<int> >, std::vector<float, std::allocator<float> >, std::vector<std::string, std::allocator<std::string> >, bool, std::vector<bool, std::allocator<bool> >, paddle::framework::BlockDesc*, long, std::vector<paddle::framework::BlockDesc*, std::allocator<paddle::framework::BlockDesc*> >, std::vector<long, std::allocator<long> > >&) const ()
#7  0x00000000007801cf in paddle::framework::TypedAttrChecker<std::vector<int, std::allocator<int> > >::operator()(std::unordered_map<std::string, boost::variant<boost::blank, int, float, std::string, std::vector<int, std::allocator<int> >, std::vector<float, std::allocator<float> >, std::vector<std::string, std::allocator<std::string> >, bool, std::vector<bool, std::allocator<bool> >, paddle::framework::BlockDesc*, long, std::vector<paddle::framework::BlockDesc*, std::allocator<paddle::framework::BlockDesc*> >, std::vector<long, std::allocator<long> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, boost::variant<boost::blank, int, float, std::string, std::vector<int, std::allocator<int> >, std::vector<float, std::allocator<float> >, std::vector<std::string, std::allocator<std::string> >, bool, std::vector<bool, std::allocator<bool> >, paddle::framework::BlockDesc*, long, std::vector<paddle::framework::BlockDesc*, std::allocator<paddle::framework::BlockDesc*> >, std::vector<long, std::allocator<long> > > > > >*) const ()
#8  0x000000000067a959 in paddle::framework::OpRegistry::CreateOp(std::string const&, std::map<std::string, std::vector<std::string, std::allocator<std::string> >, std::less<std::string>, std::allocator<std::pair<std::string const, std::vector<std::string, std::allocator<std::string> > > > > const&, std::map<std::string, std::vector<std::string, std::allocator<std::string> >, std::less<std::string>, std::allocator<std::pair<std::string const, std::vector<std::string, std::allocator<std::string> > > > > const&, std::unordered_map<std::string, boost::variant<boost::blank, int, float, std::string, std::vector<int, std::allocator<int> >, std::vector<float, std::allocator<float> >, std::vector<std::string, std::allocator<std::string> >, bool, std::vector<bool, std::allocator<bool> >, paddle::framework::BlockDesc*, long, std::vector<paddle::framework::BlockDesc*, std::allocator<paddle::framework::BlockDesc*> >, std::vector<long, std::allocator<long> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, boost::variant<boost::blank, int, float, std::string, std::vector<int, std::allocator<int> >, std::vector<float, std::allocator<float> >, std::vector<std::string, std::allocator<std::string> >, bool, std::vector<bool, std::allocator<bool> >, paddle::framework::BlockDesc*, long, std::vector<paddle::framework::BlockDesc*, std::allocator<paddle::framework::BlockDesc*> >, std::vector<long, std::allocator<long> > > > > >) ()
#9  0x000000000067aae3 in paddle::framework::OpRegistry::CreateOp(paddle::framework::OpDesc const&) ()
#10 0x0000000000653194 in paddle::framework::Executor::Prepare(paddle::framework::ProgramDesc const&, int, std::vector<std::string, std::allocator<std::string> > const&) ()
#11 0x000000000063b67e in visionary::lac::MainTagger::create_buff (this=0x19f0b00, buff=0x7ffc740008c0) at baidu/visionary/lac/src/main_tagger.cpp:100
#12 0x00000000006330ab in visionary::lac::Lac::create_buff (this=0x19f09d0) at baidu/visionary/lac/src/lac.cpp:107
#13 0x000000000062ef13 in visionary::lac::lac_buff_create (lac_handle=0x19f09d0) at baidu/visionary/lac/src/ilac.cpp:43
#14 0x0000000000629f3f in tagging (max_result_num=1000) at baidu/visionary/lac/tools/lac_class_demo.cpp:135
#15 0x000000000062a5b9 in thread_worker (arg=0x7fffa242eaf0) at baidu/visionary/lac/tools/lac_class_demo.cpp:205

辛苦看下如何解决这个问题。更换训练维paddle1.2的多cpu训练接口?还是更换预测库的维paddle1.5_pb32的呢?

指派人
分配到
无
里程碑
无
分配里程碑
工时统计
无
截止日期
无
标识: paddlepaddle/Paddle#19354
渝ICP备2023009037号

京公网安备11010502055752号

网络110报警服务 Powered by GitLab CE v13.7
开源知识
Git 入门 Pro Git 电子书 在线学 Git
Markdown 基础入门 IT 技术知识开源图谱
帮助
使用手册 反馈建议 博客
《GitCode 隐私声明》 《GitCode 服务条款》 关于GitCode
Powered by GitLab CE v13.7