Skip to content

  • 体验新版
    • 正在加载...
  • 登录
  • PaddlePaddle
  • Paddle
  • Issue
  • #16364

P
Paddle
  • 项目概览

PaddlePaddle / Paddle
大约 2 年 前同步成功

通知 2325
Star 20933
Fork 5424
  • 代码
    • 文件
    • 提交
    • 分支
    • Tags
    • 贡献者
    • 分支图
    • Diff
  • Issue 1423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
  • Wiki 0
    • Wiki
  • 分析
    • 仓库
    • DevOps
  • 项目成员
  • Pages
P
Paddle
  • 项目概览
    • 项目概览
    • 详情
    • 发布
  • 仓库
    • 仓库
    • 文件
    • 提交
    • 分支
    • 标签
    • 贡献者
    • 分支图
    • 比较
  • Issue 1,423
    • Issue 1,423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
    • 合并请求 543
  • Pages
  • 分析
    • 分析
    • 仓库分析
    • DevOps
  • Wiki 0
    • Wiki
  • 成员
    • 成员
  • 收起侧边栏
  • 动态
  • 分支图
  • 创建新Issue
  • 提交
  • Issue看板
已关闭
开放中
Opened 3月 21, 2019 by saxon_zh@saxon_zhGuest

clip op在多机下报错

Created by: ccmeteorljh

paddle version:1.3.0 复现模型: deep_attention_matching实现多机版本; https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleNLP/deep_attention_matching_net/train_and_evaluate.py 报错如下:

get_pserver_program() is deprecated, call get_pserver_programs() to get pserver main and startup in a single call.
I0321 13:13:49.156570 62757 grpc_server.cc:430] Server listening on 127.0.0.1:9122 selected port: 9122
F0321 13:14:45.583297 63121 listen_and_serv_op.cc:74] run sub program:60 error Invoke operator clip error.
Python Callstacks: 
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/framework.py", line 1317, in append_op
    attrs=kwargs.get("attrs", None))
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/transpiler/distribute_transpiler.py", line 1928, in _append_pserver_non_opt_ops
    attrs=opt_op.all_attrs())
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/transpiler/distribute_transpiler.py", line 775, in __append_optimize_op__
    self._append_pserver_non_opt_ops(block, op)
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/transpiler/distribute_transpiler.py", line 845, in get_pserver_program
    lr_ops)
  File "/home/work/ljh_test/baidu/paddle/test/cts_test/dist_base.py", line 72, in run_pserver
    pserver_prog = t.get_pserver_program(current_endpoint)
  File "/home/work/ljh_test/baidu/paddle/test/cts_test/dist_base.py", line 390, in runtime_main
    model.run_pserver(endpoints, trainers, current_endpoint, trainer_id, run_params)
  File "dist_deep_attention_matching.py", line 175, in <module>
    runtime_main(TestDistDeepAttentionMatching)
C++ Callstacks: 
holder_ should not be null
Tensor holds no memory. Call Tensor::mutable_data first. at [/paddle/paddle/fluid/framework/tensor.cc:23]
PaddlePaddle Call Stacks: 
0       0x7faf69e5ce2dp void paddle::platform::EnforceNotMet::Init<std::string>(std::string, char const*, int) + 365
1       0x7faf69e5d177p paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const*, int) + 87
2       0x7faf6b8ca9a6p paddle::framework::Tensor::check_memory_size() const + 182
3       0x7faf6a040b8ap paddle::operators::ClipKernel<paddle::platform::CPUDeviceContext, float>::Compute(paddle::framework::ExecutionContext const&) const + 858
4       0x7faf6a041283p std::_Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CPUPlace, false, 0ul, paddle::operators::ClipKernel<paddle::platform::CPUDeviceContext, float> >::operator()(char const*, char const*, int) const::{lambda(paddle::framework::ExecutionContext const&)#1}>::_M_invoke(std::_Any_data const&, paddle::framework::ExecutionContext const&) + 35
5       0x7faf6b86dbb3p paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) const + 659
6       0x7faf6b86b425p paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) + 341
7       0x7faf69f7c33ap paddle::framework::Executor::RunPreparedContext(paddle::framework::ExecutorPrepareContext*, paddle::framework::Scope*, bool, bool, bool) + 218
8       0x7faf6aa2c692p
9       0x7faf6aa3394ap std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<std::unique_ptr<paddle::platform::EnforceNotMet, std::default_delete<paddle::platform::EnforceNotMet> > >, std::__future_base::_Result_base::_Deleter>, std::unique_ptr<paddle::platform::EnforceNotMet, std::default_delete<paddle::platform::EnforceNotMet> > > >::_M_invoke(std::_Any_data const&) + 42
10      0x7faf6a9f65a7p std::__future_base::_State_base::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>&, bool&) + 39
11      0x7fafd3abaa99p
12      0x7faf6aa2b7d2p
13      0x7faf6b8c32c9p paddle::framework::ThreadPool::TaskLoop() + 1689
14      0x7faf850f57e0p
15      0x7fafd3ab36bap
16      0x7fafd37e941dp clone + 109
*** Check failure stack trace: ***
    @     0x7faf69f3731d  google::LogMessage::Fail()
    @     0x7faf69f3adcc  google::LogMessage::SendToLog()
    @     0x7faf69f36e43  google::LogMessage::Flush()
    @     0x7faf69f3c2de  google::LogMessageFatal::~LogMessageFatal()
    @     0x7faf6aa2c730  _ZNSt17_Function_handlerIFSt10unique_ptrIN6paddle8platform13EnforceNotMetESt14default_deleteIS3_EEvESt17reference_wrapperISt12_Bind_simpleIFS8_IZNS1_9framework10ThreadPool18RunAndGetExceptionIZNS1_9operatorsL21ParallelExecuteBlocksERKSt6vectorImSaImEEPNSA_8ExecutorERKSE_ISt10shared_ptrINSA_22ExecutorPrepareContextEESaISN_EEPNSA_11ProgramDescEPNSA_5ScopeEEUlvE_EESt6futureIS6_ET_EUlvE_EvEEEE9_M_invokeERKSt9_Any_data
    @     0x7faf6aa3394a  std::_Function_handler<>::_M_invoke()
    @     0x7faf6a9f65a7  std::__future_base::_State_base::_M_do_set()
    @     0x7fafd3abaa99  __pthread_once_slow
    @     0x7faf6aa2b7d2  _ZNSt13__future_base11_Task_stateIZN6paddle9framework10ThreadPool18RunAndGetExceptionIZNS1_9operatorsL21ParallelExecuteBlocksERKSt6vectorImSaImEEPNS2_8ExecutorERKS6_ISt10shared_ptrINS2_22ExecutorPrepareContextEESaISF_EEPNS2_11ProgramDescEPNS2_5ScopeEEUlvE_EESt6futureISt10unique_ptrINS1_8platform13EnforceNotMetESt14default_deleteISS_EEET_EUlvE_SaIiEFSV_vEE6_M_runEv
    @     0x7faf6b8c32c9  paddle::framework::ThreadPool::TaskLoop()
    @     0x7faf850f57e0  execute_native_thread_routine
    @     0x7fafd3ab36ba  start_thread
    @     0x7fafd37e941d  clone
    @              (nil)  (unknown)
```
**去掉包含clip的如下代码后可正确运行:**
```bash
fluid.clip.set_gradient_clip(clip=fluid.clip.GradientClipByValue(
                max=1.0, min=-1.0))
```
指派人
分配到
无
里程碑
无
分配里程碑
工时统计
无
截止日期
无
标识: paddlepaddle/Paddle#16364
渝ICP备2023009037号

京公网安备11010502055752号

网络110报警服务 Powered by GitLab CE v13.7
开源知识
Git 入门 Pro Git 电子书 在线学 Git
Markdown 基础入门 IT 技术知识开源图谱
帮助
使用手册 反馈建议 博客
《GitCode 隐私声明》 《GitCode 服务条款》 关于GitCode
Powered by GitLab CE v13.7