NCCL WARN Error : mixing different streams within a group call is not supported
Created by: ellinyang
分布式训练yolov3模型时报错
- 版本、环境信息: 1)PaddlePaddle版本:Fluid1.5.1 2)GPU:p4, cuda9.0, cudnn7 4)系统环境:python2.7 dockr :paddlepaddle/paddle:1.5.1-gpu-cuda9.0-cudnn7
- 训练信息 1)单机多卡 2)batch_size =6; input_size=608
- 复现信息:单机训练正常,开启分布式时错误报错
- 问题描述:
[1] enqueue.cc:373 NCCL WARN Error : mixing different streams within a group call is not supported.
[1] NCCL INFO enqueue.cc:429 -> 5
terminate called after throwing an instance of 'paddle::platform::EnforceNotMet'
what(): invalid usage at [/paddle/paddle/fluid/platform/nccl_helper.h:70]
PaddlePaddle Call Stacks:
0 0x7f6d6d188ad0p void paddle::platform::EnforceNotMet::Init<char const*>(char const*, char const*, int) + 352
1 0x7f6d6d188e49p paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int) + 137
2 0x7f6d6d328d58p paddle::platform::NCCLGroupGuard::~NCCLGroupGuard() + 328
3 0x7f6d6eecba17p
4 0x7f6d6eedb5ddp
5 0x7f6d6eedb5ddp
6 0x7f6d6eedc314p paddle::framework::details::OpHandleBase::RunAndRecordEvent(std::function<void ()> const&) + 116
7 0x7f6d6eecbc22p paddle::framework::details::AllReduceOpHandle::RunAllReduceFuncs(std::vector<std::function<void ()>, std::allocator<std::function<void ()> > > const&) + 98
8 0x7f6d6eecd728p paddle::framework::details::AllReduceOpHandle::RunImpl() + 3176
9 0x7f6d6eedc8b0p paddle::framework::details::OpHandleBase::Run(bool) + 160
10 0x7f6d6eebdc26p paddle::framework::details::FastThreadedSSAGraphExecutor::RunOpSync(paddle::framework::details::OpHandleBase*) + 310
11 0x7f6d6eebc88fp paddle::framework::details::FastThreadedSSAGraphExecutor::RunOp(paddle::framework::details::OpHandleBase*, std::shared_ptr<paddle::framework::BlockingQueue<unsigned long> > const&, unsigned long*) + 47
12 0x7f6d6eebcc4fp
13 0x7f6d6d3bd983p std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, void> >::_M_invoke(std::_Any_data const&) + 35
14 0x7f6d6d253b27p std::__future_base::_State_base::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>&, bool&) + 39
15 0x7f6dde518a99p
16 0x7f6d6eeb82d2p
17 0x7f6d6d2550a4p ThreadPool::ThreadPool(unsigned long)::{lambda()#1}::operator()() const + 404
18 0x7f6d9ac90c80p
19 0x7f6dde5116bap
20 0x7f6dde24741dp clone + 109