cudaStreamSynchronize an illegal memory access was encountered errno:77
Created by: nizihan
为使您的问题得到快速解决,在建立Issues前,请您先通过如下方式搜索是否有相似问题:【搜索issue关键字】【使用labels筛选】【官方文档】
如果您没有查询到相似问题,为快速解决您的提问,建立issue时请提供如下细节信息:
- 标题:简洁、精准概括您的问题,例如“Insufficient Memory xxx" ”
- 版本、环境信息: 1)PaddlePaddle版本:请提供您的PaddlePaddle版本号,例如1.1或CommitID 2)CPU:预测若用CPU,请提供CPU型号,MKL/OpenBlas/MKLDNN/等数学库使用情况 3)GPU:预测若用GPU,请提供GPU型号、CUDA和CUDNN版本号 4)系统环境:请您描述系统类型、版本,例如Mac OS 10.14,Python版本
- 训练信息 1)单机/多机,单卡/多卡 2)显存信息 3)Operator信息
- 复现信息:如为报错,请给出复现环境、复现步骤
- 问题描述:请详细描述您的问题,同步贴出报错信息、日志、可复现的代码片段
Thank you for contributing to PaddlePaddle. Before submitting the issue, you could search issue in the github in case that there was a similar issue submitted or resolved before. If there is no solution,please make sure that this is a training issue including the following details: System information -PaddlePaddle 1.5.1 -GPU: including CUDA 9.0 /CUDNN v7 8卡P40
配置: NCCL_DEBUG=INFO NCCL_P2P_LEVEL=4 NCCL_SHM_DISABLE=1 #FLAGS_conv_workspace_size_limit=1024 FLAGS_fast_eager_deletion_mode=1 FLAGS_eager_delete_tensor_gb=0.0 FLAGS_fraction_of_gpu_memory_to_use=0.01 FLAGS_limit_of_tmp_allocation=0 is_auto_over_sell=0
训练了一小时之后突然报错cudaStreamSynchronize an illegal memory access was encountered errno:77
F0925 19:11:48.244735 610 device_context.cc:333] cudaStreamSynchronize an illegal memory access was encountered errno:77 *** Check failure stack trace: *** F0925 19:11:48.244750 612 device_context.cc:333] cudaStreamSynchronize an illegal memory access was encountered errno:77 *** Check failure stack trace: *** @ 0x7f0fcc6e2e9d google::LogMessage::Fail() @ 0x7f0fcc6e2e9d google::LogMessage::Fail() @ 0x7f0fcc6e694c google::LogMessage::SendToLog() @ 0x7f0fcc6e694c google::LogMessage::SendToLog() @ 0x7f0fcc6e29c3 google::LogMessage::Flush() @ 0x7f0fcc6e29c3 google::LogMessage::Flush() @ 0x7f0fcc6e7e5e google::LogMessageFatal::~LogMessageFatal() @ 0x7f0fcc6e7e5e google::LogMessageFatal::~LogMessageFatal() @ 0x7f0fce22c8ed _ZNSt17_Function_handlerIFvvEZNK6paddle8platform17CUDADeviceContext4WaitEvEUlvE_E9_M_invokeERKSt9_Any_data @ 0x7f0fce22c8ed _ZNSt17_Function_handlerIFvvEZNK6paddle8platform17CUDADeviceContext4WaitEvEUlvE_E9_M_invokeERKSt9_Any_data @ 0x7f0fce239a44 paddle::platform::TemporaryAllocator::Release() @ 0x7f0fce239a44 paddle::platform::TemporaryAllocator::Release() @ 0x7f0fce22f891 paddle::platform::CUDADeviceContext::Wait() @ 0x7f0fce22f891 paddle::platform::CUDADeviceContext::Wait() @ 0x7f0fce1b8741 paddle::framework::TransDataDevice() @ 0x7f0fce1b8741 paddle::framework::TransDataDevice() @ 0x7f0fce1b77de paddle::framework::TransformData() @ 0x7f0fce1b77de paddle::framework::TransformData() @ 0x7f0fce1aee2d paddle::framework::OperatorWithKernel::PrepareData() @ 0x7f0fce1aee2d paddle::framework::OperatorWithKernel::PrepareData() @ 0x7f0fce1aff5d paddle::framework::OperatorWithKernel::RunImpl() @ 0x7f0fce1aff5d paddle::framework::OperatorWithKernel::RunImpl() @ 0x7f0fce1b0401 paddle::framework::OperatorWithKernel::RunImpl() @ 0x7f0fce1b0401 paddle::framework::OperatorWithKernel::RunImpl() @ 0x7f0fce1ad9fc paddle::framework::OperatorBase::Run() @ 0x7f0fce1ad9fc paddle::framework::OperatorBase::Run() @ 0x7f0fcdfbc2da paddle::framework::details::ComputationOpHandle::RunImpl() @ 0x7f0fcdfbc2da paddle::framework::details::ComputationOpHandle::RunImpl() @ 0x7f0fcdfaec80 paddle::framework::details::OpHandleBase::Run() @ 0x7f0fcdfaec80 paddle::framework::details::OpHandleBase::Run() @ 0x7f0fcdf8fff6 paddle::framework::details::FastThreadedSSAGraphExecutor::RunOpSync() @ 0x7f0fcdf8fff6 paddle::framework::details::FastThreadedSSAGraphExecutor::RunOpSync() @ 0x7f0fcdf8ec5f paddle::framework::details::FastThreadedSSAGraphExecutor::RunOp() @ 0x7f0fcdf8ec5f paddle::framework::details::FastThreadedSSAGraphExecutor::RunOp() @ 0x7f0fcdf8f01f _ZNSt17_Function_handlerIFvvESt17reference_wrapperISt12_Bind_simpleIFS1_ISt5_BindIFZN6paddle9framework7details28FastThreadedSSAGraphExecutor10RunOpAsyncEPSt13unordered_mapIPNS6_12OpHandleBaseESt6atomicIiESt4hashISA_ESt8equal_toISA_ESaISt4pairIKSA_SC_EEESA_RKSt10shared_ptrINS5_13BlockingQueueImEEEEUlvE_vEEEvEEEE9_M_invokeERKSt9_Any_data @ 0x7f0fcdf8f01f _ZNSt17_Function_handlerIFvvESt17reference_wrapperISt12_Bind_simpleIFS1_ISt5_BindIFZN6paddle9framework7details28FastThreadedSSAGraphExecutor10RunOpAsyncEPSt13unordered_mapIPNS6_12OpHandleBaseESt6atomicIiESt4hashISA_ESt8equal_toISA_ESaISt4pairIKSA_SC_EEESA_RKSt10shared_ptrINS5_13BlockingQueueImEEEEUlvE_vEEEvEEEE9_M_invokeERKSt9_Any_data @ 0x7f0fcd86fa83 std::_Function_handler<>::_M_invoke() @ 0x7f0fcd86fa83 std::_Function_handler<>::_M_invoke() @ 0x7f0fcc6684f7 std::__future_base::_State_base::_M_do_set() @ 0x7f0fcc6684f7 std::__future_base::_State_base::_M_do_set() @ 0x7f108cb58e03 __pthread_once_internal @ 0x7f108cb58e03 __pthread_once_internal @ 0x7f0fcdf8a6a2 _ZNSt13__future_base11_Task_stateISt5_BindIFZN6paddle9framework7details28FastThreadedSSAGraphExecutor10RunOpAsyncEPSt13unordered_mapIPNS4_12OpHandleBaseESt6atomicIiESt4hashIS8_ESt8equal_toIS8_ESaISt4pairIKS8_SA_EEES8_RKSt10shared_ptrINS3_13BlockingQueueImEEEEUlvE_vEESaIiEFvvEE6_M_runEv @ 0x7f0fcdf8a6a2 _ZNSt13__future_base11_Task_stateISt5_BindIFZN6paddle9framework7details28FastThreadedSSAGraphExecutor10RunOpAsyncEPSt13unordered_mapIPNS4_12OpHandleBaseESt6atomicIiESt4hashIS8_ESt8equal_toIS8_ESaISt4pairIKS8_SA_EEES8_RKSt10shared_ptrINS3_13BlockingQueueImEEEEUlvE_vEESaIiEFvvEE6_M_runEv @ 0x7f0fcc669a74 _ZZN10ThreadPoolC1EmENKUlvE_clEv @ 0x7f0fcc669a74 _ZZN10ThreadPoolC1EmENKUlvE_clEv @ 0x7f1001985470 (unknown) @ 0x7f1001985470 (unknown) @ 0x7f108cb53aa1 start_thread @ 0x7f108cb53aa1 start_thread @ 0x7f108c215c4d clone @ 0x7f108c215c4d clone @ (nil) (unknown)