在训练PPYOLO的过程中出现cudaStreamSynchronize misaligned address错误
Created by: yeyupiaoling
-
Ubuntu 18.04
-
本地CUDA 10.0, anaconda 虚拟环境 cudatoolkit=10.0
-
NCCL 2.4.8 for CUDA 10.0
-
Python 3.7
-
PaddlePaddle 1.8.4.post107
-
双卡,如下:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.66 Driver Version: 450.66 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce RTX 208... Off | 00000000:01:00.0 Off | N/A |
| 51% 53C P2 60W / 250W | 9859MiB / 11016MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 GeForce RTX 208... Off | 00000000:05:00.0 Off | N/A |
| 54% 54C P2 93W / 250W | 9802MiB / 11019MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
错误日志:
is_profiler: 0
loss_scale: 8.0
opt: {}
output_eval: save_models/eval
profiler_path: save_models/detection.profiler
resume_checkpoint: None
use_vdl: True
vdl_log_dir: logs/scalar
------------------------------------------------
2020-09-09 17:40:55,396-INFO: If regularizer of a Parameter has been set by 'fluid.ParamAttr' or 'fluid.WeightNormParamAttr' already. The Regularization[L2Decay, regularization_coeff=0.000500] in Optimizer will not take effect, and it will only be applied to other Parameters!
2020-09-09 17:40:57,569-INFO: places would be ommited when DataLoader is not iterable
W0909 17:40:57.609563 1560 device_context.cc:252] Please NOTE: device: 0, CUDA Capability: 75, Driver API Version: 11.0, Runtime API Version: 10.0
W0909 17:40:57.648514 1560 device_context.cc:260] device: 0, cuDNN Version: 7.6.
2020-09-09 17:40:59,413-WARNING: /home/test/.cache/paddle/weights/ResNet50_vd_ssld_pretrained.pdparams not found, try to load model file saved with [ save_params, save_persistables, save_vars ]
/home/test/anaconda3/envs/PaddlePaddle/lib/python3.7/site-packages/paddle/fluid/io.py:1998: UserWarning: This list is not set, Because of Paramerter not found in program. There are: fc_0.b_0 fc_0.w_0
format(" ".join(unused_para_list)))
2020-09-09 17:41:05,690-INFO: places would be ommited when DataLoader is not iterable
I0909 17:41:06.986269 1560 build_strategy.cc:361] set enable_sequential_execution:1
W0909 17:41:07.356726 1560 fuse_all_reduce_op_pass.cc:74] Find all_reduce operators: 240. To make the speed faster, some all_reduce ops are fused during training, after fusion, the number of all_reduce ops is 171.
2020-09-09 17:41:13,339-INFO: iter: 0, lr: 0.000000, 'loss_xy': '1.000268', 'loss_wh': '3.644889', 'loss_obj': '4445.346191', 'loss_cls': '7.542771', 'loss_iou': '3.998751', 'loss_iou_aware': '2.708143', 'loss': '4464.240723', time: 0.005, eta: 0:43:14
2020-09-09 17:42:07,639-INFO: iter: 100, lr: 0.000025, 'loss_xy': '1.032357', 'loss_wh': '2.916426', 'loss_obj': '10.692862', 'loss_cls': '7.065003', 'loss_iou': '4.001646', 'loss_iou_aware': '4.178246', 'loss': '30.555357', time: 0.615, eta: 3 days, 13:23:23
2020-09-09 17:43:02,254-INFO: iter: 200, lr: 0.000050, 'loss_xy': '0.994666', 'loss_wh': '2.304071', 'loss_obj': '9.268065', 'loss_cls': '6.182310', 'loss_iou': '3.792430', 'loss_iou_aware': '2.720503', 'loss': '25.305017', time: 0.544, eta: 3 days, 3:33:07
F0909 17:43:09.341343 1592 all_reduce_op_handle.cc:192] cudaStreamSynchronize misaligned address
*** Check failure stack trace: ***
@ 0x7f78ccc0783d google::LogMessage::Fail()
@ 0x7f78ccc0b2ec google::LogMessage::SendToLog()
@ 0x7f78ccc07363 google::LogMessage::Flush()
@ 0x7f78ccc0c7fe google::LogMessageFatal::~LogMessageFatal()
@ 0x7f78cfed4cbc paddle::framework::details::AllReduceOpHandle::SyncNCCLAllReduce()
@ 0x7f78cfed4d54 paddle::framework::details::AllReduceOpHandle::NCCLAllReduceFunc()
@ 0x7f78cfed5d16 paddle::framework::details::AllReduceOpHandle::AllReduceFunc()
@ 0x7f78cfe3367c paddle::framework::details::FusedAllReduceOpHandle::FusedAllReduceFunc()
@ 0x7f78cfe34c5e paddle::framework::details::FusedAllReduceOpHandle::RunImpl()
@ 0x7f78cfe70aa1 paddle::framework::details::FastThreadedSSAGraphExecutor::RunOpSync()
@ 0x7f78cfe6e59f paddle::framework::details::FastThreadedSSAGraphExecutor::RunOp()
@ 0x7f78cfe6e864 _ZNSt17_Function_handlerIFvvESt17reference_wrapperISt12_Bind_simpleIFS1_ISt5_BindIFZN6paddle9framework7details28FastThreadedSSAGraphExecutor10RunOpAsyncEPSt13unordered_mapIPNS6_12OpHandleBaseESt6atomicIiESt4hashISA_ESt8equal_toISA_ESaISt4pairIKSA_SC_EEESA_RKSt10shared_ptrINS5_13BlockingQueueImEEEEUlvE_vEEEvEEEE9_M_invokeERKSt9_Any_data
@ 0x7f78ccc65413 std::_Function_handler<>::_M_invoke()
@ 0x7f78cca5f107 std::__future_base::_State_base::_M_do_set()
@ 0x7f7918f17827 __pthread_once_slow
@ 0x7f78cfe6aa32 _ZNSt13__future_base11_Task_stateISt5_BindIFZN6paddle9framework7details28FastThreadedSSAGraphExecutor10RunOpAsyncEPSt13unordered_mapIPNS4_12OpHandleBaseESt6atomicIiESt4hashIS8_ESt8equal_toIS8_ESaISt4pairIKS8_SA_EEES8_RKSt10shared_ptrINS3_13BlockingQueueImEEEEUlvE_vEESaIiEFvvEE6_M_runEv
@ 0x7f78cca61564 _ZZN10ThreadPoolC1EmENKUlvE_clEv
@ 0x7f790a2e4421 execute_native_thread_routine_compat
@ 0x7f7918f0f6db start_thread
@ 0x7f7918c38a3f clone
@ (nil) (unknown)