训练报错,寻求帮助!!!
Created by: zgsxwsdxg
那位技术同仁帮忙看下,定位下问题,谢谢;我的训练日志如下: 2020-09-18 18:59:45,566-INFO: If regularizer of a Parameter has been set by 'fluid.ParamAttr' or 'fluid.WeightNormParamAttr' already. The Regularization[L2Decay, regularization_coeff=0.000500] in Optimizer will not take effect, and it will only be applied to other Parameters! loading annotations into memory... Done (t=0.00s) creating index... index created! 2020-09-18 18:59:50,686-INFO: places would be ommited when DataLoader is not iterable W0918 18:59:50.764096 10283 device_context.cc:252] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 10.1, Runtime API Version: 10.0 W0918 18:59:50.876243 10283 device_context.cc:260] device: 0, cuDNN Version: 7.6. 2020-09-18 18:59:56,455-INFO: Downloading ResNet50_vd_ssld_pretrained.tar from https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_ssld_pretrained.tar
0%| | 0/92837 [00:00<?, ?KB/s] 1%| | 755/92837 [00:00<00:12, 7484.04KB/s] 4%|▍ | 3683/92837 [00:00<00:09, 9635.64KB/s] 8%|▊ | 7731/92837 [00:00<00:06, 12490.43KB/s] 10%|█ | 9670/92837 [00:00<00:08, 9338.40KB/s] 12%|█▏ | 11204/92837 [00:00<00:10, 7928.85KB/s] 13%|█▎ | 12454/92837 [00:01<00:11, 7175.30KB/s] 15%|█▍ | 13505/92837 [00:01<00:11, 6747.20KB/s] 16%|█▌ | 14419/92837 [00:01<00:12, 6426.71KB/s] 16%|█▋ | 15232/92837 [00:01<00:12, 6277.22KB/s] 17%|█▋ | 15980/92837 [00:01<00:12, 6140.45KB/s] 18%|█▊ | 16679/92837 [00:01<00:12, 6045.80KB/s] 19%|█▊ | 17344/92837 [00:01<00:12, 6015.28KB/s] 19%|█▉ | 17988/92837 [00:02<00:12, 5938.82KB/s] 20%|██ | 18612/92837 [00:02<00:12, 5918.70KB/s] 21%|██ | 19225/92837 [00:02<00:12, 5914.43KB/s] 21%|██▏ | 19831/92837 [00:02<00:12, 5898.27KB/s] 22%|██▏ | 20431/92837 [00:02<00:12, 5909.86KB/s] 23%|██▎ | 21030/92837 [00:02<00:12, 5876.63KB/s] 23%|██▎ | 21623/92837 [00:02<00:12, 5877.34KB/s] 24%|██▍ | 22215/92837 [00:02<00:12, 5874.68KB/s] 25%|██▍ | 22805/92837 [00:02<00:11, 5867.35KB/s] 25%|██▌ | 23395/92837 [00:02<00:11, 5861.26KB/s] 26%|██▌ | 23987/92837 [00:03<00:11, 5863.60KB/s] 26%|██▋ | 24579/92837 [00:03<00:11, 5865.08KB/s] 27%|██▋ | 25171/92837 [00:03<00:11, 5864.32KB/s] 28%|██▊ | 25763/92837 [00:03<00:11, 5867.06KB/s] 28%|██▊ | 26355/92837 [00:03<00:11, 5867.97KB/s] 29%|██▉ | 26947/92837 [00:03<00:11, 5871.57KB/s] 30%|██▉ | 27539/92837 [00:03<00:11, 5871.02KB/s] 30%|███ | 28131/92837 [00:03<00:11, 5870.32KB/s] 31%|███ | 28723/92837 [00:03<00:10, 5869.79KB/s] 32%|███▏ | 29315/92837 [00:03<00:10, 5869.68KB/s] 32%|███▏ | 29907/92837 [00:04<00:10, 5869.44KB/s] 33%|███▎ | 30499/92837 [00:04<00:10, 5869.44KB/s] 33%|███▎ | 31091/92837 [00:04<00:10, 5868.58KB/s] 34%|███▍ | 31683/92837 [00:04<00:10, 5869.02KB/s] 35%|███▍ | 32275/92837 [00:04<00:10, 5868.59KB/s] 35%|███▌ | 32867/92837 [00:04<00:10, 5869.07KB/s] 36%|███▌ | 33459/92837 [00:04<00:10, 5872.93KB/s] 37%|███▋ | 34051/92837 [00:04<00:10, 5869.56KB/s] 37%|███▋ | 34643/92837 [00:04<00:09, 5871.04KB/s] 38%|███▊ | 35235/92837 [00:04<00:09, 5868.33KB/s] 39%|███▊ | 35827/92837 [00:05<00:09, 5868.62KB/s] 39%|███▉ | 36419/92837 [00:05<00:09, 5868.71KB/s] 40%|███▉ | 37011/92837 [00:05<00:09, 5868.72KB/s] 41%|████ | 37603/92837 [00:05<00:09, 5867.43KB/s] 41%|████ | 38195/92837 [00:05<00:09, 5866.92KB/s] 42%|████▏ | 38787/92837 [00:05<00:09, 5868.43KB/s] 42%|████▏ | 39379/92837 [00:05<00:09, 5872.50KB/s] 43%|████▎ | 39971/92837 [00:05<00:09, 5870.84KB/s] 44%|████▎ | 40563/92837 [00:05<00:08, 5870.22KB/s] 44%|████▍ | 41155/92837 [00:05<00:08, 5869.61KB/s] 45%|████▍ | 41747/92837 [00:06<00:08, 5868.80KB/s] 46%|████▌ | 42339/92837 [00:06<00:08, 5868.92KB/s] 46%|████▌ | 42931/92837 [00:06<00:08, 5869.12KB/s] 47%|████▋ | 43523/92837 [00:06<00:08, 5868.69KB/s] 48%|████▊ | 44115/92837 [00:06<00:08, 5868.63KB/s] 48%|████▊ | 44707/92837 [00:06<00:08, 5868.87KB/s] 49%|████▉ | 45299/92837 [00:06<00:08, 5872.47KB/s] 49%|████▉ | 45891/92837 [00:06<00:07, 5870.03KB/s] 50%|█████ | 46483/92837 [00:06<00:07, 5868.72KB/s] 51%|█████ | 47075/92837 [00:07<00:07, 5868.74KB/s] 51%|█████▏ | 47667/92837 [00:07<00:07, 5868.84KB/s] 52%|█████▏ | 48259/92837 [00:07<00:07, 5868.86KB/s] 53%|█████▎ | 48851/92837 [00:07<00:07, 5868.93KB/s] 53%|█████▎ | 49443/92837 [00:07<00:07, 5868.57KB/s] 54%|█████▍ | 50035/92837 [00:07<00:07, 5868.95KB/s] 55%|█████▍ | 50627/92837 [00:07<00:07, 5868.66KB/s] 55%|█████▌ | 51219/92837 [00:07<00:07, 5872.15KB/s] 56%|█████▌ | 51811/92837 [00:07<00:06, 5871.01KB/s] 56%|█████▋ | 52403/92837 [00:07<00:06, 5870.38KB/s] 57%|█████▋ | 52995/92837 [00:08<00:06, 5869.70KB/s] 58%|█████▊ | 53587/92837 [00:08<00:06, 5869.40KB/s] 58%|█████▊ | 54179/92837 [00:08<00:06, 5869.45KB/s] 59%|█████▉ | 54771/92837 [00:08<00:06, 5867.67KB/s] 60%|█████▉ | 55363/92837 [00:08<00:06, 5868.21KB/s] 60%|██████ | 55955/92837 [00:08<00:06, 5866.44KB/s] 61%|██████ | 56547/92837 [00:08<00:06, 5867.18KB/s] 62%|██████▏ | 57139/92837 [00:08<00:06, 5872.02KB/s] 62%|██████▏ | 57731/92837 [00:08<00:05, 5870.58KB/s] 63%|██████▎ | 58323/92837 [00:08<00:05, 5870.25KB/s] 63%|██████▎ | 58915/92837 [00:09<00:05, 5869.78KB/s] 64%|██████▍ | 59507/92837 [00:09<00:05, 5869.07KB/s] 65%|██████▍ | 60099/92837 [00:09<00:05, 5869.27KB/s] 65%|██████▌ | 60691/92837 [00:09<00:05, 5869.23KB/s] 66%|██████▌ | 61283/92837 [00:09<00:05, 5868.96KB/s] 67%|██████▋ | 61875/92837 [00:09<00:05, 5868.95KB/s] 67%|██████▋ | 62467/92837 [00:09<00:05, 5868.96KB/s] 68%|██████▊ | 63059/92837 [00:09<00:05, 5868.51KB/s] 69%|██████▊ | 63651/92837 [00:09<00:04, 5872.88KB/s] 69%|██████▉ | 64243/92837 [00:09<00:04, 5871.61KB/s] 70%|██████▉ | 64835/92837 [00:10<00:04, 5870.47KB/s] 70%|███████ | 65427/92837 [00:10<00:04, 5870.54KB/s] 71%|███████ | 66019/92837 [00:10<00:04, 5869.16KB/s] 72%|███████▏ | 66611/92837 [00:10<00:04, 5869.42KB/s] 72%|███████▏ | 67203/92837 [00:10<00:04, 5868.65KB/s] 73%|███████▎ | 67795/92837 [00:10<00:04, 5868.59KB/s] 74%|███████▎ | 68387/92837 [00:10<00:04, 5867.95KB/s] 74%|███████▍ | 68979/92837 [00:10<00:04, 5868.09KB/s] 75%|███████▍ | 69571/92837 [00:10<00:03, 5872.13KB/s] 76%|███████▌ | 70163/92837 [00:10<00:03, 5870.56KB/s] 76%|███████▌ | 70755/92837 [00:11<00:03, 5870.18KB/s] 77%|███████▋ | 71347/92837 [00:11<00:03, 5869.78KB/s] 77%|███████▋ | 71939/92837 [00:11<00:03, 5868.76KB/s] 78%|███████▊ | 72531/92837 [00:11<00:03, 5868.85KB/s] 79%|███████▉ | 73123/92837 [00:11<00:03, 5868.57KB/s] 79%|███████▉ | 73715/92837 [00:11<00:03, 5868.71KB/s] 80%|████████ | 74307/92837 [00:11<00:03, 5868.91KB/s] 81%|████████ | 74899/92837 [00:11<00:03, 5867.64KB/s] 81%|████████▏ | 75491/92837 [00:11<00:02, 5872.29KB/s] 82%|████████▏ | 76083/92837 [00:11<00:02, 5871.00KB/s] 83%|████████▎ | 76675/92837 [00:12<00:02, 5870.68KB/s] 83%|████████▎ | 77267/92837 [00:12<00:02, 5869.76KB/s] 84%|████████▍ | 77859/92837 [00:12<00:02, 5869.78KB/s] 85%|████████▍ | 78451/92837 [00:12<00:02, 5869.60KB/s] 85%|████████▌ | 79043/92837 [00:12<00:02, 5869.40KB/s] 86%|████████▌ | 79635/92837 [00:12<00:02, 5869.00KB/s] 86%|████████▋ | 80227/92837 [00:12<00:02, 5868.57KB/s] 87%|████████▋ | 80819/92837 [00:12<00:02, 5868.27KB/s] 88%|████████▊ | 81411/92837 [00:12<00:01, 5873.18KB/s] 88%|████████▊ | 82003/92837 [00:12<00:01, 5871.67KB/s] 89%|████████▉ | 82595/92837 [00:13<00:01, 5870.87KB/s] 90%|████████▉ | 83187/92837 [00:13<00:01, 5870.24KB/s] 90%|█████████ | 83779/92837 [00:13<00:01, 5869.72KB/s] 91%|█████████ | 84371/92837 [00:13<00:01, 5869.32KB/s] 92%|█████████▏| 84963/92837 [00:13<00:01, 5869.49KB/s] 92%|█████████▏| 85555/92837 [00:13<00:01, 5868.45KB/s] 93%|█████████▎| 86147/92837 [00:13<00:01, 5869.15KB/s] 93%|█████████▎| 86739/92837 [00:13<00:01, 5869.19KB/s] 94%|█████████▍| 87331/92837 [00:13<00:00, 5869.29KB/s] 95%|█████████▍| 87923/92837 [00:13<00:00, 5873.25KB/s] 95%|█████████▌| 88515/92837 [00:14<00:00, 5871.75KB/s] 96%|█████████▌| 89107/92837 [00:14<00:00, 5870.95KB/s] 97%|█████████▋| 89699/92837 [00:14<00:00, 5869.85KB/s] 97%|█████████▋| 90291/92837 [00:14<00:00, 5869.47KB/s] 98%|█████████▊| 90883/92837 [00:14<00:00, 5869.38KB/s] 99%|█████████▊| 91475/92837 [00:14<00:00, 5869.17KB/s] 99%|█████████▉| 92067/92837 [00:14<00:00, 5869.13KB/s] 100%|█████████▉| 92659/92837 [00:14<00:00, 5868.70KB/s] 100%|██████████| 92837/92837 [00:14<00:00, 6272.83KB/s] 2020-09-18 19:00:11,565-INFO: Decompressing /home/ubuntu/.cache/paddle/weights/ResNet50_vd_ssld_pretrained.tar... 2020-09-18 19:00:12,866-WARNING: /home/ubuntu/.cache/paddle/weights/ResNet50_vd_ssld_pretrained.pdparams not found, try to load model file saved with [ save_params, save_persistables, save_vars ] /home/ubuntu/anaconda3/envs/py3.7-paddle-1.8/lib/python3.7/site-packages/paddle/fluid/io.py:1998: UserWarning: This list is not set, Because of Paramerter not found in program. There are: fc_0.w_0 fc_0.b_0 format(" ".join(unused_para_list))) loading annotations into memory... Done (t=0.00s) creating index... index created! 2020-09-18 19:00:18,819-INFO: places would be ommited when DataLoader is not iterable W0918 19:00:27.958123 10283 dynamic_loader.cc:167] You may need to install 'nccl2' from NVIDIA official website: https://developer.nvidia.com/nccl/nccl-downloadbefore install PaddlePaddle. /home/ubuntu/anaconda3/envs/py3.7-paddle-1.8/lib/python3.7/site-packages/paddle/fluid/executor.py:1070: UserWarning: The following exception is not an EOF exception. "The following exception is not an EOF exception.") Traceback (most recent call last): File "tools/train.py", line 372, in main() File "tools/train.py", line 245, in main outs = exe.run(compiled_train_prog, fetch_list=train_values) File "/home/ubuntu/anaconda3/envs/py3.7-paddle-1.8/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1071, in run six.reraise(*sys.exc_info()) File "/home/ubuntu/anaconda3/envs/py3.7-paddle-1.8/lib/python3.7/site-packages/six.py", line 703, in reraise raise value File "/home/ubuntu/anaconda3/envs/py3.7-paddle-1.8/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1066, in run return_merged=return_merged) File "/home/ubuntu/anaconda3/envs/py3.7-paddle-1.8/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1156, in _run_impl program._compile(scope, self.place) File "/home/ubuntu/anaconda3/envs/py3.7-paddle-1.8/lib/python3.7/site-packages/paddle/fluid/compiler.py", line 443, in _compile places=self._places) File "/home/ubuntu/anaconda3/envs/py3.7-paddle-1.8/lib/python3.7/site-packages/paddle/fluid/compiler.py", line 396, in _compile_data_parallel self._exec_strategy, self._build_strategy, self._graph) paddle.fluid.core_avx.EnforceNotMet:
C++ Call Stacks (More useful to developers):
0 std::string paddle::platform::GetTraceBackStringstd::string(std::string&&, char const*, int) 1 paddle::platform::EnforceNotMet::EnforceNotMet(paddle::platform::ErrorSummary const&, char const*, int) 2 paddle::platform::dynload::GetNCCLDsoHandle() 3 void std::__once_call_impl<std::_Bind_simple<paddle::platform::dynload::DynLoad__ncclCommInitAll::operator()<ncclComm**, int, int*>(ncclComm**, int, int*)::{lambda()#1} ()> >() 4 paddle::platform::NCCLContextMap::NCCLContextMap(std::vector<paddle::platform::Place, std::allocatorpaddle::platform::Place > const&, ncclUniqueId*, unsigned long, unsigned long) 5 paddle::platform::NCCLCommunicator::InitFlatCtxs(std::vector<paddle::platform::Place, std::allocatorpaddle::platform::Place > const&, std::vector<ncclUniqueId*, std::allocator<ncclUniqueId*> > const&, unsigned long, unsigned long) 6 paddle::framework::ParallelExecutorPrivate::InitNCCLCtxs(paddle::framework::Scope*, paddle::framework::details::BuildStrategy const&) 7 paddle::framework::ParallelExecutorPrivate::InitOrGetNCCLCommunicator(paddle::framework::Scope*, paddle::framework::details::BuildStrategy*) 8 paddle::framework::ParallelExecutor::ParallelExecutor(std::vector<paddle::platform::Place, std::allocatorpaddle::platform::Place > const&, std::vector<std::string, std::allocatorstd::string > const&, std::string const&, paddle::framework::Scope*, std::vector<paddle::framework::Scope*, std::allocatorpaddle::framework::Scope* > const&, paddle::framework::details::ExecutionStrategy const&, paddle::framework::details::BuildStrategy const&, paddle::framework::ir::Graph*)
Error Message Summary:
PreconditionNotMetError: The third-party dynamic library (libnccl.so) that Paddle depends on is not configured correctly. (error code is libnccl.so: cannot open shared object file: No such file or directory) Suggestions:
- Check if the third-party dynamic library (e.g. CUDA, CUDNN) is installed correctly and its version is matched with paddlepaddle you installed.
- Configure third-party dynamic library environment variables as follows:
- Linux: set LD_LIBRARY_PATH by
export LD_LIBRARY_PATH=...
- Windows: set PATH by `set PATH=XXX; at (/paddle/paddle/fluid/platform/dynload/dynamic_loader.cc:194)