运行 10分钟快速上手使用PaddleX——MobileNetV3_ssld图像分类的报错 (#167) · Issue · PaddlePaddle / PaddleX

运行 10分钟快速上手使用PaddleX——MobileNetV3_ssld图像分类的报错

Created by: tspp520

发生在运行 10分钟快速上手使用PaddleX——MobileNetV3_ssld图像分类 https://aistudio.baidu.com/aistudio/projectdetail/439860

前面数据集导入在本地跑都没问题。但是运行到 model.train 报错了，谢谢各位大佬给看看报错信息，并给出解答，谢谢！睡了，晚安，各位 num_classes = len(train_dataset.labels) model = pdx.cls.MobileNetV3_large_ssld(num_classes=num_classes) model.train(num_epochs=12, train_dataset=train_dataset, train_batch_size=32, eval_dataset=eval_dataset, lr_decay_epochs=[6, 8], save_interval_epochs=1, learning_rate=0.00625, save_dir='output/mobilenetv3_large_ssld', use_vdl=True)

2020-06-19 15:58:21,935-INFO: If regularizer of a Parameter has been set by 'fluid.ParamAttr' or 'fluid.WeightNormParamAttr' already. The Regularization[L2Decay, regularization_coeff=0.000100] in Optimizer will not take effect, and it will only be applied to other Parameters! Downloading MobileNetV3_large_x1_0_ssld_pretrained.tar [==================================================] 100.00% Uncompress /home/wsuser/.paddlehub/tmp/tmpx64j5fu9/MobileNetV3_large_x1_0_ssld_pretrained.tar [==================================================] 100.00% 2020-06-19 15:58:34 [INFO] Load pretrain weights from output/mobilenetv3_large_ssld/pretrain/MobileNetV3_large_x1_0_ssld. 2020-06-19 15:58:34 [WARNING] [SKIP] Shape of pretrained weight output/mobilenetv3_large_ssld/pretrain/MobileNetV3_large_x1_0_ssld/fc_weights doesn't match.(Pretrained: (1280, 1000), Actual: (1280, 6)) 2020-06-19 15:58:34 [WARNING] [SKIP] Shape of pretrained weight output/mobilenetv3_large_ssld/pretrain/MobileNetV3_large_x1_0_ssld/fc_offset doesn't match.(Pretrained: (1000,), Actual: (6,)) 2020-06-19 15:58:34,387-WARNING: output/mobilenetv3_large_ssld/pretrain/MobileNetV3_large_x1_0_ssld.pdparams not found, try to load model file saved with [ save_params, save_persistables, save_vars ] 2020-06-19 15:58:34 [INFO] There are 268 varaibles in output/mobilenetv3_large_ssld/pretrain/MobileNetV3_large_x1_0_ssld are loaded. /opt/conda/envs/Python-3.6-CUDA/lib/python3.6/site-packages/paddle/fluid/executor.py:1070: UserWarning: The following exception is not an EOF exception. "The following exception is not an EOF exception.")

EnforceNotMet Traceback (most recent call last) in 9 learning_rate=0.00625, 10 save_dir='output/mobilenetv3_large_ssld', ---> 11 use_vdl=True)

/opt/conda/envs/Python-3.6-CUDA/lib/python3.6/site-packages/paddlex/cv/models/classifier.py in train(self, num_epochs, train_dataset, train_batch_size, eval_dataset, save_interval_epochs, log_interval_steps, save_dir, pretrain_weights, optimizer, learning_rate, warmup_steps, warmup_start_lr, lr_decay_epochs, lr_decay_gamma, use_vdl, sensitivities_file, eval_metric_loss, early_stop, early_stop_patience, resume_checkpoint) 201 use_vdl=use_vdl, 202 early_stop=early_stop, --> 203 early_stop_patience=early_stop_patience) 204 205 def evaluate(self,

/opt/conda/envs/Python-3.6-CUDA/lib/python3.6/site-packages/paddlex/cv/models/base.py in train_loop(self, num_epochs, train_dataset, train_batch_size, eval_dataset, save_interval_epochs, log_interval_steps, save_dir, use_vdl, early_stop, early_stop_patience) 454 self.parallel_train_prog, 455 feed=data, --> 456 fetch_list=list(self.train_outputs.values())) 457 outputs_avg = np.mean(np.array(outputs), axis=1) 458 records.append(outputs_avg)

/opt/conda/envs/Python-3.6-CUDA/lib/python3.6/site-packages/paddle/fluid/executor.py in run(self, program, feed, fetch_list, feed_var_name, fetch_var_name, scope, return_numpy, use_program_cache, return_merged, use_prune) 1069 warnings.warn( 1070 "The following exception is not an EOF exception.") -> 1071 six.reraise(*sys.exc_info()) 1072 1073 def _run_impl(self, program, feed, fetch_list, feed_var_name,

/opt/conda/envs/Python-3.6-CUDA/lib/python3.6/site-packages/six.py in reraise(tp, value, tb) 691 return getattr(self, assertNotRegex)(*args, **kwargs) 692 --> 693 694 if PY3: 695 exec = getattr(moves.builtins, "exec")

/opt/conda/envs/Python-3.6-CUDA/lib/python3.6/site-packages/paddle/fluid/executor.py in run(self, program, feed, fetch_list, feed_var_name, fetch_var_name, scope, return_numpy, use_program_cache, return_merged, use_prune) 1064 use_program_cache=use_program_cache, 1065 use_prune=use_prune, -> 1066 return_merged=return_merged) 1067 except Exception as e: 1068 if not isinstance(e, core.EOFException):

/opt/conda/envs/Python-3.6-CUDA/lib/python3.6/site-packages/paddle/fluid/executor.py in _run_impl(self, program, feed, fetch_list, feed_var_name, fetch_var_name, scope, return_numpy, use_program_cache, return_merged, use_prune) 1154 use_program_cache=use_program_cache) 1155 -> 1156 program._compile(scope, self.place) 1157 if program._is_inference: 1158 return self._run_inference(program._executor, feed)

/opt/conda/envs/Python-3.6-CUDA/lib/python3.6/site-packages/paddle/fluid/compiler.py in _compile(self, scope, place) 441 use_cuda=isinstance(self._place, core.CUDAPlace), 442 scope=self._scope, --> 443 places=self._places) 444 return self 445

/opt/conda/envs/Python-3.6-CUDA/lib/python3.6/site-packages/paddle/fluid/compiler.py in _compile_data_parallel(self, places, use_cuda, scope) 394 cpt.to_text(self._loss_name) 395 if self._loss_name else six.u(''), self._scope, self._local_scopes, --> 396 self._exec_strategy, self._build_strategy, self._graph) 397 398 def _compile_inference(self):

EnforceNotMet:

C++ Call Stacks (More useful to developers):

0 std::string paddle::platform::GetTraceBackString<char const*>(char const*&&, char const*, int) 1 paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int) 2 paddle::platform::dynload::GetNCCLDsoHandle() 3 void std::__once_call_impl<std::_Bind_simple<paddle::platform::dynload::DynLoad__ncclCommInitAll::operator()<ncclComm**, int, int*>(ncclComm**, int, int*)::{lambda()#1} ()> >() 4 paddle::platform::NCCLContextMap::NCCLContextMap(std::vector<paddle::platform::Place, std::allocatorpaddle::platform::Place > const&, ncclUniqueId*, unsigned long, unsigned long) 5 paddle::platform::NCCLCommunicator::InitFlatCtxs(std::vector<paddle::platform::Place, std::allocatorpaddle::platform::Place > const&, std::vector<ncclUniqueId*, std::allocator<ncclUniqueId*> > const&, unsigned long, unsigned long) 6 paddle::framework::ParallelExecutorPrivate::InitNCCLCtxs(paddle::framework::Scope*, paddle::framework::details::BuildStrategy const&) 7 paddle::framework::ParallelExecutorPrivate::InitOrGetNCCLCommunicator(paddle::framework::Scope*, paddle::framework::details::BuildStrategy*) 8 paddle::framework::ParallelExecutor::ParallelExecutor(std::vector<paddle::platform::Place, std::allocatorpaddle::platform::Place > const&, std::vector<std::string, std::allocatorstd::string > const&, std::string const&, paddle::framework::Scope*, std::vector<paddle::framework::Scope*, std::allocatorpaddle::framework::Scope* > const&, paddle::framework::details::ExecutionStrategy const&, paddle::framework::details::BuildStrategy const&, paddle::framework::ir::Graph*)

Error Message Summary:

Error: Failed to find dynamic library: libnccl.so ( libnccl.so: cannot open shared object file: No such file or directory ) Please specify its path correctly using following ways: Method. set environment variable LD_LIBRARY_PATH on Linux or DYLD_LIBRARY_PATH on Mac OS. For instance, issue command: export LD_LIBRARY_PATH=... Note: After Mac OS 10.11, using the DYLD_LIBRARY_PATH is impossible unless System Integrity Protection (SIP) is disabled. at (/paddle/paddle/fluid/platform/dynload/dynamic_loader.cc:177)

PaddlePaddle / PaddleX

运行 10分钟快速上手使用PaddleX——MobileNetV3_ssld图像分类 的报错

C++ Call Stacks (More useful to developers):

Error Message Summary:

运行 10分钟快速上手使用PaddleX——MobileNetV3_ssld图像分类的报错