在MacOS下训练出错,但在AI STUDIO正常,日志详见下图
Created by: handy001
- 版本、环境信息 1)PaddleHub和PaddlePaddle版本:PaddleHub1.6.1,PaddlePaddle1.7.1 2)系统环境:系统类型:MacOS/,python版本:Python 3.7.2
- 复现信息:如为报错,请给出复现环境、复现步骤
[2020-03-25 21:08:52,561] [ INFO] - Installing mobilenet_v2_imagenet module [2020-03-25 21:08:53,152] [ INFO] - Module mobilenet_v2_imagenet already installed in /Users/handy/.paddlehub/modules/mobilenet_v2_imagenet [2020-03-25 21:08:53,660] [ INFO] - 267 pretrained paramaters loaded by PaddleHub [2020-03-25 21:08:53,661] [ INFO] - Dataset label map = {'roses': 0, 'daisy': 1} [2020-03-25 21:08:53,661] [ INFO] - Checkpoint dir: ckpt_model [2020-03-25 21:09:00,721] [ INFO] - Strategy with slanted triangle learning rate, L2 regularization, /Users/handy/.virtualenvs/PaddleApp/lib/python3.7/site-packages/paddle/fluid/executor.py:804: UserWarning: There are no operators in the program to be executed. If you pass Program manually, please use fluid.program_guard to ensure the current Program is being used. warnings.warn(error_info) [2020-03-25 21:09:00,763] [ INFO] - Try loading checkpoint from ckpt_model/ckpt.meta [2020-03-25 21:09:00,763] [ INFO] - PaddleHub model checkpoint not found, start from scratch... [2020-03-25 21:09:00,810] [ INFO] - PaddleHub finetune start I0325 21:09:01.170374 272072128 parallel_executor.cc:440] The Program will be executed on CPU using ParallelExecutor, 2 cards are used, so 2 programs are executed in parallel. W0325 21:09:01.280500 272072128 fuse_all_reduce_op_pass.cc:74] Find all_reduce operators: 161. To make the speed faster, some all_reduce ops are fused during training, after fusion, the number of all_reduce ops is 161. I0325 21:09:01.301883 272072128 build_strategy.cc:365] SeqOnlyAllReduceOps:0, num_trainers:1 I0325 21:09:01.572834 272072128 parallel_executor.cc:307] Inplace strategy is enabled, when build_strategy.enable_inplace = True I0325 21:09:01.647723 272072128 parallel_executor.cc:375] Garbage collection strategy is enabled, when FLAGS_eager_delete_tensor_gb = 0 [2020-03-25 21:09:05,624] [ INFO] - Evaluation on dev dataset start Traceback (most recent call last): File "classify_task.py", line 27, in run_states = paddle_wrap.task.finetune_and_eval() File "/Users/handy/.virtualenvs/PaddleApp/lib/python3.7/site-packages/paddlehub/finetune/task/base_task.py", line 864, in finetune_and_eval return self.finetune(do_eval=True) File "/Users/handy/.virtualenvs/PaddleApp/lib/python3.7/site-packages/paddlehub/finetune/task/base_task.py", line 893, in finetune self.eval(phase="dev") File "/Users/handy/.virtualenvs/PaddleApp/lib/python3.7/site-packages/paddlehub/finetune/task/base_task.py", line 923, in eval self._eval_end_event(run_states) File "/Users/handy/.virtualenvs/PaddleApp/lib/python3.7/site-packages/paddlehub/finetune/task/base_task.py", line 631, in hook_function func(*args) File "/Users/handy/.virtualenvs/PaddleApp/lib/python3.7/site-packages/paddlehub/finetune/task/base_task.py", line 719, in _default_eval_end_event eval_scores, eval_loss, run_speed = self._calculate_metrics(run_states) File "/Users/handy/.virtualenvs/PaddleApp/lib/python3.7/site-packages/paddlehub/finetune/task/classifier_task.py", line 115, in _calculate_metrics run_time_used = time.time() - run_states[0].run_time_begin IndexError: list index out of range libc++abi.dylib: terminating with uncaught exception of type paddle::platform::EnforceNotMet:
C++ Call Stacks (More useful to developers):
0 std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > paddle::platform::GetTraceBackString<std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&>(std::__1::basic_string<char, std::__1::char_traits, std::_1::allocator > const&, char const*, int) 1 paddle::framework::RWLock::RDLock() 2 paddle::framework::Scope::HasKid(paddle::framework::Scope const*) const 3 paddle::framework::ParallelExecutorPrivate::~ParallelExecutorPrivate() 4 paddle::framework::ParallelExecutor::~ParallelExecutor() 5 pybind11::classpaddle::framework::ParallelExecutor::dealloc(pybind11::detail::value_and_holder&) 6 pybind11::detail::clear_instance(_object*) 7 pybind11_object_dealloc
Error Message Summary:
Error: acquire read lock failed [Hint: Expected pthread_rwlock_rdlock(&lock_) == 0, but received pthread_rwlock_rdlock(&lock_):22 != 0:0.] at (/home/teamcity/work/ef54dc8a5b211854/paddle/fluid/framework/rw_lock.h:36)
W0325 21:09:06.206892 272072128 init.cc:209] Warning: PaddlePaddle catches a failure signal, it may not work properly W0325 21:09:06.206903 272072128 init.cc:211] You could check whether you killed PaddlePaddle thread/process accidentally or report the case to PaddlePaddle W0325 21:09:06.206907 272072128 init.cc:214] The detail failure signal is:
W0325 21:09:06.206910 272072128 init.cc:217] *** Aborted at 1585141746 (unix time) try "date -d @1585141746" if you are using GNU date *** W0325 21:09:06.207233 272072128 init.cc:217] PC: @ 0x0 (unknown) W0325 21:09:06.209056 272072128 init.cc:217] *** SIGABRT (@0x7fff682af7fa) received by PID 9529 (TID 0x110377dc0) stack trace: *** W0325 21:09:06.209656 272072128 init.cc:217] @ 0x7fff6836142d _sigtramp