Error: Blocking queue is killed because the data reader raises an exception
Created by: JwDong2019
您好,我下载了ICDAR2015,并且按照步骤使用gen_label.py产生标签文件;下载了MobileNetV3_large_x0_5_pretrained预训练模型,训练时报如下错误,这种情况是因为什么呢?看了好issue,好像和大家的问题不太一样. python3 tools/train.py -c configs/det/det_mv3_db_v1.1.yml -o Global.pretrain_weights=./pretrain_models/MobileNetV3_large_x0_5_pretrained/ 2>&1 | tee train_det.log 2020-09-23 09:14:00,099-INFO: {'Global': {'debug': False, 'algorithm': 'DB', 'use_gpu': True, 'epoch_num': 1200, 'log_smooth_window': 20, 'print_batch_step': 2, 'save_model_dir': './output/det_db/', 'save_epoch_step': 200, 'eval_batch_step': [4000, 5000], 'train_batch_size_per_card': 8, 'test_batch_size_per_card': 8, 'image_shape': [3, 640, 640], 'reader_yml': './configs/det/det_db_icdar15_reader.yml', 'pretrain_weights': './pretrain_models/MobileNetV3_large_x0_5_pretrained/', 'checkpoints': None, 'save_res_path': './output/det_db/predicts_db.txt', 'save_inference_dir': None}, 'Architecture': {'function': 'ppocr.modeling.architectures.det_model,DetModel'}, 'Backbone': {'function': 'ppocr.modeling.backbones.det_mobilenet_v3,MobileNetV3', 'scale': 0.5, 'model_name': 'large', 'disable_se': True}, 'Head': {'function': 'ppocr.modeling.heads.det_db_head,DBHead', 'model_name': 'large', 'k': 50, 'inner_channels': 96, 'out_channels': 2}, 'Loss': {'function': 'ppocr.modeling.losses.det_db_loss,DBLoss', 'balance_loss': True, 'main_loss_type': 'DiceLoss', 'alpha': 5, 'beta': 10, 'ohem_ratio': 3}, 'Optimizer': {'function': 'ppocr.optimizer,AdamDecay', 'base_lr': 0.001, 'beta1': 0.9, 'beta2': 0.999, 'decay': {'function': 'cosine_decay_warmup', 'step_each_epoch': 16, 'total_epoch': 1200}}, 'PostProcess': {'function': 'ppocr.postprocess.db_postprocess,DBPostProcess', 'thresh': 0.3, 'box_thresh': 0.6, 'max_candidates': 1000, 'unclip_ratio': 1.5}, 'TrainReader': {'reader_function': 'ppocr.data.det.dataset_traversal,TrainReader', 'process_function': 'ppocr.data.det.db_process,DBProcessTrain', 'num_workers': 1, 'img_set_dir': './train_data/icdar2015/text_localization/ch4_training_images/', 'label_file_path': './train_data/icdar2015/text_localization/train_icdar2015_label.txt'}, 'EvalReader': {'reader_function': 'ppocr.data.det.dataset_traversal,EvalTestReader', 'process_function': 'ppocr.data.det.db_process,DBProcessTest', 'img_set_dir': './train_data/icdar2015/text_localization/ch4_test_images/', 'label_file_path': './train_data/icdar2015/text_localization/test_icdar2015_label.txt', 'test_image_shape': [736, 1280]}, 'TestReader': {'reader_function': 'ppocr.data.det.dataset_traversal,EvalTestReader', 'process_function': 'ppocr.data.det.db_process,DBProcessTest', 'infer_img': None, 'img_set_dir': './train_data/icdar2015/text_localization/ch4_test_images/', 'label_file_path': './train_data/icdar2015/text_localization/test_icdar2015_label.txt', 'test_image_shape': [736, 1280], 'do_eval': True}} 2020-09-23 09:14:00,380-INFO: If regularizer of a Parameter has been set by 'fluid.ParamAttr' or 'fluid.WeightNormParamAttr' already. The Regularization[L2Decay, regularization_coeff=0.000000] in Optimizer will not take effect, and it will only be applied to other Parameters! 2020-09-23 09:14:01,167-INFO: places would be ommited when DataLoader is not iterable W0923 09:14:01.199735 1289457 device_context.cc:252] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 11.0, Runtime API Version: 9.0 W0923 09:14:01.201994 1289457 device_context.cc:260] device: 0, cuDNN Version: 7.6. 2020-09-23 09:14:02,071-INFO: Loading parameters from ./pretrain_models/MobileNetV3_large_x0_5_pretrained/... 2020-09-23 09:14:02,072-WARNING: ./pretrain_models/MobileNetV3_large_x0_5_pretrained/.pdparams not found, try to load model file saved with [ save_params, save_persistables, save_vars ] 2020-09-23 09:14:02,154-INFO: Finish initing model from ./pretrain_models/MobileNetV3_large_x0_5_pretrained/ 2020-09-23 09:14:02,154-INFO: During the training process, after the 4000th iteration, an evaluation is run every 5000 iterations 3 640 640 3 640 640 ./train_data/icdar2015/text_localization/train_icdar2015_label.txt 2020-09-23 09:14:02,286-WARNING: Your reader has raised an exception! Process Process-1: Traceback (most recent call last): File "/home/djw/anaconda3/envs/paddle/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap self.run() File "/home/djw/anaconda3/envs/paddle/lib/python3.7/multiprocessing/process.py", line 99, in run self._target(*self._args, **self._kwargs) File "/home/djw/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/reader/decorator.py", line 556, in _read_into_queue six.reraise(*sys.exc_info()) File "/home/djw/anaconda3/envs/paddle/lib/python3.7/site-packages/six.py", line 703, in reraise raise value File "/home/djw/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/reader/decorator.py", line 549, in _read_into_queue for sample in reader(): File "/home/djw/workspace/PaddleOCR/ppocr/data/det/dataset_traversal.py", line 115, in batch_iter_reader for outs in sample_iter_reader(): File "/home/djw/workspace/PaddleOCR/ppocr/data/det/dataset_traversal.py", line 57, in sample_iter_reader outs = self.process(label_infor) File "/home/djw/workspace/PaddleOCR/ppocr/data/det/db_process.py", line 102, in call img_path, gt_label = self.convert_label_infor(label_infor) File "/home/djw/workspace/PaddleOCR/ppocr/data/det/db_process.py", line 98, in convert_label_infor label = json.loads(substr[1]) File "/home/djw/anaconda3/envs/paddle/lib/python3.7/json/init.py", line 348, in loads return _default_decoder.decode(s) File "/home/djw/anaconda3/envs/paddle/lib/python3.7/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/home/djw/anaconda3/envs/paddle/lib/python3.7/json/decoder.py", line 353, in raw_decode obj, end = self.scan_once(s, idx) json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 3 (char 2) Exception in thread Thread-1: Traceback (most recent call last): File "/home/djw/anaconda3/envs/paddle/lib/python3.7/threading.py", line 926, in _bootstrap_inner self.run() File "/home/djw/anaconda3/envs/paddle/lib/python3.7/threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "/home/djw/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/reader.py", line 1145, in thread_main six.reraise(*sys.exc_info()) File "/home/djw/anaconda3/envs/paddle/lib/python3.7/site-packages/six.py", line 703, in reraise raise value File "/home/djw/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/reader.py", line 1125, in thread_main for tensors in self._tensor_reader(): File "/home/djw/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/reader.py", line 1195, in tensor_reader_impl for slots in paddle_reader(): File "/home/djw/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/data_feeder.py", line 506, in reader_creator for item in reader(): File "/home/djw/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/reader/decorator.py", line 572, in queue_reader raise ValueError("multiprocess reader raises an exception") ValueError: multiprocess reader raises an exception
Traceback (most recent call last): File "tools/train.py", line 131, in main() File "tools/train.py", line 103, in main program.train_eval_det_run(config, exe, train_info_dict, eval_info_dict) File "/home/djw/workspace/PaddleOCR/tools/program.py", line 291, in train_eval_det_run return_numpy=False) File "/home/djw/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1071, in run six.reraise(*sys.exc_info()) File "/home/djw/anaconda3/envs/paddle/lib/python3.7/site-packages/six.py", line 703, in reraise raise value File "/home/djw/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1066, in run return_merged=return_merged) File "/home/djw/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1167, in _run_impl return_merged=return_merged) File "/home/djw/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/executor.py", line 879, in _run_parallel tensors = exe.run(fetch_var_names, return_merged)._move_to_list() paddle.fluid.core_avx.EnforceNotMet:
C++ Call Stacks (More useful to developers):
0 std::string paddle::platform::GetTraceBackString<std::string const&>(std::string const&, char const*, int) 1 paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const*, int) 2 paddle::operators::reader::BlockingQueue<std::vector<paddle::framework::LoDTensor, std::allocatorpaddle::framework::LoDTensor > >::Receive(std::vector<paddle::framework::LoDTensor, std::allocatorpaddle::framework::LoDTensor >) 3 paddle::operators::reader::PyReader::ReadNext(std::vector<paddle::framework::LoDTensor, std::allocatorpaddle::framework::LoDTensor >) 4 std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result, std::__future_base::_Result_base::_Deleter>, unsigned long> >::_M_invoke(std::_Any_data const&) 5 std::__future_base::_State_base::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>&, bool&) 6 ThreadPool::ThreadPool(unsigned long)::{lambda()#1}::operator()() const
Python Call Stacks (More useful to users):
File "/home/djw/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/framework.py", line 2610, in append_op attrs=kwargs.get("attrs", None)) File "/home/djw/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/reader.py", line 1080, in _init_non_iterable attrs={'drop_last': self._drop_last}) File "/home/djw/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/reader.py", line 978, in init self._init_non_iterable() File "/home/djw/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/reader.py", line 620, in from_generator iterable, return_list, drop_last) File "/home/djw/workspace/PaddleOCR/ppocr/modeling/architectures/det_model.py", line 125, in create_feed iterable=False) File "/home/djw/workspace/PaddleOCR/ppocr/modeling/architectures/det_model.py", line 138, in call image, labels, loader = self.create_feed(mode) File "/home/djw/workspace/PaddleOCR/tools/program.py", line 171, in build dataloader, outputs = model(mode=mode) File "tools/train.py", line 51, in main config, train_program, startup_program, mode='train') File "tools/train.py", line 131, in main()
Error Message Summary:
Error: Blocking queue is killed because the data reader raises an exception [Hint: Expected killed_ != true, but received killed_:1 == true:1.] at (/paddle/paddle/fluid/operators/reader/blocking_queue.h:141) [operator < read > error]