pp yolo error during training
Created by: sukkyusun1
following error occured repeatly when 11000 iteration
2020-09-09 16:54:54,508-INFO: iter: 11300, lr: 0.000100, 'loss_xy': '5.475853', 'loss_wh': '9.381088', 'loss_obj': '39.984276', 'loss_cls': '11.378154', 'loss_iou': '35.723583', 'loss_iou_aware': '0.037410', 'loss': '102.309998', time: 0.633, eta: 15:35:59
2020-09-09 16:55:57,974-INFO: iter: 11400, lr: 0.000100, 'loss_xy': '5.445478', 'loss_wh': '9.870771', 'loss_obj': '38.965660', 'loss_cls': '11.542593', 'loss_iou': '36.368202', 'loss_iou_aware': '0.035767', 'loss': '103.814850', time: 0.632, eta: 15:33:56
2020-09-09 16:57:04,996-INFO: iter: 11500, lr: 0.000100, 'loss_xy': '6.019155', 'loss_wh': '9.661630', 'loss_obj': '39.524551', 'loss_cls': '11.500303', 'loss_iou': '37.492279', 'loss_iou_aware': '0.033797', 'loss': '105.630592', time: 0.671, eta: 16:29:15
2020-09-09 16:58:13,333-INFO: iter: 11600, lr: 0.000100, 'loss_xy': '5.562244', 'loss_wh': '10.271358', 'loss_obj': '38.939468', 'loss_cls': '10.803724', 'loss_iou': '34.840134', 'loss_iou_aware': '0.033829', 'loss': '100.110817', time: 0.685, eta: 16:48:35
2020-09-09 16:58:26,631-WARNING: consumer[31433] exit abnormally with exitcode[-9]
2020-09-09 16:58:26,631-WARNING: 1 consumers have exited abnormally!!!
2020-09-09 16:58:26,631-WARNING: consumer[31433] exit abnormally with exitcode[-9]
2020-09-09 16:58:26,631-WARNING: 1 consumers have exited abnormally!!!
2020-09-09 16:58:26,664-WARNING: Your reader has raised an exception!
Exception in thread Thread-11:
Traceback (most recent call last):
File "/home/sk/anaconda3/envs/paddle3.6/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/home/sk/anaconda3/envs/paddle3.6/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/home/sk/anaconda3/envs/paddle3.6/lib/python3.6/site-packages/paddle/fluid/reader.py", line 1145, in thread_main
six.reraise(*sys.exc_info())
File "/home/sk/anaconda3/envs/paddle3.6/lib/python3.6/site-packages/six.py", line 703, in reraise
raise value
File "/home/sk/anaconda3/envs/paddle3.6/lib/python3.6/site-packages/paddle/fluid/reader.py", line 1125, in thread_main
for tensors in self._tensor_reader():
File "/home/sk/anaconda3/envs/paddle3.6/lib/python3.6/site-packages/paddle/fluid/reader.py", line 1195, in tensor_reader_impl
for slots in paddle_reader():
File "/home/sk/anaconda3/envs/paddle3.6/lib/python3.6/site-packages/paddle/fluid/data_feeder.py", line 506, in reader_creator
for item in reader():
File "/home/sk/PaddleDetection/ppdet/data/reader.py", line 445, in _reader
reader.reset()
File "/home/sk/PaddleDetection/ppdet/data/parallel_map.py", line 267, in reset
assert self._consumer_healthy(), "cannot start another pass of data"
AssertionError: cannot start another pass of data for some consumers exited abnormally before!!!
/home/sk/anaconda3/envs/paddle3.6/lib/python3.6/site-packages/paddle/fluid/executor.py:1070: UserWarning: The following exception is not an EOF exception. "The following exception is not an EOF exception.") Traceback (most recent call last): File "tools/train.py", line 368, in main() File "tools/train.py", line 241, in main outs = exe.run(compiled_train_prog, fetch_list=train_values) File "/home/sk/anaconda3/envs/paddle3.6/lib/python3.6/site-packages/paddle/fluid/executor.py", line 1071, in run six.reraise(*sys.exc_info()) File "/home/sk/anaconda3/envs/paddle3.6/lib/python3.6/site-packages/six.py", line 703, in reraise raise value File "/home/sk/anaconda3/envs/paddle3.6/lib/python3.6/site-packages/paddle/fluid/executor.py", line 1066, in run return_merged=return_merged) File "/home/sk/anaconda3/envs/paddle3.6/lib/python3.6/site-packages/paddle/fluid/executor.py", line 1167, in _run_impl return_merged=return_merged) File "/home/sk/anaconda3/envs/paddle3.6/lib/python3.6/site-packages/paddle/fluid/executor.py", line 879, in _run_parallel tensors = exe.run(fetch_var_names, return_merged)._move_to_list() paddle.fluid.core_avx.EnforceNotMet:
C++ Call Stacks (More useful to developers):
0 std::string paddle::platform::GetTraceBackString<std::string const&>(std::string const&, char const*, int) 1 paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const*, int) 2 paddle::operators::reader::BlockingQueue<std::vector<paddle::framework::LoDTensor, std::allocatorpaddle::framework::LoDTensor > >::Receive(std::vector<paddle::framework::LoDTensor, std::allocatorpaddle::framework::LoDTensor >) 3 paddle::operators::reader::PyReader::ReadNext(std::vector<paddle::framework::LoDTensor, std::allocatorpaddle::framework::LoDTensor >) 4 std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result, std::__future_base::_Result_base::_Deleter>, unsigned long> >::_M_invoke(std::_Any_data const&) 5 std::__future_base::_State_base::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>&, bool&) 6 ThreadPool::ThreadPool(unsigned long)::{lambda()#1}::operator()() const
Python Call Stacks (More useful to users):
File "/home/sk/anaconda3/envs/paddle3.6/lib/python3.6/site-packages/paddle/fluid/framework.py", line 2610, in append_op attrs=kwargs.get("attrs", None)) File "/home/sk/anaconda3/envs/paddle3.6/lib/python3.6/site-packages/paddle/fluid/reader.py", line 1080, in _init_non_iterable attrs={'drop_last': self._drop_last}) File "/home/sk/anaconda3/envs/paddle3.6/lib/python3.6/site-packages/paddle/fluid/reader.py", line 978, in init self._init_non_iterable() File "/home/sk/anaconda3/envs/paddle3.6/lib/python3.6/site-packages/paddle/fluid/reader.py", line 620, in from_generator iterable, return_list, drop_last) File "/home/sk/PaddleDetection/ppdet/modeling/architectures/yolo.py", line 155, in build_inputs iterable=iterable) if use_dataloader else None File "tools/train.py", line 113, in main feed_vars, train_loader = model.build_inputs(**inputs_def) File "tools/train.py", line 368, in main()
Error Message Summary:
Error: Blocking queue is killed because the data reader raises an exception [Hint: Expected killed_ != true, but received killed_:1 == true:1.] at (/paddle/paddle/fluid/operators/reader/blocking_queue.h:141) [operator < read > error]
ppyolo.yml
architecture: YOLOv3 use_gpu: true max_iters: 100000 log_smooth_window: 100 log_iter: 100 save_dir: output snapshot_iter: 10000 metric: VOC pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_ssld_pretrained.tar weights: output/ppyolo/model_final num_classes: 5 use_fine_grained_loss: true use_ema: true ema_decay: 0.9998
YOLOv3: backbone: ResNet yolo_head: YOLOv3Head use_fine_grained_loss: true
ResNet: norm_type: sync_bn freeze_at: 0 freeze_norm: false norm_decay: 0. depth: 50 feature_maps: [3, 4, 5] variant: d dcn_v2_stages: [5]
YOLOv3Head: anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] anchors: [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]] norm_decay: 0. coord_conv: true iou_aware: true iou_aware_factor: 0.4 scale_x_y: 1.05 spp: true yolo_loss: YOLOv3Loss nms: background_label: -1 keep_top_k: 100 nms_threshold: 0.45 nms_top_k: 1000 normalized: false score_threshold: 0.01 drop_block: true
YOLOv3Loss: batch_size: 4 ignore_thresh: 0.7 scale_x_y: 1.05 label_smooth: false use_fine_grained_loss: true iou_loss: IouLoss iou_aware_loss: IouAwareLoss
IouLoss: loss_weight: 2.5 max_height: 608 max_width: 608
IouAwareLoss: loss_weight: 1.0 max_height: 608 max_width: 608
MatrixNMS: background_label: -1 keep_top_k: 100 normalized: false score_threshold: 0.01 post_threshold: 0.01
LearningRate: base_lr: 0.0001 schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones:
- 150000
- 200000
- !LinearWarmup start_factor: 0. steps: 4000
OptimizerBuilder: optimizer: momentum: 0.9 type: Momentum regularizer: factor: 0.0005 type: L2
READER: 'ppyolo_reader.yml'
ppyolo.reader.yml TrainReader: inputs_def: fields: ['image', 'gt_bbox', 'gt_class', 'gt_score'] num_max_boxes: 200 dataset: !VOCDataSet #image_dir: train2017 anno_path: /home/sk/PaddleDetection/dataset/voc/trainval.txt dataset_dir: /home/sk/PaddleDetection/dataset/voc with_background: false use_default_label : false sample_transforms: - !DecodeImage to_rgb: True with_mixup: True - !MixupImage alpha: 1.5 beta: 1.5 - !ColorDistort {} - !RandomExpand fill_value: [123.675, 116.28, 103.53] - !RandomCrop {} - !RandomFlipImage is_normalized: false - !NormalizeBox {} - !PadBox num_max_boxes: 200 - !BboxXYXY2XYWH {} batch_transforms:
- !RandomShape sizes: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608] random_inter: True
- !NormalizeImage mean: [0.485, 0.456, 0.406] std: [0.229, 0.224, 0.225] is_scale: True is_channel_first: false
- !Permute to_bgr: false channel_first: True
Gt2YoloTarget is only used when use_fine_grained_loss set as true,
this operator will be deleted automatically if use_fine_grained_loss
is set as false
- !Gt2YoloTarget anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] anchors: [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]] downsample_ratios: [32, 16, 8] batch_size: 4 shuffle: true mixup_epoch: 25000 drop_last: true worker_num: 8 bufsize: 4 use_process: true
EvalReader: inputs_def: #fields: ['image', 'im_size', 'im_id'] fields : ['image','im_size','im_id','gt_bbox','gt_class','is_difficult'] num_max_boxes: 200 dataset: !VOCDataSet #image_dir: val2017 anno_path: /home/sk/PaddleDetection/dataset/voc/test.txt dataset_dir: /home/sk/PaddleDetection/dataset/voc with_background: false use_default_label : false sample_transforms: - !DecodeImage to_rgb: True - !ResizeImage target_size: 1728 interp: 2 - !NormalizeImage mean: [0.485, 0.456, 0.406] std: [0.229, 0.224, 0.225] is_scale: True is_channel_first: false - !PadBox num_max_boxes: 200 - !Permute to_bgr: false channel_first: True batch_size: 4 drop_empty: false worker_num: 8 bufsize: 4
TestReader: inputs_def: image_shape: [3, 1728,1728] fields: ['image', 'im_size', 'im_id'] dataset: !ImageFolder anno_path: /home/sk/PaddleDetection/dataset/voc/label_list.txt with_background: false use_default_label : false sample_transforms: - !DecodeImage to_rgb: True - !ResizeImage target_size: 1728 interp: 2 - !NormalizeImage mean: [0.485, 0.456, 0.406] std: [0.229, 0.224, 0.225] is_scale: True is_channel_first: false - !Permute to_bgr: false channel_first: True batch_size: 1