未验证 提交 ca227ee4 编写于 作者: W wangxinxin08 提交者: GitHub

add sku110k models (#7465)

* add sku110k models

* add link in ppyoloe doc

* modify directory of sku110k

* fix data config problem
上级 93de5e5f
...@@ -89,6 +89,8 @@ The PaddleDetection team provides configs and weights of various feature detecti ...@@ -89,6 +89,8 @@ The PaddleDetection team provides configs and weights of various feature detecti
|Pedestrian Detection | CrowdHuman | [pphuman](../pphuman) | |Pedestrian Detection | CrowdHuman | [pphuman](../pphuman) |
|Vehicle Detection | BDD100K, UA-DETRAC | [ppvehicle](../ppvehicle) | |Vehicle Detection | BDD100K, UA-DETRAC | [ppvehicle](../ppvehicle) |
|Small Object Detection | VisDrone、DOTA、xView | [smalldet](../smalldet) | |Small Object Detection | VisDrone、DOTA、xView | [smalldet](../smalldet) |
|Densely Packed Object Detection | SKU110k | [application](./application) |
|Rotated Object Detection | DOTA | [PP-YOLOE-R](../rotate/ppyoloe_r/) |
## Getting Start ## Getting Start
......
...@@ -88,6 +88,8 @@ PaddleDetection团队提供了基于PP-YOLOE的各种垂类检测模型的配置 ...@@ -88,6 +88,8 @@ PaddleDetection团队提供了基于PP-YOLOE的各种垂类检测模型的配置
| 行人检测 | CrowdHuman | [pphuman](../pphuman) | | 行人检测 | CrowdHuman | [pphuman](../pphuman) |
| 车辆检测 | BDD100K、UA-DETRAC | [ppvehicle](../ppvehicle) | | 车辆检测 | BDD100K、UA-DETRAC | [ppvehicle](../ppvehicle) |
| 小目标检测 | VisDrone、DOTA、xView | [smalldet](../smalldet) | | 小目标检测 | VisDrone、DOTA、xView | [smalldet](../smalldet) |
| 密集目标检测 | SKU110k | [application](./application) |
| 旋转框检测 | DOTA | [PP-YOLOE-R](../rotate/ppyoloe_r/) |
## 使用说明 ## 使用说明
......
...@@ -11,6 +11,8 @@ ...@@ -11,6 +11,8 @@
工业数据集使用[PKU-Market-PCB](https://robotics.pkusz.edu.cn/resources/dataset/),该数据集用于印刷电路板(PCB)的瑕疵检测,提供了6种常见的PCB缺陷, 工业数据集使用[PKU-Market-PCB](https://robotics.pkusz.edu.cn/resources/dataset/),该数据集用于印刷电路板(PCB)的瑕疵检测,提供了6种常见的PCB缺陷,
处理后的COCO格式,包含图片训练集555张,测试集138张,6个类别,[PKU-Market-PCB COCO格式下载](https://bj.bcebos.com/v1/paddledet/data/PCB_coco.zip) 处理后的COCO格式,包含图片训练集555张,测试集138张,6个类别,[PKU-Market-PCB COCO格式下载](https://bj.bcebos.com/v1/paddledet/data/PCB_coco.zip)
商超数据集[SKU110k](https://github.com/eg4000/SKU110K_CVPR19)是商品超市场景下的密集目标检测数据集,包含11,762张图片和超过170个实例。其中包括8,233张用于训练的图像、588张用于验证的图像和2,941张用于测试的图像。
## 实验结果: ## 实验结果:
...@@ -31,11 +33,30 @@ ...@@ -31,11 +33,30 @@
- 具体使用教程请参考[ppyoloe](../ppyoloe#getting-start) - 具体使用教程请参考[ppyoloe](../ppyoloe#getting-start)
## SKU110k Model ZOO
| Model | Epoch | GPU number | images/GPU | backbone | input shape | Box AP<sup>val<br>0.5:0.95 (maxDets=300) | Box AP<sup>test<br>0.5:0.95 (maxDets=300) | download | config |
|:--------------:|:-----:|:-------:|:----------:|:----------:| :-------:|:-------------------------:|:---------------------------:|:---------:|:------:|
| PP-YOLOE+_s | 80 | 8 | 8 | cspresnet-s | 960 | 57.4 | 58.8 | [download](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_s_80e_sku110k.pdparams) | [config](./ppyoloe_plus_crn_s_80e_sku110k.yml) |
| PP-YOLOE+_m | 80 | 8 | 8 | cspresnet-m | 960 | 58.2 | 59.7 | [download](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_m_80e_sku110k.pdparams) | [config](./ppyoloe_plus_crn_m_80e_sku110k.yml) |
| PP-YOLOE+_l | 80 | 8 | 4 | cspresnet-l | 960 | 58.8 | 60.2 | [download](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_l_80e_sku110k.pdparams) | [config](./ppyoloe_plus_crn_l_80e_sku110k.yml) |
| PP-YOLOE+_x | 80 | 8 | 4 | cspresnet-x | 960 | 59.0 | 60.3 | [download](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_x_80e_sku110k.pdparams) | [config](./ppyoloe_plus_crn_x_80e_sku110k.yml) |
**注意:**
- SKU110k系列模型训练过程中使用8 GPUs进行训练,如果**GPU卡数**或者**batch size**发生了改变,你需要按照公式 **lr<sub>new</sub> = lr<sub>default</sub> * (batch_size<sub>new</sub> * GPU_number<sub>new</sub>) / (batch_size<sub>default</sub> * GPU_number<sub>default</sub>)** 调整学习率。
- SKU110k数据集使用**maxDets=300**的mAP值作为评估指标。
- 具体使用教程请参考[ppyoloe](../ppyoloe#getting-start)
## 引用 ## 引用
``` ```
@inproceedings{goldman2019dense,
author = {Eran Goldman and Roei Herzig and Aviv Eisenschtat and Jacob Goldberger and Tal Hassner},
title = {Precise Detection in Densely Packed Scenes},
booktitle = {Proc. Conf. Comput. Vision Pattern Recognition (CVPR)},
year = {2019}
}
@article{Exdark, @article{Exdark,
title={Getting to Know Low-light Images with The Exclusively Dark Dataset}, title={Getting to Know Low-light Images with The Exclusively Dark Dataset},
author={Loh, Yuen Peng and Chan, Chee Seng}, author={Loh, Yuen Peng and Chan, Chee Seng},
......
metric: COCO
num_classes: 1
TrainDataset:
!COCODataSet
image_dir: images
anno_path: annotations/annotations_train.json
dataset_dir: dataset/SKU110K_fixed
data_fields: ['image', 'gt_bbox', 'gt_class', 'difficult']
EvalDataset:
!COCODataSet
image_dir: images
anno_path: annotations/annotations_val.json
dataset_dir: dataset/SKU110K_fixed
allow_empty: true
TestDataset:
!ImageFolder
anno_path: annotations/annotations_test.json
dataset_dir: dataset/SKU110K_fixed
_BASE_: [
'./_base_/sku110k.yml',
'../../runtime.yml'
]
log_iter: 10
snapshot_epoch: 20
weights: output/ppyoloe_plus_crn_s_80e_coco/model_final
pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/pretrained/ppyoloe_crn_l_obj365_pretrained.pdparams
depth_mult: 1.0
width_mult: 1.0
# arch
architecture: YOLOv3
norm_type: sync_bn
use_ema: true
ema_decay: 0.9998
custom_black_list: ['reduce_mean']
YOLOv3:
backbone: CSPResNet
neck: CustomCSPPAN
yolo_head: PPYOLOEHead
post_process: ~
CSPResNet:
layers: [3, 6, 6, 3]
channels: [64, 128, 256, 512, 1024]
return_idx: [1, 2, 3]
use_large_stem: True
use_alpha: True
CustomCSPPAN:
out_channels: [768, 384, 192]
stage_num: 1
block_num: 3
act: 'swish'
spp: true
use_alpha: True
PPYOLOEHead:
fpn_strides: [32, 16, 8]
grid_cell_scale: 5.0
grid_cell_offset: 0.5
static_assigner_epoch: -1
use_varifocal_loss: True
loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5}
static_assigner:
name: ATSSAssigner
topk: 9
assigner:
name: TaskAlignedAssigner
topk: 13
alpha: 1.0
beta: 6.0
nms:
name: MultiClassNMS
nms_top_k: 3000
keep_top_k: 1000
score_threshold: 0.01
nms_threshold: 0.7
# reader
worker_num: 8
eval_height: &eval_height 960
eval_width: &eval_width 960
eval_size: &eval_size [*eval_height, *eval_width]
TrainReader:
sample_transforms:
- Decode: {}
- Resize: {target_size: [3000, 1800], keep_ratio: True, interp: 2}
- RandomDistort: {}
- RandomCrop: {}
- RandomFlip: {}
batch_transforms:
- BatchRandomResize: {target_size: [480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800, 832, 864, 896, 928, 960, 992, 1024, 1056, 1088, 1120, 1152], random_size: True, random_interp: True, keep_ratio: False}
- NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none}
- Permute: {}
- PadGT: {}
batch_size: 4
shuffle: true
drop_last: true
use_shared_memory: true
collate_batch: true
EvalReader:
sample_transforms:
- Decode: {}
- Resize: {target_size: *eval_size, keep_ratio: False, interp: 2}
- NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none}
- Permute: {}
batch_size: 2
TestReader:
inputs_def:
image_shape: [3, *eval_height, *eval_width]
sample_transforms:
- Decode: {}
- Resize: {target_size: *eval_size, keep_ratio: False, interp: 2}
- NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none}
- Permute: {}
batch_size: 1
# optimizer
epoch: 80
LearningRate:
base_lr: 0.002
schedulers:
- !CosineDecay
max_epochs: 96
- !LinearWarmup
start_factor: 0.
epochs: 5
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0005
type: L2
_BASE_: [
'./_base_/sku110k.yml',
'../../runtime.yml'
]
log_iter: 10
snapshot_epoch: 20
weights: output/ppyoloe_plus_crn_s_80e_coco/model_final
pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/pretrained/ppyoloe_crn_m_obj365_pretrained.pdparams
depth_mult: 0.67
width_mult: 0.75
# arch
architecture: YOLOv3
norm_type: sync_bn
use_ema: true
ema_decay: 0.9998
custom_black_list: ['reduce_mean']
YOLOv3:
backbone: CSPResNet
neck: CustomCSPPAN
yolo_head: PPYOLOEHead
post_process: ~
CSPResNet:
layers: [3, 6, 6, 3]
channels: [64, 128, 256, 512, 1024]
return_idx: [1, 2, 3]
use_large_stem: True
use_alpha: True
CustomCSPPAN:
out_channels: [768, 384, 192]
stage_num: 1
block_num: 3
act: 'swish'
spp: true
use_alpha: True
PPYOLOEHead:
fpn_strides: [32, 16, 8]
grid_cell_scale: 5.0
grid_cell_offset: 0.5
static_assigner_epoch: -1
use_varifocal_loss: True
loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5}
static_assigner:
name: ATSSAssigner
topk: 9
assigner:
name: TaskAlignedAssigner
topk: 13
alpha: 1.0
beta: 6.0
nms:
name: MultiClassNMS
nms_top_k: 3000
keep_top_k: 1000
score_threshold: 0.01
nms_threshold: 0.7
# reader
worker_num: 8
eval_height: &eval_height 960
eval_width: &eval_width 960
eval_size: &eval_size [*eval_height, *eval_width]
TrainReader:
sample_transforms:
- Decode: {}
- Resize: {target_size: [3000, 1800], keep_ratio: True, interp: 2}
- RandomDistort: {}
- RandomCrop: {}
- RandomFlip: {}
batch_transforms:
- BatchRandomResize: {target_size: [480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800, 832, 864, 896, 928, 960, 992, 1024, 1056, 1088, 1120, 1152], random_size: True, random_interp: True, keep_ratio: False}
- NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none}
- Permute: {}
- PadGT: {}
batch_size: 8
shuffle: true
drop_last: true
use_shared_memory: true
collate_batch: true
EvalReader:
sample_transforms:
- Decode: {}
- Resize: {target_size: *eval_size, keep_ratio: False, interp: 2}
- NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none}
- Permute: {}
batch_size: 2
TestReader:
inputs_def:
image_shape: [3, *eval_height, *eval_width]
sample_transforms:
- Decode: {}
- Resize: {target_size: *eval_size, keep_ratio: False, interp: 2}
- NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none}
- Permute: {}
batch_size: 1
# optimizer
epoch: 80
LearningRate:
base_lr: 0.004
schedulers:
- !CosineDecay
max_epochs: 96
- !LinearWarmup
start_factor: 0.
epochs: 5
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0005
type: L2
_BASE_: [
'./_base_/sku110k.yml',
'../../runtime.yml'
]
log_iter: 10
snapshot_epoch: 20
weights: output/ppyoloe_plus_crn_s_80e_coco/model_final
pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/pretrained/ppyoloe_crn_s_obj365_pretrained.pdparams
depth_mult: 0.33
width_mult: 0.50
# arch
architecture: YOLOv3
norm_type: sync_bn
use_ema: true
ema_decay: 0.9998
custom_black_list: ['reduce_mean']
YOLOv3:
backbone: CSPResNet
neck: CustomCSPPAN
yolo_head: PPYOLOEHead
post_process: ~
CSPResNet:
layers: [3, 6, 6, 3]
channels: [64, 128, 256, 512, 1024]
return_idx: [1, 2, 3]
use_large_stem: True
use_alpha: True
CustomCSPPAN:
out_channels: [768, 384, 192]
stage_num: 1
block_num: 3
act: 'swish'
spp: true
use_alpha: True
PPYOLOEHead:
fpn_strides: [32, 16, 8]
grid_cell_scale: 5.0
grid_cell_offset: 0.5
static_assigner_epoch: -1
use_varifocal_loss: True
loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5}
static_assigner:
name: ATSSAssigner
topk: 9
assigner:
name: TaskAlignedAssigner
topk: 13
alpha: 1.0
beta: 6.0
nms:
name: MultiClassNMS
nms_top_k: 3000
keep_top_k: 1000
score_threshold: 0.01
nms_threshold: 0.7
# reader
worker_num: 8
eval_height: &eval_height 960
eval_width: &eval_width 960
eval_size: &eval_size [*eval_height, *eval_width]
TrainReader:
sample_transforms:
- Decode: {}
- Resize: {target_size: [3000, 1800], keep_ratio: True, interp: 2}
- RandomDistort: {}
- RandomCrop: {}
- RandomFlip: {}
batch_transforms:
- BatchRandomResize: {target_size: [480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800, 832, 864, 896, 928, 960, 992, 1024, 1056, 1088, 1120, 1152], random_size: True, random_interp: True, keep_ratio: False}
- NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none}
- Permute: {}
- PadGT: {}
batch_size: 8
shuffle: true
drop_last: true
use_shared_memory: true
collate_batch: true
EvalReader:
sample_transforms:
- Decode: {}
- Resize: {target_size: *eval_size, keep_ratio: False, interp: 2}
- NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none}
- Permute: {}
batch_size: 2
TestReader:
inputs_def:
image_shape: [3, *eval_height, *eval_width]
sample_transforms:
- Decode: {}
- Resize: {target_size: *eval_size, keep_ratio: False, interp: 2}
- NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none}
- Permute: {}
batch_size: 1
# optimizer
epoch: 80
LearningRate:
base_lr: 0.004
schedulers:
- !CosineDecay
max_epochs: 96
- !LinearWarmup
start_factor: 0.
epochs: 5
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0005
type: L2
_BASE_: [
'./_base_/sku110k.yml',
'../../runtime.yml'
]
log_iter: 10
snapshot_epoch: 20
weights: output/ppyoloe_plus_crn_s_80e_coco/model_final
pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/pretrained/ppyoloe_crn_x_obj365_pretrained.pdparams
depth_mult: 1.33
width_mult: 1.25
# arch
architecture: YOLOv3
norm_type: sync_bn
use_ema: true
ema_decay: 0.9998
custom_black_list: ['reduce_mean']
YOLOv3:
backbone: CSPResNet
neck: CustomCSPPAN
yolo_head: PPYOLOEHead
post_process: ~
CSPResNet:
layers: [3, 6, 6, 3]
channels: [64, 128, 256, 512, 1024]
return_idx: [1, 2, 3]
use_large_stem: True
use_alpha: True
CustomCSPPAN:
out_channels: [768, 384, 192]
stage_num: 1
block_num: 3
act: 'swish'
spp: true
use_alpha: True
PPYOLOEHead:
fpn_strides: [32, 16, 8]
grid_cell_scale: 5.0
grid_cell_offset: 0.5
static_assigner_epoch: -1
use_varifocal_loss: True
loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5}
static_assigner:
name: ATSSAssigner
topk: 9
assigner:
name: TaskAlignedAssigner
topk: 13
alpha: 1.0
beta: 6.0
nms:
name: MultiClassNMS
nms_top_k: 3000
keep_top_k: 1000
score_threshold: 0.01
nms_threshold: 0.7
# reader
worker_num: 8
eval_height: &eval_height 960
eval_width: &eval_width 960
eval_size: &eval_size [*eval_height, *eval_width]
TrainReader:
sample_transforms:
- Decode: {}
- Resize: {target_size: [3000, 1800], keep_ratio: True, interp: 2}
- RandomDistort: {}
- RandomCrop: {}
- RandomFlip: {}
batch_transforms:
- BatchRandomResize: {target_size: [480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800, 832, 864, 896, 928, 960, 992, 1024, 1056, 1088, 1120, 1152], random_size: True, random_interp: True, keep_ratio: False}
- NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none}
- Permute: {}
- PadGT: {}
batch_size: 4
shuffle: true
drop_last: true
use_shared_memory: true
collate_batch: true
EvalReader:
sample_transforms:
- Decode: {}
- Resize: {target_size: *eval_size, keep_ratio: False, interp: 2}
- NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none}
- Permute: {}
batch_size: 2
TestReader:
inputs_def:
image_shape: [3, *eval_height, *eval_width]
sample_transforms:
- Decode: {}
- Resize: {target_size: *eval_size, keep_ratio: False, interp: 2}
- NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none}
- Permute: {}
batch_size: 1
# optimizer
epoch: 80
LearningRate:
base_lr: 0.002
schedulers:
- !CosineDecay
max_epochs: 96
- !LinearWarmup
start_factor: 0.
epochs: 5
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0005
type: L2
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册