未验证 提交 3cbf564e 编写于 作者: L littletomatodonkey 提交者: GitHub

add generic detection models (#926)

Add practical generic detection models containing 676 categories
上级 9cd784b0
...@@ -134,6 +134,7 @@ ...@@ -134,6 +134,7 @@
- [Objects365 2019 Challenge夺冠模型](docs/featured_model/champion_model/CACascadeRCNN.md) - [Objects365 2019 Challenge夺冠模型](docs/featured_model/champion_model/CACascadeRCNN.md)
- [Open Images 2019-Object Detction比赛最佳单模型](docs/featured_model/champion_model/OIDV5_BASELINE_MODEL.md) - [Open Images 2019-Object Detction比赛最佳单模型](docs/featured_model/champion_model/OIDV5_BASELINE_MODEL.md)
- [服务器端实用目标检测模型](configs/rcnn_enhance/README.md): V100上速度20FPS时,COCO mAP高达47.8%。 - [服务器端实用目标检测模型](configs/rcnn_enhance/README.md): V100上速度20FPS时,COCO mAP高达47.8%。
- [大规模实用目标检测模型](docs/featured_model/LARGE_SCALE_DET_MODEL.md): 提供了包含676个类别的大规模服务器端实用目标检测模型,适用于绝大部分使用场景,可以直接用来预测,也可以用于微调其他任务。
## 许可证书 ## 许可证书
......
...@@ -149,6 +149,7 @@ The following is the relationship between COCO mAP and FPS on Tesla V100 of repr ...@@ -149,6 +149,7 @@ The following is the relationship between COCO mAP and FPS on Tesla V100 of repr
- [Objects365 2019 Challenge champion model](docs/featured_model/champion_model/CACascadeRCNN.md) - [Objects365 2019 Challenge champion model](docs/featured_model/champion_model/CACascadeRCNN.md)
- [Best single model of Open Images 2019-Object Detction](docs/featured_model/champion_model/OIDV5_BASELINE_MODEL.md) - [Best single model of Open Images 2019-Object Detction](docs/featured_model/champion_model/OIDV5_BASELINE_MODEL.md)
- [Practical Server-side detection method](configs/rcnn_enhance/README_en.md): Inference speed on single V100 GPU can reach 20FPS when COCO mAP is 47.8%. - [Practical Server-side detection method](configs/rcnn_enhance/README_en.md): Inference speed on single V100 GPU can reach 20FPS when COCO mAP is 47.8%.
- [Large-scale practical object detection models](docs/featured_model/LARGE_SCALE_DET_MODEL_en.md): Large-scale practical server-side detection pretrained models with 676 categories are provided for most application scenarios, which can be used not only for direct inference but also finetuning on other datasets.
## License ## License
......
...@@ -34,3 +34,4 @@ ...@@ -34,3 +34,4 @@
| :---------------------- | :-------------: | :-------: | :-----: | :------------: | :----: | :-----: | :-------------: | :-----: | | :---------------------- | :-------------: | :-------: | :-----: | :------------: | :----: | :-----: | :-------------: | :-----: |
| ResNet50-vd-FPN-Dcnv2 | Faster | 2 | 3x | 61.425 | 41.6 | - | [model](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_dcn_r50_vd_fpn_3x_server_side.tar) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/rcnn_server_side_det/faster_rcnn_dcn_r50_vd_fpn_3x_server_side.yml) | | ResNet50-vd-FPN-Dcnv2 | Faster | 2 | 3x | 61.425 | 41.6 | - | [model](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_dcn_r50_vd_fpn_3x_server_side.tar) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/rcnn_server_side_det/faster_rcnn_dcn_r50_vd_fpn_3x_server_side.yml) |
| ResNet50-vd-FPN-Dcnv2 | Cascade Faster | 2 | 3x | 20.001 | 47.8 | - | [model](https://paddlemodels.bj.bcebos.com/object_detection/cascade_rcnn_dcn_r50_vd_fpn_3x_server_side.tar) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/rcnn_server_side_det/cascade_rcnn_dcn_r50_vd_fpn_3x_server_side.yml) | | ResNet50-vd-FPN-Dcnv2 | Cascade Faster | 2 | 3x | 20.001 | 47.8 | - | [model](https://paddlemodels.bj.bcebos.com/object_detection/cascade_rcnn_dcn_r50_vd_fpn_3x_server_side.tar) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/rcnn_server_side_det/cascade_rcnn_dcn_r50_vd_fpn_3x_server_side.yml) |
| ResNet101-vd-FPN-Dcnv2 | Cascade Faster | 2 | 3x | 19.523 | 49.4 | - | [model](https://paddlemodels.bj.bcebos.com/object_detection/cascade_rcnn_dcn_r101_vd_fpn_3x_server_side.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/rcnn_server_side_det/cascade_rcnn_dcn_r101_vd_fpn_3x_server_side.yml) |
...@@ -30,11 +30,12 @@ And the following figure shows `mAP-Speed` curves for some common detectors. ...@@ -30,11 +30,12 @@ And the following figure shows `mAP-Speed` curves for some common detectors.
> For fair comparison, inference time for PSS-DET models on V100 GPU is transformed to Titan V GPU by multiplying by 1.2 times. > For fair comparison, inference time for PSS-DET models on V100 GPU is transformed to Titan V GPU by multiplying by 1.2 times.
## Model Zoo ## Model Zoo
#### COCO dataset
| Backbone | Type | Image/gpu | Lr schd | Inf time (fps) | Box AP | Mask AP | Download | Configs | | Backbone | Type | Image/gpu | Lr schd | Inf time (fps) | Box AP | Mask AP | Download | Configs |
| :---------------------- | :-------------: | :-------: | :-----: | :------------: | :----: | :-----: | :----------------------------------------------------------: | :-----: | | :---------------------- | :-------------: | :-------: | :-----: | :------------: | :----: | :-----: | :----------------------------------------------------------: | :-----: |
| ResNet50-vd-FPN-Dcnv2 | Faster | 2 | 3x | 61.425 | 41.6 | - | [model](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_dcn_r50_vd_fpn_3x_server_side.tar) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/rcnn_server_side_det/faster_rcnn_dcn_r50_vd_fpn_3x_server_side.yml) | | ResNet50-vd-FPN-Dcnv2 | Faster | 2 | 3x | 61.425 | 41.6 | - | [model](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_dcn_r50_vd_fpn_3x_server_side.tar) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/rcnn_server_side_det/faster_rcnn_dcn_r50_vd_fpn_3x_server_side.yml) |
| ResNet50-vd-FPN-Dcnv2 | Cascade Faster | 2 | 3x | 20.001 | 47.8 | - | [model](https://paddlemodels.bj.bcebos.com/object_detection/cascade_rcnn_dcn_r50_vd_fpn_3x_server_side.tar) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/rcnn_server_side_det/cascade_rcnn_dcn_r50_vd_fpn_3x_server_side.yml) | | ResNet50-vd-FPN-Dcnv2 | Cascade Faster | 2 | 3x | 20.001 | 47.8 | - | [model](https://paddlemodels.bj.bcebos.com/object_detection/cascade_rcnn_dcn_r50_vd_fpn_3x_server_side.tar) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/rcnn_server_side_det/cascade_rcnn_dcn_r50_vd_fpn_3x_server_side.yml) |
| ResNet101-vd-FPN-Dcnv2 | Cascade Faster | 2 | 3x | 19.523 | 49.4 | - | [model](https://paddlemodels.bj.bcebos.com/object_detection/cascade_rcnn_dcn_r101_vd_fpn_3x_server_side.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/rcnn_server_side_det/cascade_rcnn_dcn_r101_vd_fpn_3x_server_side.yml) |
architecture: CascadeRCNN
max_iters: 270000
snapshot_iter: 30000
use_gpu: true
log_smooth_window: 20
log_iter: 20
save_dir: output
pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_vd_ssld_pretrained.tar
weights: output/cascade_rcnn_dcn_r101_vd_fpn_3x_server_side/model_final
metric: COCO
num_classes: 81
CascadeRCNN:
backbone: ResNet
fpn: FPN
rpn_head: FPNRPNHead
roi_extractor: FPNRoIAlign
bbox_head: CascadeBBoxHead
bbox_assigner: CascadeBBoxAssigner
ResNet:
norm_type: bn
depth: 101
feature_maps: [2, 3, 4, 5]
freeze_at: 2
variant: d
dcn_v2_stages: [3, 4, 5]
lr_mult_list: [0.05, 0.05, 0.1, 0.15]
FPN:
max_level: 6
min_level: 2
num_chan: 64
spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
FPNRPNHead:
anchor_generator:
anchor_sizes: [32, 64, 128, 256, 512]
aspect_ratios: [0.5, 1.0, 2.0]
stride: [16.0, 16.0]
variance: [1.0, 1.0, 1.0, 1.0]
anchor_start_size: 32
min_level: 2
max_level: 6
num_chan: 64
rpn_target_assign:
rpn_batch_size_per_im: 256
rpn_fg_fraction: 0.5
rpn_positive_overlap: 0.7
rpn_negative_overlap: 0.3
rpn_straddle_thresh: 0.0
train_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 2000
post_nms_top_n: 2000
test_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 500
post_nms_top_n: 300
FPNRoIAlign:
canconical_level: 4
canonical_size: 224
min_level: 2
max_level: 5
box_resolution: 7
sampling_ratio: 2
CascadeBBoxAssigner:
batch_size_per_im: 512
bbox_reg_weights: [10, 20, 30]
bg_thresh_lo: [0.0, 0.0, 0.0]
bg_thresh_hi: [0.5, 0.6, 0.7]
fg_thresh: [0.5, 0.6, 0.7]
fg_fraction: 0.25
CascadeBBoxHead:
head: CascadeTwoFCHead
bbox_loss: BalancedL1Loss
nms:
keep_top_k: 100
nms_threshold: 0.5
score_threshold: 0.05
BalancedL1Loss:
alpha: 0.5
gamma: 1.5
beta: 1.0
loss_weight: 1.0
CascadeTwoFCHead:
mlp_dim: 1024
LearningRate:
base_lr: 0.02
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [180000, 240000]
- !LinearWarmup
start_factor: 0.1
steps: 1000
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0001
type: L2
TrainReader:
inputs_def:
fields: ['image', 'im_info', 'im_id', 'gt_bbox', 'gt_class', 'is_crowd']
dataset:
!COCODataSet
image_dir: train2017
anno_path: annotations/instances_train2017.json
dataset_dir: dataset/coco
sample_transforms:
- !DecodeImage
to_rgb: true
- !RandomFlipImage
prob: 0.5
- !AutoAugmentImage
autoaug_type: v1
- !NormalizeImage
is_channel_first: false
is_scale: true
mean: [0.485,0.456,0.406]
std: [0.229, 0.224,0.225]
- !ResizeImage
target_size: [640, 672, 704, 736, 768, 800, 832, 864, 896, 928, 960, 992, 1024]
max_size: 1500
interp: 1
use_cv2: true
- !Permute
to_bgr: false
channel_first: true
batch_transforms:
- !PadBatch
pad_to_stride: 32
use_padded_im_info: false
batch_size: 2
shuffle: true
worker_num: 2
use_process: false
EvalReader:
inputs_def:
fields: ['image', 'im_info', 'im_id', 'im_shape']
# for voc
#fields: ['image', 'im_info', 'im_id', 'gt_bbox', 'gt_class', 'is_difficult']
dataset:
!COCODataSet
image_dir: val2017
anno_path: annotations/instances_val2017.json
dataset_dir: dataset/coco
sample_transforms:
- !DecodeImage
to_rgb: true
with_mixup: false
- !NormalizeImage
is_channel_first: false
is_scale: true
mean: [0.485,0.456,0.406]
std: [0.229, 0.224,0.225]
- !ResizeImage
interp: 1
max_size: 13330
target_size: 800
use_cv2: true
- !Permute
channel_first: true
to_bgr: false
batch_transforms:
- !PadBatch
pad_to_stride: 32
use_padded_im_info: true
batch_size: 1
shuffle: false
drop_empty: false
worker_num: 2
TestReader:
inputs_def:
# set image_shape if needed
fields: ['image', 'im_info', 'im_id', 'im_shape']
dataset:
!ImageFolder
anno_path: annotations/instances_val2017.json
sample_transforms:
- !DecodeImage
to_rgb: true
with_mixup: false
- !NormalizeImage
is_channel_first: false
is_scale: true
mean: [0.485,0.456,0.406]
std: [0.229, 0.224,0.225]
- !ResizeImage
interp: 1
max_size: 1333
target_size: 800
use_cv2: true
- !Permute
channel_first: true
to_bgr: false
batch_transforms:
- !PadBatch
pad_to_stride: 32
use_padded_im_info: true
batch_size: 1
shuffle: false
architecture: CascadeRCNN
max_iters: 1500000
snapshot_iter: 100000
use_gpu: true
log_smooth_window: 20
log_iter: 20
save_dir: output
pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/CBResNet101_vd_ssld_pretrained.tar
weights: output/cascade_rcnn_cbr101_vd_fpn_generic_server_side/model_final
metric: VOC
num_classes: 677
CascadeRCNN:
backbone: CBResNet
fpn: FPN
rpn_head: FPNRPNHead
roi_extractor: FPNRoIAlign
bbox_head: CascadeBBoxHead
bbox_assigner: CascadeBBoxAssigner
CBResNet:
norm_type: bn
norm_decay: 0.
depth: 101
feature_maps: [2, 3, 4, 5]
freeze_at: 2
variant: d
repeat_num: 2
lr_mult_list: [0.05, 0.05, 0.1, 0.15]
FPN:
max_level: 6
min_level: 2
num_chan: 256
spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
FPNRPNHead:
anchor_generator:
anchor_sizes: [32, 64, 128, 256, 512]
aspect_ratios: [0.5, 1.0, 2.0]
stride: [16.0, 16.0]
variance: [1.0, 1.0, 1.0, 1.0]
anchor_start_size: 32
min_level: 2
max_level: 6
num_chan: 256
rpn_target_assign:
rpn_batch_size_per_im: 256
rpn_fg_fraction: 0.5
rpn_positive_overlap: 0.7
rpn_negative_overlap: 0.3
rpn_straddle_thresh: 0.0
train_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 2000
post_nms_top_n: 2000
test_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 500
post_nms_top_n: 300
FPNRoIAlign:
canconical_level: 4
canonical_size: 224
min_level: 2
max_level: 5
box_resolution: 14
sampling_ratio: 2
CascadeBBoxAssigner:
batch_size_per_im: 512
bbox_reg_weights: [10, 20, 30]
bg_thresh_lo: [0.0, 0.0, 0.0]
bg_thresh_hi: [0.5, 0.6, 0.7]
fg_thresh: [0.5, 0.6, 0.7]
fg_fraction: 0.25
CascadeBBoxHead:
head: CascadeTwoFCHead
bbox_loss: BalancedL1Loss
nms:
keep_top_k: 100
nms_threshold: 0.5
score_threshold: 0.05
BalancedL1Loss:
alpha: 0.5
gamma: 1.5
beta: 1.0
loss_weight: 1.0
CascadeTwoFCHead:
mlp_dim: 1024
LearningRate:
base_lr: 0.005
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [1000000, 1400000]
- !LinearWarmup
start_factor: 0.1
steps: 1000
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0001
type: L2
TrainReader:
inputs_def:
fields: ['image', 'im_info', 'im_id', 'gt_bbox', 'gt_class', 'is_crowd']
dataset:
!COCODataSet
image_dir: train2017
anno_path: annotations/instances_train2017.json
dataset_dir: dataset/coco
sample_transforms:
- !DecodeImage
to_rgb: true
- !RandomFlipImage
prob: 0.5
- !AutoAugmentImage
autoaug_type: v1
- !NormalizeImage
is_channel_first: false
is_scale: true
mean: [0.485,0.456,0.406]
std: [0.229, 0.224,0.225]
- !ResizeImage
target_size: [640, 672, 704, 736, 768, 800, 832, 864, 896, 928, 960, 992, 1024]
max_size: 1500
interp: 1
use_cv2: true
- !Permute
to_bgr: false
channel_first: true
batch_transforms:
- !PadBatch
pad_to_stride: 32
use_padded_im_info: false
batch_size: 1
shuffle: true
worker_num: 2
use_process: false
EvalReader:
inputs_def:
fields: ['image', 'im_info', 'im_id', 'im_shape']
# for voc
#fields: ['image', 'im_info', 'im_id', 'gt_bbox', 'gt_class', 'is_difficult']
dataset:
!COCODataSet
image_dir: val2017
anno_path: annotations/instances_val2017.json
dataset_dir: dataset/coco
sample_transforms:
- !DecodeImage
to_rgb: true
with_mixup: false
- !NormalizeImage
is_channel_first: false
is_scale: true
mean: [0.485,0.456,0.406]
std: [0.229, 0.224,0.225]
- !ResizeImage
interp: 1
max_size: 1300
target_size: 800
use_cv2: true
- !Permute
channel_first: true
to_bgr: false
batch_transforms:
- !PadBatch
pad_to_stride: 32
use_padded_im_info: true
batch_size: 1
shuffle: false
drop_empty: false
worker_num: 2
TestReader:
inputs_def:
# set image_shape if needed
fields: ['image', 'im_info', 'im_id', 'im_shape']
dataset:
!ImageFolder
use_default_label: false
with_background: true
anno_path: ./dataset/voc/generic_det_label_list.txt
sample_transforms:
- !DecodeImage
to_rgb: true
with_mixup: false
- !NormalizeImage
is_channel_first: false
is_scale: true
mean: [0.485,0.456,0.406]
std: [0.229, 0.224,0.225]
- !ResizeImage
interp: 1
max_size: 1333
target_size: 800
use_cv2: true
- !Permute
channel_first: true
to_bgr: false
batch_transforms:
- !PadBatch
pad_to_stride: 32
use_padded_im_info: true
batch_size: 1
shuffle: false
architecture: CascadeRCNN
max_iters: 1500000
snapshot_iter: 100000
use_gpu: true
log_smooth_window: 20
log_iter: 20
save_dir: output
pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_vd_ssld_pretrained.tar
weights: output/cascade_rcnn_dcn_r101_vd_fpn_generic_server_side/model_final
metric: VOC
num_classes: 677
CascadeRCNN:
backbone: ResNet
fpn: FPN
rpn_head: FPNRPNHead
roi_extractor: FPNRoIAlign
bbox_head: CascadeBBoxHead
bbox_assigner: CascadeBBoxAssigner
ResNet:
norm_type: bn
depth: 101
feature_maps: [2, 3, 4, 5]
freeze_at: 2
variant: d
dcn_v2_stages: [3, 4, 5]
lr_mult_list: [0.05, 0.05, 0.1, 0.15]
FPN:
max_level: 6
min_level: 2
num_chan: 64
spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
FPNRPNHead:
anchor_generator:
anchor_sizes: [32, 64, 128, 256, 512]
aspect_ratios: [0.5, 1.0, 2.0]
stride: [16.0, 16.0]
variance: [1.0, 1.0, 1.0, 1.0]
anchor_start_size: 32
min_level: 2
max_level: 6
num_chan: 64
rpn_target_assign:
rpn_batch_size_per_im: 256
rpn_fg_fraction: 0.5
rpn_positive_overlap: 0.7
rpn_negative_overlap: 0.3
rpn_straddle_thresh: 0.0
train_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 2000
post_nms_top_n: 2000
test_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 500
post_nms_top_n: 300
FPNRoIAlign:
canconical_level: 4
canonical_size: 224
min_level: 2
max_level: 5
box_resolution: 7
sampling_ratio: 2
CascadeBBoxAssigner:
batch_size_per_im: 512
bbox_reg_weights: [10, 20, 30]
bg_thresh_lo: [0.0, 0.0, 0.0]
bg_thresh_hi: [0.5, 0.6, 0.7]
fg_thresh: [0.5, 0.6, 0.7]
fg_fraction: 0.25
CascadeBBoxHead:
head: CascadeTwoFCHead
bbox_loss: BalancedL1Loss
nms:
keep_top_k: 100
nms_threshold: 0.5
score_threshold: 0.05
BalancedL1Loss:
alpha: 0.5
gamma: 1.5
beta: 1.0
loss_weight: 1.0
CascadeTwoFCHead:
mlp_dim: 1024
LearningRate:
base_lr: 0.01
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [1000000, 1400000]
- !LinearWarmup
start_factor: 0.1
steps: 1000
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0001
type: L2
TrainReader:
inputs_def:
fields: ['image', 'im_info', 'im_id', 'gt_bbox', 'gt_class', 'is_crowd']
dataset:
!COCODataSet
image_dir: train2017
anno_path: annotations/instances_train2017.json
dataset_dir: dataset/coco
sample_transforms:
- !DecodeImage
to_rgb: true
- !RandomFlipImage
prob: 0.5
- !AutoAugmentImage
autoaug_type: v1
- !NormalizeImage
is_channel_first: false
is_scale: true
mean: [0.485,0.456,0.406]
std: [0.229, 0.224,0.225]
- !ResizeImage
target_size: [640, 672, 704, 736, 768, 800, 832, 864, 896, 928, 960, 992, 1024]
max_size: 1500
interp: 1
use_cv2: true
- !Permute
to_bgr: false
channel_first: true
batch_transforms:
- !PadBatch
pad_to_stride: 32
use_padded_im_info: false
batch_size: 1
shuffle: true
worker_num: 2
use_process: false
EvalReader:
inputs_def:
fields: ['image', 'im_info', 'im_id', 'im_shape']
# for voc
#fields: ['image', 'im_info', 'im_id', 'gt_bbox', 'gt_class', 'is_difficult']
dataset:
!COCODataSet
image_dir: val2017
anno_path: annotations/instances_val2017.json
dataset_dir: dataset/coco
sample_transforms:
- !DecodeImage
to_rgb: true
with_mixup: false
- !NormalizeImage
is_channel_first: false
is_scale: true
mean: [0.485,0.456,0.406]
std: [0.229, 0.224,0.225]
- !ResizeImage
interp: 1
max_size: 1300
target_size: 800
use_cv2: true
- !Permute
channel_first: true
to_bgr: false
batch_transforms:
- !PadBatch
pad_to_stride: 32
use_padded_im_info: true
batch_size: 1
shuffle: false
drop_empty: false
worker_num: 2
TestReader:
inputs_def:
# set image_shape if needed
fields: ['image', 'im_info', 'im_id', 'im_shape']
dataset:
!ImageFolder
use_default_label: false
with_background: true
anno_path: ./dataset/voc/generic_det_label_list.txt
sample_transforms:
- !DecodeImage
to_rgb: true
with_mixup: false
- !NormalizeImage
is_channel_first: false
is_scale: true
mean: [0.485,0.456,0.406]
std: [0.229, 0.224,0.225]
- !ResizeImage
interp: 1
max_size: 1333
target_size: 800
use_cv2: true
- !Permute
channel_first: true
to_bgr: false
batch_transforms:
- !PadBatch
pad_to_stride: 32
use_padded_im_info: true
batch_size: 1
shuffle: false
architecture: CascadeRCNN
max_iters: 750000
snapshot_iter: 50000
use_gpu: true
log_smooth_window: 20
log_iter: 20
save_dir: output
pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_ssld_v2_pretrained.tar
weights: output/cascade_rcnn_dcn_r50_vd_fpn_generic_server_side/model_final
metric: VOC
num_classes: 677
CascadeRCNN:
backbone: ResNet
fpn: FPN
rpn_head: FPNRPNHead
roi_extractor: FPNRoIAlign
bbox_head: CascadeBBoxHead
bbox_assigner: CascadeBBoxAssigner
ResNet:
norm_type: bn
depth: 50
feature_maps: [2, 3, 4, 5]
freeze_at: 2
variant: d
dcn_v2_stages: [3, 4, 5]
lr_mult_list: [0.05, 0.05, 0.1, 0.15]
FPN:
max_level: 6
min_level: 2
num_chan: 64
spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
FPNRPNHead:
anchor_generator:
anchor_sizes: [32, 64, 128, 256, 512]
aspect_ratios: [0.5, 1.0, 2.0]
stride: [16.0, 16.0]
variance: [1.0, 1.0, 1.0, 1.0]
anchor_start_size: 32
min_level: 2
max_level: 6
num_chan: 64
rpn_target_assign:
rpn_batch_size_per_im: 256
rpn_fg_fraction: 0.5
rpn_positive_overlap: 0.7
rpn_negative_overlap: 0.3
rpn_straddle_thresh: 0.0
train_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 2000
post_nms_top_n: 2000
test_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 500
post_nms_top_n: 300
FPNRoIAlign:
canconical_level: 4
canonical_size: 224
min_level: 2
max_level: 5
box_resolution: 7
sampling_ratio: 2
CascadeBBoxAssigner:
batch_size_per_im: 512
bbox_reg_weights: [10, 20, 30]
bg_thresh_lo: [0.0, 0.0, 0.0]
bg_thresh_hi: [0.5, 0.6, 0.7]
fg_thresh: [0.5, 0.6, 0.7]
fg_fraction: 0.25
CascadeBBoxHead:
head: CascadeTwoFCHead
bbox_loss: BalancedL1Loss
nms:
keep_top_k: 100
nms_threshold: 0.5
score_threshold: 0.05
BalancedL1Loss:
alpha: 0.5
gamma: 1.5
beta: 1.0
loss_weight: 1.0
CascadeTwoFCHead:
mlp_dim: 1024
LearningRate:
base_lr: 0.02
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [500000, 700000]
- !LinearWarmup
start_factor: 0.1
steps: 1000
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0001
type: L
TrainReader:
inputs_def:
fields: ['image', 'im_info', 'im_id', 'gt_bbox', 'gt_class', 'is_crowd']
dataset:
!COCODataSet
image_dir: train2017
anno_path: annotations/instances_train2017.json
dataset_dir: dataset/coco
sample_transforms:
- !DecodeImage
to_rgb: true
- !RandomFlipImage
prob: 0.5
- !AutoAugmentImage
autoaug_type: v1
- !NormalizeImage
is_channel_first: false
is_scale: true
mean: [0.485,0.456,0.406]
std: [0.229, 0.224,0.225]
- !ResizeImage
target_size: [640, 672, 704, 736, 768, 800, 832, 864, 896, 928, 960, 992, 1024]
max_size: 1500
interp: 1
use_cv2: true
- !Permute
to_bgr: false
channel_first: true
batch_transforms:
- !PadBatch
pad_to_stride: 32
use_padded_im_info: false
batch_size: 2
shuffle: true
worker_num: 2
use_process: false
EvalReader:
inputs_def:
fields: ['image', 'im_info', 'im_id', 'im_shape']
# for voc
#fields: ['image', 'im_info', 'im_id', 'gt_bbox', 'gt_class', 'is_difficult']
dataset:
!COCODataSet
image_dir: val2017
anno_path: annotations/instances_val2017.json
dataset_dir: dataset/coco
sample_transforms:
- !DecodeImage
to_rgb: true
with_mixup: false
- !NormalizeImage
is_channel_first: false
is_scale: true
mean: [0.485,0.456,0.406]
std: [0.229, 0.224,0.225]
- !ResizeImage
interp: 1
max_size: 1500
target_size: 1000
use_cv2: true
- !Permute
channel_first: true
to_bgr: false
batch_transforms:
- !PadBatch
pad_to_stride: 32
use_padded_im_info: true
batch_size: 1
shuffle: false
drop_empty: false
worker_num: 2
TestReader:
inputs_def:
# set image_shape if needed
fields: ['image', 'im_info', 'im_id', 'im_shape']
dataset:
!ImageFolder
use_default_label: false
with_background: true
anno_path: ./dataset/voc/generic_det_label_list.txt
sample_transforms:
- !DecodeImage
to_rgb: true
with_mixup: false
- !NormalizeImage
is_channel_first: false
is_scale: true
mean: [0.485,0.456,0.406]
std: [0.229, 0.224,0.225]
- !ResizeImage
interp: 1
max_size: 1500
target_size: 1000
use_cv2: true
- !Permute
channel_first: true
to_bgr: false
batch_transforms:
- !PadBatch
pad_to_stride: 32
use_padded_im_info: true
batch_size: 1
shuffle: false
Infant bed
Rose
Flag
Flashlight
Sea turtle
Camera
Animal
Glove
Crocodile
Cattle
House
Guacamole
Penguin
Vehicle registration plate
Bench
Ladybug
Human nose
Watermelon
Flute
Butterfly
Washing machine
Raccoon
Segway
Taco
Jellyfish
Cake
Pen
Cannon
Bread
Tree
Shellfish
Bed
Hamster
Hat
Toaster
Sombrero
Tiara
Bowl
Dragonfly
Moths and butterflies
Antelope
Vegetable
Torch
Building
Power plugs and sockets
Blender
Billiard table
Cutting board
Bronze sculpture
Turtle
Broccoli
Tiger
Mirror
Bear
Zucchini
Dress
Volleyball
Guitar
Reptile
Golf cart
Tart
Fedora
Carnivore
Car
Lighthouse
Coffeemaker
Food processor
Truck
Bookcase
Surfboard
Footwear
Bench
Necklace
Flower
Radish
Marine mammal
Frying pan
Tap
Peach
Knife
Handbag
Laptop
Tent
Ambulance
Christmas tree
Eagle
Limousine
Kitchen & dining room table
Polar bear
Tower
Football
Willow
Human head
Stop sign
Banana
Mixer
Binoculars
Dessert
Bee
Chair
Wood-burning stove
Flowerpot
Beaker
Oyster
Woodpecker
Harp
Bathtub
Wall clock
Sports uniform
Rhinoceros
Beehive
Cupboard
Chicken
Man
Blue jay
Cucumber
Balloon
Kite
Fireplace
Lantern
Missile
Book
Spoon
Grapefruit
Squirrel
Orange
Coat
Punching bag
Zebra
Billboard
Bicycle
Door handle
Mechanical fan
Ring binder
Table
Parrot
Sock
Vase
Weapon
Shotgun
Glasses
Seahorse
Belt
Watercraft
Window
Giraffe
Lion
Tire
Vehicle
Canoe
Tie
Shelf
Picture frame
Printer
Human leg
Boat
Slow cooker
Croissant
Candle
Pancake
Pillow
Coin
Stretcher
Sandal
Woman
Stairs
Harpsichord
Stool
Bus
Suitcase
Human mouth
Juice
Skull
Door
Violin
Chopsticks
Digital clock
Sunflower
Leopard
Bell pepper
Harbor seal
Snake
Sewing machine
Goose
Helicopter
Seat belt
Coffee cup
Microwave oven
Hot dog
Countertop
Serving tray
Dog bed
Beer
Sunglasses
Golf ball
Waffle
Palm tree
Trumpet
Ruler
Helmet
Ladder
Office building
Tablet computer
Toilet paper
Pomegranate
Skirt
Gas stove
Cookie
Cart
Raven
Egg
Burrito
Goat
Kitchen knife
Skateboard
Salt and pepper shakers
Lynx
Boot
Platter
Ski
Swimwear
Swimming pool
Drinking straw
Wrench
Drum
Ant
Human ear
Headphones
Fountain
Bird
Jeans
Television
Crab
Microphone
Home appliance
Snowplow
Beetle
Artichoke
Jet ski
Stationary bicycle
Human hair
Brown bear
Starfish
Fork
Lobster
Corded phone
Drink
Saucer
Carrot
Insect
Clock
Castle
Tennis racket
Ceiling fan
Asparagus
Jaguar
Musical instrument
Train
Cat
Rifle
Dumbbell
Mobile phone
Taxi
Shower
Pitcher
Lemon
Invertebrate
Turkey
High heels
Bust
Elephant
Scarf
Barrel
Trombone
Pumpkin
Box
Tomato
Frog
Bidet
Human face
Houseplant
Van
Shark
Ice cream
Swim cap
Falcon
Ostrich
Handgun
Whiteboard
Lizard
Pasta
Snowmobile
Light bulb
Window blind
Muffin
Pretzel
Computer monitor
Horn
Furniture
Sandwich
Fox
Convenience store
Fish
Fruit
Earrings
Curtain
Grape
Sofa bed
Horse
Luggage and bags
Desk
Crutch
Bicycle helmet
Tick
Airplane
Canary
Spatula
Watch
Lily
Kitchen appliance
Filing cabinet
Aircraft
Cake stand
Candy
Sink
Mouse
Wine
Wheelchair
Goldfish
Refrigerator
French fries
Drawer
Treadmill
Picnic basket
Dice
Cabbage
Football helmet
Pig
Person
Shorts
Gondola
Honeycomb
Doughnut
Chest of drawers
Land vehicle
Bat
Monkey
Dagger
Tableware
Human foot
Mug
Alarm clock
Pressure cooker
Human hand
Tortoise
Baseball glove
Sword
Pear
Miniskirt
Traffic sign
Girl
Roller skates
Dinosaur
Porch
Human beard
Submarine sandwich
Screwdriver
Strawberry
Wine glass
Seafood
Racket
Wheel
Sea lion
Toy
Tea
Tennis ball
Waste container
Mule
Cricket ball
Pineapple
Coconut
Doll
Coffee table
Snowman
Lavender
Shrimp
Maple
Cowboy hat
Goggles
Rugby ball
Caterpillar
Poster
Rocket
Organ
Saxophone
Traffic light
Cocktail
Plastic bag
Squash
Mushroom
Hamburger
Light switch
Parachute
Teddy bear
Winter melon
Deer
Musical keyboard
Plumbing fixture
Scoreboard
Baseball bat
Envelope
Adhesive tape
Briefcase
Paddle
Bow and arrow
Telephone
Sheep
Jacket
Boy
Pizza
Otter
Office supplies
Couch
Cello
Bull
Camel
Ball
Duck
Whale
Shirt
Tank
Motorcycle
Accordion
Owl
Porcupine
Sun hat
Nail
Scissors
Swan
Lamp
Crown
Piano
Sculpture
Cheetah
Oboe
Tin can
Mango
Tripod
Oven
Mouse
Barge
Coffee
Snowboard
Common fig
Salad
Marine invertebrates
Umbrella
Kangaroo
Human arm
Measuring cup
Snail
Loveseat
Suit
Teapot
Bottle
Alpaca
Kettle
Trousers
Popcorn
Centipede
Spider
Sparrow
Plate
Bagel
Personal care
Apple
Brassiere
Bathroom cabinet
studio couch
Computer keyboard
Table tennis racket
Sushi
Cabinetry
Street light
Towel
Nightstand
Rabbit
Dolphin
Dog
Jug
Wok
Fire hydrant
Human eye
Skyscraper
Backpack
Potato
Paper towel
Lifejacket
Bicycle wheel
Toilet
tuba
carpet
trolley
tv
fan
llama
stapler
tricycle
head_phone
air_conditioner
cookies
towel/napkin
boots
sausage
suv
bar_soap
baseball
luggage
poker_card
shovel
marker
earphone
projector
pencil_case
french_horn
tangerine
router/modem
folder
donut
durian
sailboat
nuts
coffee_machine
meat_balls
basket
extension_cord
green_beans
avocado
soccer
egg_tart
clutch
slide
fishing_rod
hanger
bread/bun
surveillance_camera
globe
blackboard/whiteboard
life_saver
pigeon
red_cabbage
cymbal
faucet
steak
swing
mangosteen
cheese
urinal
lettuce
hurdle
ring
basketball
potted_plant
rickshaw
target
race_car
bow_tie
iron
toiletries
donkey
saw
hammer
billiards
cutting/chopping_board
power_outlet
hair_drier
baozi
medal
liquid_soap
wild_bird
leather_shoes
dining_table
game_board
barbell
radio
street_lights
tape
hockey
spring_rolls
rice
golf_club
lighter
chips
microscope
cell_phone
fire_truck
noodles
cabinet/shelf
electronic_stove_and_gas_stove
key
comb
trash_bin/can
toothbrush
dates
electric_drill
cow
eggplant
broom
vent
tong
green_onion
scallop
facial_cleanser
toothpaste
hamimelon
eraser
shampoo/shower_gel
CD
skating_and_skiing_shoes
american_football
slippers
pitaya
pot/pan
calculator
tissue
table_tennis_paddle
board_eraser
speaker
papaya
cigar
notepaper
garlic
rice_cooker
canned
parking_meter
flashlight
paint_brush
cup
cue
crosswalk_sign
kiwi_fruit
radiator
mop
chainsaw
sandals
storage_box
onion
bracelet
fire_extinguisher
scale
okra
microwave
sneakers
pepper
corn
pomelo
computer_box
pliers
trophy
plum
brush
machinery_vehicle
yak
crane
converter
facial_mask
carriage
pickup_truck
traffic_cone
pie
pen/pencil
sports_car
frisbee
cleaning_products
remote
stroller
婴儿床
玫瑰
手电筒
海龟
照相机
动物
手套
鳄鱼
房子
鳄梨酱
企鹅
车辆牌照
凳子
瓢虫
人鼻
西瓜
长笛
蝴蝶
洗衣机
浣熊
赛格威
墨西哥玉米薄饼卷
海蜇
蛋糕
加农炮
面包
贝类
仓鼠
帽子
烤面包机
帽帽
冠状头饰
蜻蜓
飞蛾和蝴蝶
羚羊
蔬菜
火炬
建筑物
电源插头和插座
搅拌机
台球桌
切割板
青铜雕塑
乌龟
西兰花
老虎
镜子
西葫芦
礼服
排球
吉他
爬行动物
高尔夫球车
蛋挞
费多拉
食肉动物
小型车
灯塔
咖啡壶
食品加工厂
卡车
书柜
冲浪板
鞋类
凳子
项链
萝卜
海洋哺乳动物
煎锅
水龙头
手提包
笔记本电脑
帐篷
救护车
圣诞树
豪华轿车
厨房和餐桌
北极熊
塔楼
足球
柳树
人头
停车标志
香蕉
搅拌机
双筒望远镜
甜点
蜜蜂
椅子
烧柴炉
花盆
烧杯
牡蛎
啄木鸟
竖琴
浴缸
挂钟
运动服
犀牛
蜂箱
橱柜
冠蓝鸦
黄瓜
气球
风筝
壁炉
灯笼
导弹
勺子
葡萄柚
松鼠
橙色
外套
打孔袋
斑马
广告牌
自行车
门把手
机械风扇
环形粘结剂
桌子
鹦鹉
袜子
花瓶
武器
猎枪
玻璃杯
海马
腰带
船舶
窗口
长颈鹿
狮子
轮胎
车辆
独木舟
领带
架子
相框
打印机
人腿
小船
慢炖锅
牛角包
蜡烛
煎饼
枕头
硬币
担架
凉鞋
女人
楼梯
拨弦键琴
凳子
公共汽车
手提箱
人口学
果汁
颅骨
小提琴
筷子
数字时钟
向日葵
甜椒
海港海豹
缝纫机
直升机
座椅安全带
咖啡杯
微波炉
热狗
台面
服务托盘
狗床
啤酒
太阳镜
高尔夫球
华夫饼干
棕榈树
小号
尺子
头盔
梯子
办公楼
平板电脑
厕纸
石榴
裙子
煤气炉
曲奇饼干
大车
掠夺
鸡蛋
墨西哥煎饼
山羊
菜刀
滑板
盐和胡椒瓶
猞猁
靴子
大浅盘
滑雪板
泳装
游泳池
吸管
扳手
蚂蚁
人耳
耳机
喷泉
牛仔裤
电视机
话筒
家用电器
除雪机
甲虫
朝鲜蓟
喷气式滑雪板
固定自行车
人发
棕熊
海星
叉子
龙虾
有线电话
饮料
胡萝卜
昆虫
时钟
城堡
网球拍
吊扇
芦笋
美洲虎
乐器
火车
来复枪
哑铃
手机
出租车
淋浴
投掷者
柠檬
无脊椎动物
火鸡
高跟鞋
打破
大象
围巾
枪管
长号
南瓜
盒子
番茄
坐浴盆
人脸
室内植物
厢式货车
鲨鱼
冰淇淋
游泳帽
鸵鸟
手枪
白板
蜥蜴
面食
雪车
灯泡
窗盲
松饼
椒盐脆饼
计算机显示器
喇叭
家具
三明治
福克斯
便利店
水果
耳环
帷幕
葡萄
沙发床
行李和行李
书桌
拐杖
自行车头盔
滴答声
飞机
金丝雀
手表
莉莉
厨房用具
文件柜
飞机
蛋糕架
糖果
水槽
鼠标
葡萄酒
轮椅
金鱼
冰箱
炸薯条
抽屉
单调的工作
野餐篮子
骰子
甘蓝
足球头盔
短裤
贡多拉
蜂巢
炸圈饼
抽屉柜
陆地车辆
蝙蝠
猴子
匕首
餐具
人足
马克杯
闹钟
高压锅
人手
乌龟
棒球手套
迷你裙
交通标志
女孩
旱冰鞋
恐龙
门廊
胡须
潜艇三明治
螺丝起子
草莓
酒杯
海鲜
球拍
车轮
海狮
玩具
茶叶
网球
废物容器
骡子
板球
菠萝
椰子
娃娃
咖啡桌
雪人
薰衣草
小虾
枫树
牛仔帽
护目镜
橄榄球
毛虫
海报
火箭
器官
萨克斯
交通灯
鸡尾酒
塑料袋
壁球
蘑菇
汉堡包
电灯开关
降落伞
泰迪熊
冬瓜
鹿
音乐键盘
卫生器具
记分牌
棒球棒
包络线
胶带
公文包
弓箭
电话
夹克
男孩
披萨
水獭
办公用品
沙发
大提琴
公牛
骆驼
鸭子
鲸鱼
衬衫
坦克
摩托车
手风琴
猫头鹰
豪猪
太阳帽
钉子
剪刀
天鹅
皇冠
钢琴
雕塑
猎豹
双簧管
罐头罐
芒果
三脚架
烤箱
鼠标
驳船
咖啡
滑雪板
普通无花果
沙拉
无脊椎动物
雨伞
袋鼠
人手臂
量杯
蜗牛
相思
西服
茶壶
羊驼
水壶
裤子
爆米花
蜈蚣
蜘蛛
麻雀
盘子
百吉饼
个人护理
苹果
胸罩
浴室柜
演播室沙发
电脑键盘
乒乓球拍
寿司
橱柜
路灯
毛巾
床头柜
海豚
大罐
炒锅
消火栓
人眼
摩天大楼
背包
马铃薯
纸巾
小精灵
自行车车轮
卫生间
大号
地毯
手推车
电视
风扇
美洲驼
订书机
三轮车
耳机
空调器
饼干
毛巾/餐巾
靴子
香肠
运动型多用途汽车
肥皂
棒球
行李
扑克牌
铲子
标记笔
耳机
投影机
铅笔盒
法国圆号
橘子
路由器
文件夹
甜甜圈
榴莲
帆船
坚果
咖啡机
肉丸
篮子
插线板
青豆
鳄梨
英式足球
蛋挞
离合器
滑梯
鱼竿
衣架
面包
监控摄像头
地球仪
黑板/白板
救生员
鸽子
红卷心菜
铜钹
水龙头
牛排
秋千
山竹
奶酪
小便池
生菜
跨栏
戒指
篮球
盆栽植物
人力车
目标
赛车
蝴蝶结
熨斗
化妆品
铁锤
台球
切割/砧板
电源插座
吹风机
包子
奖章/奖牌
液体肥皂
野鸟
皮鞋
餐桌
游戏板
杠铃
收音机
路灯
磁带
曲棍球
春卷
大米
高尔夫俱乐部
打火机
炸薯条
显微镜
手机
消防车
面条
橱柜/架子
电磁炉和煤气炉
钥匙
梳子
垃圾箱/罐
牙刷
枣子
电钻
奶牛
茄子
扫帚
抽油烟机
钳子
大葱
扇贝
洁面乳
牙膏
哈密瓜
橡皮擦
洗发水/沐浴露
光盘
溜冰鞋和滑雪鞋
美式足球
拖鞋
火龙果
锅/平底锅
计算器
纸巾
乒乓球拍
板擦
扬声器
木瓜
雪茄
信纸
大蒜
电饭锅
罐装的
停车计时器
手电筒
画笔
杯子
球杆
人行横道标志
奇异果/猕猴桃
散热器
拖把
电锯
凉鞋拖鞋
储物箱
洋葱
手镯
灭火器
秋葵
微波炉
运动鞋
胡椒
玉米
柚子
主机
钳子
奖杯
李子/梅子
刷子/画笔
机械车辆
牦牛
起重机
转换器
面膜
马车
皮卡车
交通锥
馅饼
钢笔/铅笔
跑车
飞盘
清洁用品/洗涤剂/洗衣液
遥控器
婴儿车/手推车
## 大规模实用目标检测模型
### 简介
* 与图像分类任务不同,目标检测任务中,不仅需要标注图像中物体所属类别,还要标注其边框位置,因此标注成本相对更高。目前已开源的目标检测数据集中,应用比较广泛的有Open Images V5、Objects365和COCO数据集,这三个数据集的基本信息如下。
| Dataset | Classes | Images | Bounding boxes |
|--------------------|---------|-----------|----------------|
| COCO | 80 | 123,287 | 886,284 |
| Objects365 | 365 | 600,000 | 10,000,000 |
| Open Images V5 | 500 | 1,743,042 | 14,610,229 |
上述数据集中包含的类别均不多(相比于ImageNet1k分类数据集的1000个类别)。为了提供更加实用的服务器端目标检测模型,方便用户在不需要任何微调的情况下就可以直接使用,PaddleDetection结合[服务器端实用目标检测方案](./SERVER_SIDE.md),融合Open Images V5和Objects365训练集数据(二者包含许多重复类别),生成了包含676个类别的新数据集,类别映射关系可以在这里查看: [676个类别的标签文件](../../dataset/voc/generic_det_label_list_zh.txt)。并训练了服务器端实用目标检测模型,适用于绝大部分应用场景,方便用户直接部署使用,用户也可以根据提供的预训练模型,在自己的数据集上进行模型微调,加快收敛并获得更高的精度指标。
### 模型库
| 骨架网络 | 网络类型 | 下载 | 配置文件 |
| :---------------| :---------------| :---------------| :---------------
| ResNet50-vd-FPN-Dcnv2 | Cascade Faster | [model](https://paddlemodels.bj.bcebos.com/object_detection/cascade_rcnn_dcn_r50_vd_fpn_generic_server_side.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/rcnn_server_side_det/generic/cascade_rcnn_dcn_r50_vd_fpn_generic_server_side.yml) |
| ResNet101-vd-FPN-Dcnv2 | Cascade Faster | [model](https://paddlemodels.bj.bcebos.com/object_detection/cascade_rcnn_dcn_r101_vd_fpn_generic_server_side.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/rcnn_server_side_det/generic/cascade_rcnn_dcn_r101_vd_fpn_generic_server_side.yml) |
| CBResNet101-vd-FPN | Cascade Faster | [model](https://paddlemodels.bj.bcebos.com/object_detection/cascade_rcnn_cbr101_vd_fpn_generic_server_side.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/rcnn_server_side_det/generic/cascade_rcnn_cbr101_vd_fpn_generic_server_side.yml) |
## Large-scale practical object detection models (676 categories)
### Introduction
* Unlike the image classification task, in the object detection task, it is necessary to mark not only the category of the object in the image, but also the position of the object, which takes higher cost for labeling. Open Images V5, Objects365 and COCO datasets are commonly used datasets for objecet detection tasks. The basic information of these three datasets is as follows.
| Dataset | Classes | Images | Bounding boxes |
|--------------------|---------|-----------|----------------|
| COCO | 80 | 123,287 | 886,284 |
| Objects365 | 365 | 600,000 | 10,000,000 |
| Open Images V5 | 500 | 1,743,042 | 14,610,229 |
There are relatively not enough categories in the above dataset (compared to 1000 categories in the ImageNet1k classification dataset). In order to provide more practical server-side object detection models, which are convenient for users to use directly without finetuning anymore, PaddleDetection combines [Practical Server-side detection method base on RCNN](./SERVER_SIDE_en.md), merges Open image V5 and Objects365 dataset to generate a new training set containing 676 categories. The label list can be here: [label list containing 676 categories](../../dataset/voc/generic_det_label_list.txt). Some practical server-side models are trained on the dataset, which are suitable for most application scenarios. It is convenient for users to directly infer or deploy. Users can also finetune on their own datasets based on the provided pretrained models to accelerate convergence and achieve higher performance.
### Model zoo
| Backbone | Type | Download | Configs |
| :---------------| :---------------| :---------------| :---------------
| ResNet50-vd-FPN-Dcnv2 | Cascade Faster | [model](https://paddlemodels.bj.bcebos.com/object_detection/cascade_rcnn_dcn_r50_vd_fpn_generic_server_side.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/rcnn_server_side_det/generic/cascade_rcnn_dcn_r50_vd_fpn_generic_server_side.yml) |
| ResNet101-vd-FPN-Dcnv2 | Cascade Faster | [model](https://paddlemodels.bj.bcebos.com/object_detection/cascade_rcnn_dcn_r101_vd_fpn_generic_server_side.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/rcnn_server_side_det/generic/cascade_rcnn_dcn_r101_vd_fpn_generic_server_side.yml) |
| CBResNet101-vd-FPN | Cascade Faster | [model](https://paddlemodels.bj.bcebos.com/object_detection/cascade_rcnn_cbr101_vd_fpn_generic_server_side.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/rcnn_server_side_det/generic/cascade_rcnn_cbr101_vd_fpn_generic_server_side.yml) |
../../configs/rcnn_enhance/README_en.md
\ No newline at end of file
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册