提交 3e57b4c3 编写于 作者: Q qingqing01 提交者: GitHub

Rename object_detection to PaddleDetection. (#2601)

* Rename object_detection to PaddleDetection
* Small fix for doc
上级
# Virtualenv
/.venv/
/venv/
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
# C extensions
*.so
# json file
*.json
# Distribution / packaging
/bin/
/build/
/develop-eggs/
/dist/
/eggs/
/lib/
/lib64/
/output/
/parts/
/sdist/
/var/
/*.egg-info/
/.installed.cfg
/*.egg
/.eggs
# AUTHORS and ChangeLog will be generated while packaging
/AUTHORS
/ChangeLog
# BCloud / BuildSubmitter
/build_submitter.*
/logger_client_log
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
.tox/
.coverage
.cache
.pytest_cache
nosetests.xml
coverage.xml
# Translations
*.mo
# Sphinx documentation
/docs/_build/
*.json
[style]
based_on_style = pep8
column_limit = 80
# PaddleDetection
The goal of PaddleDetection is to provide easy access to a wide range of object
detection models in both industry and research settings. We design
PaddleDetection to be not only performant, production-ready but also highly
flexible, catering to research needs.
<div align="center">
<img src="demo/output/000000570688.jpg" />
</div>
## Introduction
Design Principles:
- Production Ready:
Key operations are implemented in C++ and CUDA, together with PaddlePaddle's
highly efficient inference engine, enables easy deployment in server environments.
- Highly Flexible:
Components are designed to be modular. Model architectures, as well as data
preprocess pipelines, can be easily customized with simple configuration
changes.
- Performance Optimized:
With the help of the underlying PaddlePaddle framework, faster training and
reduced GPU memory footprint is achieved. Notably, Yolo V3 training is
much faster compared to other frameworks. Another example is Mask-RCNN
(ResNet50), we managed to fit up to 5 images per GPU (V100 16GB) during
training.
Supported Architectures:
| | ResNet | ResNet-vd <sup>[1](#vd)</sup> | ResNeXt | SENet | MobileNet | DarkNet |
|--------------------|:------:|------------------------------:|:-------:|:-----:|:---------:|:-------:|
| Faster R-CNN | ✓ | ✓ | ✓ | ✓ | ✗ | ✗ |
| Faster R-CNN + FPN | ✓ | ✓ | ✓ | ✓ | ✗ | ✗ |
| Mask R-CNN | ✓ | ✓ | ✓ | ✓ | ✗ | ✗ |
| Mask R-CNN + FPN | ✓ | ✓ | ✓ | ✓ | ✗ | ✗ |
| Cascade R-CNN | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ |
| RetinaNet | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ |
| Yolov3 | ✓ | ✗ | ✗ | ✗ | ✓ | ✓ |
| SSD | ✗ | ✗ | ✗ | ✗ | ✓ | ✗ |
<a name="vd">[1]</a> ResNet-vd models offer much improved accuracy with negligible performance cost.
Advanced Features:
- [x] **Synchronized Batch Norm**: currently used by Yolo V3.
- [x] **Group Norm**: pretrained models to be released.
- [x] **Modulated Deformable Convolution**: pretrained models to be released.
- [x] **Deformable PSRoI Pooling**: pretrained models to be released.
## Model zoo
Pretrained models are available in the PaddlePaddle [detection model zoo](docs/MODEL_ZOO.md).
## Installation
Please follow the [installation guide](docs/INSTALL.md).
## Get Started
For inference, simply run the following command and the visualized result will
be saved in `output/`.
```bash
export PYTHONPATH=`pwd`:$PYTHONPATH
python tools/infer.py -c configs/mask_rcnn_r50_1x.yml \
-o weights=https://paddlemodels.bj.bcebos.com/object_detection/mask_rcnn_r50_1x.tar
-infer_img=demo/000000570688.jpg
```
For detailed training and evaluation workflow, please refer to [GETTING_STARTED.md](docs/GETTING_STARTED.md).
We also recommend users to take a look at the [IPython Notebook demo](demo/mask_rcnn_demo.ipynb)
Further information can be found in these documentations:
- [Introduction to the configuration workflow.](docs/CONFIG.md)
- [Guide to custom dataset and preprocess pipeline.](docs/DATA.md)
## Todo List
Please note this is a work in progress, substantial changes may come in the
near future.
Some of the planned features include:
- [ ] Mixed precision training.
- [ ] Distributed training.
- [ ] Inference in 8-bit mode.
- [ ] User defined operations.
- [ ] Larger model zoo.
## Updates
#### Initial release (7/3/2019)
- Initial release of PaddleDetection and detection model zoo
- Models included: Faster R-CNN, Mask R-CNN, Faster R-CNN+FPN, Mask
R-CNN+FPN, Cascade-Faster-RCNN+FPN, RetinaNet, Yolo v3, and SSD.
## Contributing
Contributions are highly welcomed and we would really appreciate your feedback!!
architecture: CascadeRCNN
train_feed: FasterRCNNTrainFeed
eval_feed: FasterRCNNEvalFeed
test_feed: FasterRCNNTestFeed
max_iters: 90000
snapshot_iter: 10000
use_gpu: true
log_smooth_window: 20
save_dir: output
pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar
weights: output/cascade_rcnn_r50_fpn_1x/model_final
metric: COCO
CascadeRCNN:
backbone: ResNet
fpn: FPN
rpn_head: FPNRPNHead
roi_extractor: FPNRoIAlign
bbox_head: CascadeBBoxHead
bbox_assigner: CascadeBBoxAssigner
ResNet:
norm_type: affine_channel
depth: 50
feature_maps: [2, 3, 4, 5]
freeze_at: 2
variant: b
FPN:
min_level: 2
max_level: 6
num_chan: 256
spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
FPNRPNHead:
anchor_generator:
anchor_sizes: [32, 64, 128, 256, 512]
aspect_ratios: [0.5, 1.0, 2.0]
stride: [16.0, 16.0]
variance: [1.0, 1.0, 1.0, 1.0]
anchor_start_size: 32
min_level: 2
max_level: 6
num_chan: 256
rpn_target_assign:
rpn_batch_size_per_im: 256
rpn_fg_fraction: 0.5
rpn_positive_overlap: 0.7
rpn_negative_overlap: 0.3
rpn_straddle_thresh: 0.0
train_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 2000
post_nms_top_n: 2000
test_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 1000
post_nms_top_n: 1000
FPNRoIAlign:
canconical_level: 4
canonical_size: 224
min_level: 2
max_level: 5
box_resolution: 7
sampling_ratio: 2
CascadeBBoxAssigner:
batch_size_per_im: 512
bbox_reg_weights: [10, 20, 30]
bg_thresh_lo: [0.0, 0.0, 0.0]
bg_thresh_hi: [0.5, 0.6, 0.7]
fg_thresh: [0.5, 0.6, 0.7]
fg_fraction: 0.25
num_classes: 81
CascadeBBoxHead:
head: FC6FC7Head
nms:
keep_top_k: 100
nms_threshold: 0.5
score_threshold: 0.05
num_classes: 81
FC6FC7Head:
num_chan: 1024
LearningRate:
base_lr: 0.02
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [60000, 80000]
- !LinearWarmup
start_factor: 0.3333333333333333
steps: 500
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0001
type: L2
FasterRCNNTrainFeed:
batch_size: 2
dataset:
dataset_dir: dataset/coco
annotation: annotations/instances_train2017.json
image_dir: train2017
batch_transforms:
- !PadBatch
pad_to_stride: 32
drop_last: false
num_workers: 2
FasterRCNNEvalFeed:
batch_size: 1
dataset:
dataset_dir: dataset/coco
annotation: annotations/instances_val2017.json
image_dir: val2017
batch_transforms:
- !PadBatch
pad_to_stride: 32
FasterRCNNTestFeed:
batch_size: 1
dataset:
annotation: annotations/instances_val2017.json
batch_transforms:
- !PadBatch
pad_to_stride: 32
drop_last: false
num_workers: 2
architecture: FasterRCNN
train_feed: FasterRCNNTrainFeed
eval_feed: FasterRCNNEvalFeed
test_feed: FasterRCNNTestFeed
use_gpu: true
max_iters: 180000
log_smooth_window: 20
save_dir: output
snapshot_iter: 10000
pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_pretrained.tar
metric: COCO
weights: output/faster_rcnn_r101_1x/model_final
FasterRCNN:
backbone: ResNet
rpn_head: RPNHead
roi_extractor: RoIAlign
bbox_head: BBoxHead
bbox_assigner: BBoxAssigner
ResNet:
norm_type: affine_channel
depth: 101
feature_maps: 4
freeze_at: 2
ResNetC5:
norm_type: affine_channel
RPNHead:
anchor_generator:
anchor_sizes: [32, 64, 128, 256, 512]
aspect_ratios: [0.5, 1.0, 2.0]
stride: [16.0, 16.0]
variance: [1.0, 1.0, 1.0, 1.0]
rpn_target_assign:
rpn_batch_size_per_im: 256
rpn_fg_fraction: 0.5
rpn_negative_overlap: 0.3
rpn_positive_overlap: 0.7
rpn_straddle_thresh: 0.0
use_random: true
train_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 12000
post_nms_top_n: 2000
test_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 6000
post_nms_top_n: 1000
RoIAlign:
resolution: 14
sampling_ratio: 0
spatial_scale: 0.0625
BBoxAssigner:
batch_size_per_im: 512
bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
bg_thresh_hi: 0.5
bg_thresh_lo: 0.0
fg_fraction: 0.25
fg_thresh: 0.5
num_classes: 81
BBoxHead:
head: ResNetC5
nms:
keep_top_k: 100
nms_threshold: 0.5
score_threshold: 0.05
num_classes: 81
LearningRate:
base_lr: 0.01
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [12000, 16000]
- !LinearWarmup
start_factor: 0.3333333333333333
steps: 500
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0001
type: L2
FasterRCNNTrainFeed:
# batch size per device
batch_size: 1
dataset:
dataset_dir: dataset/coco
annotation: annotations/instances_train2017.json
image_dir: train2017
drop_last: false
num_workers: 2
FasterRCNNEvalFeed:
batch_size: 1
dataset:
dataset_dir: dataset/coco
annotation: annotations/instances_val2017.json
image_dir: val2017
num_workers: 2
FasterRCNNTestFeed:
batch_size: 1
dataset:
annotation: annotations/instances_val2017.json
architecture: FasterRCNN
train_feed: FasterRCNNTrainFeed
eval_feed: FasterRCNNEvalFeed
test_feed: FasterRCNNTestFeed
max_iters: 180000
snapshot_iter: 10000
use_gpu: true
log_smooth_window: 20
save_dir: output
pretrain_weights: http://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_pretrained.tar
weights: output/faster_rcnn_r101_fpn_1x/model_final
metric: COCO
FasterRCNN:
backbone: ResNet
fpn: FPN
rpn_head: FPNRPNHead
roi_extractor: FPNRoIAlign
bbox_head: BBoxHead
bbox_assigner: BBoxAssigner
ResNet:
depth: 101
feature_maps: [2, 3, 4, 5]
freeze_at: 2
norm_type: affine_channel
FPN:
max_level: 6
min_level: 2
num_chan: 256
spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
FPNRPNHead:
anchor_generator:
anchor_sizes: [32, 64, 128, 256, 512]
aspect_ratios: [0.5, 1.0, 2.0]
stride: [16.0, 16.0]
variance: [1.0, 1.0, 1.0, 1.0]
anchor_start_size: 32
max_level: 6
min_level: 2
num_chan: 256
rpn_target_assign:
rpn_batch_size_per_im: 256
rpn_fg_fraction: 0.5
rpn_negative_overlap: 0.3
rpn_positive_overlap: 0.7
rpn_straddle_thresh: 0.0
train_proposal:
min_size: 0.0
nms_thresh: 0.7
post_nms_top_n: 2000
pre_nms_top_n: 2000
test_proposal:
min_size: 0.0
nms_thresh: 0.7
post_nms_top_n: 1000
pre_nms_top_n: 1000
FPNRoIAlign:
canconical_level: 4
canonical_size: 224
max_level: 5
min_level: 2
box_resolution: 7
sampling_ratio: 2
BBoxAssigner:
batch_size_per_im: 512
bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
bg_thresh_hi: 0.5
bg_thresh_lo: 0.0
fg_fraction: 0.25
fg_thresh: 0.5
num_classes: 81
BBoxHead:
head: TwoFCHead
nms:
keep_top_k: 100
nms_threshold: 0.5
score_threshold: 0.05
num_classes: 81
TwoFCHead:
num_chan: 1024
LearningRate:
base_lr: 0.01
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [120000, 160000]
- !LinearWarmup
start_factor: 0.3333333333333333
steps: 500
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0001
type: L2
FasterRCNNTrainFeed:
# batch size per device
batch_size: 1
dataset:
dataset_dir: dataset/coco
image_dir: train2017
annotation: annotations/instances_train2017.json
batch_transforms:
- !PadBatch
pad_to_stride: 32
num_workers: 2
FasterRCNNEvalFeed:
batch_size: 1
dataset:
dataset_dir: dataset/coco
annotation: annotations/instances_val2017.json
image_dir: val2017
batch_transforms:
- !PadBatch
pad_to_stride: 32
num_workers: 2
FasterRCNNTestFeed:
batch_size: 1
dataset:
annotation: annotations/instances_val2017.json
batch_transforms:
- !PadBatch
pad_to_stride: 32
num_workers: 2
architecture: FasterRCNN
train_feed: FasterRCNNTrainFeed
eval_feed: FasterRCNNEvalFeed
test_feed: FasterRCNNTestFeed
max_iters: 360000
snapshot_iter: 10000
use_gpu: true
log_smooth_window: 20
save_dir: output
pretrain_weights: http://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_pretrained.tar
weights: output/faster_rcnn_r101_fpn_2x/model_final
metric: COCO
FasterRCNN:
backbone: ResNet
fpn: FPN
rpn_head: FPNRPNHead
roi_extractor: FPNRoIAlign
bbox_head: BBoxHead
bbox_assigner: BBoxAssigner
ResNet:
depth: 101
feature_maps: [2, 3, 4, 5]
freeze_at: 2
norm_type: affine_channel
FPN:
max_level: 6
min_level: 2
num_chan: 256
spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
FPNRPNHead:
anchor_generator:
anchor_sizes: [32, 64, 128, 256, 512]
aspect_ratios: [0.5, 1.0, 2.0]
stride: [16.0, 16.0]
variance: [1.0, 1.0, 1.0, 1.0]
anchor_start_size: 32
max_level: 6
min_level: 2
num_chan: 256
rpn_target_assign:
rpn_batch_size_per_im: 256
rpn_fg_fraction: 0.5
rpn_negative_overlap: 0.3
rpn_positive_overlap: 0.7
rpn_straddle_thresh: 0.0
train_proposal:
min_size: 0.0
nms_thresh: 0.7
post_nms_top_n: 2000
pre_nms_top_n: 2000
test_proposal:
min_size: 0.0
nms_thresh: 0.7
post_nms_top_n: 1000
pre_nms_top_n: 1000
FPNRoIAlign:
canconical_level: 4
canonical_size: 224
max_level: 5
min_level: 2
box_resolution: 7
sampling_ratio: 2
BBoxAssigner:
batch_size_per_im: 512
bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
bg_thresh_hi: 0.5
bg_thresh_lo: 0.0
fg_fraction: 0.25
fg_thresh: 0.5
num_classes: 81
BBoxHead:
head: TwoFCHead
nms:
keep_top_k: 100
nms_threshold: 0.5
score_threshold: 0.05
num_classes: 81
TwoFCHead:
num_chan: 1024
LearningRate:
base_lr: 0.01
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [240000, 320000]
- !LinearWarmup
start_factor: 0.3333333333333333
steps: 500
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0001
type: L2
FasterRCNNTrainFeed:
# batch size per device
batch_size: 1
dataset:
dataset_dir: dataset/coco
image_dir: train2017
annotation: annotations/instances_train2017.json
batch_transforms:
- !PadBatch
pad_to_stride: 32
num_workers: 2
FasterRCNNEvalFeed:
batch_size: 1
dataset:
dataset_dir: dataset/coco
annotation: annotations/instances_val2017.json
image_dir: val2017
batch_transforms:
- !PadBatch
pad_to_stride: 32
num_workers: 2
FasterRCNNTestFeed:
batch_size: 1
dataset:
annotation: annotations/instances_val2017.json
batch_transforms:
- !PadBatch
pad_to_stride: 32
num_workers: 2
architecture: FasterRCNN
train_feed: FasterRCNNTrainFeed
eval_feed: FasterRCNNEvalFeed
test_feed: FasterRCNNTestFeed
max_iters: 180000
snapshot_iter: 10000
use_gpu: true
log_smooth_window: 20
save_dir: output
pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_vd_pretrained.tar
weights: output/faster_rcnn_r101_vd_fpn_1x/model_final
metric: COCO
FasterRCNN:
backbone: ResNet
fpn: FPN
rpn_head: FPNRPNHead
roi_extractor: FPNRoIAlign
bbox_head: BBoxHead
bbox_assigner: BBoxAssigner
ResNet:
depth: 101
feature_maps: [2, 3, 4, 5]
freeze_at: 2
norm_type: affine_channel
variant: d
FPN:
max_level: 6
min_level: 2
num_chan: 256
spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
FPNRPNHead:
anchor_generator:
anchor_sizes: [32, 64, 128, 256, 512]
aspect_ratios: [0.5, 1.0, 2.0]
stride: [16.0, 16.0]
variance: [1.0, 1.0, 1.0, 1.0]
anchor_start_size: 32
max_level: 6
min_level: 2
num_chan: 256
rpn_target_assign:
rpn_batch_size_per_im: 256
rpn_fg_fraction: 0.5
rpn_negative_overlap: 0.3
rpn_positive_overlap: 0.7
rpn_straddle_thresh: 0.0
train_proposal:
min_size: 0.0
nms_thresh: 0.7
post_nms_top_n: 2000
pre_nms_top_n: 2000
test_proposal:
min_size: 0.0
nms_thresh: 0.7
post_nms_top_n: 1000
pre_nms_top_n: 1000
FPNRoIAlign:
canconical_level: 4
canonical_size: 224
max_level: 5
min_level: 2
box_resolution: 7
sampling_ratio: 2
BBoxAssigner:
batch_size_per_im: 512
bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
bg_thresh_hi: 0.5
bg_thresh_lo: 0.0
fg_fraction: 0.25
fg_thresh: 0.5
num_classes: 81
BBoxHead:
head: TwoFCHead
nms:
keep_top_k: 100
nms_threshold: 0.5
score_threshold: 0.05
num_classes: 81
TwoFCHead:
num_chan: 1024
LearningRate:
base_lr: 0.01
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [120000, 160000]
- !LinearWarmup
start_factor: 0.3333333333333333
steps: 1000
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0001
type: L2
FasterRCNNTrainFeed:
# batch size per device
batch_size: 1
dataset:
dataset_dir: dataset/coco
image_dir: train2017
annotation: annotations/instances_train2017.json
batch_transforms:
- !PadBatch
pad_to_stride: 32
num_workers: 2
FasterRCNNEvalFeed:
batch_size: 1
dataset:
dataset_dir: dataset/coco
annotation: annotations/instances_val2017.json
image_dir: val2017
batch_transforms:
- !PadBatch
pad_to_stride: 32
num_workers: 2
FasterRCNNTestFeed:
batch_size: 1
dataset:
annotation: annotations/instances_val2017.json
batch_transforms:
- !PadBatch
pad_to_stride: 32
num_workers: 2
architecture: FasterRCNN
train_feed: FasterRCNNTrainFeed
eval_feed: FasterRCNNEvalFeed
test_feed: FasterRCNNTestFeed
max_iters: 360000
snapshot_iter: 10000
use_gpu: true
log_smooth_window: 20
save_dir: output
pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_vd_pretrained.tar
weights: output/faster_rcnn_r101_vd_fpn_2x/model_final
metric: COCO
FasterRCNN:
backbone: ResNet
fpn: FPN
rpn_head: FPNRPNHead
roi_extractor: FPNRoIAlign
bbox_head: BBoxHead
bbox_assigner: BBoxAssigner
ResNet:
depth: 101
feature_maps: [2, 3, 4, 5]
freeze_at: 2
norm_type: affine_channel
variant: d
FPN:
max_level: 6
min_level: 2
num_chan: 256
spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
FPNRPNHead:
anchor_generator:
anchor_sizes: [32, 64, 128, 256, 512]
aspect_ratios: [0.5, 1.0, 2.0]
stride: [16.0, 16.0]
variance: [1.0, 1.0, 1.0, 1.0]
anchor_start_size: 32
max_level: 6
min_level: 2
num_chan: 256
rpn_target_assign:
rpn_batch_size_per_im: 256
rpn_fg_fraction: 0.5
rpn_negative_overlap: 0.3
rpn_positive_overlap: 0.7
rpn_straddle_thresh: 0.0
train_proposal:
min_size: 0.0
nms_thresh: 0.7
post_nms_top_n: 2000
pre_nms_top_n: 2000
test_proposal:
min_size: 0.0
nms_thresh: 0.7
post_nms_top_n: 1000
pre_nms_top_n: 1000
FPNRoIAlign:
canconical_level: 4
canonical_size: 224
max_level: 5
min_level: 2
box_resolution: 7
sampling_ratio: 2
BBoxAssigner:
batch_size_per_im: 512
bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
bg_thresh_hi: 0.5
bg_thresh_lo: 0.0
fg_fraction: 0.25
fg_thresh: 0.5
num_classes: 81
BBoxHead:
head: TwoFCHead
nms:
keep_top_k: 100
nms_threshold: 0.5
score_threshold: 0.05
num_classes: 81
TwoFCHead:
num_chan: 1024
LearningRate:
base_lr: 0.01
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [240000, 320000]
- !LinearWarmup
start_factor: 0.3333333333333333
steps: 1000
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0001
type: L2
FasterRCNNTrainFeed:
# batch size per device
batch_size: 1
dataset:
dataset_dir: dataset/coco
image_dir: train2017
annotation: annotations/instances_train2017.json
batch_transforms:
- !PadBatch
pad_to_stride: 32
num_workers: 2
FasterRCNNEvalFeed:
batch_size: 1
dataset:
dataset_dir: dataset/coco
annotation: annotations/instances_val2017.json
image_dir: val2017
batch_transforms:
- !PadBatch
pad_to_stride: 32
num_workers: 2
FasterRCNNTestFeed:
batch_size: 1
dataset:
annotation: annotations/instances_val2017.json
batch_transforms:
- !PadBatch
pad_to_stride: 32
num_workers: 2
architecture: FasterRCNN
train_feed: FasterRCNNTrainFeed
eval_feed: FasterRCNNEvalFeed
test_feed: FasterRCNNTestFeed
use_gpu: true
max_iters: 180000
log_smooth_window: 20
save_dir: output
snapshot_iter: 10000
pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar
metric: COCO
weights: output/faster_rcnn_r50_1x/model_final
FasterRCNN:
backbone: ResNet
rpn_head: RPNHead
roi_extractor: RoIAlign
bbox_head: BBoxHead
bbox_assigner: BBoxAssigner
ResNet:
norm_type: affine_channel
depth: 50
feature_maps: 4
freeze_at: 2
ResNetC5:
norm_type: affine_channel
RPNHead:
anchor_generator:
anchor_sizes: [32, 64, 128, 256, 512]
aspect_ratios: [0.5, 1.0, 2.0]
stride: [16.0, 16.0]
variance: [1.0, 1.0, 1.0, 1.0]
rpn_target_assign:
rpn_batch_size_per_im: 256
rpn_fg_fraction: 0.5
rpn_negative_overlap: 0.3
rpn_positive_overlap: 0.7
rpn_straddle_thresh: 0.0
use_random: true
train_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 12000
post_nms_top_n: 2000
test_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 6000
post_nms_top_n: 1000
RoIAlign:
resolution: 14
sampling_ratio: 0
spatial_scale: 0.0625
BBoxAssigner:
batch_size_per_im: 512
bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
bg_thresh_hi: 0.5
bg_thresh_lo: 0.0
fg_fraction: 0.25
fg_thresh: 0.5
num_classes: 81
BBoxHead:
head: ResNetC5
nms:
keep_top_k: 100
nms_threshold: 0.5
score_threshold: 0.05
num_classes: 81
LearningRate:
base_lr: 0.01
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [12000, 16000]
- !LinearWarmup
start_factor: 0.3333333333333333
steps: 500
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0001
type: L2
FasterRCNNTrainFeed:
# batch size per device
batch_size: 1
dataset:
dataset_dir: dataset/coco
annotation: annotations/instances_train2017.json
image_dir: train2017
drop_last: false
num_workers: 2
FasterRCNNEvalFeed:
batch_size: 1
dataset:
dataset_dir: dataset/coco
annotation: annotations/instances_val2017.json
image_dir: val2017
num_workers: 2
FasterRCNNTestFeed:
batch_size: 1
dataset:
annotation: annotations/instances_val2017.json
architecture: FasterRCNN
train_feed: FasterRCNNTrainFeed
eval_feed: FasterRCNNEvalFeed
test_feed: FasterRCNNTestFeed
use_gpu: true
max_iters: 360000
log_smooth_window: 20
save_dir: output
snapshot_iter: 10000
pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar
metric: COCO
weights: output/faster_rcnn_r50_2x/model_final
FasterRCNN:
backbone: ResNet
rpn_head: RPNHead
roi_extractor: RoIAlign
bbox_head: BBoxHead
bbox_assigner: BBoxAssigner
ResNet:
norm_type: affine_channel
depth: 50
feature_maps: 4
freeze_at: 2
ResNetC5:
norm_type: affine_channel
RPNHead:
anchor_generator:
anchor_sizes: [32, 64, 128, 256, 512]
aspect_ratios: [0.5, 1.0, 2.0]
stride: [16.0, 16.0]
variance: [1.0, 1.0, 1.0, 1.0]
rpn_target_assign:
rpn_batch_size_per_im: 256
rpn_fg_fraction: 0.5
rpn_negative_overlap: 0.3
rpn_positive_overlap: 0.7
rpn_straddle_thresh: 0.0
use_random: true
train_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 12000
post_nms_top_n: 2000
test_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 6000
post_nms_top_n: 1000
RoIAlign:
resolution: 14
sampling_ratio: 0
spatial_scale: 0.0625
BBoxAssigner:
batch_size_per_im: 512
bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
bg_thresh_hi: 0.5
bg_thresh_lo: 0.0
fg_fraction: 0.25
fg_thresh: 0.5
num_classes: 81
BBoxHead:
head: ResNetC5
nms:
keep_top_k: 100
nms_threshold: 0.5
score_threshold: 0.05
num_classes: 81
LearningRate:
base_lr: 0.01
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [24000, 32000]
- !LinearWarmup
start_factor: 0.3333333333333333
steps: 500
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0001
type: L2
FasterRCNNTrainFeed:
# batch size per device
batch_size: 1
dataset:
dataset_dir: dataset/coco
annotation: annotations/instances_train2017.json
image_dir: train2017
drop_last: false
num_workers: 2
FasterRCNNEvalFeed:
batch_size: 1
dataset:
dataset_dir: dataset/coco
annotation: annotations/instances_val2017.json
image_dir: val2017
num_workers: 2
FasterRCNNTestFeed:
batch_size: 1
dataset:
annotation: annotations/instances_val2017.json
architecture: FasterRCNN
train_feed: FasterRCNNTrainFeed
eval_feed: FasterRCNNEvalFeed
test_feed: FasterRCNNTestFeed
max_iters: 90000
use_gpu: true
snapshot_iter: 10000
log_smooth_window: 20
save_dir: output
pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar
metric: COCO
weights: output/fpn/faster_rcnn_r50_fpn_1x/model_final
FasterRCNN:
backbone: ResNet
fpn: FPN
rpn_head: FPNRPNHead
roi_extractor: FPNRoIAlign
bbox_head: BBoxHead
bbox_assigner: BBoxAssigner
ResNet:
norm_type: affine_channel
norm_decay: 0.
depth: 50
feature_maps: [2, 3, 4, 5]
freeze_at: 2
FPN:
min_level: 2
max_level: 6
num_chan: 256
spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
FPNRPNHead:
anchor_generator:
anchor_sizes: [32, 64, 128, 256, 512]
aspect_ratios: [0.5, 1.0, 2.0]
stride: [16.0, 16.0]
variance: [1.0, 1.0, 1.0, 1.0]
anchor_start_size: 32
min_level: 2
max_level: 6
num_chan: 256
rpn_target_assign:
rpn_batch_size_per_im: 256
rpn_fg_fraction: 0.5
rpn_positive_overlap: 0.7
rpn_negative_overlap: 0.3
rpn_straddle_thresh: 0.0
train_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 2000
post_nms_top_n: 2000
test_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 1000
post_nms_top_n: 1000
FPNRoIAlign:
canconical_level: 4
canonical_size: 224
min_level: 2
max_level: 5
box_resolution: 7
sampling_ratio: 2
BBoxAssigner:
batch_size_per_im: 512
bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
bg_thresh_lo: 0.0
bg_thresh_hi: 0.5
fg_fraction: 0.25
fg_thresh: 0.5
num_classes: 81
BBoxHead:
head: TwoFCHead
nms:
keep_top_k: 100
nms_threshold: 0.5
score_threshold: 0.05
num_classes: 81
TwoFCHead:
num_chan: 1024
LearningRate:
base_lr: 0.02
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [60000, 80000]
- !LinearWarmup
start_factor: 0.3333333333333333
steps: 500
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0001
type: L2
FasterRCNNTrainFeed:
batch_size: 2
dataset:
dataset_dir: dataset/coco
annotation: annotations/instances_train2017.json
image_dir: train2017
batch_transforms:
- !PadBatch
pad_to_stride: 32
drop_last: false
num_workers: 2
FasterRCNNEvalFeed:
batch_size: 1
dataset:
dataset_dir: dataset/coco
annotation: annotations/instances_val2017.json
image_dir: val2017
batch_transforms:
- !PadBatch
pad_to_stride: 32
FasterRCNNTestFeed:
batch_size: 1
dataset:
annotation: annotations/instances_val2017.json
batch_transforms:
- !PadBatch
pad_to_stride: 32
drop_last: false
num_workers: 2
architecture: FasterRCNN
train_feed: FasterRCNNTrainFeed
eval_feed: FasterRCNNEvalFeed
test_feed: FasterRCNNTestFeed
max_iters: 90000
use_gpu: true
snapshot_iter: 10000
log_smooth_window: 20
save_dir: output
pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar
metric: COCO
weights: output/faster_rcnn_r50_fpn_2x/model_final
FasterRCNN:
backbone: ResNet
fpn: FPN
rpn_head: FPNRPNHead
roi_extractor: FPNRoIAlign
bbox_head: BBoxHead
bbox_assigner: BBoxAssigner
ResNet:
norm_type: affine_channel
norm_decay: 0.
depth: 50
feature_maps: [2, 3, 4, 5]
freeze_at: 2
FPN:
min_level: 2
max_level: 6
num_chan: 256
spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
FPNRPNHead:
anchor_generator:
anchor_sizes: [32, 64, 128, 256, 512]
aspect_ratios: [0.5, 1.0, 2.0]
stride: [16.0, 16.0]
variance: [1.0, 1.0, 1.0, 1.0]
anchor_start_size: 32
min_level: 2
max_level: 6
num_chan: 256
rpn_target_assign:
rpn_batch_size_per_im: 256
rpn_fg_fraction: 0.5
rpn_positive_overlap: 0.7
rpn_negative_overlap: 0.3
rpn_straddle_thresh: 0.0
train_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 2000
post_nms_top_n: 2000
test_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 1000
post_nms_top_n: 1000
FPNRoIAlign:
canconical_level: 4
canonical_size: 224
min_level: 2
max_level: 5
box_resolution: 7
sampling_ratio: 2
BBoxAssigner:
batch_size_per_im: 512
bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
bg_thresh_lo: 0.0
bg_thresh_hi: 0.5
fg_fraction: 0.25
fg_thresh: 0.5
num_classes: 81
BBoxHead:
head: TwoFCHead
nms:
keep_top_k: 100
nms_threshold: 0.5
score_threshold: 0.05
num_classes: 81
TwoFCHead:
num_chan: 1024
LearningRate:
base_lr: 0.02
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [120000, 160000]
- !LinearWarmup
start_factor: 0.3333333333333333
steps: 500
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0001
type: L2
FasterRCNNTrainFeed:
batch_size: 2
dataset:
dataset_dir: dataset/coco
annotation: annotations/instances_train2017.json
image_dir: train2017
batch_transforms:
- !PadBatch
pad_to_stride: 32
drop_last: false
num_workers: 2
FasterRCNNEvalFeed:
batch_size: 1
dataset:
dataset_dir: dataset/coco
annotation: annotations/instances_val2017.json
image_dir: val2017
batch_transforms:
- !PadBatch
pad_to_stride: 32
FasterRCNNTestFeed:
batch_size: 1
dataset:
annotation: coco/annotations/instances_val2017.json
batch_transforms:
- !PadBatch
pad_to_stride: 32
drop_last: false
num_workers: 2
architecture: FasterRCNN
train_feed: FasterRCNNTrainFeed
eval_feed: FasterRCNNEvalFeed
test_feed: FasterRCNNTestFeed
use_gpu: true
max_iters: 180000
log_smooth_window: 20
save_dir: output/faster-r50-vd-c4-1x
snapshot_iter: 10000
pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_pretrained.tar
metric: COCO
weights: output/faster_rcnn_r50_vd_1x/model_final
FasterRCNN:
backbone: ResNet
rpn_head: RPNHead
roi_extractor: RoIAlign
bbox_head: BBoxHead
bbox_assigner: BBoxAssigner
ResNet:
norm_type: affine_channel
depth: 50
feature_maps: 4
freeze_at: 2
variant: d
ResNetC5:
norm_type: affine_channel
variant: d
RPNHead:
anchor_generator:
anchor_sizes: [32, 64, 128, 256, 512]
aspect_ratios: [0.5, 1.0, 2.0]
stride: [16.0, 16.0]
variance: [1.0, 1.0, 1.0, 1.0]
rpn_target_assign:
rpn_batch_size_per_im: 256
rpn_fg_fraction: 0.5
rpn_negative_overlap: 0.3
rpn_positive_overlap: 0.7
rpn_straddle_thresh: 0.0
use_random: true
train_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 12000
post_nms_top_n: 2000
test_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 6000
post_nms_top_n: 1000
RoIAlign:
resolution: 14
sampling_ratio: 0
spatial_scale: 0.0625
BBoxAssigner:
batch_size_per_im: 512
bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
bg_thresh_hi: 0.5
bg_thresh_lo: 0.0
fg_fraction: 0.25
fg_thresh: 0.5
num_classes: 81
BBoxHead:
head: ResNetC5
nms:
keep_top_k: 100
nms_threshold: 0.5
score_threshold: 0.05
num_classes: 81
LearningRate:
base_lr: 0.01
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [12000, 16000]
- !LinearWarmup
start_factor: 0.3333333333333333
steps: 500
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0001
type: L2
FasterRCNNTrainFeed:
# batch size per device
batch_size: 1
dataset:
dataset_dir: dataset/coco
annotation: annotations/instances_train2017.json
image_dir: train2017
drop_last: false
num_workers: 2
FasterRCNNEvalFeed:
batch_size: 1
dataset:
dataset_dir: dataset/coco
annotation: annotations/instances_val2017.json
image_dir: val2017
num_workers: 2
FasterRCNNTestFeed:
batch_size: 1
dataset:
annotation: annotations/instances_val2017.json
architecture: FasterRCNN
train_feed: FasterRCNNTrainFeed
eval_feed: FasterRCNNEvalFeed
test_feed: FasterRCNNTestFeed
max_iters: 180000
snapshot_iter: 10000
use_gpu: true
log_smooth_window: 20
save_dir: output
pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_pretrained.tar
weights: output/faster_rcnn_r50_vd_fpn_2x/model_final
metric: COCO
FasterRCNN:
backbone: ResNet
fpn: FPN
rpn_head: FPNRPNHead
roi_extractor: FPNRoIAlign
bbox_head: BBoxHead
bbox_assigner: BBoxAssigner
ResNet:
depth: 50
feature_maps: [2, 3, 4, 5]
freeze_at: 2
norm_type: affine_channel
variant: d
FPN:
max_level: 6
min_level: 2
num_chan: 256
spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
FPNRPNHead:
anchor_generator:
anchor_sizes: [32, 64, 128, 256, 512]
aspect_ratios: [0.5, 1.0, 2.0]
stride: [16.0, 16.0]
variance: [1.0, 1.0, 1.0, 1.0]
anchor_start_size: 32
max_level: 6
min_level: 2
num_chan: 256
rpn_target_assign:
rpn_batch_size_per_im: 256
rpn_fg_fraction: 0.5
rpn_negative_overlap: 0.3
rpn_positive_overlap: 0.7
rpn_straddle_thresh: 0.0
train_proposal:
min_size: 0.0
nms_thresh: 0.7
post_nms_top_n: 2000
pre_nms_top_n: 2000
test_proposal:
min_size: 0.0
nms_thresh: 0.7
post_nms_top_n: 1000
pre_nms_top_n: 1000
FPNRoIAlign:
canconical_level: 4
canonical_size: 224
max_level: 5
min_level: 2
box_resolution: 7
sampling_ratio: 2
BBoxAssigner:
batch_size_per_im: 512
bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
bg_thresh_hi: 0.5
bg_thresh_lo: 0.0
fg_fraction: 0.25
fg_thresh: 0.5
num_classes: 81
BBoxHead:
head: TwoFCHead
nms:
keep_top_k: 100
nms_threshold: 0.5
score_threshold: 0.05
num_classes: 81
TwoFCHead:
num_chan: 1024
LearningRate:
base_lr: 0.02
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [120000, 160000]
- !LinearWarmup
start_factor: 0.3333333333333333
steps: 500
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0001
type: L2
FasterRCNNTrainFeed:
# batch size per device
batch_size: 2
dataset:
dataset_dir: dataset/coco
image_dir: train2017
annotation: annotations/instances_train2017.json
batch_transforms:
- !PadBatch
pad_to_stride: 32
num_workers: 2
FasterRCNNEvalFeed:
batch_size: 1
dataset:
dataset_dir: dataset/coco
annotation: annotations/instances_val2017.json
image_dir: val2017
batch_transforms:
- !PadBatch
pad_to_stride: 32
num_workers: 2
FasterRCNNTestFeed:
batch_size: 1
dataset:
annotation: annotations/instances_val2017.json
batch_transforms:
- !PadBatch
pad_to_stride: 32
num_workers: 2
architecture: FasterRCNN
train_feed: FasterRCNNTrainFeed
eval_feed: FasterRCNNEvalFeed
test_feed: FasterRCNNTestFeed
max_iters: 180000
snapshot_iter: 10000
use_gpu: true
log_smooth_window: 20
save_dir: output
pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/SE154_vd_pretrained.tar
weights: output/faster_rcnn_se154_1x/model_final
metric: COCO
FasterRCNN:
backbone: SENet
rpn_head: RPNHead
roi_extractor: RoIAlign
bbox_head: BBoxHead
bbox_assigner: BBoxAssigner
SENet:
depth: 152
feature_maps: 4
freeze_at: 2
group_width: 4
groups: 64
norm_type: affine_channel
variant: d
SENetC5:
depth: 152
freeze_at: 2
group_width: 4
groups: 64
norm_type: affine_channel
variant: d
RPNHead:
anchor_generator:
anchor_sizes: [32, 64, 128, 256, 512]
aspect_ratios: [0.5, 1.0, 2.0]
stride: [16.0, 16.0]
variance: [1.0, 1.0, 1.0, 1.0]
rpn_target_assign:
rpn_batch_size_per_im: 256
rpn_fg_fraction: 0.5
rpn_negative_overlap: 0.3
rpn_positive_overlap: 0.7
rpn_straddle_thresh: 0.0
train_proposal:
min_size: 0.0
nms_thresh: 0.7
post_nms_top_n: 2000
pre_nms_top_n: 12000
test_proposal:
min_size: 0.0
nms_thresh: 0.7
post_nms_top_n: 1000
pre_nms_top_n: 6000
RoIAlign:
resolution: 7
sampling_ratio: 0
spatial_scale: 0.0625
BBoxAssigner:
batch_size_per_im: 512
bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
bg_thresh_hi: 0.5
bg_thresh_lo: 0.0
fg_fraction: 0.25
fg_thresh: 0.5
num_classes: 81
BBoxHead:
head: SENetC5
nms:
keep_top_k: 100
nms_threshold: 0.5
score_threshold: 0.05
num_classes: 81
LearningRate:
base_lr: 0.01
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [120000, 160000]
- !LinearWarmup
start_factor: 0.1
steps: 1000
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0001
type: L2
FasterRCNNTrainFeed:
# batch size per device
batch_size: 1
dataset:
dataset_dir: dataset/coco
annotation: annotations/instances_val2017.json
image_dir: val2017
num_workers: 2
FasterRCNNEvalFeed:
batch_size: 1
dataset:
dataset_dir: dataset/coco
annotation: annotations/instances_val2017.json
image_dir: val2017
num_workers: 2
FasterRCNNTestFeed:
batch_size: 1
dataset:
annotation: annotations/instances_val2017.json
num_workers: 2
architecture: FasterRCNN
train_feed: FasterRCNNTrainFeed
eval_feed: FasterRCNNEvalFeed
test_feed: FasterRCNNTestFeed
max_iters: 180000
snapshot_iter: 10000
use_gpu: true
log_smooth_window: 20
save_dir: output
pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/SE154_vd_pretrained.tar
weights: output/faster_rcnn_se154_fpn_1x/model_final
metric: COCO
FasterRCNN:
backbone: SENet
fpn: FPN
rpn_head: FPNRPNHead
roi_extractor: FPNRoIAlign
bbox_head: BBoxHead
bbox_assigner: BBoxAssigner
SENet:
depth: 152
feature_maps: [2, 3, 4, 5]
freeze_at: 2
group_width: 4
groups: 64
norm_type: affine_channel
variant: d
FPN:
max_level: 6
min_level: 2
num_chan: 256
spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
FPNRPNHead:
anchor_generator:
anchor_sizes: [32, 64, 128, 256, 512]
aspect_ratios: [0.5, 1.0, 2.0]
stride: [16.0, 16.0]
variance: [1.0, 1.0, 1.0, 1.0]
anchor_start_size: 32
max_level: 6
min_level: 2
num_chan: 256
rpn_target_assign:
rpn_batch_size_per_im: 256
rpn_fg_fraction: 0.5
rpn_negative_overlap: 0.3
rpn_positive_overlap: 0.7
rpn_straddle_thresh: 0.0
train_proposal:
min_size: 0.0
nms_thresh: 0.7
post_nms_top_n: 2000
pre_nms_top_n: 2000
test_proposal:
min_size: 0.0
nms_thresh: 0.7
post_nms_top_n: 1000
pre_nms_top_n: 1000
FPNRoIAlign:
canconical_level: 4
canonical_size: 224
max_level: 5
min_level: 2
box_resolution: 7
sampling_ratio: 2
BBoxAssigner:
batch_size_per_im: 512
bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
bg_thresh_hi: 0.5
bg_thresh_lo: 0.0
fg_fraction: 0.25
fg_thresh: 0.5
num_classes: 81
BBoxHead:
head: TwoFCHead
nms:
keep_top_k: 100
nms_threshold: 0.5
score_threshold: 0.05
num_classes: 81
TwoFCHead:
num_chan: 1024
LearningRate:
base_lr: 0.01
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [120000, 160000]
- !LinearWarmup
start_factor: 0.1
steps: 1000
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0001
type: L2
FasterRCNNTrainFeed:
# batch size per device
batch_size: 1
dataset:
dataset_dir: dataset/coco
image_dir: train2017
annotation: annotations/instances_train2017.json
batch_transforms:
- !PadBatch
pad_to_stride: 32
num_workers: 2
FasterRCNNEvalFeed:
batch_size: 1
dataset:
dataset_dir: dataset/coco
annotation: annotations/instances_val2017.json
image_dir: val2017
batch_transforms:
- !PadBatch
pad_to_stride: 32
num_workers: 2
FasterRCNNTestFeed:
batch_size: 1
dataset:
annotation: annotations/instances_val2017.json
batch_transforms:
- !PadBatch
pad_to_stride: 32
num_workers: 2
architecture: FasterRCNN
train_feed: FasterRCNNTrainFeed
eval_feed: FasterRCNNEvalFeed
test_feed: FasterRCNNTestFeed
max_iters: 260000
snapshot_iter: 10000
use_gpu: true
log_smooth_window: 20
save_dir: output
pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/SE154_vd_pretrained.tar
weights: output/faster_rcnn_se154_fpn_s1x/model_final
metric: COCO
FasterRCNN:
backbone: SENet
fpn: FPN
rpn_head: FPNRPNHead
roi_extractor: FPNRoIAlign
bbox_head: BBoxHead
bbox_assigner: BBoxAssigner
SENet:
depth: 152
feature_maps: [2, 3, 4, 5]
freeze_at: 2
group_width: 4
groups: 64
norm_type: affine_channel
variant: d
FPN:
max_level: 6
min_level: 2
num_chan: 256
spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
FPNRPNHead:
anchor_generator:
anchor_sizes: [32, 64, 128, 256, 512]
aspect_ratios: [0.5, 1.0, 2.0]
stride: [16.0, 16.0]
variance: [1.0, 1.0, 1.0, 1.0]
anchor_start_size: 32
max_level: 6
min_level: 2
num_chan: 256
rpn_target_assign:
rpn_batch_size_per_im: 256
rpn_fg_fraction: 0.5
rpn_negative_overlap: 0.3
rpn_positive_overlap: 0.7
rpn_straddle_thresh: 0.0
train_proposal:
min_size: 0.0
nms_thresh: 0.7
post_nms_top_n: 2000
pre_nms_top_n: 2000
test_proposal:
min_size: 0.0
nms_thresh: 0.7
post_nms_top_n: 1000
pre_nms_top_n: 1000
FPNRoIAlign:
canconical_level: 4
canonical_size: 224
max_level: 5
min_level: 2
box_resolution: 7
sampling_ratio: 2
BBoxAssigner:
batch_size_per_im: 512
bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
bg_thresh_hi: 0.5
bg_thresh_lo: 0.0
fg_fraction: 0.25
fg_thresh: 0.5
num_classes: 81
BBoxHead:
head: TwoFCHead
nms:
keep_top_k: 100
nms_threshold: 0.5
score_threshold: 0.05
num_classes: 81
TwoFCHead:
num_chan: 1024
LearningRate:
base_lr: 0.01
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [200000, 240000]
- !LinearWarmup
start_factor: 0.1
steps: 1000
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0001
type: L2
FasterRCNNTrainFeed:
# batch size per device
batch_size: 1
dataset:
dataset_dir: dataset/coco
image_dir: train2017
annotation: annotations/instances_train2017.json
batch_transforms:
- !PadBatch
pad_to_stride: 32
num_workers: 2
FasterRCNNEvalFeed:
batch_size: 1
dataset:
dataset_dir: dataset/coco
annotation: annotations/instances_val2017.json
image_dir: val2017
batch_transforms:
- !PadBatch
pad_to_stride: 32
num_workers: 2
FasterRCNNTestFeed:
batch_size: 1
dataset:
annotation: annotations/instances_val2017.json
batch_transforms:
- !PadBatch
pad_to_stride: 32
num_workers: 2
architecture: FasterRCNN
train_feed: FasterRCNNTrainFeed
eval_feed: FasterRCNNEvalFeed
test_feed: FasterRCNNTestFeed
max_iters: 180000
snapshot_iter: 10000
use_gpu: true
log_smooth_window: 20
save_dir: output
pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt101_64x4d_pretrained.tar
weights: output/faster_rcnn_x101_64x4d_fpn_1x/model_final
metric: COCO
FasterRCNN:
backbone: ResNeXt
fpn: FPN
rpn_head: FPNRPNHead
roi_extractor: FPNRoIAlign
bbox_head: BBoxHead
bbox_assigner: BBoxAssigner
ResNeXt:
depth: 101
feature_maps: [2, 3, 4, 5]
freeze_at: 2
group_width: 4
groups: 64
norm_type: affine_channel
FPN:
max_level: 6
min_level: 2
num_chan: 256
spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
FPNRPNHead:
anchor_generator:
anchor_sizes: [32, 64, 128, 256, 512]
aspect_ratios: [0.5, 1.0, 2.0]
stride: [16.0, 16.0]
variance: [1.0, 1.0, 1.0, 1.0]
anchor_start_size: 32
max_level: 6
min_level: 2
num_chan: 256
rpn_target_assign:
rpn_batch_size_per_im: 256
rpn_fg_fraction: 0.5
rpn_negative_overlap: 0.3
rpn_positive_overlap: 0.7
rpn_straddle_thresh: 0.0
train_proposal:
min_size: 0.0
nms_thresh: 0.7
post_nms_top_n: 2000
pre_nms_top_n: 2000
test_proposal:
min_size: 0.0
nms_thresh: 0.7
post_nms_top_n: 1000
pre_nms_top_n: 1000
FPNRoIAlign:
canconical_level: 4
canonical_size: 224
max_level: 5
min_level: 2
box_resolution: 7
sampling_ratio: 2
BBoxAssigner:
batch_size_per_im: 512
bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
bg_thresh_hi: 0.5
bg_thresh_lo: 0.0
fg_fraction: 0.25
fg_thresh: 0.5
num_classes: 81
BBoxHead:
head: TwoFCHead
nms:
keep_top_k: 100
nms_threshold: 0.5
score_threshold: 0.05
num_classes: 81
TwoFCHead:
num_chan: 1024
LearningRate:
base_lr: 0.01
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [120000, 160000]
- !LinearWarmup
start_factor: 0.3333333333333333
steps: 500
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0001
type: L2
FasterRCNNTrainFeed:
# batch size per device
batch_size: 1
dataset:
dataset_dir: dataset/coco
image_dir: train2017
annotation: annotations/instances_train2017.json
batch_transforms:
- !PadBatch
pad_to_stride: 32
num_workers: 2
FasterRCNNEvalFeed:
batch_size: 1
dataset:
dataset_dir: dataset/coco
annotation: annotations/instances_val2017.json
image_dir: val2017
batch_transforms:
- !PadBatch
pad_to_stride: 32
num_workers: 2
FasterRCNNTestFeed:
batch_size: 1
dataset:
annotation: annotations/instances_val2017.json
batch_transforms:
- !PadBatch
pad_to_stride: 32
num_workers: 2
architecture: FasterRCNN
train_feed: FasterRCNNTrainFeed
eval_feed: FasterRCNNEvalFeed
test_feed: FasterRCNNTestFeed
max_iters: 180000
snapshot_iter: 10000
use_gpu: true
log_smooth_window: 20
save_dir: output
pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt101_64x4d_pretrained.tar
weights: output/faster_rcnn_x101_64x4d_fpn_2x/model_final
metric: COCO
FasterRCNN:
backbone: ResNeXt
fpn: FPN
rpn_head: FPNRPNHead
roi_extractor: FPNRoIAlign
bbox_head: BBoxHead
bbox_assigner: BBoxAssigner
ResNeXt:
depth: 101
feature_maps: [2, 3, 4, 5]
freeze_at: 2
group_width: 4
groups: 64
norm_type: affine_channel
FPN:
max_level: 6
min_level: 2
num_chan: 256
spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
FPNRPNHead:
anchor_generator:
anchor_sizes: [32, 64, 128, 256, 512]
aspect_ratios: [0.5, 1.0, 2.0]
stride: [16.0, 16.0]
variance: [1.0, 1.0, 1.0, 1.0]
anchor_start_size: 32
max_level: 6
min_level: 2
num_chan: 256
rpn_target_assign:
rpn_batch_size_per_im: 256
rpn_fg_fraction: 0.5
rpn_negative_overlap: 0.3
rpn_positive_overlap: 0.7
rpn_straddle_thresh: 0.0
train_proposal:
min_size: 0.0
nms_thresh: 0.7
post_nms_top_n: 2000
pre_nms_top_n: 2000
test_proposal:
min_size: 0.0
nms_thresh: 0.7
post_nms_top_n: 1000
pre_nms_top_n: 1000
FPNRoIAlign:
canconical_level: 4
canonical_size: 224
max_level: 5
min_level: 2
box_resolution: 7
sampling_ratio: 2
BBoxAssigner:
batch_size_per_im: 512
bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
bg_thresh_hi: 0.5
bg_thresh_lo: 0.0
fg_fraction: 0.25
fg_thresh: 0.5
num_classes: 81
BBoxHead:
head: TwoFCHead
nms:
keep_top_k: 100
nms_threshold: 0.5
score_threshold: 0.05
num_classes: 81
TwoFCHead:
num_chan: 1024
LearningRate:
base_lr: 0.01
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [240000, 320000]
- !LinearWarmup
start_factor: 0.3333333333333333
steps: 500
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0001
type: L2
FasterRCNNTrainFeed:
# batch size per device
batch_size: 1
dataset:
dataset_dir: dataset/coco
image_dir: train2017
annotation: annotations/instances_train2017.json
batch_transforms:
- !PadBatch
pad_to_stride: 32
num_workers: 2
FasterRCNNEvalFeed:
batch_size: 1
dataset:
dataset_dir: dataset/coco
annotation: annotations/instances_val2017.json
image_dir: val2017
batch_transforms:
- !PadBatch
pad_to_stride: 32
num_workers: 2
FasterRCNNTestFeed:
batch_size: 1
dataset:
annotation: annotations/instances_val2017.json
batch_transforms:
- !PadBatch
pad_to_stride: 32
num_workers: 2
architecture: MaskRCNN
train_feed: MaskRCNNTrainFeed
eval_feed: MaskRCNNEvalFeed
test_feed: MaskRCNNTestFeed
use_gpu: true
max_iters: 180000
snapshot_iter: 10000
log_smooth_window: 20
save_dir: output
pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_pretrained.tar
metric: COCO
weights: output/mask_rcnn_r101_fpn_1x/model_final/
MaskRCNN:
backbone: ResNet
fpn: FPN
rpn_head: FPNRPNHead
roi_extractor: FPNRoIAlign
bbox_head: BBoxHead
bbox_assigner: BBoxAssigner
ResNet:
depth: 101
feature_maps: [2, 3, 4, 5]
freeze_at: 2
norm_type: affine_channel
FPN:
max_level: 6
min_level: 2
num_chan: 256
spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
FPNRPNHead:
anchor_generator:
aspect_ratios: [0.5, 1.0, 2.0]
variance: [1.0, 1.0, 1.0, 1.0]
anchor_start_size: 32
max_level: 6
min_level: 2
num_chan: 256
rpn_target_assign:
rpn_batch_size_per_im: 256
rpn_fg_fraction: 0.5
rpn_negative_overlap: 0.3
rpn_positive_overlap: 0.7
rpn_straddle_thresh: 0.0
train_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 2000
post_nms_top_n: 2000
test_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 1000
post_nms_top_n: 1000
FPNRoIAlign:
canconical_level: 4
canonical_size: 224
max_level: 5
min_level: 2
sampling_ratio: 2
box_resolution: 7
mask_resolution: 14
MaskHead:
dilation: 1
num_chan_reduced: 256
num_classes: 81
num_convs: 4
resolution: 28
BBoxAssigner:
batch_size_per_im: 512
bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
bg_thresh_hi: 0.5
bg_thresh_lo: 0.0
fg_fraction: 0.25
fg_thresh: 0.5
num_classes: 81
MaskAssigner:
resolution: 28
BBoxHead:
head: TwoFCHead
nms:
keep_top_k: 100
nms_threshold: 0.5
score_threshold: 0.05
num_classes: 81
TwoFCHead:
num_chan: 1024
LearningRate:
base_lr: 0.01
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [120000, 160000]
- !LinearWarmup
start_factor: 0.3333333333333333
steps: 500
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0001
type: L2
MaskRCNNTrainFeed:
batch_size: 1
dataset:
dataset_dir: dataset/coco
annotation: annotations/instances_train2017.json
image_dir: train2017
batch_transforms:
- !PadBatch
pad_to_stride: 32
num_workers: 2
MaskRCNNEvalFeed:
batch_size: 1
dataset:
dataset_dir: dataset/coco
annotation: annotations/instances_val2017.json
image_dir: val2017
batch_transforms:
- !PadBatch
pad_to_stride: 32
num_workers: 2
MaskRCNNTestFeed:
batch_size: 1
dataset:
annotation: annotations/instances_val2017.json
batch_transforms:
- !PadBatch
pad_to_stride: 32
num_workers: 2
architecture: MaskRCNN
train_feed: MaskRCNNTrainFeed
eval_feed: MaskRCNNEvalFeed
test_feed: MaskRCNNTestFeed
use_gpu: true
max_iters: 360000
snapshot_iter: 10000
log_smooth_window: 20
save_dir: output
pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_pretrained.tar
metric: COCO
weights: output/mask_rcnn_r101_fpn_2x/model_final/
MaskRCNN:
backbone: ResNet
fpn: FPN
rpn_head: FPNRPNHead
roi_extractor: FPNRoIAlign
bbox_head: BBoxHead
bbox_assigner: BBoxAssigner
ResNet:
depth: 101
feature_maps: [2, 3, 4, 5]
freeze_at: 2
norm_type: affine_channel
FPN:
max_level: 6
min_level: 2
num_chan: 256
spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
FPNRPNHead:
anchor_generator:
aspect_ratios: [0.5, 1.0, 2.0]
variance: [1.0, 1.0, 1.0, 1.0]
anchor_start_size: 32
max_level: 6
min_level: 2
num_chan: 256
rpn_target_assign:
rpn_batch_size_per_im: 256
rpn_fg_fraction: 0.5
rpn_negative_overlap: 0.3
rpn_positive_overlap: 0.7
rpn_straddle_thresh: 0.0
train_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 2000
post_nms_top_n: 2000
test_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 1000
post_nms_top_n: 1000
FPNRoIAlign:
canconical_level: 4
canonical_size: 224
max_level: 5
min_level: 2
sampling_ratio: 2
box_resolution: 7
mask_resolution: 14
MaskHead:
dilation: 1
num_chan_reduced: 256
num_classes: 81
num_convs: 4
resolution: 28
BBoxAssigner:
batch_size_per_im: 512
bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
bg_thresh_hi: 0.5
bg_thresh_lo: 0.0
fg_fraction: 0.25
fg_thresh: 0.5
num_classes: 81
MaskAssigner:
resolution: 28
BBoxHead:
head: TwoFCHead
nms:
keep_top_k: 100
nms_threshold: 0.5
score_threshold: 0.05
num_classes: 81
TwoFCHead:
num_chan: 1024
LearningRate:
base_lr: 0.01
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [240000, 320000]
- !LinearWarmup
start_factor: 0.3333333333333333
steps: 500
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0001
type: L2
MaskRCNNTrainFeed:
batch_size: 1
dataset:
dataset_dir: dataset/coco
annotation: annotations/instances_train2017.json
image_dir: train2017
batch_transforms:
- !PadBatch
pad_to_stride: 32
num_workers: 2
MaskRCNNEvalFeed:
batch_size: 1
dataset:
dataset_dir: dataset/coco
annotation: annotations/instances_val2017.json
image_dir: val2017
batch_transforms:
- !PadBatch
pad_to_stride: 32
num_workers: 2
MaskRCNNTestFeed:
batch_size: 1
dataset:
annotation: annotations/instances_val2017.json
batch_transforms:
- !PadBatch
pad_to_stride: 32
num_workers: 2
architecture: MaskRCNN
train_feed: MaskRCNNTrainFeed
eval_feed: MaskRCNNEvalFeed
test_feed: MaskRCNNTestFeed
use_gpu: true
max_iters: 180000
snapshot_iter: 10000
log_smooth_window: 20
save_dir: output
pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar
metric: COCO
weights: output/mask_rcnn_r50_1x/model_final
MaskRCNN:
backbone: ResNet
rpn_head: RPNHead
roi_extractor: RoIAlign
bbox_assigner: BBoxAssigner
bbox_head: BBoxHead
mask_assigner: MaskAssigner
mask_head: MaskHead
ResNet:
norm_type: affine_channel
norm_decay: 0.
depth: 50
feature_maps: 4
freeze_at: 2
ResNetC5:
norm_type: affine_channel
RPNHead:
anchor_generator:
anchor_sizes: [32, 64, 128, 256, 512]
aspect_ratios: [0.5, 1.0, 2.0]
stride: [16.0, 16.0]
variance: [1.0, 1.0, 1.0, 1.0]
rpn_target_assign:
rpn_batch_size_per_im: 256
rpn_fg_fraction: 0.5
rpn_negative_overlap: 0.3
rpn_positive_overlap: 0.7
rpn_straddle_thresh: 0.0
train_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 12000
post_nms_top_n: 2000
test_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 6000
post_nms_top_n: 1000
RoIAlign:
resolution: 14
spatial_scale: 0.0625
sampling_ratio: 0
BBoxHead:
head: ResNetC5
nms:
keep_top_k: 100
nms_threshold: 0.5
normalized: false
score_threshold: 0.05
num_classes: 81
MaskHead:
dilation: 1
num_chan_reduced: 256
num_classes: 81
resolution: 14
BBoxAssigner:
batch_size_per_im: 512
bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
bg_thresh_hi: 0.5
bg_thresh_lo: 0.0
fg_fraction: 0.25
fg_thresh: 0.5
num_classes: 81
MaskAssigner:
num_classes: 81
resolution: 14
LearningRate:
base_lr: 0.01
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [120000, 160000]
- !LinearWarmup
start_factor: 0.3333333333333333
steps: 500
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0001
type: L2
MaskRCNNTrainFeed:
batch_size: 1
dataset:
dataset_dir: dataset/coco
annotation: annotations/instances_train2017.json
image_dir: train2017
num_workers: 2
MaskRCNNEvalFeed:
batch_size: 1
dataset:
dataset_dir: dataset/coco
annotation: annotations/instances_val2017.json
image_dir: val2017
MaskRCNNTestFeed:
batch_size: 1
dataset:
annotation: annotations/instances_val2017.json
architecture: MaskRCNN
train_feed: MaskRCNNTrainFeed
eval_feed: MaskRCNNEvalFeed
test_feed: MaskRCNNTestFeed
use_gpu: true
max_iters: 360000
snapshot_iter: 10000
log_smooth_window: 20
save_dir: output
pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar
metric: COCO
weights: output/mask_rcnn_r50_2x/model_final/
MaskRCNN:
backbone: ResNet
rpn_head: RPNHead
roi_extractor: RoIAlign
bbox_assigner: BBoxAssigner
bbox_head: BBoxHead
mask_assigner: MaskAssigner
mask_head: MaskHead
ResNet:
norm_type: affine_channel
norm_decay: 0.
depth: 50
feature_maps: 4
freeze_at: 2
ResNetC5:
norm_type: affine_channel
RPNHead:
anchor_generator:
anchor_sizes: [32, 64, 128, 256, 512]
aspect_ratios: [0.5, 1.0, 2.0]
stride: [16.0, 16.0]
variance: [1.0, 1.0, 1.0, 1.0]
rpn_target_assign:
rpn_batch_size_per_im: 256
rpn_fg_fraction: 0.5
rpn_negative_overlap: 0.3
rpn_positive_overlap: 0.7
rpn_straddle_thresh: 0.0
train_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 12000
post_nms_top_n: 2000
test_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 6000
post_nms_top_n: 1000
RoIAlign:
resolution: 14
spatial_scale: 0.0625
sampling_ratio: 0
BBoxHead:
head: ResNetC5
nms:
keep_top_k: 100
nms_threshold: 0.5
normalized: false
score_threshold: 0.05
num_classes: 81
MaskHead:
dilation: 1
num_chan_reduced: 256
num_classes: 81
resolution: 14
BBoxAssigner:
batch_size_per_im: 512
bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
bg_thresh_hi: 0.5
bg_thresh_lo: 0.0
fg_fraction: 0.25
fg_thresh: 0.5
num_classes: 81
MaskAssigner:
num_classes: 81
resolution: 14
LearningRate:
base_lr: 0.01
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [240000, 320000]
#start the warm up from base_lr * start_factor
- !LinearWarmup
start_factor: 0.3333333333333333
steps: 500
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0001
type: L2
MaskRCNNTrainFeed:
batch_size: 1
dataset:
dataset_dir: dataset/coco
annotation: annotations/instances_train2017.json
image_dir: train2017
num_workers: 2
MaskRCNNEvalFeed:
batch_size: 1
dataset:
dataset_dir: dataset/coco
annotation: annotations/instances_val2017.json
image_dir: val2017
MaskRCNNTestFeed:
batch_size: 1
dataset:
annotation: annotations/instances_val2017.json
architecture: MaskRCNN
train_feed: MaskRCNNTrainFeed
eval_feed: MaskRCNNEvalFeed
test_feed: MaskRCNNTestFeed
use_gpu: true
max_iters: 180000
snapshot_iter: 10000
log_smooth_window: 20
save_dir: output
pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar
metric: COCO
weights: output/mask_rcnn_r50_fpn_1x/model_final/
MaskRCNN:
backbone: ResNet
fpn: FPN
rpn_head: FPNRPNHead
roi_extractor: FPNRoIAlign
bbox_head: BBoxHead
bbox_assigner: BBoxAssigner
ResNet:
depth: 50
feature_maps: [2, 3, 4, 5]
freeze_at: 2
norm_type: affine_channel
FPN:
max_level: 6
min_level: 2
num_chan: 256
spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
FPNRPNHead:
anchor_generator:
aspect_ratios: [0.5, 1.0, 2.0]
variance: [1.0, 1.0, 1.0, 1.0]
anchor_start_size: 32
max_level: 6
min_level: 2
num_chan: 256
rpn_target_assign:
rpn_batch_size_per_im: 256
rpn_fg_fraction: 0.5
rpn_negative_overlap: 0.3
rpn_positive_overlap: 0.7
rpn_straddle_thresh: 0.0
train_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 2000
post_nms_top_n: 2000
test_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 1000
post_nms_top_n: 1000
FPNRoIAlign:
canconical_level: 4
canonical_size: 224
max_level: 5
min_level: 2
sampling_ratio: 2
box_resolution: 7
mask_resolution: 14
MaskHead:
dilation: 1
num_chan_reduced: 256
num_classes: 81
num_convs: 4
resolution: 28
BBoxAssigner:
batch_size_per_im: 512
bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
bg_thresh_hi: 0.5
bg_thresh_lo: 0.0
fg_fraction: 0.25
fg_thresh: 0.5
num_classes: 81
MaskAssigner:
resolution: 28
BBoxHead:
head: TwoFCHead
nms:
keep_top_k: 100
nms_threshold: 0.5
score_threshold: 0.05
num_classes: 81
TwoFCHead:
num_chan: 1024
LearningRate:
base_lr: 0.01
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [120000, 160000]
- !LinearWarmup
start_factor: 0.3333333333333333
steps: 500
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0001
type: L2
MaskRCNNTrainFeed:
batch_size: 1
dataset:
dataset_dir: dataset/coco
annotation: annotations/instances_train2017.json
image_dir: train2017
batch_transforms:
- !PadBatch
pad_to_stride: 32
num_workers: 2
MaskRCNNEvalFeed:
batch_size: 1
dataset:
dataset_dir: dataset/coco
annotation: annotations/instances_val2017.json
image_dir: val2017
batch_transforms:
- !PadBatch
pad_to_stride: 32
num_workers: 2
MaskRCNNTestFeed:
batch_size: 1
dataset:
annotation: annotations/instances_val2017.json
batch_transforms:
- !PadBatch
pad_to_stride: 32
num_workers: 2
architecture: MaskRCNN
train_feed: MaskRCNNTrainFeed
eval_feed: MaskRCNNEvalFeed
test_feed: MaskRCNNTestFeed
use_gpu: true
max_iters: 360000
snapshot_iter: 10000
log_smooth_window: 20
save_dir: output
pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar
metric: COCO
weights: output/mask_rcnn_r50_fpn_2x/model_final/
MaskRCNN:
backbone: ResNet
fpn: FPN
rpn_head: FPNRPNHead
roi_extractor: FPNRoIAlign
bbox_head: BBoxHead
bbox_assigner: BBoxAssigner
ResNet:
depth: 50
feature_maps: [2, 3, 4, 5]
freeze_at: 2
norm_type: affine_channel
FPN:
max_level: 6
min_level: 2
num_chan: 256
spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
FPNRPNHead:
anchor_generator:
aspect_ratios: [0.5, 1.0, 2.0]
variance: [1.0, 1.0, 1.0, 1.0]
anchor_start_size: 32
max_level: 6
min_level: 2
num_chan: 256
rpn_target_assign:
rpn_batch_size_per_im: 256
rpn_fg_fraction: 0.5
rpn_negative_overlap: 0.3
rpn_positive_overlap: 0.7
rpn_straddle_thresh: 0.0
train_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 2000
post_nms_top_n: 2000
test_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 1000
post_nms_top_n: 1000
FPNRoIAlign:
canconical_level: 4
canonical_size: 224
max_level: 5
min_level: 2
sampling_ratio: 2
box_resolution: 7
mask_resolution: 14
MaskHead:
dilation: 1
num_chan_reduced: 256
num_classes: 81
num_convs: 4
resolution: 28
BBoxAssigner:
batch_size_per_im: 512
bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
bg_thresh_hi: 0.5
bg_thresh_lo: 0.0
fg_fraction: 0.25
fg_thresh: 0.5
num_classes: 81
MaskAssigner:
resolution: 28
BBoxHead:
head: TwoFCHead
nms:
keep_top_k: 100
nms_threshold: 0.5
score_threshold: 0.05
num_classes: 81
TwoFCHead:
num_chan: 1024
LearningRate:
base_lr: 0.01
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [240000, 320000]
- !LinearWarmup
start_factor: 0.3333333333333333
steps: 500
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0001
type: L2
MaskRCNNTrainFeed:
batch_size: 1
dataset:
dataset_dir: dataset/coco
annotation: annotations/instances_train2017.json
image_dir: train2017
batch_transforms:
- !PadBatch
pad_to_stride: 32
num_workers: 2
MaskRCNNEvalFeed:
batch_size: 1
dataset:
dataset_dir: dataset/coco
annotation: annotations/instances_val2017.json
image_dir: val2017
batch_transforms:
- !PadBatch
pad_to_stride: 32
num_workers: 2
MaskRCNNTestFeed:
batch_size: 1
dataset:
annotation: annotations/instances_val2017.json
batch_transforms:
- !PadBatch
pad_to_stride: 32
num_workers: 2
architecture: MaskRCNN
train_feed: MaskRCNNTrainFeed
eval_feed: MaskRCNNEvalFeed
test_feed: MaskRCNNTestFeed
use_gpu: true
max_iters: 360000
snapshot_iter: 10000
log_smooth_window: 20
save_dir: output
pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_pretrained.tar
metric: COCO
weights: output/mask_rcnn_r50_vd_fpn_2x/model_final/
MaskRCNN:
backbone: ResNet
fpn: FPN
rpn_head: FPNRPNHead
roi_extractor: FPNRoIAlign
bbox_head: BBoxHead
bbox_assigner: BBoxAssigner
ResNet:
depth: 50
feature_maps: [2, 3, 4, 5]
freeze_at: 2
norm_type: affine_channel
variant: d
FPN:
max_level: 6
min_level: 2
num_chan: 256
spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
FPNRPNHead:
anchor_generator:
aspect_ratios: [0.5, 1.0, 2.0]
variance: [1.0, 1.0, 1.0, 1.0]
anchor_start_size: 32
max_level: 6
min_level: 2
num_chan: 256
rpn_target_assign:
rpn_batch_size_per_im: 256
rpn_fg_fraction: 0.5
rpn_negative_overlap: 0.3
rpn_positive_overlap: 0.7
rpn_straddle_thresh: 0.0
train_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 2000
post_nms_top_n: 2000
test_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 1000
post_nms_top_n: 1000
FPNRoIAlign:
canconical_level: 4
canonical_size: 224
max_level: 5
min_level: 2
box_resolution: 7
sampling_ratio: 2
mask_resolution: 14
MaskHead:
dilation: 1
num_chan_reduced: 256
num_classes: 81
num_convs: 4
resolution: 28
BBoxAssigner:
batch_size_per_im: 512
bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
bg_thresh_hi: 0.5
bg_thresh_lo: 0.0
fg_fraction: 0.25
fg_thresh: 0.5
num_classes: 81
MaskAssigner:
resolution: 28
BBoxHead:
head: TwoFCHead
nms:
keep_top_k: 100
nms_threshold: 0.5
score_threshold: 0.05
num_classes: 81
TwoFCHead:
num_chan: 1024
LearningRate:
base_lr: 0.01
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [240000, 320000]
- !LinearWarmup
start_factor: 0.3333333333333333
steps: 500
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0001
type: L2
MaskRCNNTrainFeed:
# batch size per device
batch_size: 1
dataset:
dataset_dir: dataset/coco
image_dir: train2017
annotation: annotations/instances_train2017.json
batch_transforms:
- !PadBatch
pad_to_stride: 32
num_workers: 2
MaskRCNNEvalFeed:
batch_size: 1
dataset:
dataset_dir: dataset/coco
annotation: annotations/instances_val2017.json
image_dir: val2017
batch_transforms:
- !PadBatch
pad_to_stride: 32
num_workers: 2
MaskRCNNTestFeed:
batch_size: 1
dataset:
annotation: annotations/instances_val2017.json
batch_transforms:
- !PadBatch
pad_to_stride: 32
num_workers: 2
architecture: MaskRCNN
train_feed: MaskRCNNTrainFeed
eval_feed: MaskRCNNEvalFeed
test_feed: MaskRCNNTestFeed
max_iters: 260000
snapshot_iter: 10000
use_gpu: true
log_smooth_window: 20
save_dir: output
pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/SE154_vd_pretrained.tar
weights: output/mask_rcnn_se154_vd_fpn_s1x/model_final/
metric: COCO
MaskRCNN:
backbone: SENet
fpn: FPN
rpn_head: FPNRPNHead
roi_extractor: FPNRoIAlign
bbox_head: BBoxHead
bbox_assigner: BBoxAssigner
SENet:
depth: 152
feature_maps: [2, 3, 4, 5]
freeze_at: 2
group_width: 4
groups: 64
norm_type: affine_channel
variant: d
FPN:
max_level: 6
min_level: 2
num_chan: 256
spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
FPNRPNHead:
anchor_generator:
aspect_ratios: [0.5, 1.0, 2.0]
variance: [1.0, 1.0, 1.0, 1.0]
anchor_start_size: 32
max_level: 6
min_level: 2
num_chan: 256
rpn_target_assign:
rpn_batch_size_per_im: 256
rpn_fg_fraction: 0.5
rpn_negative_overlap: 0.3
rpn_positive_overlap: 0.7
rpn_straddle_thresh: 0.0
train_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 2000
post_nms_top_n: 2000
test_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 1000
post_nms_top_n: 1000
FPNRoIAlign:
canconical_level: 4
canonical_size: 224
max_level: 5
min_level: 2
box_resolution: 7
sampling_ratio: 2
mask_resolution: 14
MaskHead:
dilation: 1
num_chan_reduced: 256
num_classes: 81
num_convs: 4
resolution: 28
BBoxAssigner:
batch_size_per_im: 512
bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
bg_thresh_hi: 0.5
bg_thresh_lo: 0.0
fg_fraction: 0.25
fg_thresh: 0.5
num_classes: 81
MaskAssigner:
resolution: 28
BBoxHead:
head: TwoFCHead
nms:
keep_top_k: 100
nms_threshold: 0.5
score_threshold: 0.05
num_classes: 81
TwoFCHead:
num_chan: 1024
LearningRate:
base_lr: 0.01
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [200000, 240000]
- !LinearWarmup
start_factor: 0.3333333333333333
steps: 500
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0001
type: L2
MaskRCNNTrainFeed:
# batch size per device
batch_size: 1
dataset:
dataset_dir: dataset/coco
image_dir: train2017
annotation: annotations/instances_train2017.json
batch_transforms:
- !PadBatch
pad_to_stride: 32
num_workers: 2
MaskRCNNEvalFeed:
batch_size: 1
dataset:
dataset_dir: dataset/coco
annotation: annotations/instances_val2017.json
image_dir: val2017
batch_transforms:
- !PadBatch
pad_to_stride: 32
num_workers: 2
MaskRCNNTestFeed:
batch_size: 1
dataset:
annotation: annotations/instances_val2017.json
batch_transforms:
- !PadBatch
pad_to_stride: 32
num_workers: 2
architecture: RetinaNet
train_feed: FasterRCNNTrainFeed
eval_feed: FasterRCNNEvalFeed
test_feed: FasterRCNNTestFeed
max_iters: 90000
use_gpu: true
pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_pretrained.tar
weights: output/retinanet_r101_fpn_1x/model_final
log_smooth_window: 20
snapshot_iter: 10000
metric: COCO
save_dir: output
RetinaNet:
backbone: ResNet
fpn: FPN
retina_head: RetinaHead
ResNet:
norm_type: affine_channel
norm_decay: 0.
depth: 101
feature_maps: [3, 4, 5]
freeze_at: 2
FPN:
max_level: 7
min_level: 3
num_chan: 256
spatial_scale: [0.03125, 0.0625, 0.125]
has_extra_convs: true
RetinaHead:
num_convs_per_octave: 4
num_chan: 256
max_level: 7
min_level: 3
prior_prob: 0.01
base_scale: 4
num_scales_per_octave: 3
num_classes: 81
anchor_generator:
aspect_ratios: [1.0, 2.0, 0.5]
variance: [1.0, 1.0, 1.0, 1.0]
target_assign:
positive_overlap: 0.5
negative_overlap: 0.4
gamma: 2.0
alpha: 0.25
sigma: 3.0151134457776365
output_decoder:
score_thresh: 0.05
nms_thresh: 0.5
pre_nms_top_n: 1000
detections_per_im: 100
nms_eta: 1.0
LearningRate:
base_lr: 0.01
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [60000, 80000]
- !LinearWarmup
start_factor: 0.3333333333333333
steps: 500
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0001
type: L2
FasterRCNNTrainFeed:
batch_size: 2
batch_transforms:
- !PadBatch
pad_to_stride: 128
dataset:
dataset_dir: data/coco
annotation: annotations/instances_train2017.json
image_dir: train2017
num_workers: 2
FasterRCNNEvalFeed:
batch_size: 2
batch_transforms:
- !PadBatch
pad_to_stride: 128
dataset:
dataset_dir: data/coco
annotation: annotations/instances_val2017.json
image_dir: val2017
num_workers: 2
FasterRCNNTestFeed:
batch_size: 1
batch_transforms:
- !PadBatch
pad_to_stride: 128
dataset:
annotation: annotations/instances_val2017.json
num_workers: 2
architecture: RetinaNet
train_feed: FasterRCNNTrainFeed
eval_feed: FasterRCNNEvalFeed
test_feed: FasterRCNNTestFeed
max_iters: 90000
use_gpu: true
pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar
weights: output/retinanet_r50_fpn_1x/model_final
log_smooth_window: 20
snapshot_iter: 10000
metric: COCO
save_dir: output
RetinaNet:
backbone: ResNet
fpn: FPN
retina_head: RetinaHead
ResNet:
norm_type: affine_channel
norm_decay: 0.
depth: 50
feature_maps: [3, 4, 5]
freeze_at: 2
FPN:
max_level: 7
min_level: 3
num_chan: 256
spatial_scale: [0.03125, 0.0625, 0.125]
has_extra_convs: true
RetinaHead:
num_convs_per_octave: 4
num_chan: 256
max_level: 7
min_level: 3
prior_prob: 0.01
base_scale: 4
num_scales_per_octave: 3
num_classes: 81
anchor_generator:
aspect_ratios: [1.0, 2.0, 0.5]
variance: [1.0, 1.0, 1.0, 1.0]
target_assign:
positive_overlap: 0.5
negative_overlap: 0.4
gamma: 2.0
alpha: 0.25
sigma: 3.0151134457776365
output_decoder:
score_thresh: 0.05
nms_thresh: 0.5
pre_nms_top_n: 1000
detections_per_im: 100
nms_eta: 1.0
LearningRate:
base_lr: 0.01
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [60000, 80000]
- !LinearWarmup
start_factor: 0.3333333333333333
steps: 500
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0001
type: L2
FasterRCNNTrainFeed:
batch_size: 2
dataset:
dataset_dir: dataset/coco
annotation: annotations/instances_train2017.json
image_dir: train2017
batch_transforms:
- !PadBatch
pad_to_stride: 128
num_workers: 2
FasterRCNNEvalFeed:
batch_size: 2
dataset:
dataset_dir: dataset/coco
annotation: annotations/instances_val2017.json
image_dir: val2017
batch_transforms:
- !PadBatch
pad_to_stride: 128
num_workers: 2
FasterRCNNTestFeed:
batch_size: 1
dataset:
annotation: annotations/instances_val2017.json
batch_transforms:
- !PadBatch
pad_to_stride: 128
num_workers: 2
architecture: SSD
max_iters: 28000
train_feed: SSDTrainFeed
eval_feed: SSDEvalFeed
test_feed: SSDTestFeed
pretrain_weights: ./ssd3/
use_gpu: true
snapshot_iter: 2000
log_smooth_window: 1
metric: VOC
save_dir: output
weights: output/ssd_mobilenet_v1_voc/model_final/
SSD:
backbone: MobileNet
multi_box_head: MultiBoxHead
num_classes: 21
metric:
ap_version: 11point
evaluate_difficult: false
overlap_threshold: 0.5
output_decoder:
background_label: 0
keep_top_k: 200
nms_eta: 1.0
nms_threshold: 0.45
nms_top_k: 400
score_threshold: 0.01
MobileNet:
norm_decay: 0.
conv_group_scale: 1
extra_block_filters: [[256, 512], [128, 256], [128, 256], [64, 128]]
with_extra_blocks: true
MultiBoxHead:
aspect_ratios: [[2.], [2., 3.], [2., 3.], [2., 3.], [2., 3.], [2., 3.]]
base_size: 300
flip: true
max_ratio: 90
max_sizes: [[], 150.0, 195.0, 240.0, 285.0, 300.0]
min_ratio: 20
min_sizes: [60.0, 105.0, 150.0, 195.0, 240.0, 285.0]
offset: 0.5
LearningRate:
schedulers:
- !PiecewiseDecay
milestones: [10000, 15000, 20000, 25000]
values: [0.001, 0.0005, 0.00025, 0.0001, 0.00001]
OptimizerBuilder:
optimizer:
momentum: 0.0
type: RMSPropOptimizer
regularizer:
factor: 0.00005
type: L2
SSDTrainFeed:
batch_size: 32
use_process: true
dataset:
dataset_dir: dataset/voc
annotation: VOCdevkit/VOC_all/ImageSets/Main/train.txt
image_dir: VOCdevkit/VOC_all/JPEGImages
use_default_label: true
SSDEvalFeed:
batch_size: 64
use_process: true
dataset:
dataset_dir: dataset/voc
annotation: VOCdevkit/VOC_all/ImageSets/Main/val.txt
image_dir: VOCdevkit/VOC_all/JPEGImages
use_default_label: true
drop_last: false
SSDTestFeed:
batch_size: 1
dataset:
use_default_label: true
drop_last: false
architecture: YOLOv3
train_feed: YoloTrainFeed
eval_feed: YoloEvalFeed
test_feed: YoloTestFeed
use_gpu: true
max_iters: 500200
log_smooth_window: 20
save_dir: output
snapshot_iter: 2000
metric: COCO
pretrain_weights: https://paddlemodels.bj.bcebos.com/yolo/darknet53.tar.gz
weights: https://paddlemodels.bj.bcebos.com/yolo/yolov3.tar.gz
YOLOv3:
backbone: DarkNet
yolo_head: YOLOv3Head
DarkNet:
norm_type: sync_bn
norm_decay: 0.
depth: 53
YOLOv3Head:
anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
anchors: [[10, 13], [16, 30], [33, 23],
[30, 61], [62, 45], [59, 119],
[116, 90], [156, 198], [373, 326]]
norm_decay: 0.
ignore_thresh: 0.7
label_smooth: true
nms:
background_label: -1
keep_top_k: 100
nms_threshold: 0.45
nms_top_k: 1000
normalized: false
score_threshold: 0.01
num_classes: 80
LearningRate:
base_lr: 0.001
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones:
- 400000
- 450000
- !LinearWarmup
start_factor: 0.
steps: 4000
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0005
type: L2
YoloTrainFeed:
batch_size: 8
dataset:
dataset_dir: dataset/coco
annotation: annotations/instances_train2017.json
image_dir: train2017
num_workers: 8
bufsize: 128
use_process: true
YoloEvalFeed:
batch_size: 8
dataset:
dataset_dir: dataset/coco
annotation: annotations/instances_val2017.json
image_dir: val2017
YoloTestFeed:
batch_size: 1
dataset:
annotation: annotations/instances_val2017.json
architecture: YOLOv3
train_feed: YoloTrainFeed
eval_feed: YoloEvalFeed
test_feed: YoloTestFeed
use_gpu: true
max_iters: 500200
log_smooth_window: 20
save_dir: output
snapshot_iter: 2000
metric: COCO
pretrain_weights: http://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV1_pretrained.tar
weights: https://paddlemodels.bj.bcebos.com/yolo/yolo_mobilenet1.0.tar.gz
YOLOv3:
backbone: MobileNet
yolo_head: YOLOv3Head
MobileNet:
norm_type: sync_bn
norm_decay: 0.
conv_group_scale: 1
with_extra_blocks: false
YOLOv3Head:
anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
anchors: [[10, 13], [16, 30], [33, 23],
[30, 61], [62, 45], [59, 119],
[116, 90], [156, 198], [373, 326]]
norm_decay: 0.
ignore_thresh: 0.7
label_smooth: true
nms:
background_label: -1
keep_top_k: 100
nms_threshold: 0.45
nms_top_k: 1000
normalized: false
score_threshold: 0.01
num_classes: 80
LearningRate:
base_lr: 0.001
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones:
- 400000
- 450000
- !LinearWarmup
start_factor: 0.
steps: 4000
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0005
type: L2
YoloTrainFeed:
batch_size: 8
dataset:
dataset_dir: dataset/coco
annotation: annotations/instances_train2017.json
image_dir: train2017
num_workers: 8
bufsize: 128
use_process: true
YoloEvalFeed:
batch_size: 8
dataset:
dataset_dir: dataset/coco
annotation: annotations/instances_val2017.json
image_dir: val2017
YoloTestFeed:
batch_size: 1
dataset:
annotation: annotations/instances_val2017.json
architecture: YOLOv3
train_feed: YoloTrainFeed
eval_feed: YoloEvalFeed
test_feed: YoloTestFeed
use_gpu: true
max_iters: 500200
log_smooth_window: 20
save_dir: output
snapshot_iter: 2000
metric: COCO
pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet34_pretrained.tar
weights: https://paddlemodels.bj.bcebos.com/yolo/yolo_resnet34.tar.gz
YOLOv3:
backbone: ResNet
yolo_head: YOLOv3Head
ResNet:
norm_type: sync_bn
freeze_at: 0
freeze_norm: false
norm_decay: 0.
depth: 34
feature_maps: [3, 4, 5]
YOLOv3Head:
anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
anchors: [[10, 13], [16, 30], [33, 23],
[30, 61], [62, 45], [59, 119],
[116, 90], [156, 198], [373, 326]]
norm_decay: 0.
ignore_thresh: 0.7
label_smooth: true
nms:
background_label: -1
keep_top_k: 100
nms_threshold: 0.45
nms_top_k: 1000
normalized: false
score_threshold: 0.01
num_classes: 80
LearningRate:
base_lr: 0.001
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones:
- 400000
- 450000
- !LinearWarmup
start_factor: 0.
steps: 4000
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0005
type: L2
YoloTrainFeed:
batch_size: 8
dataset:
dataset_dir: dataset/coco
annotation: annotations/instances_train2017.json
image_dir: train2017
num_workers: 8
bufsize: 128
use_process: true
YoloEvalFeed:
batch_size: 8
dataset:
dataset_dir: dataset/coco
annotation: annotations/instances_val2017.json
image_dir: val2017
YoloTestFeed:
batch_size: 1
dataset:
annotation: annotations/instances_val2017.json
DIR="$( cd "$(dirname "$0")" ; pwd -P )"
cd "$DIR"
# Download the data.
echo "Downloading..."
wget http://images.cocodataset.org/zips/train2014.zip
wget http://images.cocodataset.org/zips/val2014.zip
wget http://images.cocodataset.org/zips/train2017.zip
wget http://images.cocodataset.org/zips/val2017.zip
wget http://images.cocodataset.org/annotations/annotations_trainval2014.zip
wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip
# Extract the data.
echo "Extracting..."
unzip train2014.zip
unzip val2014.zip
unzip train2017.zip
unzip val2017.zip
unzip annotations_trainval2014.zip
unzip annotations_trainval2017.zip
DIR="$( cd "$(dirname "$0")" ; pwd -P )"
cd "$DIR"
# Download the data.
echo "Downloading..."
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar
# Extract the data.
echo "Extracting..."
tar -xf VOCtrainval_11-May-2012.tar
tar -xf VOCtrainval_06-Nov-2007.tar
tar -xf VOCtest_06-Nov-2007.tar
echo "Creating data lists..."
python -c 'from ppdet.utils.voc_utils import merge_and_create_list; merge_and_create_list("VOCdevkit", ["2007", "2012"], "VOCdevkit/VOC_all")'
因为 它太大了无法显示 source diff 。你可以改为 查看blob
# Introduction
PaddleDetection takes a rather principled approach to configuration management. We aim to automate the configuration workflow and to reduce configuration errors.
# Rationale
Presently, configuration in mainstream frameworks are usually dictionary based: the global config is simply a giant, loosely defined Python dictionary.
This approach is error prone, e.g., misspelled or displaced keys may lead to serious errors in training process, causing time loss and wasted resources.
To avoid the common pitfalls, with automation and static analysis in mind, we propose a configuration design that is user friendly, easy to maintain and extensible.
# Design
The design utilizes some of Python's reflection mechanism to extract configuration schematics from Python class definitions.
To be specific, it extracts information from class constructor arguments, including names, docstrings, default values, data types (if type hints are available).
This approach advocates modular and testable design, leading to a unified and extensible code base.
## API
Most of the functionality is exposed in `ppdet.core.workspace` module.
- `register`: This decorator register a class as configurable module; it understands several special annotations in the class definition.
- `__category__`: For better organization, modules are classified into categories.
- `__inject__`: A list of constructor arguments, which are intended to take module instances as input, module instances will be created at runtime an injected. The corresponding configuration value can be a class name string, a serialized object, a config key pointing to a serialized object, or a dict (in which case the constructor needs to handle it, see example below).
- `__op__`: Shortcut for wrapping PaddlePaddle operators into a callable objects, together with `__append_doc__` (extracting docstring from target PaddlePaddle operator automatically), this can be a real time saver.
- `serializable`: This decorator make a class directly serializable in yaml config file, by taking advantage of [pyyaml](https://pyyaml.org/wiki/PyYAMLDocumentation)'s serialization mechanism.
- `create`: Constructs a module instance according to global configuration.
- `load_config` and `merge_config`: Loading yaml file and merge config settings from command line.
## Example
Take the `RPNHead` module for example, it is composed of several PaddlePaddle operators. We first wrap those operators into classes, then pass in instances of these classes when instantiating the `RPNHead` module.
```python
# excerpt from `ppdet/modeling/ops.py`
from ppdet.core.workspace import register, serializable
# ... more operators
@register
@serializable
class GenerateProposals(object):
# NOTE this class simply wraps a PaddlePaddle operator
__op__ = fluid.layers.generate_proposals
# NOTE docstring for args are extracted from PaddlePaddle OP
__append_doc__ = True
def __init__(self,
pre_nms_top_n=6000,
post_nms_top_n=1000,
nms_thresh=.5,
min_size=.1,
eta=1.):
super(GenerateProposals, self).__init__()
self.pre_nms_top_n = pre_nms_top_n
self.post_nms_top_n = post_nms_top_n
self.nms_thresh = nms_thresh
self.min_size = min_size
self.eta = eta
# ... more operators
# excerpt from `ppdet/modeling/anchor_heads/rpn_head.py`
from ppdet.core.workspace import register
from ppdet.modeling.ops import AnchorGenerator, RPNTargetAssign, GenerateProposals
@register
class RPNHead(object):
"""
RPN Head
Args:
anchor_generator (object): `AnchorGenerator` instance
rpn_target_assign (object): `RPNTargetAssign` instance
train_proposal (object): `GenerateProposals` instance for training
test_proposal (object): `GenerateProposals` instance for testing
"""
__inject__ = [
'anchor_generator', 'rpn_target_assign', 'train_proposal',
'test_proposal'
]
def __init__(self,
anchor_generator=AnchorGenerator().__dict__,
rpn_target_assign=RPNTargetAssign().__dict__,
train_proposal=GenerateProposals(12000, 2000).__dict__,
test_proposal=GenerateProposals().__dict__):
super(RPNHead, self).__init__()
self.anchor_generator = anchor_generator
self.rpn_target_assign = rpn_target_assign
self.train_proposal = train_proposal
self.test_proposal = test_proposal
if isinstance(anchor_generator, dict):
self.anchor_generator = AnchorGenerator(**anchor_generator)
if isinstance(rpn_target_assign, dict):
self.rpn_target_assign = RPNTargetAssign(**rpn_target_assign)
if isinstance(train_proposal, dict):
self.train_proposal = GenerateProposals(**train_proposal)
if isinstance(test_proposal, dict):
self.test_proposal = GenerateProposals(**test_proposal)
```
The corresponding(generated) YAML snippet is as follows, note this is the configuration in **FULL**, all the default values can be omitted. In case of the above example, all arguments have default value, meaning nothing is required in the config file.
```yaml
RPNHead:
test_prop:
eta: 1.0
min_size: 0.1
nms_thresh: 0.5
post_nms_top_n: 1000
pre_nms_top_n: 6000
train_prop:
eta: 1.0
min_size: 0.1
nms_thresh: 0.5
post_nms_top_n: 2000
pre_nms_top_n: 12000
anchor_generator:
# ...
rpn_target_assign:
# ...
```
Example snippet that make use of the `RPNHead` module.
```python
from ppdet.core.worskspace import load_config, merge_config, create
load_config('some_config_file.yml')
merge_config(more_config_options_from_command_line)
rpn_head = create('RPNHead')
# ... code that use the created module!
```
Configuration file can also have serialized objects in it, denoted with `!`, for example
```yaml
LearningRate:
base_lr: 0.01
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [60000, 80000]
- !LinearWarmup
start_factor: 0.3333333333333333
steps: 500
```
# Requirements
Two Python packages are used, both are optional.
- [typeguard](https://github.com/agronholm/typeguard) is used for type checking in Python 3.
- [docstring\_parser](https://github.com/rr-/docstring_parser) is needed for docstring parsing.
To install them, simply run:
```shell
pip install typeguard http://github.com/willthefrog/docstring_parser/tarball/master
```
# Tooling
A small utility (`tools/configure.py`) is included to simplify the configuration process, it provides 4 commands to walk users through the configuration process:
1. `list`: List currently registered modules by category, one can also specify which category to list with the `--category` flag.
2. `help`: Get help information for a module, including description, options, configuration template and example command line flags.
3. `analyze`: Check configuration file for missing/extraneous options, options with mismatch type (if type hint is given) and missing dependencies, it also highlights user provided values (overridden default values).
4. `generate`: Generate a configuration template for a given list of modules. By default it generates a complete configuration file, which can be quite verbose; if a `--minimal` flag is given, it generates a template that only contain non optional settings. For example, to generate a configuration for Faster R-CNN architecture with `ResNet` backbone and `FPN`, run:
```shell
python tools/configure.py generate FasterRCNN ResNet RPNHead RoIAlign BBoxAssigner BBoxHead FasterRCNNTrainFeed FasterRCNNTestFeed LearningRate OptimizerBuilder
```
For a minimal version, run:
```shell
python tools/configure.py --minimal generate FasterRCNN BBoxHead
```
## Introduction
The data pipeline is responsible for loading and converting data. Each
resulting data sample is a tuple of np.ndarrays.
For example, Faster R-CNN training uses samples of this format: `[(im,
im_info, im_id, gt_bbox, gt_class, is_crowd), (...)]`.
### Implementation
The data pipeline consists of four sub-systems: data parsing, image
pre-processing, data conversion and data feeding APIs.
Data samples are collected to form `dataset.Dataset`s, usually 3 sets are
needed for training, validation, and testing respectively.
First, `dataset.source` loads the data files into memory, then
`dataset.transform` processes them, and lastly, the batched samples
are fetched by `dataset.Reader`.
Sub-systems details:
1. Data parsing
Parses various data sources and creates `dataset.Dataset` instances. Currently,
following data sources are supported:
- COCO data source
Loads `COCO` type datasets with directory structures like this:
```
data/coco/
├── annotations
│ ├── instances_train2017.json
│ ├── instances_val2017.json
| ...
├── train2017
│ ├── 000000000009.jpg
│ ├── 000000580008.jpg
| ...
├── val2017
│ ├── 000000000139.jpg
│ ├── 000000000285.jpg
| ...
```
- Pascal VOC data source
Loads `Pascal VOC` like datasets with directory structure like this:
```
data/pascalvoc/
├──Annotations
│ ├── i000050.jpg
│ ├── 003876.xml
| ...
├── ImageSets
│ ├──Main
└── train.txt
└── val.txt
└── test.txt
└── dog_train.txt
└── dog_trainval.txt
└── dog_val.txt
└── dog_test.txt
└── ...
│ ├──Layout
└──...
│ ├── Segmentation
└──...
├── JPEGImages
│ ├── 000050.jpg
│ ├── 003876.jpg
| ...
```
- Roidb data source
A generalized data source serialized as pickle files, which have the following
structure:
```python
(records, cname2id)
# `cname2id` is a `dict` which maps category name to class IDs
# and `records` is a list of dict of this structure:
{
'im_file': im_fname, # image file name
'im_id': im_id, # image ID
'h': im_h, # height of image
'w': im_w, # width of image
'is_crowd': is_crowd, # crowd marker
'gt_class': gt_class, # ground truth class
'gt_bbox': gt_bbox, # ground truth bounding box
'gt_poly': gt_poly, # ground truth segmentation
}
```
We provide a tool to generate roidb data sources. To convert `COCO` or `VOC`
like dataset, run this command:
```sh
# --type: the type of original data (xml or json)
# --annotation: the path of file, which contains the name of annotation files
# --save-dir: the save path
# --samples: the number of samples (default is -1, which mean all datas in dataset)
python ./tools/generate_data_for_training.py
--type=json \
--annotation=./annotations/instances_val2017.json \
--save-dir=./roidb \
--samples=-1
```
2. Image preprocessing
the `dataset.transform.operator` module provides operations such as image
decoding, expanding, cropping, etc. Multiple operators are combined to form
larger processing pipelines.
3. Data transformer
Transform a `dataset.Dataset` to achieve various desired effects, Notably: the
`dataset.transform.paralle_map` transformer accelerates image processing with
multi-threads or multi-processes. More transformers can be found in
`dataset.transform.transformer`.
4. Data feeding apis
To facilitate data pipeline building, we combine multiple `dataset.Dataset` to
form a `dataset.Reader` which can provide data for training, validation and
testing respectively. Users can simply call `Reader.[train|eval|infer]` to get
the corresponding data stream. Many aspect of the `Reader`, such as storage
location, preprocessing pipeline, acceleration mode can be configured with yaml
files.
The main APIs are as follows:
1. Data parsing
- `source/coco_loader.py`: COCO dataset parser. [source](https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/object_detection/ppdet/data/source/coco_loader.py)
- `source/voc_loader.py`: Pascal VOC dataset parser. [source](https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/object_detection/ppdet/data/source/voc_loader.py)
[Note] To use a non-default label list for VOC datasets, a `label_list.txt`
file is needed, one can use the provided label list
(`data/pascalvoc/ImageSets/Main/label_list.txt`) or generate a custom one (with `tools/generate_data_for_training.py`). Also, `use_default_label` option should
be set to `false` in the configuration file
- `source/loader.py`: Roidb dataset parser. [source](https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/object_detection/ppdet/data/source/loader.py)
2. Operator
`transform/operators.py`: Contains a variety of data enhancement methods, including:
- `DecodeImage`: Read images in RGB format.
- `RandomFlipImage`: Horizontal flip.
- `RandomDistort`: Distort brightness, contrast, saturation, and hue.
- `ResizeImage`: Resize image with interpolation.
- `RandomInterpImage`: Use a random interpolation method to resize the image.
- `CropImage`: Crop image with respect to different scale, aspect ratio, and overlap.
- `ExpandImage`: Pad image to a larger size, padding filled with mean image value.
- `NormalizeImage`: Normalize image pixel values.
- `NormalizeBox`: Normalize the bounding box.
- `Permute`: Arrange the channels of the image and optionally convert image to BGR format.
- `MixupImage`: Mixup two images with given fraction<sup>[1](#vd)</sup>.
<a name="mix">[1]</a> Please refer to [this paper](https://arxiv.org/pdf/1710.09412.pdf)
`transform/arrange_sample.py`: Assemble the data samples needed by different models.
3. Transformer
`transform/post_map.py`: Transformations that operates on whole batches, mainly for:
- Padding whole batch to given stride values
- Resize images to Multi-scales
- Randomly adjust the image size of the batch data
`transform/transformer.py`: Data filtering batching.
`transform/parallel_map.py`: Accelerate data processing with multi-threads/multi-processes.
4. Reader
`reader.py`: Combine source and transforms, return batch data according to `max_iter`.
`data_feed.py`: Configure default parameters for `reader.py`.
### Usage
#### Canned Datasets
Preset for common datasets, e.g., `MS-COCO` and `Pascal Voc` are included. In
most cases, user can simply use these canned dataset as is. Moreover, the
whole data pipeline is fully customizable through the yaml configuration files.
#### Custom Datasets
- Option 1: Convert the dataset to COCO or VOC format.
```sh
# a small utility (`tools/labelme2coco.py`) is provided to convert
# Labelme-annotated dataset to COCO format.
python ./tools/labelme2coco.py --json_input_dir ./labelme_annos/
--image_input_dir ./labelme_imgs/
--output_dir ./cocome/
--train_proportion 0.8
--val_proportion 0.2
--test_proportion 0.0
# --json_input_dir:The path of json files which are annotated by Labelme.
# --image_input_dir:The path of images.
# --output_dir:The path of coverted COCO dataset.
# --train_proportion:The train proportion of annatation data.
# --val_proportion:The validation proportion of annatation data.
# --test_proportion: The inference proportion of annatation data.
```
- Option 2:
1. Add `source/XX_loader.py` and implement the `load` function, following the
example of `source/coco_loader.py` and `source/voc_loader.py`.
2. Modify the `load` function in `source/loader.py` to make use of the newly
added data loader.
3. Modify `/source/__init__.py` accordingly.
```python
if data_cf['type'] in ['VOCSource', 'COCOSource', 'RoiDbSource']:
source_type = 'RoiDbSource'
# Replace the above code with the following code:
if data_cf['type'] in ['VOCSource', 'COCOSource', 'RoiDbSource', 'XXSource']:
source_type = 'RoiDbSource'
```
4. In the configure file, define the `type` of `dataset` as `XXSource`.
#### How to add data pre-processing?
- To add pre-processing operation for a single image, refer to the classes in
`transform/operators.py`, and implement the desired transformation with a new
class.
- To add pre-processing for a batch, one needs to modify the `build_post_map`
function in `transform/post_map.py`.
## 介绍
本模块是一个Python模块,用于加载数据并将其转换成适用于检测模型的训练、验证、测试所需要的格式——由多个np.ndarray组成的tuple数组,例如用于Faster R-CNN模型的训练数据格式为:`[(im, im_info, im_id, gt_bbox, gt_class, is_crowd), (...)]`
### 实现
该模块内部可分为4个子功能:数据解析、图片预处理、数据转换和数据获取接口。
我们采用`dataset.Dataset`表示一份数据,比如`COCO`数据包含3份数据,分别用于训练、验证和测试。原始数据存储与文件中,通过`dataset.source`加载到内存,然后使用`dataset.transform`对数据进行处理转换,最终通过`dataset.Reader`的接口可以获得用于训练、验证和测试的batch数据。
子功能介绍:
1. 数据解析
数据解析得到的是`dataset.Dataset`,实现逻辑位于`dataset.source`中。通过它可以实现解析不同格式的数据集,已支持的数据源包括:
- COCO数据源
该数据集目前分为COCO2012和COCO2017,主要由json文件和image文件组成,其组织结构如下所示:
```
data/coco/
├── annotations
│ ├── instances_train2014.json
│ ├── instances_train2017.json
│ ├── instances_val2014.json
│ ├── instances_val2017.json
| ...
├── train2017
│ ├── 000000000009.jpg
│ ├── 000000580008.jpg
| ...
├── val2017
│ ├── 000000000139.jpg
│ ├── 000000000285.jpg
| ...
```
- Pascal VOC数据源
该数据集目前分为VOC2007和VOC2012,主要由xml文件和image文件组成,其组织结构如下所示:
```
data/pascalvoc/
├──Annotations
│ ├── i000050.jpg
│ ├── 003876.xml
| ...
├── ImageSets
│ ├──Main
└── train.txt
└── val.txt
└── test.txt
└── dog_train.txt
└── dog_trainval.txt
└── dog_val.txt
└── dog_test.txt
└── ...
│ ├──Layout
└──...
│ ├── Segmentation
└──...
├── JPEGImages
│ ├── 000050.jpg
│ ├── 003876.jpg
| ...
```
- Roidb数据源
该数据集主要由COCO数据集和Pascal VOC数据集转换而成的pickle文件,包含一个dict,而dict中只包含一个命名为‘records’的list(可能还有一个命名为‘cname2cid’的字典),其内容如下所示:
```python
(records, catname2clsid)
'records'是一个list并且它的结构如下:
{
'im_file': im_fname, # 图像文件名
'im_id': im_id, # 图像id
'h': im_h, # 图像高度
'w': im_w, # 图像宽度
'is_crowd': is_crowd, # 是否重叠
'gt_class': gt_class, # 真实框类别
'gt_bbox': gt_bbox, # 真实框坐标
'gt_poly': gt_poly, # 多边形坐标
}
'cname2id'是一个dict保存了类别名到id的映射
```
我们在`./tools/`中提供了一个生成roidb数据集的代码,可以通过下面命令实现该功能。
```python
# --type: 原始数据集的类别(只能是xml或者json)
# --annotation: 一个包含所需标注文件名的文件的路径
# --save-dir: 保存路径
# --samples: sample的个数(默认是-1,代表使用所有sample)
python ./tools/generate_data_for_training.py
--type=json \
--annotation=./annotations/instances_val2017.json \
--save-dir=./roidb \
--samples=-1
```
2. 图片预处理
图片预处理通过包括图片解码、缩放、裁剪等操作,我们采用`dataset.transform.operator`算子的方式来统一实现,这样能方便扩展。此外,多个算子还可以组合形成复杂的处理流程, 并被`dataset.transformer`中的转换器使用,比如多线程完成一个复杂的预处理流程。
3. 数据转换器
数据转换器的功能是完成对某个`dataset.Dataset`进行转换处理,从而得到一个新的`dataset.Dataset`。我们采用装饰器模式实现各种不同的`dataset.transform.transformer`。比如用于多进程预处理的`dataset.transform.paralle_map`转换器。
4. 数据获取接口
为方便训练时的数据获取,我们将多个`dataset.Dataset`组合在一起构成一个`dataset.Reader`为用户提供数据,用户只需要调用`Reader.[train|eval|infer]`即可获得对应的数据流。`Reader`支持yaml文件配置数据地址、预处理过程、加速方式等。
主要的APIs如下:
1. 数据解析
- `source/coco_loader.py`:用于解析COCO数据集。[详见代码](https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/object_detection/ppdet/data/source/coco_loader.py)
- `source/voc_loader.py`:用于解析Pascal VOC数据集。[详见代码](https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/object_detection/ppdet/data/source/voc_loader.py)
[注意]在使用VOC数据集时,若不使用默认的label列表,则需要先使用`tools/generate_data_for_training.py`生成`label_list.txt`(使用方式与数据解析中的roidb数据集获取过程一致),或提供`label_list.txt`放置于`data/pascalvoc/ImageSets/Main`中;同时在配置文件中设置参数`use_default_label``true`
- `source/loader.py`:用于解析Roidb数据集。[详见代码](https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/object_detection/ppdet/data/source/loader.py)
2. 算子
`transform/operators.py`:包含多种数据增强方式,主要包括:
``` python
RandomFlipImage水平翻转
RandomDistort随机扰动图片亮度对比度饱和度和色相
ResizeImage根据特定的插值方式调整图像大小
RandomInterpImage使用随机的插值方式调整图像大小
CropImage根据缩放比例长宽比例两个参数生成若干候选框再依据这些候选框和标注框的面积交并比(IoU)挑选出符合要求的裁剪结果
ExpandImage将原始图片放进一张使用像素均值填充(随后会在减均值操作中减掉)的扩张图中再对此图进行裁剪缩放和翻转
DecodeImage以RGB格式读取图像
Permute对图像的通道进行排列并转为BGR格式
NormalizeImage对图像像素值进行归一化
NormalizeBox对bounding box进行归一化
MixupImage按比例叠加两张图像
```
[注意]:Mixup的操作可参考[论文](https://arxiv.org/pdf/1710.09412.pdf)
`transform/arrange_sample.py`:实现对输入网络数据的排序。
3. 转换
`transform/post_map.py`:用于完成批数据的预处理操作,其主要包括:
``` python
随机调整批数据的图像大小
多尺度调整图像大小
padding操作
```
`transform/transformer.py`:用于过滤无用的数据,并返回批数据。
`transform/parallel_map.py`:用于实现加速。
4. 读取
`reader.py`:用于组合source和transformer操作,根据`max_iter`返回batch数据。
`data_feed.py`: 用于配置 `reader.py`中所需的默认参数.
### 使用
#### 常规使用
结合yaml文件中的配置信息,完成本模块的功能。yaml文件的使用可以参见配置文件部分。
- 读取用于训练的数据
``` python
ccfg = load_cfg('./config.yml')
coco = Reader(ccfg.DATA, ccfg.TRANSFORM, maxiter=-1)
```
#### 如何使用自定义数据集?
- 选择1:将数据集转换为VOC格式或者COCO格式。
```python
# 在./tools/中提供了labelme2coco.py用于将labelme标注的数据集转换为COCO数据集
python ./tools/labelme2coco.py --json_input_dir ./labelme_annos/
--image_input_dir ./labelme_imgs/
--output_dir ./cocome/
--train_proportion 0.8
--val_proportion 0.2
--test_proportion 0.0
# --json_input_dir:使用labelme标注的json文件所在文件夹
# --image_input_dir:图像文件所在文件夹
# --output_dir:转换后的COCO格式数据集存放位置
# --train_proportion:标注数据中用于train的比例
# --val_proportion:标注数据中用于validation的比例
# --test_proportion: 标注数据中用于infer的比例
```
- 选择2:
1. 仿照`./source/coco_loader.py``./source/voc_loader.py`,添加`./source/XX_loader.py`并实现`load`函数。
2.`./source/loader.py``load`函数中添加使用`./source/XX_loader.py`的入口。
3. 修改`./source/__init__.py`
```python
if data_cf['type'] in ['VOCSource', 'COCOSource', 'RoiDbSource']:
source_type = 'RoiDbSource'
# 将上述代码替换为如下代码:
if data_cf['type'] in ['VOCSource', 'COCOSource', 'RoiDbSource', 'XXSource']:
source_type = 'RoiDbSource'
```
4. 在配置文件中修改`dataset`下的`type``XXSource`
#### 如何增加数据预处理?
- 若增加单张图像的增强预处理,可在`transform/operators.py`中参考每个类的代码,新建一个类来实现新的数据增强;同时在配置文件中增加该预处理。
- 若增加单个batch的图像预处理,可在`transform/post_map.py`中参考`build_post_map`中每个函数的代码,新建一个内部函数来实现新的批数据预处理;同时在配置文件中增加该预处理。
# Getting Started
For setting up the test environment, please refer to [installation
instructions](INSTALL.md).
## Training
#### Single-GPU Training
```bash
export CUDA_VISIBLE_DEVICES=0
python tools/train.py -c configs/faster_rcnn_r50_1x.yml
```
#### Multi-GPU Training
```bash
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
python tools/train.py -c =configs/faster_rcnn_r50_1x.yml
```
- Datasets is stored in `dataset/coco` by default (configurable).
- Pretrained model is downloaded automatically and cached in `~/.cache/paddle/weights`.
- Model checkpoints is saved in `output` by default (configurable).
- To check out hyper parameters used, please refer to the config file.
Alternating between training epoch and evaluation run is possible, simply pass
in `--eval=True` to do so (tested with `SSD` detector on Pascal-VOC, not
recommended for two stage models or training sessions on COCO dataset)
## Evaluation
```bash
export CUDA_VISIBLE_DEVICES=0
# or run on CPU with:
# export CPU_NUM=1
python tools/eval.py -c configs/faster_rcnn_r50_1x.yml
```
- Checkpoint is loaded from `output` by default (configurable)
- Multi-GPU evaluation for R-CNN and SSD models is not supported at the
moment, but it is a planned feature
## Inference
- Run inference on a single image:
```bash
export CUDA_VISIBLE_DEVICES=0
# or run on CPU with:
# export CPU_NUM=1
python tools/infer.py -c configs/faster_rcnn_r50_1x.yml --infer_img=demo/000000570688.jpg
```
- Batch inference:
```bash
export CUDA_VISIBLE_DEVICES=0
# or run on CPU with:
# export CPU_NUM=1
python tools/infer.py -c configs/faster_rcnn_r50_1x.yml --infer_dir=demo
```
The visualization files are saved in `output` by default, to specify a different
path, simply add a `--save_file=` flag.
## FAQ
Q: Why do I get `NaN` loss values during single GPU training?
A: The default learning rate is tuned to multi-GPU training (8x GPUs), it must
be adapted for single GPU training accordingly (e.g., divide by 8).
# Installation
---
## Table of Contents
- [Introduction](#introduction)
- [PaddlePaddle](#paddlepaddle)
- [Other Dependencies](#other-dependencies)
- [PaddleDetection](#paddle-detection)
- [Datasets](#datasets)
## Introduction
This document covers how to install PaddleDetection, its dependencies
(including PaddlePaddle), together with COCO and PASCAL VOC dataset.
For general information about PaddleDetection, please see [README.md](../README.md).
## PaddlePaddle
Running PaddleDetection requires PaddlePaddle Fluid v.1.5 and later. please follow the instructions in [installation document](http://www.paddlepaddle.org/documentation/docs/en/1.4/beginners_guide/install/index_en.html).
Please make sure your PaddlePaddle installation was successful and the version
of your PaddlePaddle is not lower than required. Verify with the following commands.
```
# To check if PaddlePaddle installation was sucessful
python -c "from paddle.fluid import fluid; fluid.install_check.run_check()"
# To check PaddlePaddle version
python -c "import paddle; print(paddle.__version__)"
```
### Requirements:
- Python2 or Python3
- CUDA >= 8.0
- cuDNN >= 5.0
- nccl >= 2.1.2
## Other Dependencies
[COCO-API](https://github.com/cocodataset/cocoapi):
COCO-API is needed for training. Installation is as follows:
git clone https://github.com/cocodataset/cocoapi.git
cd cocoapi/PythonAPI
# if cython is not installed
pip install Cython
# Install into global site-packages
make install
# Alternatively, if you do not have permissions or prefer
# not to install the COCO API into global site-packages
python setup.py install --user
## PaddleDetection
**Clone Paddle models repository:**
You can clone Paddle models and change working directory to PaddleDetection
with the following commands:
```
cd <path/to/clone/models>
git clone https://github.com/PaddlePaddle/models
cd models/PaddleCV/object_detection
```
**Install Python dependencies:**
Required python packages are specified in [requirements.txt](./requirements.txt), and can be installed with:
```
pip install -r requirements.txt
```
**Make sure the tests pass:**
```
export PYTHONPATH=`pwd`:$PYTHONPATH
python ppdet/modeling/tests/test_architectures.py
```
## Datasets
PaddleDetection includes support for [MSCOCO](http://cocodataset.org) and [PASCAL VOC](http://host.robots.ox.ac.uk/pascal/VOC/) by default, please follow these instructions to set up the dataset.
**Create symlinks for local datasets:**
Default dataset path in config files is `data/coco` and `data/voc`, if the
datasets are already available on disk, you can simply create symlinks to
their directories:
```
ln -sf <path/to/coco> <path/to/paddle_detection>/data/coco
ln -sf <path/to/voc> <path/to/paddle_detection>/data/voc
```
**Download datasets manually:**
On the other hand, to download the datasets, run the following commands:
- MS-COCO
```
cd dataset/coco
./download.sh
```
- PASCAL VOC
```
cd dataset/voc
./download.sh
```
**Download datasets automatically:**
If a training session is started but the dataset is not setup properly (e.g,
not found in `data/coc` or `data/voc`), PaddleDetection can automatically
download them from [MSCOCO-2017](http://images.cocodataset.org) and
[VOC2012](http://host.robots.ox.ac.uk/pascal/VOC), the decompressed datasets
will be cached in `~/.cache/paddle/dataset/` and can be discovered automatically
subsequently.
**NOTE:** For further informations on the datasets, please see [DATA.md](DATA.md)
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import ppdet.modeling
import ppdet.optimizer
import ppdet.data
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import print_function
from __future__ import division
import inspect
import importlib
import re
try:
from docstring_parser import parse as doc_parse
except Exception:
def doc_parse(*args):
if not doc_parse.__warning_sent__:
from ppdet.utils.cli import ColorTTY
color_tty = ColorTTY()
message = "docstring_parser is not installed, " \
+ "argument description is not available"
print(color_tty.yellow(message))
doc_parse.__warning_sent__ = True
doc_parse.__warning_sent__ = False
try:
from typeguard import check_type
except Exception:
def check_type(*args):
if not check_type.__warning_sent__:
from ppdet.utils.cli import ColorTTY
color_tty = ColorTTY()
message = "typeguard is not installed, type checking is not available"
print(color_tty.yellow(message))
check_type.__warning_sent__ = True
check_type.__warning_sent__ = False
__all__ = ['SchemaValue', 'SchemaDict', 'extract_schema']
class SchemaValue(object):
def __init__(self, name, doc='', type=None):
super(SchemaValue, self).__init__()
self.name = name
self.doc = doc
self.type = type
def set_default(self, value):
self.default = value
def has_default(self):
return hasattr(self, 'default')
class SchemaDict(dict):
def __init__(self, **kwargs):
super(SchemaDict, self).__init__()
self.schema = {}
self.strict = False
self.doc = ""
self.update(kwargs)
def __setitem__(self, key, value):
# XXX also update regular dict to SchemaDict??
if isinstance(value, dict) and key in self and isinstance(self[key],
SchemaDict):
self[key].update(value)
else:
super(SchemaDict, self).__setitem__(key, value)
def __missing__(self, key):
if self.has_default(key):
return self.schema[key].default
elif key in self.schema:
return self.schema[key]
else:
raise KeyError(key)
def copy(self):
newone = SchemaDict()
newone.__dict__.update(self.__dict__)
newone.update(self)
return newone
def set_schema(self, key, value):
assert isinstance(value, SchemaValue)
self.schema[key] = value
def set_strict(self, strict):
self.strict = strict
def has_default(self, key):
return key in self.schema and self.schema[key].has_default()
def is_default(self, key):
if not self.has_default(key):
return False
if hasattr(self[key], '__dict__'):
return True
else:
return key not in self or self[key] == self.schema[key].default
def find_default_keys(self):
return [
k for k in list(self.keys()) + list(self.schema.keys())
if self.is_default(k)
]
def mandatory(self):
return any([k for k in self.schema.keys() if not self.has_default(k)])
def find_missing_keys(self):
missing = [
k for k in self.schema.keys()
if k not in self and not self.has_default(k)
]
placeholders = [k for k in self if self[k] in ('<missing>', '<value>')]
return missing + placeholders
def find_extra_keys(self):
return list(set(self.keys()) - set(self.schema.keys()))
def find_mismatch_keys(self):
mismatch_keys = []
for arg in self.schema.values():
if arg.type is not None:
try:
check_type("{}.{}".format(self.name, arg.name),
self[arg.name], arg.type)
except Exception:
mismatch_keys.append(arg.name)
return mismatch_keys
def validate(self):
missing_keys = self.find_missing_keys()
if missing_keys:
raise ValueError("Missing param for class<{}>: {}".format(
self.name, ", ".join(missing_keys)))
extra_keys = self.find_extra_keys()
if extra_keys and self.strict:
raise ValueError("Extraneous param for class<{}>: {}".format(
self.name, ", ".join(extra_keys)))
mismatch_keys = self.find_mismatch_keys()
if mismatch_keys:
raise TypeError("Wrong param type for class<{}>: {}".format(
self.name, ", ".join(mismatch_keys)))
def extract_schema(cls):
"""
Extract schema from a given class
Args:
cls (type): Class from which to extract.
Returns:
schema (SchemaDict): Extracted schema.
"""
ctor = cls.__init__
# python 2 compatibility
if hasattr(inspect, 'getfullargspec'):
argspec = inspect.getfullargspec(ctor)
annotations = argspec.annotations
has_kwargs = argspec.varkw is not None
else:
argspec = inspect.getargspec(ctor)
# python 2 type hinting workaround, see pep-3107
# however, since `typeguard` does not support python 2, type checking
# is still python 3 only for now
annotations = getattr(ctor, '__annotations__', {})
has_kwargs = argspec.keywords is not None
names = [arg for arg in argspec.args if arg != 'self']
defaults = argspec.defaults
num_defaults = argspec.defaults is not None and len(argspec.defaults) or 0
num_required = len(names) - num_defaults
docs = cls.__doc__
if docs is None and getattr(cls, '__category__', None) == 'op':
docs = cls.__call__.__doc__
docstring = doc_parse(docs)
if docstring is None:
comments = {}
else:
comments = {}
for p in docstring.params:
match_obj = re.match('^([a-zA-Z_]+[a-zA-Z_0-9]*).*', p.arg_name)
if match_obj is not None:
comments[match_obj.group(1)] = p.description
schema = SchemaDict()
schema.name = cls.__name__
schema.doc = ""
if docs is not None:
start_pos = docs[0] == '\n' and 1 or 0
schema.doc = docs[start_pos:].split("\n")[0].strip()
# XXX handle paddle's weird doc convention
if '**' == schema.doc[:2] and '**' == schema.doc[-2:]:
schema.doc = schema.doc[2:-2].strip()
schema.category = hasattr(cls, '__category__') and getattr(
cls, '__category__') or 'module'
schema.strict = not has_kwargs
schema.pymodule = importlib.import_module(cls.__module__)
schema.inject = getattr(cls, '__inject__', [])
for idx, name in enumerate(names):
comment = name in comments and comments[name] or name
if name in schema.inject:
type_ = None
else:
type_ = name in annotations and annotations[name] or None
value_schema = SchemaValue(name, comment, type_)
if idx >= num_required:
value_schema.set_default(defaults[idx - num_required])
schema.set_schema(name, value_schema)
return schema
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import importlib
import inspect
import yaml
__all__ = ['serializable', 'Callable']
def _make_python_constructor(cls):
def python_constructor(loader, node):
if isinstance(node, yaml.SequenceNode):
args = loader.construct_sequence(node, deep=True)
return cls(*args)
else:
kwargs = loader.construct_mapping(node, deep=True)
try:
return cls(**kwargs)
except Exception as ex:
print("Error when construct {} instance from yaml config".
format(cls.__name__))
raise ex
return python_constructor
def _make_python_representer(cls):
# python 2 compatibility
if hasattr(inspect, 'getfullargspec'):
argspec = inspect.getfullargspec(cls)
else:
argspec = inspect.getargspec(cls.__init__)
argnames = [arg for arg in argspec.args if arg != 'self']
def python_representer(dumper, obj):
if argnames:
data = {name: getattr(obj, name) for name in argnames}
else:
data = obj.__dict__
if '_id' in data:
del data['_id']
return dumper.represent_mapping(u'!{}'.format(cls.__name__), data)
return python_representer
def serializable(cls):
"""
Add loader and dumper for given class, which must be "trivially serializable"
Args:
cls: class to be serialized
Returns: cls
"""
yaml.add_constructor(u'!{}'.format(cls.__name__),
_make_python_constructor(cls))
yaml.add_representer(cls, _make_python_representer(cls))
return cls
@serializable
class Callable(object):
"""
Helper to be used in Yaml for creating arbitrary class objects
Args:
full_type (str): the full module path to target function
"""
def __init__(self, full_type, args=[], kwargs={}):
super(Callable, self).__init__()
self.full_type = full_type
self.args = args
self.kwargs = kwargs
def __call__(self):
if '.' in self.full_type:
idx = self.full_type.rfind('.')
module = importlib.import_module(self.full_type[:idx])
func_name = self.full_type[idx + 1:]
else:
try:
module = importlib.import_module('builtins')
except Exception:
module = importlib.import_module('__builtin__')
func_name = self.full_type
func = getattr(module, func_name)
return func(*self.args, **self.kwargs)
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import print_function
from __future__ import division
import importlib
import os
import sys
import yaml
from .config.schema import SchemaDict, extract_schema
from .config.yaml_helpers import serializable
__all__ = [
'global_config', 'load_config', 'merge_config', 'get_registered_modules',
'create', 'register', 'serializable'
]
class AttrDict(dict):
"""Single level attribute dict, NOT recursive"""
def __init__(self, **kwargs):
super(AttrDict, self).__init__()
super(AttrDict, self).update(kwargs)
def __getattr__(self, key):
if key in self:
return self[key]
raise AttributeError("object has no attribute '{}'".format(key))
global_config = AttrDict()
def load_config(file_path):
"""
Load config from file.
Args:
file_path (str): Path of the config file to be loaded.
Returns: global config
"""
_, ext = os.path.splitext(file_path)
assert ext in ['.yml', '.yaml'], "only support yaml files for now"
merge_config(yaml.load(open(file_path), Loader=yaml.Loader))
return global_config
def merge_config(config):
"""
Merge config into global config.
Args:
config (dict): Config to be merged.
Returns: global config
"""
for key, value in config.items():
if isinstance(value, dict) and key in global_config:
global_config[key].update(value)
else:
global_config[key] = value
def get_registered_modules():
return {k: v for k, v in global_config.items() if isinstance(v, SchemaDict)}
def make_partial(cls):
op_module = importlib.import_module(cls.__op__.__module__)
op = getattr(op_module, cls.__op__.__name__)
cls.__category__ = getattr(cls, '__category__', None) or 'op'
def partial_apply(self, *args, **kwargs):
kwargs_ = self.__dict__.copy()
kwargs_.update(kwargs)
return op(*args, **kwargs_)
if getattr(cls, '__append_doc__', True): # XXX should default to True?
if sys.version_info[0] > 2:
cls.__doc__ = "Wrapper for `{}` OP".format(op.__name__)
cls.__init__.__doc__ = op.__doc__
cls.__call__ = partial_apply
cls.__call__.__doc__ = op.__doc__
else:
# XXX work around for python 2
partial_apply.__doc__ = op.__doc__
cls.__call__ = partial_apply
return cls
def register(cls):
"""
Register a given module class.
Args:
cls (type): Module class to be registered.
Returns: cls
"""
if cls.__name__ in global_config:
raise ValueError("Module class already registered: {}".format(
cls.__name__))
if hasattr(cls, '__op__'):
cls = make_partial(cls)
global_config[cls.__name__] = extract_schema(cls)
return cls
def create(cls_or_name, **kwargs):
"""
Create an instance of given module class.
Args:
cls_or_name (type or str): Class of which to create instance.
Returns: instance of type `cls_or_name`
"""
assert type(cls_or_name) in [type, str
], "should be a class or name of a class"
name = type(cls_or_name) == str and cls_or_name or cls_or_name.__name__
assert name in global_config and isinstance(global_config[name], SchemaDict), \
"the module {} is not registered".format(name)
config = global_config[name]
config.update(kwargs)
config.validate()
cls = getattr(config.pymodule, name)
kwargs = {}
kwargs.update(global_config[name])
if getattr(config, 'inject', None):
for k in config.inject:
target_key = global_config[name][k]
# optional dependency
if target_key is None:
continue
# also accept dictionaries and serialized objects
if isinstance(target_key, dict) or hasattr(target_key, '__dict__'):
continue
elif isinstance(target_key, str):
if target_key not in global_config:
raise ValueError("Missing injection config:", target_key)
target = global_config[target_key]
if isinstance(target, SchemaDict):
kwargs[k] = create(target_key)
elif hasattr(target, '__dict__'): # serialized object
kwargs[k] = target
else:
raise ValueError("Unsupported injection type:", target_key)
return cls(**kwargs)
docs/DATA.md
\ No newline at end of file
docs/DATA_cn.md
\ No newline at end of file
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# function:
# module to prepare data for detection model training
#
# implementation notes:
# - Dateset
# basic interface to accessing data samples in stream mode
#
# - xxxSource (RoiDbSource)
# * subclass of 'Dataset'
# * load data from local files and other source data
#
# - xxxOperator (DecodeImage)
# * subclass of 'BaseOperator'
# * each op can transform a sample, eg: decode/resize/crop image
# * each op must obey basic rules defined in transform.operator.base
#
# - transformer
# * subclass of 'Dataset'
# * 'MappedDataset' accept a 'xxxSource' and a list of 'xxxOperator'
# to build a transformed 'Dataset'
from __future__ import absolute_import
from .dataset import Dataset
from .reader import Reader
from .data_feed import create_reader
__all__ = ['Dataset', 'Reader', 'create_reader']
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import print_function
from __future__ import division
import os
import inspect
from ppdet.core.workspace import register, serializable
from ppdet.utils.download import get_dataset_path
from ppdet.data.reader import Reader
# XXX these are for triggering the decorator
from ppdet.data.transform.operators import (
DecodeImage, MixupImage, NormalizeBox, NormalizeImage, RandomDistort,
RandomFlipImage, RandomInterpImage, ResizeImage, ExpandImage, CropImage,
Permute)
from ppdet.data.transform.arrange_sample import (ArrangeRCNN, ArrangeTestRCNN,
ArrangeSSD, ArrangeTestSSD,
ArrangeYOLO, ArrangeTestYOLO)
__all__ = [
'PadBatch', 'MultiScale', 'RandomShape', 'DataSet', 'CocoDataSet',
'DataFeed', 'TrainFeed', 'EvalFeed', 'FasterRCNNTrainFeed',
'MaskRCNNTrainFeed', 'FasterRCNNTestFeed', 'MaskRCNNTestFeed',
'SSDTrainFeed', 'SSDEvalFeed', 'SSDTestFeed', 'YoloTrainFeed',
'YoloEvalFeed', 'YoloTestFeed', 'create_reader'
]
def create_reader(feed, max_iter=0):
"""
Return iterable data reader.
Args:
max_iter (int): number of iterations.
"""
# if `DATASET_DIR` does not exists, search ~/.paddle/dataset for a directory
# named `DATASET_DIR` (e.g., coco, pascal), if not present either, download
if feed.dataset.dataset_dir:
dataset_dir = get_dataset_path(feed.dataset.dataset_dir)
feed.dataset.annotation = os.path.join(dataset_dir,
feed.dataset.annotation)
feed.dataset.image_dir = os.path.join(dataset_dir,
feed.dataset.image_dir)
mixup_epoch = -1
if getattr(feed, 'mixup_epoch', None) is not None:
mixup_epoch = feed.mixup_epoch
bufsize = 10
use_process = False
if getattr(feed, 'bufsize', None) is not None:
bufsize = feed.bufsize
if getattr(feed, 'use_process', None) is not None:
use_process = feed.use_process
mode = feed.mode
data_config = {
mode: {
'ANNO_FILE': feed.dataset.annotation,
'IMAGE_DIR': feed.dataset.image_dir,
'USE_DEFAULT_LABEL': feed.dataset.use_default_label,
'IS_SHUFFLE': feed.shuffle,
'SAMPLES': feed.samples,
'WITH_BACKGROUND': feed.with_background,
'MIXUP_EPOCH': mixup_epoch,
'TYPE': type(feed.dataset).__source__
}
}
if len(getattr(feed.dataset, 'images', [])) > 0:
data_config[mode]['IMAGES'] = feed.dataset.images
transform_config = {
'WORKER_CONF': {
'bufsize': bufsize,
'worker_num': feed.num_workers,
'use_process': use_process
},
'BATCH_SIZE': feed.batch_size,
'DROP_LAST': feed.drop_last,
'USE_PADDED_IM_INFO': feed.use_padded_im_info,
}
batch_transforms = feed.batch_transforms
pad = [t for t in batch_transforms if isinstance(t, PadBatch)]
rand_shape = [t for t in batch_transforms if isinstance(t, RandomShape)]
multi_scale = [t for t in batch_transforms if isinstance(t, MultiScale)]
if any(pad):
transform_config['IS_PADDING'] = True
if pad[0].pad_to_stride != 0:
transform_config['COARSEST_STRIDE'] = pad[0].pad_to_stride
if any(rand_shape):
transform_config['RANDOM_SHAPES'] = rand_shape[0].sizes
if any(multi_scale):
transform_config['MULTI_SCALES'] = multi_scale[0].scales
if hasattr(inspect, 'getfullargspec'):
argspec = inspect.getfullargspec
else:
argspec = inspect.getargspec
ops = []
for op in feed.sample_transforms:
op_dict = op.__dict__.copy()
argnames = [
arg for arg in argspec(type(op).__init__).args if arg != 'self'
]
op_dict = {k: v for k, v in op_dict.items() if k in argnames}
op_dict['op'] = op.__class__.__name__
ops.append(op_dict)
transform_config['OPS'] = ops
reader = Reader(data_config, {mode: transform_config}, max_iter)
return reader._make_reader(mode)
# XXX batch transforms are only stubs for now, actually handled by `post_map`
@serializable
class PadBatch(object):
"""
Pad a batch of samples to same dimensions
Args:
pad_to_stride (int): pad to multiple of strides, e.g., 32
"""
def __init__(self, pad_to_stride=0):
super(PadBatch, self).__init__()
self.pad_to_stride = pad_to_stride
@serializable
class MultiScale(object):
"""
Randomly resize image by scale
Args:
scales (list): list of int, randomly resize to one of these scales
"""
def __init__(self, scales=[]):
super(MultiScale, self).__init__()
self.scales = scales
@serializable
class RandomShape(object):
"""
Randomly reshape a batch
Args:
sizes (list): list of int, random choose a size from these
"""
def __init__(self, sizes=[]):
super(RandomShape, self).__init__()
self.sizes = sizes
@serializable
class DataSet(object):
"""
Dataset, e.g., coco, pascal voc
Args:
annotation (str): annotation file path
image_dir (str): directory where image files are stored
num_classes (int): number of classes
shuffle (bool): shuffle samples
"""
__source__ = 'RoiDbSource'
def __init__(self,
annotation,
image_dir,
dataset_dir=None,
use_default_label=None):
super(DataSet, self).__init__()
self.dataset_dir = dataset_dir
self.annotation = annotation
self.image_dir = image_dir
self.use_default_label = use_default_label
COCO_DATASET_DIR = 'coco'
COCO_TRAIN_ANNOTATION = 'annotations/instances_train2017.json'
COCO_TRAIN_IMAGE_DIR = 'train2017'
COCO_VAL_ANNOTATION = 'annotations/instances_val2017.json'
COCO_VAL_IMAGE_DIR = 'val2017'
@serializable
class CocoDataSet(DataSet):
def __init__(self,
dataset_dir=COCO_DATASET_DIR,
annotation=COCO_TRAIN_ANNOTATION,
image_dir=COCO_TRAIN_IMAGE_DIR):
super(CocoDataSet, self).__init__(
dataset_dir=dataset_dir, annotation=annotation, image_dir=image_dir)
VOC_DATASET_DIR = 'pascalvoc'
VOC_TRAIN_ANNOTATION = 'VOCdevkit/VOC_all/ImageSets/Main/train.txt'
VOC_VAL_ANNOTATION = 'VOCdevkit/VOC_all/ImageSets/Main/val.txt'
VOC_TEST_ANNOTATION = 'VOCdevkit/VOC_all/ImageSets/Main/test.txt'
VOC_IMAGE_DIR = 'VOCdevkit/VOC_all/JPEGImages'
VOC_USE_DEFAULT_LABEL = None
@serializable
class VocDataSet(DataSet):
__source__ = 'VOCSource'
def __init__(self,
dataset_dir=VOC_DATASET_DIR,
annotation=VOC_TRAIN_ANNOTATION,
image_dir=VOC_IMAGE_DIR,
use_default_label=VOC_USE_DEFAULT_LABEL):
super(VocDataSet, self).__init__(
dataset_dir=dataset_dir,
annotation=annotation,
image_dir=image_dir,
use_default_label=use_default_label)
@serializable
class SimpleDataSet(DataSet):
__source__ = 'SimpleSource'
def __init__(self,
dataset_dir=None,
annotation=None,
image_dir=None,
use_default_label=None):
super(SimpleDataSet, self).__init__(
dataset_dir=dataset_dir, annotation=annotation, image_dir=image_dir)
self.images = []
def add_images(self, images):
self.images.extend(images)
@serializable
class DataFeed(object):
"""
DataFeed encompasses all data loading related settings
Args:
dataset (object): a `Dataset` instance
fields (list): list of data fields needed
image_shape (list): list of image dims (C, MAX_DIM, MIN_DIM)
sample_transforms (list): list of sample transformations to use
batch_transforms (list): list of batch transformations to use
batch_size (int): number of images per device
shuffle (bool): if samples should be shuffled
drop_last (bool): drop last batch if size is uneven
num_workers (int): number of workers processes (or threads)
"""
__category__ = 'data'
def __init__(self,
dataset,
fields,
image_shape,
sample_transforms=None,
batch_transforms=None,
batch_size=1,
shuffle=False,
samples=-1,
drop_last=False,
with_background=True,
num_workers=2,
bufsize=10,
use_process=False,
use_padded_im_info=False):
super(DataFeed, self).__init__()
self.fields = fields
self.image_shape = image_shape
self.sample_transforms = sample_transforms
self.batch_transforms = batch_transforms
self.batch_size = batch_size
self.shuffle = shuffle
self.samples = samples
self.drop_last = drop_last
self.with_background = with_background
self.num_workers = num_workers
self.bufsize = bufsize
self.use_process = use_process
self.dataset = dataset
self.use_padded_im_info = use_padded_im_info
if isinstance(dataset, dict):
self.dataset = DataSet(**dataset)
# for custom (i.e., Non-preset) datasets
@register
class TrainFeed(DataFeed):
__doc__ = DataFeed.__doc__
def __init__(self,
dataset,
fields,
image_shape,
sample_transforms=[],
batch_transforms=[],
batch_size=1,
shuffle=True,
samples=-1,
drop_last=False,
with_background=True,
num_workers=2,
bufsize=10,
use_process=True):
super(TrainFeed, self).__init__(
dataset,
fields,
image_shape,
sample_transforms,
batch_transforms,
batch_size=batch_size,
shuffle=shuffle,
samples=samples,
drop_last=drop_last,
with_background=with_background,
num_workers=num_workers,
bufsize=bufsize,
use_process=use_process, )
@register
class EvalFeed(DataFeed):
__doc__ = DataFeed.__doc__
def __init__(self,
dataset,
fields,
image_shape,
sample_transforms=[],
batch_transforms=[],
batch_size=1,
shuffle=False,
samples=-1,
drop_last=False,
with_background=True,
num_workers=2):
super(EvalFeed, self).__init__(
dataset,
fields,
image_shape,
sample_transforms,
batch_transforms,
batch_size=batch_size,
shuffle=shuffle,
samples=samples,
drop_last=drop_last,
with_background=with_background,
num_workers=num_workers)
@register
class TestFeed(DataFeed):
__doc__ = DataFeed.__doc__
def __init__(self,
dataset,
fields,
image_shape,
sample_transforms=[],
batch_transforms=[],
batch_size=1,
shuffle=False,
drop_last=False,
with_background=True,
num_workers=2):
super(TestFeed, self).__init__(
dataset,
fields,
image_shape,
sample_transforms,
batch_transforms,
batch_size=batch_size,
shuffle=shuffle,
drop_last=drop_last,
with_background=with_background,
num_workers=num_workers)
@register
class FasterRCNNTrainFeed(DataFeed):
__doc__ = DataFeed.__doc__
def __init__(self,
dataset=CocoDataSet().__dict__,
fields=[
'image', 'im_info', 'im_id', 'gt_box', 'gt_label',
'is_crowd'
],
image_shape=[3, 1333, 800],
sample_transforms=[
DecodeImage(to_rgb=True),
RandomFlipImage(prob=0.5),
NormalizeImage(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225],
is_scale=True,
is_channel_first=False),
ResizeImage(target_size=800, max_size=1333, interp=1),
Permute(to_bgr=False)
],
batch_transforms=[PadBatch()],
batch_size=1,
shuffle=True,
samples=-1,
drop_last=False,
num_workers=2,
use_process=False):
# XXX this should be handled by the data loader, since `fields` is
# given, just collect them
sample_transforms.append(ArrangeRCNN())
super(FasterRCNNTrainFeed, self).__init__(
dataset,
fields,
image_shape,
sample_transforms,
batch_transforms,
batch_size=batch_size,
shuffle=shuffle,
samples=samples,
drop_last=drop_last,
num_workers=num_workers,
use_process=use_process)
# XXX these modes should be unified
self.mode = 'TRAIN'
@register
class FasterRCNNEvalFeed(DataFeed):
__doc__ = DataFeed.__doc__
def __init__(self,
dataset=CocoDataSet(COCO_VAL_ANNOTATION,
COCO_VAL_IMAGE_DIR).__dict__,
fields=['image', 'im_info', 'im_id', 'im_shape'],
image_shape=[3, 1333, 800],
sample_transforms=[
DecodeImage(to_rgb=True),
NormalizeImage(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225],
is_scale=True,
is_channel_first=False),
ResizeImage(target_size=800, max_size=1333, interp=1),
Permute(to_bgr=False)
],
batch_transforms=[PadBatch()],
batch_size=1,
shuffle=False,
samples=-1,
drop_last=False,
num_workers=2,
use_padded_im_info=True):
sample_transforms.append(ArrangeTestRCNN())
super(FasterRCNNEvalFeed, self).__init__(
dataset,
fields,
image_shape,
sample_transforms,
batch_transforms,
batch_size=batch_size,
shuffle=shuffle,
samples=samples,
drop_last=drop_last,
num_workers=num_workers,
use_padded_im_info=use_padded_im_info)
self.mode = 'VAL'
@register
class FasterRCNNTestFeed(DataFeed):
__doc__ = DataFeed.__doc__
def __init__(self,
dataset=SimpleDataSet(COCO_VAL_ANNOTATION,
COCO_VAL_IMAGE_DIR).__dict__,
fields=['image', 'im_info', 'im_id', 'im_shape'],
image_shape=[3, 1333, 800],
sample_transforms=[
DecodeImage(to_rgb=True),
NormalizeImage(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225],
is_scale=True,
is_channel_first=False),
Permute(to_bgr=False)
],
batch_transforms=[PadBatch()],
batch_size=1,
shuffle=False,
samples=-1,
drop_last=False,
num_workers=2,
use_padded_im_info=True):
sample_transforms.append(ArrangeTestRCNN())
if isinstance(dataset, dict):
dataset = SimpleDataSet(**dataset)
super(FasterRCNNTestFeed, self).__init__(
dataset,
fields,
image_shape,
sample_transforms,
batch_transforms,
batch_size=batch_size,
shuffle=shuffle,
samples=samples,
drop_last=drop_last,
num_workers=num_workers,
use_padded_im_info=use_padded_im_info)
self.mode = 'TEST'
# XXX currently use two presets, in the future, these should be combined into a
# single `RCNNTrainFeed`. Mask (and keypoint) should be processed
# automatically if `gt_mask` (or `gt_keypoints`) is in the required fields
@register
class MaskRCNNTrainFeed(DataFeed):
__doc__ = DataFeed.__doc__
def __init__(self,
dataset=CocoDataSet().__dict__,
fields=[
'image', 'im_info', 'im_id', 'gt_box', 'gt_label',
'is_crowd', 'gt_mask'
],
image_shape=[3, 1333, 800],
sample_transforms=[
DecodeImage(to_rgb=True),
RandomFlipImage(prob=0.5, is_mask_flip=True),
NormalizeImage(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225],
is_scale=True,
is_channel_first=False),
ResizeImage(target_size=800,
max_size=1333,
interp=1,
use_cv2=True),
Permute(to_bgr=False, channel_first=True)
],
batch_transforms=[PadBatch()],
batch_size=1,
shuffle=True,
samples=-1,
drop_last=False,
num_workers=2,
use_process=False,
use_padded_im_info=False):
sample_transforms.append(ArrangeRCNN(is_mask=True))
super(MaskRCNNTrainFeed, self).__init__(
dataset,
fields,
image_shape,
sample_transforms,
batch_transforms,
batch_size=batch_size,
shuffle=shuffle,
samples=samples,
drop_last=drop_last,
num_workers=num_workers,
use_process=use_process)
self.mode = 'TRAIN'
@register
class MaskRCNNEvalFeed(DataFeed):
__doc__ = DataFeed.__doc__
def __init__(self,
dataset=CocoDataSet(COCO_VAL_ANNOTATION,
COCO_VAL_IMAGE_DIR).__dict__,
fields=['image', 'im_info', 'im_id', 'im_shape'],
image_shape=[3, 1333, 800],
sample_transforms=[
DecodeImage(to_rgb=True),
NormalizeImage(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225],
is_scale=True,
is_channel_first=False),
ResizeImage(target_size=800,
max_size=1333,
interp=1,
use_cv2=True),
Permute(to_bgr=False, channel_first=True)
],
batch_transforms=[PadBatch()],
batch_size=1,
shuffle=False,
samples=-1,
drop_last=False,
num_workers=2,
use_process=False,
use_padded_im_info=True):
sample_transforms.append(ArrangeTestRCNN())
super(MaskRCNNEvalFeed, self).__init__(
dataset,
fields,
image_shape,
sample_transforms,
batch_transforms,
batch_size=batch_size,
shuffle=shuffle,
samples=samples,
drop_last=drop_last,
num_workers=num_workers,
use_process=use_process,
use_padded_im_info=use_padded_im_info)
self.mode = 'VAL'
@register
class MaskRCNNTestFeed(DataFeed):
__doc__ = DataFeed.__doc__
def __init__(self,
dataset=SimpleDataSet(COCO_VAL_ANNOTATION,
COCO_VAL_IMAGE_DIR).__dict__,
fields=['image', 'im_info', 'im_id', 'im_shape'],
image_shape=[3, 1333, 800],
sample_transforms=[
DecodeImage(to_rgb=True),
NormalizeImage(
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225],
is_scale=True,
is_channel_first=False),
Permute(to_bgr=False, channel_first=True)
],
batch_transforms=[PadBatch()],
batch_size=1,
shuffle=False,
samples=-1,
drop_last=False,
num_workers=2,
use_process=False,
use_padded_im_info=True):
sample_transforms.append(ArrangeTestRCNN())
if isinstance(dataset, dict):
dataset = SimpleDataSet(**dataset)
super(MaskRCNNTestFeed, self).__init__(
dataset,
fields,
image_shape,
sample_transforms,
batch_transforms,
batch_size=batch_size,
shuffle=shuffle,
samples=samples,
drop_last=drop_last,
num_workers=num_workers,
use_process=use_process,
use_padded_im_info=use_padded_im_info)
self.mode = 'TEST'
@register
class SSDTrainFeed(DataFeed):
__doc__ = DataFeed.__doc__
def __init__(self,
dataset=VocDataSet().__dict__,
fields=['image', 'gt_box', 'gt_label', 'is_difficult'],
image_shape=[3, 300, 300],
sample_transforms=[
DecodeImage(to_rgb=True, with_mixup=False),
NormalizeBox(),
RandomDistort(brightness_lower=0.875,
brightness_upper=1.125,
is_order=True),
ExpandImage(max_ratio=4, prob=0.5),
CropImage(batch_sampler=[[1, 1, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0],
[1, 50, 0.3, 1.0, 0.5, 2.0, 0.1, 0.0],
[1, 50, 0.3, 1.0, 0.5, 2.0, 0.3, 0.0],
[1, 50, 0.3, 1.0, 0.5, 2.0, 0.5, 0.0],
[1, 50, 0.3, 1.0, 0.5, 2.0, 0.7, 0.0],
[1, 50, 0.3, 1.0, 0.5, 2.0, 0.9, 0.0],
[1, 50, 0.3, 1.0, 0.5, 2.0, 0.0, 1.0]],
satisfy_all=False, avoid_no_bbox=False),
ResizeImage(target_size=300, use_cv2=False, interp=1),
RandomFlipImage(is_normalized=True),
Permute(),
NormalizeImage(mean=[127.5, 127.5, 127.5],
std=[127.502231, 127.502231, 127.502231],
is_scale=False)
],
batch_transforms=[],
batch_size=32,
shuffle=True,
samples=-1,
drop_last=True,
num_workers=8,
bufsize=10,
use_process=True):
sample_transforms.append(ArrangeSSD())
if isinstance(dataset, dict):
dataset = VocDataSet(**dataset)
super(SSDTrainFeed, self).__init__(
dataset,
fields,
image_shape,
sample_transforms,
batch_transforms,
batch_size=batch_size,
shuffle=shuffle,
samples=samples,
drop_last=drop_last,
num_workers=num_workers,
use_process=use_process)
self.mode = 'TRAIN'
@register
class SSDEvalFeed(DataFeed):
__doc__ = DataFeed.__doc__
def __init__(
self,
dataset=VocDataSet(VOC_VAL_ANNOTATION).__dict__,
fields=['image', 'gt_box', 'gt_label', 'is_difficult'],
image_shape=[3, 300, 300],
sample_transforms=[
DecodeImage(to_rgb=True, with_mixup=False),
NormalizeBox(),
ResizeImage(target_size=300, use_cv2=False, interp=1),
RandomFlipImage(is_normalized=True),
Permute(),
NormalizeImage(
mean=[127.5, 127.5, 127.5],
std=[127.502231, 127.502231, 127.502231],
is_scale=False)
],
batch_transforms=[],
batch_size=64,
shuffle=False,
samples=-1,
drop_last=True,
num_workers=8,
bufsize=10,
use_process=False):
sample_transforms.append(ArrangeSSD())
if isinstance(dataset, dict):
dataset = VocDataSet(**dataset)
super(SSDEvalFeed, self).__init__(
dataset,
fields,
image_shape,
sample_transforms,
batch_transforms,
batch_size=batch_size,
shuffle=shuffle,
samples=samples,
drop_last=drop_last,
num_workers=num_workers,
use_process=use_process)
self.mode = 'VAL'
@register
class SSDTestFeed(DataFeed):
__doc__ = DataFeed.__doc__
def __init__(self,
dataset=SimpleDataSet(VOC_TEST_ANNOTATION).__dict__,
fields=['image', 'im_id'],
image_shape=[3, 300, 300],
sample_transforms=[
DecodeImage(to_rgb=True),
ResizeImage(target_size=300, use_cv2=False, interp=1),
Permute(),
NormalizeImage(
mean=[127.5, 127.5, 127.5],
std=[127.502231, 127.502231, 127.502231],
is_scale=False)
],
batch_transforms=[],
batch_size=1,
shuffle=False,
samples=-1,
drop_last=False,
num_workers=8,
bufsize=10,
use_process=False):
sample_transforms.append(ArrangeTestSSD())
if isinstance(dataset, dict):
dataset = SimpleDataSet(**dataset)
super(SSDTestFeed, self).__init__(
dataset,
fields,
image_shape,
sample_transforms,
batch_transforms,
batch_size=batch_size,
shuffle=shuffle,
samples=samples,
drop_last=drop_last,
num_workers=num_workers)
self.mode = 'TEST'
@register
class YoloTrainFeed(DataFeed):
__doc__ = DataFeed.__doc__
def __init__(self,
dataset=CocoDataSet().__dict__,
fields=['image', 'gt_box', 'gt_label', 'gt_score'],
image_shape=[3, 608, 608],
sample_transforms=[
DecodeImage(to_rgb=True, with_mixup=True),
MixupImage(alpha=1.5, beta=1.5),
NormalizeBox(),
RandomDistort(),
ExpandImage(max_ratio=4., prob=.5,
mean=[123.675, 116.28, 103.53]),
CropImage([[1, 1, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0],
[1, 50, 0.3, 1.0, 0.5, 2.0, 0.1, 1.0],
[1, 50, 0.3, 1.0, 0.5, 2.0, 0.3, 1.0],
[1, 50, 0.3, 1.0, 0.5, 2.0, 0.5, 1.0],
[1, 50, 0.3, 1.0, 0.5, 2.0, 0.7, 1.0],
[1, 50, 0.3, 1.0, 0.5, 2.0, 0.9, 1.0],
[1, 50, 0.3, 1.0, 0.5, 2.0, 0.0, 1.0]]),
RandomInterpImage(target_size=608),
RandomFlipImage(is_normalized=True),
NormalizeImage(
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225],
is_scale=True,
is_channel_first=False),
Permute(to_bgr=False),
],
batch_transforms=[
RandomShape(sizes=[
320, 352, 384, 416, 448, 480, 512, 544, 576, 608
])
],
batch_size=8,
shuffle=True,
samples=-1,
drop_last=True,
with_background=False,
num_workers=8,
bufsize=128,
use_process=True,
num_max_boxes=50,
mixup_epoch=250):
sample_transforms.append(ArrangeYOLO())
super(YoloTrainFeed, self).__init__(
dataset,
fields,
image_shape,
sample_transforms,
batch_transforms,
batch_size=batch_size,
shuffle=shuffle,
samples=samples,
drop_last=drop_last,
with_background=with_background,
num_workers=num_workers,
bufsize=bufsize,
use_process=use_process)
self.num_max_boxes = num_max_boxes
self.mixup_epoch = mixup_epoch
self.mode = 'TRAIN'
@register
class YoloEvalFeed(DataFeed):
__doc__ = DataFeed.__doc__
def __init__(self,
dataset=CocoDataSet(COCO_VAL_ANNOTATION,
COCO_VAL_IMAGE_DIR).__dict__,
fields=['image', 'im_shape', 'im_id'],
image_shape=[3, 608, 608],
sample_transforms=[
DecodeImage(to_rgb=True),
ResizeImage(target_size=608, interp=2),
NormalizeImage(
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225],
is_scale=True,
is_channel_first=False),
Permute(to_bgr=False),
],
batch_transforms=[],
batch_size=8,
shuffle=False,
samples=-1,
drop_last=False,
with_background=False,
num_workers=8,
num_max_boxes=50,
use_process=False):
sample_transforms.append(ArrangeTestYOLO())
super(YoloEvalFeed, self).__init__(
dataset,
fields,
image_shape,
sample_transforms,
batch_transforms,
batch_size=batch_size,
shuffle=shuffle,
samples=samples,
drop_last=drop_last,
with_background=with_background,
num_workers=num_workers,
use_process=use_process)
self.num_max_boxes = num_max_boxes
self.mode = 'VAL'
self.bufsize = 128
@register
class YoloTestFeed(DataFeed):
__doc__ = DataFeed.__doc__
def __init__(self,
dataset=SimpleDataSet(COCO_VAL_ANNOTATION,
COCO_VAL_IMAGE_DIR).__dict__,
fields=['image', 'im_shape', 'im_id'],
image_shape=[3, 608, 608],
sample_transforms=[
DecodeImage(to_rgb=True),
ResizeImage(target_size=608, interp=2),
NormalizeImage(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225],
is_scale=True,
is_channel_first=False),
Permute(to_bgr=False),
],
batch_transforms=[],
batch_size=1,
shuffle=False,
samples=-1,
drop_last=False,
with_background=False,
num_workers=8,
num_max_boxes=50,
use_process=False):
sample_transforms.append(ArrangeTestYOLO())
if isinstance(dataset, dict):
dataset = SimpleDataSet(**dataset)
super(YoloTestFeed, self).__init__(
dataset,
fields,
image_shape,
sample_transforms,
batch_transforms,
batch_size=batch_size,
shuffle=shuffle,
samples=samples,
drop_last=drop_last,
with_background=with_background,
num_workers=num_workers,
use_process=use_process)
self.num_max_boxes = num_max_boxes
self.mode = 'TEST'
self.bufsize = 128
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# function:
# interface for accessing data samples in stream
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
class Dataset(object):
"""interface to access a stream of data samples"""
def __init__(self):
self._epoch = -1
def __next__(self):
return self.next()
def __iter__(self):
return self
def __str__(self):
return "{}(fname:{}, epoch:{:d}, size:{:d}, pos:{:d})".format(
type(self).__name__, self._fname, self._epoch,
self.size(), self._pos)
def next(self):
"""get next sample"""
raise NotImplementedError('%s.next not available' %
(self.__class__.__name__))
def reset(self):
"""reset to initial status and begins a new epoch"""
raise NotImplementedError('%s.reset not available' %
(self.__class__.__name__))
def size(self):
"""get number of samples in this dataset"""
raise NotImplementedError('%s.size not available' %
(self.__class__.__name__))
def drained(self):
"""whether all sampled has been readed out for this epoch"""
raise NotImplementedError('%s.drained not available' %
(self.__class__.__name__))
def epoch_id(self):
"""return epoch id for latest sample"""
raise NotImplementedError('%s.epoch_id not available' %
(self.__class__.__name__))
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# function:
# Interface to build readers for detection data like COCO or VOC
#
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
from numbers import Integral
import logging
from .source import build_source
from .transform import build_mapper, map, batch, batch_map
logger = logging.getLogger(__name__)
class Reader(object):
"""Interface to make readers for training or evaluation"""
def __init__(self, data_cf, trans_conf, maxiter=-1):
self._data_cf = data_cf
self._trans_conf = trans_conf
self._maxiter = maxiter
self._cname2cid = None
assert isinstance(self._maxiter, Integral), "maxiter should be int"
def _make_reader(self, mode):
"""Build reader for training or validation"""
file_conf = self._data_cf[mode]
# 1, Build data source
sc_conf = {'data_cf': file_conf, 'cname2cid': self._cname2cid}
sc = build_source(sc_conf)
# 2, Buid a transformed dataset
ops = self._trans_conf[mode]['OPS']
batchsize = self._trans_conf[mode]['BATCH_SIZE']
drop_last = False if 'DROP_LAST' not in \
self._trans_conf[mode] else self._trans_conf[mode]['DROP_LAST']
mapper = build_mapper(ops, {'is_train': mode == 'TRAIN'})
worker_args = None
if 'WORKER_CONF' in self._trans_conf[mode]:
worker_args = self._trans_conf[mode]['WORKER_CONF']
worker_args = {k.lower(): v for k, v in worker_args.items()}
mapped_ds = map(sc, mapper, worker_args)
batched_ds = batch(mapped_ds, batchsize, drop_last)
trans_conf = {k.lower(): v for k, v in self._trans_conf[mode].items()}
need_keys = {
'is_padding',
'coarsest_stride',
'random_shapes',
'multi_scales',
'use_padded_im_info',
}
bm_config = {
key: value
for key, value in trans_conf.items() if key in need_keys
}
batched_ds = batch_map(batched_ds, bm_config)
batched_ds.reset()
if mode.lower() == 'train':
if self._cname2cid is not None:
logger.warn('cname2cid already set, it will be overridden')
self._cname2cid = sc.cname2cid
# 3, Build a reader
maxit = -1 if self._maxiter <= 0 else self._maxiter
def _reader():
n = 0
while True:
for _batch in batched_ds:
yield _batch
n += 1
if maxit > 0 and n == maxit:
return
batched_ds.reset()
if maxit <= 0:
return
if hasattr(sc, 'get_imid2path'):
_reader.imid2path = sc.get_imid2path()
return _reader
def train(self):
"""Build reader for training"""
return self._make_reader('TRAIN')
def val(self):
"""Build reader for validation"""
return self._make_reader('VAL')
def test(self):
"""Build reader for inference"""
return self._make_reader('TEST')
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import copy
from .roidb_source import RoiDbSource
from .simple_source import SimpleSource
def build_source(config):
"""
Build dataset from source data, default source type is 'RoiDbSource'
Args:
config (dict): should have following structure:
{
data_cf (dict):
anno_file (str): label file or image list file path
image_dir (str): root directory for images
samples (int): number of samples to load, -1 means all
is_shuffle (bool): should samples be shuffled
load_img (bool): should images be loaded
mixup_epoch (int): parse mixup in first n epoch
with_background (bool): whether load background as a class
cname2cid (dict): the label name to id dictionary
}
"""
if 'data_cf' in config:
data_cf = {k.lower(): v for k, v in config['data_cf'].items()}
data_cf['cname2cid'] = config['cname2cid']
else:
data_cf = config
args = copy.deepcopy(data_cf)
# defaut type is 'RoiDbSource'
source_type = 'RoiDbSource'
if 'type' in data_cf:
if data_cf['type'] in ['VOCSource', 'COCOSource', 'RoiDbSource']:
source_type = 'RoiDbSource'
else:
source_type = data_cf['type']
del args['type']
if source_type == 'RoiDbSource':
return RoiDbSource(**args)
elif source_type == 'SimpleSource':
return SimpleSource(**args)
else:
raise ValueError('source type not supported: ' + source_type)
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import numpy as np
from pycocotools.coco import COCO
import logging
logger = logging.getLogger(__name__)
def load(anno_path, sample_num=-1, with_background=True):
"""
Load COCO records with annotations in json file 'anno_path'
Args:
anno_path (str): json file path
sample_num (int): number of samples to load, -1 means all
with_background (bool): whether load background as a class.
if True, total class number will
be 81. default True
Returns:
(records, cname2cid)
'records' is list of dict whose structure is:
{
'im_file': im_fname, # image file name
'im_id': img_id, # image id
'h': im_h, # height of image
'w': im_w, # width
'is_crowd': is_crowd,
'gt_score': gt_score,
'gt_class': gt_class,
'gt_bbox': gt_bbox,
'gt_poly': gt_poly,
}
'cname2cid' is a dict used to map category name to class id
"""
assert anno_path.endswith('.json'), 'invalid coco annotation file: ' \
+ anno_path
coco = COCO(anno_path)
img_ids = coco.getImgIds()
cat_ids = coco.getCatIds()
records = []
ct = 0
# when with_background = True, mapping category to classid, like:
# background:0, first_class:1, second_class:2, ...
catid2clsid = dict(
{catid: i + int(with_background)
for i, catid in enumerate(cat_ids)})
cname2cid = dict({
coco.loadCats(catid)[0]['name']: clsid
for catid, clsid in catid2clsid.items()
})
for img_id in img_ids:
img_anno = coco.loadImgs(img_id)[0]
im_fname = img_anno['file_name']
im_w = img_anno['width']
im_h = img_anno['height']
ins_anno_ids = coco.getAnnIds(imgIds=img_id, iscrowd=False)
instances = coco.loadAnns(ins_anno_ids)
bboxes = []
for inst in instances:
x, y, box_w, box_h = inst['bbox']
x1 = max(0, x)
y1 = max(0, y)
x2 = min(im_w - 1, x1 + max(0, box_w - 1))
y2 = min(im_h - 1, y1 + max(0, box_h - 1))
if inst['area'] > 0 and x2 >= x1 and y2 >= y1:
inst['clean_bbox'] = [x1, y1, x2, y2]
bboxes.append(inst)
num_bbox = len(bboxes)
gt_bbox = np.zeros((num_bbox, 4), dtype=np.float32)
gt_class = np.zeros((num_bbox, 1), dtype=np.int32)
gt_score = np.ones((num_bbox, 1), dtype=np.float32)
is_crowd = np.zeros((num_bbox, 1), dtype=np.int32)
difficult = np.zeros((num_bbox, 1), dtype=np.int32)
gt_poly = [None] * num_bbox
for i, box in enumerate(bboxes):
catid = box['category_id']
gt_class[i][0] = catid2clsid[catid]
gt_bbox[i, :] = box['clean_bbox']
is_crowd[i][0] = box['iscrowd']
gt_poly[i] = box['segmentation']
coco_rec = {
'im_file': im_fname,
'im_id': np.array([img_id]),
'h': im_h,
'w': im_w,
'is_crowd': is_crowd,
'gt_class': gt_class,
'gt_bbox': gt_bbox,
'gt_score': gt_score,
'gt_poly': gt_poly,
'difficult': difficult
}
logger.debug('Load file: {}, im_id: {}, h: {}, w: {}.'.format(
im_fname, img_id, im_h, im_w))
records.append(coco_rec)
ct += 1
if sample_num > 0 and ct >= sample_num:
break
assert len(records) > 0, 'not found any coco record in %s' % (anno_path)
logger.info('{} samples in file {}'.format(ct, anno_path))
return records, cname2cid
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# function:
# load data records from local files(maybe in COCO or VOC data formats)
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import os
import numpy as np
import logging
import pickle as pkl
logger = logging.getLogger(__name__)
def check_records(records):
""" check the fields of 'records' must contains some keys
"""
needed_fields = [
'im_file', 'im_id', 'h', 'w', 'is_crowd', 'gt_class', 'gt_bbox',
'gt_poly'
]
for i, rec in enumerate(records):
for k in needed_fields:
assert k in rec, 'not found field[%s] in record[%d]' % (k, i)
def load_roidb(anno_file, sample_num=-1):
""" load normalized data records from file
'anno_file' which is a pickled file.
And the records should has a structure:
{
'im_file': str, # image file name
'im_id': int, # image id
'h': int, # height of image
'w': int, # width of image
'is_crowd': bool,
'gt_class': list of np.ndarray, # classids info
'gt_bbox': list of np.ndarray, # bounding box info
'gt_poly': list of int, # poly info
}
Args:
anno_file (str): file name for picked records
sample_num (int): number of samples to load
Returns:
list of records for detection model training
"""
assert anno_file.endswith('.roidb'), 'invalid roidb file[%s]' % (anno_file)
with open(anno_file, 'rb') as f:
roidb = f.read()
# for support python3 and python2
try:
records, cname2cid = pkl.loads(roidb, encoding='bytes')
except:
records, cname2cid = pkl.loads(roidb)
assert type(records) is list, 'invalid data type from roidb'
if sample_num > 0 and sample_num < len(records):
records = records[:sample_num]
return records, cname2cid
def load(fname,
samples=-1,
with_background=True,
with_cat2id=False,
use_default_label=None,
cname2cid=None):
""" Load data records from 'fnames'
Args:
fnames (str): file name for data record, eg:
instances_val2017.json or COCO17_val2017.roidb
samples (int): number of samples to load, default to all
with_background (bool): whether load background as a class.
default True.
with_cat2id (bool): whether return cname2cid info out
use_default_label (bool): whether use the default mapping of label to id
cname2cid (dict): the mapping of category name to id
Returns:
list of loaded records whose structure is:
{
'im_file': str, # image file name
'im_id': int, # image id
'h': int, # height of image
'w': int, # width of image
'is_crowd': bool,
'gt_class': list of np.ndarray, # classids info
'gt_bbox': list of np.ndarray, # bounding box info
'gt_poly': list of int, # poly info
}
"""
if fname.endswith('.roidb'):
records, cname2cid = load_roidb(fname, samples)
elif fname.endswith('.json'):
from . import coco_loader
records, cname2cid = coco_loader.load(fname, samples, with_background)
elif os.path.isfile(fname):
from . import voc_loader
if use_default_label is None or cname2cid is not None:
records, cname2cid = voc_loader.get_roidb(fname, samples, cname2cid,
with_background=with_background)
else:
records, cname2cid = voc_loader.load(fname, samples,
use_default_label,
with_background=with_background)
else:
raise ValueError('invalid file type when load data from file[%s]' %
(fname))
check_records(records)
if with_cat2id:
return records, cname2cid
else:
return records
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#function:
# interface to load data from local files and parse it for samples,
# eg: roidb data in pickled files
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import os
import random
import copy
import pickle as pkl
from ..dataset import Dataset
class RoiDbSource(Dataset):
""" interface to load roidb data from files
"""
def __init__(self,
anno_file,
image_dir=None,
samples=-1,
is_shuffle=True,
load_img=False,
cname2cid=None,
use_default_label=None,
mixup_epoch=-1,
with_background=True):
""" Init
Args:
fname (str): label file path
image_dir (str): root dir for images
samples (int): samples to load, -1 means all
is_shuffle (bool): whether to shuffle samples
load_img (bool): whether load data in this class
cname2cid (dict): the label name to id dictionary
use_default_label (bool):whether use the default mapping of label to id
mixup_epoch (int): parse mixup in first n epoch
with_background (bool): whether load background
as a class
"""
super(RoiDbSource, self).__init__()
self._epoch = -1
assert os.path.isfile(anno_file) or os.path.isdir(
anno_file), 'invalid file[%s] for RoiDbSource' % (anno_file)
self._fname = anno_file
self._image_dir = image_dir
if image_dir is not None:
assert os.path.isdir(image_dir), 'invalid image directory[%s]' % (
image_dir)
self._roidb = None
self._pos = -1
self._drained = False
self._samples = samples
self._is_shuffle = is_shuffle
self._load_img = load_img
self.use_default_label = use_default_label
self._mixup_epoch = mixup_epoch
self._with_background = with_background
self.cname2cid = cname2cid
def __str__(self):
return 'RoiDbSource(fname:%s,epoch:%d,size:%d,pos:%d)' \
% (self._fname, self._epoch, self.size(), self._pos)
def next(self):
""" load next sample
"""
if self._epoch < 0:
self.reset()
if self._pos >= self._samples:
self._drained = True
raise StopIteration('%s no more data' % (str(self)))
sample = copy.deepcopy(self._roidb[self._pos])
if self._load_img:
sample['image'] = self._load_image(sample['im_file'])
else:
sample['im_file'] = os.path.join(self._image_dir, sample['im_file'])
if self._epoch < self._mixup_epoch:
mix_idx = random.randint(1, self._samples - 1)
mix_pos = (mix_idx + self._pos) % self._samples
sample['mixup'] = copy.deepcopy(self._roidb[mix_pos])
if self._load_image:
sample['mixup']['image'] = \
self._load_image(sample['mixup']['im_file'])
else:
sample['mixup']['im_file'] = \
os.path.join(self._image_dir, sample['mixup']['im_file'])
self._pos += 1
return sample
def _load(self):
""" load data from file
"""
from . import loader
records, cname2cid = loader.load(self._fname, self._samples,
self._with_background, True,
self.use_default_label, self.cname2cid)
self.cname2cid = cname2cid
return records
def _load_image(self, where):
fn = os.path.join(self._image_dir, where)
with open(fn, 'rb') as f:
return f.read()
def reset(self):
""" implementation of Dataset.reset
"""
if self._roidb is None:
self._roidb = self._load()
self._samples = len(self._roidb)
if self._is_shuffle:
random.shuffle(self._roidb)
if self._epoch < 0:
self._epoch = 0
else:
self._epoch += 1
self._pos = 0
self._drained = False
def size(self):
""" implementation of Dataset.size
"""
return len(self._roidb)
def drained(self):
""" implementation of Dataset.drained
"""
assert self._epoch >= 0, 'The first epoch has not begin!'
return self._pos >= self.size()
def epoch_id(self):
""" return epoch id for latest sample
"""
return self._epoch
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# function:
# interface to load data from txt file.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import numpy as np
import copy
from ..dataset import Dataset
class SimpleSource(Dataset):
"""
Load image files for testing purpose
Args:
images (list): list of path of images
samples (int): number of samples to load, -1 means all
load_img (bool): should images be loaded
"""
def __init__(self,
images=[],
samples=-1,
load_img=True,
**kwargs):
super(SimpleSource, self).__init__()
self._epoch = -1
for image in images:
assert image != '' and os.path.isfile(image), \
"Image {} not found".format(image)
self._images = images
self._fname = None
self._simple = None
self._pos = -1
self._drained = False
self._samples = samples
self._load_img = load_img
self._imid2path = {}
def next(self):
if self._epoch < 0:
self.reset()
if self._pos >= self.size():
self._drained = True
raise StopIteration("no more data in " + str(self))
else:
sample = copy.deepcopy(self._simple[self._pos])
if self._load_img:
sample['image'] = self._load_image(sample['im_file'])
self._pos += 1
return sample
def _load(self):
ct = 0
records = []
for image in self._images:
if self._samples > 0 and ct >= self._samples:
break
rec = {'im_id': np.array([ct]), 'im_file': image}
self._imid2path[ct] = image
ct += 1
records.append(rec)
assert len(records) > 0, "no image file found"
return records
def _load_image(self, where):
with open(where, 'rb') as f:
return f.read()
def reset(self):
if self._simple is None:
self._simple = self._load()
if self._epoch < 0:
self._epoch = 0
else:
self._epoch += 1
self._pos = 0
self._drained = False
def size(self):
return len(self._simple)
def drained(self):
assert self._epoch >= 0, "the first epoch has not started yet"
return self._pos >= self.size()
def epoch_id(self):
return self._epoch
def get_imid2path(self):
"""return image id to image path map"""
return self._imid2path
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import numpy as np
import xml.etree.ElementTree as ET
def get_roidb(anno_path,
sample_num=-1,
cname2cid=None,
with_background=True):
"""
Load VOC records with annotations in xml directory 'anno_path'
Notes:
${anno_path}/ImageSets/Main/train.txt must contains xml file names for annotations
${anno_path}/Annotations/xxx.xml must contain annotation info for one record
Args:
anno_path (str): root directory for voc annotation data
sample_num (int): number of samples to load, -1 means all
cname2cid (dict): the label name to id dictionary
with_background (bool): whether load background as a class.
if True, total class number will
be 81. default True
Returns:
(records, catname2clsid)
'records' is list of dict whose structure is:
{
'im_file': im_fname, # image file name
'im_id': im_id, # image id
'h': im_h, # height of image
'w': im_w, # width
'is_crowd': is_crowd,
'gt_class': gt_class,
'gt_bbox': gt_bbox,
'gt_poly': gt_poly,
}
'cname2id' is a dict to map category name to class id
"""
txt_file = anno_path
part = txt_file.split('ImageSets')
xml_path = os.path.join(part[0], 'Annotations')
assert os.path.isfile(txt_file) and \
os.path.isdir(xml_path), 'invalid xml path'
records = []
ct = 0
existence = False if cname2cid is None else True
if cname2cid is None:
cname2cid = {}
# mapping category name to class id
# background:0, first_class:1, second_class:2, ...
with open(txt_file, 'r') as fr:
while True:
line = fr.readline()
if not line:
break
fname = line.strip() + '.xml'
xml_file = os.path.join(xml_path, fname)
if not os.path.isfile(xml_file):
continue
tree = ET.parse(xml_file)
im_fname = tree.find('filename').text
if tree.find('id') is None:
im_id = np.array([ct])
else:
im_id = np.array([int(tree.find('id').text)])
objs = tree.findall('object')
im_w = float(tree.find('size').find('width').text)
im_h = float(tree.find('size').find('height').text)
gt_bbox = np.zeros((len(objs), 4), dtype=np.float32)
gt_class = np.zeros((len(objs), 1), dtype=np.int32)
gt_score = np.ones((len(objs), 1), dtype=np.float32)
is_crowd = np.zeros((len(objs), 1), dtype=np.int32)
difficult = np.zeros((len(objs), 1), dtype=np.int32)
for i, obj in enumerate(objs):
cname = obj.find('name').text
if not existence and cname not in cname2cid:
# the background's id is 0, so need to add 1.
cname2cid[cname] = len(cname2cid) + int(with_background)
elif existence and cname not in cname2cid:
raise KeyError(
'Not found cname[%s] in cname2cid when map it to cid.' %
(cname))
gt_class[i][0] = cname2cid[cname]
_difficult = int(obj.find('difficult').text)
x1 = float(obj.find('bndbox').find('xmin').text)
y1 = float(obj.find('bndbox').find('ymin').text)
x2 = float(obj.find('bndbox').find('xmax').text)
y2 = float(obj.find('bndbox').find('ymax').text)
x1 = max(0, x1)
y1 = max(0, y1)
x2 = min(im_w - 1, x2)
y2 = min(im_h - 1, y2)
gt_bbox[i] = [x1, y1, x2, y2]
is_crowd[i][0] = 0
difficult[i][0] = _difficult
voc_rec = {
'im_file': im_fname,
'im_id': im_id,
'h': im_h,
'w': im_w,
'is_crowd': is_crowd,
'gt_class': gt_class,
'gt_score': gt_score,
'gt_bbox': gt_bbox,
'gt_poly': [],
'difficult': difficult
}
if len(objs) != 0:
records.append(voc_rec)
ct += 1
if sample_num > 0 and ct >= sample_num:
break
assert len(records) > 0, 'not found any voc record in %s' % (anno_path)
return [records, cname2cid]
def load(anno_path,
sample_num=-1,
use_default_label=True,
with_background=True):
"""
Load VOC records with annotations in
xml directory 'anno_path'
Notes:
${anno_path}/ImageSets/Main/train.txt must contains xml file names for annotations
${anno_path}/Annotations/xxx.xml must contain annotation info for one record
Args:
@anno_path (str): root directory for voc annotation data
@sample_num (int): number of samples to load, -1 means all
@use_default_label (bool): whether use the default mapping of label to id
@with_background (bool): whether load background as a class.
if True, total class number will
be 81. default True
Returns:
(records, catname2clsid)
'records' is list of dict whose structure is:
{
'im_file': im_fname, # image file name
'im_id': im_id, # image id
'h': im_h, # height of image
'w': im_w, # width
'is_crowd': is_crowd,
'gt_class': gt_class,
'gt_bbox': gt_bbox,
'gt_poly': gt_poly,
}
'cname2id' is a dict to map category name to class id
"""
txt_file = anno_path
part = txt_file.split('ImageSets')
xml_path = os.path.join(part[0], 'Annotations')
assert os.path.isfile(txt_file) and \
os.path.isdir(xml_path), 'invalid xml path'
# mapping category name to class id
# if with_background is True:
# background:0, first_class:1, second_class:2, ...
# if with_background is False:
# first_class:0, second_class:1, ...
records = []
ct = 0
cname2cid = {}
if not use_default_label:
label_path = os.path.join(part[0], 'ImageSets/Main/label_list.txt')
with open(label_path, 'r') as fr:
label_id = int(with_background)
for line in fr.readlines():
cname2cid[line.strip()] = label_id
label_id += 1
else:
cname2cid = pascalvoc_label(with_background)
with open(txt_file, 'r') as fr:
while True:
line = fr.readline()
if not line:
break
fname = line.strip() + '.xml'
xml_file = os.path.join(xml_path, fname)
if not os.path.isfile(xml_file):
continue
tree = ET.parse(xml_file)
im_fname = tree.find('filename').text
if tree.find('id') is None:
im_id = np.array([ct])
else:
im_id = np.array([int(tree.find('id').text)])
objs = tree.findall('object')
im_w = float(tree.find('size').find('width').text)
im_h = float(tree.find('size').find('height').text)
gt_bbox = np.zeros((len(objs), 4), dtype=np.float32)
gt_class = np.zeros((len(objs), 1), dtype=np.int32)
gt_score = np.ones((len(objs), 1), dtype=np.float32)
is_crowd = np.zeros((len(objs), 1), dtype=np.int32)
difficult = np.zeros((len(objs), 1), dtype=np.int32)
for i, obj in enumerate(objs):
cname = obj.find('name').text
gt_class[i][0] = cname2cid[cname]
_difficult = int(obj.find('difficult').text)
x1 = float(obj.find('bndbox').find('xmin').text)
y1 = float(obj.find('bndbox').find('ymin').text)
x2 = float(obj.find('bndbox').find('xmax').text)
y2 = float(obj.find('bndbox').find('ymax').text)
x1 = max(0, x1)
y1 = max(0, y1)
x2 = min(im_w - 1, x2)
y2 = min(im_h - 1, y2)
gt_bbox[i] = [x1, y1, x2, y2]
is_crowd[i][0] = 0
difficult[i][0] = _difficult
voc_rec = {
'im_file': im_fname,
'im_id': im_id,
'h': im_h,
'w': im_w,
'is_crowd': is_crowd,
'gt_class': gt_class,
'gt_score': gt_score,
'gt_bbox': gt_bbox,
'gt_poly': [],
'difficult': difficult
}
if len(objs) != 0:
records.append(voc_rec)
ct += 1
if sample_num > 0 and ct >= sample_num:
break
assert len(records) > 0, 'not found any voc record in %s' % (anno_path)
return [records, cname2cid]
def pascalvoc_label(with_background=True):
labels_map = {
'aeroplane': 1,
'bicycle': 2,
'bird': 3,
'boat': 4,
'bottle': 5,
'bus': 6,
'car': 7,
'cat': 8,
'chair': 9,
'cow': 10,
'diningtable': 11,
'dog': 12,
'horse': 13,
'motorbike': 14,
'person': 15,
'pottedplant': 16,
'sheep': 17,
'sofa': 18,
'train': 19,
'tvmonitor': 20
}
if not with_background:
labels_map = {k: v - 1 for k, v in labels_map.items()}
return labels_map
DATA:
TRAIN:
ANNO_FILE: data/coco.test/train2017.roidb
IMAGE_DIR: data/coco.test/train2017
SAMPLES: 10
TYPE: RoiDbSource
VAL:
ANNO_FILE: data/coco.test/val2017.roidb
IMAGE_DIR: data/coco.test/val2017
SAMPLES: 10
TYPE: RoiDbSource
TRANSFORM:
TRAIN:
OPS:
- OP: DecodeImage
TO_RGB: False
- OP: RandomFlipImage
PROB: 0.5
- OP: NormalizeImage
MEAN: [102.9801, 115.9465, 122.7717]
IS_SCALE: False
IS_CHANNEL_FIRST: False
- OP: ResizeImage
TARGET_SIZE: 800
MAX_SIZE: 1333
- OP: Rgb2Bgr
TO_BGR: False
- OP: ArrangeRCNN
BATCH_SIZE: 1
IS_PADDING: True
DROP_LAST: False
VAL:
OPS:
- OP: DecodeImage
TO_RGB: True
- OP: ResizeImage
TARGET_SIZE: 224
- OP: ArrangeSSD
BATCH_SIZE: 1
WORKER_CONF:
BUFSIZE: 200
WORKER_NUM: 8
USE_PROCESS: False
#!/bin/bash
#function:
# prepare coco data for testing
root=$(dirname `readlink -f ${BASH_SOURCE}[0]`)
cwd=`pwd`
if [[ $cwd != $root ]];then
pushd $root 2>&1 1>/dev/null
fi
test_coco_python2_url="http://filecenter.matrix.baidu.com/api/v1/file/wanglong03/coco.test.python2.zip/20190603095315/download"
test_coco_python3_url="http://filecenter.matrix.baidu.com/api/v1/file/wanglong03/coco.test.python3.zip/20190603095447/download"
if [[ $1 = "python2" ]];then
test_coco_data_url=${test_coco_python2_url}
coco_zip_file="coco.test.python2.zip"
else
test_coco_data_url=${test_coco_python3_url}
coco_zip_file="coco.test.python3.zip"
fi
echo "download testing coco from url[${test_coco_data_url}]"
coco_root_dir=${coco_zip_file/.zip/}
# clear already exist file or directory
rm -rf ${coco_root_dir} ${coco_zip_file}
wget ${test_coco_data_url} -O ${coco_zip_file}
if [ -e $coco_zip_file ];then
echo "succeed to download ${coco_zip_file}, so unzip it"
unzip ${coco_zip_file} >/dev/null 2>&1
fi
if [ -e ${coco_root_dir} ];then
rm -rf coco.test
ln -s ${coco_root_dir} coco.test
echo "succeed to generate coco data in[${coco_root_dir}] for testing"
exit 0
else
echo "failed to generate coco data"
exit 1
fi
DATA:
TRAIN:
ANNO_FILE: data/coco.test/train2017.roidb
IMAGE_DIR: data/coco.test/train2017
SAMPLES: 10
IS_SHUFFLE: True
TYPE: RoiDbSource
TRANSFORM:
TRAIN:
OPS:
- OP: DecodeImage
TO_RGB: False
- OP: RandomFlipImage
PROB: 0.5
- OP: NormalizeImage
MEAN: [102.9801, 115.9465, 122.7717]
IS_SCALE: False
IS_CHANNEL_FIRST: False
- OP: ResizeImage
TARGET_SIZE: 800
MAX_SIZE: 1333
- OP: Rgb2Bgr
TO_BGR: False
- OP: ArrangeRCNN
BATCH_SIZE: 1
IS_PADDING: True
DROP_LAST: False
WORKER_CONF:
BUFSIZE: 10
WORKER_NUM: 2
#!/usr/bin/python
#-*-coding:utf-8-*-
"""Run all tests
"""
import unittest
import test_loader
import test_operator
import test_roidb_source
import test_transformer
import test_reader
if __name__ == '__main__':
alltests = unittest.TestSuite([
unittest.TestLoader().loadTestsFromTestCase(t) \
for t in [
test_loader.TestLoader,
test_operator.TestBase,
test_roidb_source.TestRoiDbSource,
test_transformer.TestTransformer,
test_reader.TestReader,
]
])
was_succ = unittest\
.TextTestRunner(verbosity=2)\
.run(alltests)\
.wasSuccessful()
exit(0 if was_succ else 1)
import sys
import os
import six
import logging
path = os.path.join(os.path.dirname(os.path.abspath(__file__)), '../../')
if path not in sys.path:
sys.path.insert(0, path)
prefix = os.path.dirname(os.path.abspath(__file__))
#coco data for testing
if six.PY3:
version = 'python3'
else:
version = 'python2'
data_root = os.path.join(prefix, 'data/coco.test.%s' % (version))
# coco data for testing
coco_data = {
'TRAIN': {
'ANNO_FILE': os.path.join(data_root, 'train2017.roidb'),
'IMAGE_DIR': os.path.join(data_root, 'train2017')
},
'VAL': {
'ANNO_FILE': os.path.join(data_root, 'val2017.roidb'),
'IMAGE_DIR': os.path.join(data_root, 'val2017')
}
}
script = os.path.join(os.path.dirname(__file__), 'data/prepare_data.sh')
if not os.path.exists(data_root):
ret = os.system('bash %s %s' % (script, version))
if ret != 0:
logging.error('not found file[%s], you should manually prepare '
'your data using "data/prepare_data.sh"' % (data_root))
sys.exit(1)
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import os
import time
import unittest
import sys
import logging
import numpy as np
import set_env
class TestLoader(unittest.TestCase):
"""Test cases for dataset.source.loader
"""
@classmethod
def setUpClass(cls):
""" setup
"""
cls.prefix = os.path.dirname(os.path.abspath(__file__))
# json data
cls.anno_path = os.path.join(cls.prefix,
'data/coco/instances_val2017.json')
cls.image_dir = os.path.join(cls.prefix, 'data/coco/val2017')
cls.anno_path1 = os.path.join(cls.prefix,
"data/voc/ImageSets/Main/train.txt")
cls.image_dir1 = os.path.join(cls.prefix, "data/voc/JPEGImages")
@classmethod
def tearDownClass(cls):
""" tearDownClass """
pass
def test_load_coco_in_json(self):
""" test loading COCO data in json file
"""
from data.source.coco_loader import load
if not os.path.exists(self.anno_path):
logging.warn('not found %s, so skip this test' % (self.anno_path))
return
samples = 10
records, cname2id = load(self.anno_path, samples)
self.assertEqual(len(records), samples)
self.assertGreater(len(cname2id), 0)
def test_load_coco_in_roidb(self):
""" test loading COCO data in pickled records
"""
anno_path = os.path.join(self.prefix,
'data/roidbs/instances_val2017.roidb')
if not os.path.exists(anno_path):
logging.warn('not found %s, so skip this test' % (anno_path))
return
samples = 10
from data.source.loader import load_roidb
records, cname2cid = load_roidb(anno_path, samples)
self.assertEqual(len(records), samples)
self.assertGreater(len(cname2cid), 0)
def test_load_voc_in_xml(self):
""" test loading VOC data in xml files
"""
from data.source.voc_loader import load
if not os.path.exists(self.anno_path1):
logging.warn('not found %s, so skip this test' % (self.anno_path1))
return
samples = 3
records, cname2cid = load(self.anno_path1, samples)
self.assertEqual(len(records), samples)
self.assertGreater(len(cname2cid), 0)
def test_load_voc_in_roidb(self):
""" test loading VOC data in pickled records
"""
anno_path = os.path.join(self.prefix, 'data/roidbs/train.roidb')
if not os.path.exists(anno_path):
logging.warn('not found %s, so skip this test' % (anno_path))
return
samples = 3
from loader import load_roidb
records, cname2cid = load_roidb(anno_path, samples)
self.assertEqual(len(records), samples)
self.assertGreater(len(cname2cid), 0)
if __name__ == '__main__':
unittest.main()
import os
import unittest
import logging
import numpy as np
import set_env
from data import transform as tf
logging.basicConfig(level=logging.INFO)
class TestBase(unittest.TestCase):
"""Test cases for dataset.transform.operator
"""
@classmethod
def setUpClass(cls, with_mixup=False):
""" setup
"""
roidb_fname = set_env.coco_data['TRAIN']['ANNO_FILE']
image_dir = set_env.coco_data['TRAIN']['IMAGE_DIR']
import pickle as pkl
with open(roidb_fname, 'rb') as f:
roidb = f.read()
roidb = pkl.loads(roidb)
fn = os.path.join(image_dir, roidb[0][0]['im_file'])
with open(fn, 'rb') as f:
roidb[0][0]['image'] = f.read()
if with_mixup:
mixup_fn = os.path.join(image_dir, roidb[0][1]['im_file'])
roidb[0][0]['mixup'] = roidb[0][1]
with open(fn, 'rb') as f:
roidb[0][0]['mixup']['image'] = f.read()
cls.sample = roidb[0][0]
@classmethod
def tearDownClass(cls):
""" tearDownClass """
pass
def test_ops_all(self):
""" test operators
"""
# ResizeImage
ops_conf = [{
'op': 'DecodeImage'
}, {
'op': 'ResizeImage',
'target_size': 300,
'max_size': 1333
}]
mapper = tf.build(ops_conf)
self.assertTrue(mapper is not None)
data = self.sample.copy()
result0 = mapper(data)
self.assertIsNotNone(result0['image'])
self.assertEqual(len(result0['image'].shape), 3)
# RandFlipImage
ops_conf = [{'op': 'RandomFlipImage'}]
mapper = tf.build(ops_conf)
self.assertTrue(mapper is not None)
result1 = mapper(result0)
self.assertEqual(result1['image'].shape, result0['image'].shape)
self.assertEqual(result1['gt_bbox'].shape, result0['gt_bbox'].shape)
# NormalizeImage
ops_conf = [{'op': 'NormalizeImage', 'is_channel_first': False}]
mapper = tf.build(ops_conf)
self.assertTrue(mapper is not None)
result2 = mapper(result1)
im1 = result1['image']
count = np.where(im1 <= 1)[0]
if im1.dtype == 'float64':
self.assertEqual(count, im1.shape[0] * im1.shape[1], im1.shape[2])
# ArrangeSample
ops_conf = [{'op': 'ArrangeRCNN'}]
mapper = tf.build(ops_conf)
self.assertTrue(mapper is not None)
result3 = mapper(result2)
self.assertEqual(type(result3), tuple)
def test_ops_part1(self):
"""test Crop and Resize
"""
ops_conf = [{
'op': 'DecodeImage'
}, {
'op': 'NormalizeBox'
}, {
'op': 'CropImage',
'batch_sampler': [[1, 1, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0],
[1, 50, 0.3, 1.0, 0.5, 2.0, 0.1, 0.0],
[1, 50, 0.3, 1.0, 0.5, 2.0, 0.3, 0.0],
[1, 50, 0.3, 1.0, 0.5, 2.0, 0.5, 0.0],
[1, 50, 0.3, 1.0, 0.5, 2.0, 0.7, 0.0],
[1, 50, 0.3, 1.0, 0.5, 2.0, 0.9, 0.0],
[1, 50, 0.3, 1.0, 0.5, 2.0, 0.0, 1.0]]
}]
mapper = tf.build(ops_conf)
self.assertTrue(mapper is not None)
data = self.sample.copy()
result = mapper(data)
self.assertEqual(len(result['image'].shape), 3)
def test_ops_part2(self):
"""test Expand and RandomDistort
"""
ops_conf = [{
'op': 'DecodeImage'
}, {
'op': 'NormalizeBox'
}, {
'op': 'ExpandImage',
'max_ratio': 1.5,
'prob': 1
}]
mapper = tf.build(ops_conf)
self.assertTrue(mapper is not None)
data = self.sample.copy()
result = mapper(data)
self.assertEqual(len(result['image'].shape), 3)
self.assertGreater(result['gt_bbox'].shape[0], 0)
def test_ops_part3(self):
"""test Mixup and RandomInterp
"""
ops_conf = [{
'op': 'DecodeImage',
'with_mixup': True,
}, {
'op': 'MixupImage',
}, {
'op': 'RandomInterpImage',
'target_size': 608
}]
mapper = tf.build(ops_conf)
self.assertTrue(mapper is not None)
data = self.sample.copy()
result = mapper(data)
self.assertEqual(len(result['image'].shape), 3)
self.assertGreater(result['gt_bbox'].shape[0], 0)
#self.assertGreater(result['gt_score'].shape[0], 0)
if __name__ == '__main__':
unittest.main()
import os
import time
import unittest
import sys
import logging
import numpy as np
import yaml
import set_env
from data import Reader
class TestReader(unittest.TestCase):
"""Test cases for dataset.reader
"""
@classmethod
def setUpClass(cls):
""" setup
"""
prefix = os.path.dirname(os.path.abspath(__file__))
coco_yml = os.path.join(prefix, 'coco.yml')
with open(coco_yml, 'rb') as f:
cls.coco_conf = yaml.load(f.read())
cls.coco_conf['DATA']['TRAIN'] = set_env.coco_data['TRAIN']
cls.coco_conf['DATA']['VAL'] = set_env.coco_data['VAL']
rcnn_yml = os.path.join(prefix, 'rcnn_dataset.yml')
with open(rcnn_yml, 'rb') as f:
cls.rcnn_conf = yaml.load(f.read())
cls.rcnn_conf['DATA']['TRAIN'] = set_env.coco_data['TRAIN']
cls.rcnn_conf['DATA']['VAL'] = set_env.coco_data['VAL']
@classmethod
def tearDownClass(cls):
""" tearDownClass """
pass
def test_train(self):
""" Test reader for training
"""
coco = Reader(
self.coco_conf['DATA'], self.coco_conf['TRANSFORM'], maxiter=1000)
train_rd = coco.train()
self.assertTrue(train_rd is not None)
ct = 0
total = 0
bytes = 0
prev_ts = None
for sample in train_rd():
if prev_ts is None:
start_ts = time.time()
prev_ts = start_ts
ct += 1
bytes += 4 * sample[0][0].size * len(sample[0])
self.assertTrue(sample is not None)
cost = time.time() - prev_ts
if cost >= 1.0:
total += ct
qps = total / (time.time() - start_ts)
bps = bytes / (time.time() - start_ts)
logging.info('got %d/%d samples in %.3fsec with qps:%d bps:%d' %
(ct, total, cost, qps, bps))
bytes = 0
ct = 0
prev_ts = time.time()
total += ct
self.assertEqual(total, coco._maxiter)
def test_val(self):
""" Test reader for validation
"""
coco = Reader(self.coco_conf['DATA'], self.coco_conf['TRANSFORM'], 10)
val_rd = coco.val()
self.assertTrue(val_rd is not None)
# test 3 epoches
for _ in range(3):
ct = 0
for sample in val_rd():
ct += 1
self.assertTrue(sample is not None)
self.assertGreaterEqual(ct, coco._maxiter)
def test_rcnn_train(self):
""" Test reader for training
"""
anno = self.rcnn_conf['DATA']['TRAIN']['ANNO_FILE']
if not os.path.exists(anno):
logging.error('exit test_rcnn for not found file[%s]' % (anno))
return
rcnn = Reader(self.rcnn_conf['DATA'], self.rcnn_conf['TRANSFORM'], 10)
rcnn_rd = rcnn.train()
self.assertTrue(rcnn_rd is not None)
ct = 0
out = None
for sample in rcnn_rd():
out = sample
ct += 1
self.assertTrue(sample is not None)
self.assertEqual(out[0][0].shape[0], 3)
self.assertEqual(out[0][1].shape[0], 3)
self.assertEqual(out[0][3].shape[1], 4)
self.assertEqual(out[0][4].shape[1], 1)
self.assertEqual(out[0][5].shape[1], 1)
self.assertGreaterEqual(ct, rcnn._maxiter)
if __name__ == '__main__':
unittest.main()
import os
import time
import unittest
import sys
import logging
import set_env
from data import build_source
class TestRoiDbSource(unittest.TestCase):
"""Test cases for dataset.source.roidb_source
"""
@classmethod
def setUpClass(cls):
""" setup
"""
anno_path = set_env.coco_data['TRAIN']['ANNO_FILE']
image_dir = set_env.coco_data['TRAIN']['IMAGE_DIR']
cls.config = {
'data_cf': {
'anno_file': anno_path,
'image_dir': image_dir,
'samples': 100,
'load_img': True
},
'cname2cid': None
}
@classmethod
def tearDownClass(cls):
""" tearDownClass """
pass
def test_basic(self):
""" test basic apis 'next/size/drained'
"""
roi_source = build_source(self.config)
for i, sample in enumerate(roi_source):
self.assertTrue('image' in sample)
self.assertGreater(len(sample['image']), 0)
self.assertTrue(roi_source.drained())
self.assertEqual(i + 1, roi_source.size())
def test_reset(self):
""" test functions 'reset/epoch_id'
"""
roi_source = build_source(self.config)
self.assertTrue(roi_source.next() is not None)
self.assertEqual(roi_source.epoch_id(), 0)
roi_source.reset()
self.assertEqual(roi_source.epoch_id(), 1)
self.assertTrue(roi_source.next() is not None)
if __name__ == '__main__':
unittest.main()
import os
import time
import unittest
import sys
import logging
import numpy as np
import set_env
from data import build_source
from data import transform as tf
logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)
class TestTransformer(unittest.TestCase):
"""Test cases for dataset.transform.transformer
"""
@classmethod
def setUpClass(cls):
""" setup
"""
prefix = os.path.dirname(os.path.abspath(__file__))
# json data
anno_path = set_env.coco_data['TRAIN']['ANNO_FILE']
image_dir = set_env.coco_data['TRAIN']['IMAGE_DIR']
cls.sc_config = {
'anno_file': anno_path,
'image_dir': image_dir,
'samples': 200
}
cls.ops = [{
'op': 'DecodeImage',
'to_rgb': True
}, {
'op': 'ResizeImage',
'target_size': 800,
'max_size': 1333
}, {
'op': 'ArrangeRCNN',
'is_mask': False
}]
@classmethod
def tearDownClass(cls):
""" tearDownClass """
pass
def test_map(self):
""" test transformer.map
"""
mapper = tf.build(self.ops)
ds = build_source(self.sc_config)
mapped_ds = tf.map(ds, mapper)
ct = 0
for sample in mapped_ds:
self.assertTrue(type(sample[0]) is np.ndarray)
ct += 1
self.assertEqual(ct, mapped_ds.size())
def test_parallel_map(self):
""" test transformer.map with concurrent workers
"""
mapper = tf.build(self.ops)
ds = build_source(self.sc_config)
worker_conf = {'WORKER_NUM': 2, 'use_process': True}
mapped_ds = tf.map(ds, mapper, worker_conf)
ct = 0
for sample in mapped_ds:
self.assertTrue(type(sample[0]) is np.ndarray)
ct += 1
self.assertTrue(mapped_ds.drained())
self.assertEqual(ct, mapped_ds.size())
mapped_ds.reset()
ct = 0
for sample in mapped_ds:
self.assertTrue(type(sample[0]) is np.ndarray)
ct += 1
self.assertEqual(ct, mapped_ds.size())
def test_batch(self):
""" test batched dataset
"""
batchsize = 2
mapper = tf.build(self.ops)
ds = build_source(self.sc_config)
mapped_ds = tf.map(ds, mapper)
batched_ds = tf.batch(mapped_ds, batchsize, True)
for sample in batched_ds:
out = sample
self.assertEqual(len(out), batchsize)
if __name__ == '__main__':
unittest.main()
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# function:
# tool used convert COCO or VOC data to a pickled file whose
# schema for each sample is the same.
#
# notes:
# Original data format of COCO or VOC can also be directly
# used by 'PPdetection' to train.
# This tool just convert data to a unified schema,
# and it's useful when debuging with small dataset.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import argparse
import os
import sys
import logging
import pickle as pkl
path = os.path.join(os.path.dirname(os.path.abspath(__file__)), '../../')
if path not in sys.path:
sys.path.insert(0, path)
from data.source import loader
def parse_args():
""" parse arguments
"""
parser = argparse.ArgumentParser(
description='Generate Standard Dataset for PPdetection')
parser.add_argument(
'--type',
type=str,
default='json',
help='file format of label file, eg: json for COCO and xml for VOC')
parser.add_argument(
'--annotation',
type=str,
help='label file name for COCO or VOC dataset, '
'eg: instances_val2017.json or train.txt')
parser.add_argument(
'--save-dir',
type=str,
default='roidb',
help='directory to save roidb file which contains pickled samples')
parser.add_argument(
'--samples',
type=int,
default=-1,
help='number of samples to dump, default to all')
args = parser.parse_args()
return args
def dump_coco_as_pickle(args):
""" Load COCO data, and then save it as pickled file.
Notes:
label file of COCO contains a json which consists
of label info for each sample
"""
samples = args.samples
save_dir = args.save_dir
if not os.path.exists(save_dir):
os.makedirs(save_dir)
anno_path = args.annotation
roidb, cat2id = loader.load(anno_path, samples, with_cat2id=True)
samples = len(roidb)
dsname = os.path.basename(anno_path).rstrip('.json')
roidb_fname = save_dir + "/%s.roidb" % (dsname)
with open(roidb_fname, "wb") as fout:
pkl.dump((roidb, cat2id), fout)
#for rec in roidb:
# sys.stderr.write('%s\n' % (rec['im_file']))
logging.info('dumped %d samples to file[%s]' % (samples, roidb_fname))
def dump_voc_as_pickle(args):
""" Load VOC data, and then save it as pickled file.
Notes:
we assume label file of VOC contains lines
each of which corresponds to a xml file
that contains it's label info
"""
samples = args.samples
save_dir = args.save_dir
if not os.path.exists(save_dir):
os.makedirs(save_dir)
save_dir = args.save_dir
anno_path = os.path.expanduser(args.annotation)
roidb, cat2id = loader.load(
anno_path, samples, with_cat2id=True, use_default_label=None)
samples = len(roidb)
part = anno_path.split('/')
dsname = part[-4]
roidb_fname = save_dir + "/%s.roidb" % (dsname)
with open(roidb_fname, "wb") as fout:
pkl.dump((roidb, cat2id), fout)
anno_path = os.path.join(anno_path.split('/train.txt')[0], 'label_list.txt')
with open(anno_path, 'w') as fw:
for key in cat2id.keys():
fw.write(key + '\n')
logging.info('dumped %d samples to file[%s]' % (samples, roidb_fname))
if __name__ == "__main__":
""" Make sure you have already downloaded original COCO or VOC data,
then you can convert it using this tool.
Usage:
python generate_data_for_training.py --type=json
--annotation=./annotations/instances_val2017.json
--save-dir=./roidb --samples=100
"""
args = parse_args()
# VOC data are organized in xml files
if args.type == 'xml':
dump_voc_as_pickle(args)
# COCO data are organized in json file
elif args.type == 'json':
dump_coco_as_pickle(args)
else:
TypeError('Can\'t deal with {} type. '\
'Only xml or json file format supported'.format(args.type))
#!/usr/bin/env python
# coding: utf-8
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
import glob
import json
import os
import os.path as osp
import sys
import shutil
import numpy as np
import PIL.ImageDraw
class MyEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, np.integer):
return int(obj)
elif isinstance(obj, np.floating):
return float(obj)
elif isinstance(obj, np.ndarray):
return obj.tolist()
else:
return super(MyEncoder, self).default(obj)
def getbbox(self, points):
polygons = points
mask = self.polygons_to_mask([self.height, self.width], polygons)
return self.mask2box(mask)
def images(data, num):
image = {}
image['height'] = data['imageHeight']
image['width'] = data['imageWidth']
image['id'] = num + 1
image['file_name'] = data['imagePath'].split('/')[-1]
return image
def categories(label, labels_list):
category = {}
category['supercategory'] = 'component'
category['id'] = len(labels_list) + 1
category['name'] = label
return category
def annotations_rectangle(points, label, num, label_to_num):
annotation = {}
seg_points = np.asarray(points).copy()
seg_points[1, :] = np.asarray(points)[2, :]
seg_points[2, :] = np.asarray(points)[1, :]
annotation['segmentation'] = [list(seg_points.flatten())]
annotation['iscrowd'] = 0
annotation['image_id'] = num + 1
annotation['bbox'] = list(
map(float, [
points[0][0], points[0][1], points[1][0] - points[0][0], points[1][
1] - points[0][1]
]))
annotation['area'] = annotation['bbox'][2] * annotation['bbox'][3]
annotation['category_id'] = label_to_num[label]
annotation['id'] = num + 1
return annotation
def annotations_polygon(height, width, points, label, num, label_to_num):
annotation = {}
annotation['segmentation'] = [list(np.asarray(points).flatten())]
annotation['iscrowd'] = 0
annotation['image_id'] = num + 1
annotation['bbox'] = list(map(float, get_bbox(height, width, points)))
annotation['area'] = annotation['bbox'][2] * annotation['bbox'][3]
annotation['category_id'] = label_to_num[label]
annotation['id'] = num + 1
return annotation
def get_bbox(height, width, points):
polygons = points
mask = np.zeros([height, width], dtype=np.uint8)
mask = PIL.Image.fromarray(mask)
xy = list(map(tuple, polygons))
PIL.ImageDraw.Draw(mask).polygon(xy=xy, outline=1, fill=1)
mask = np.array(mask, dtype=bool)
index = np.argwhere(mask == 1)
rows = index[:, 0]
clos = index[:, 1]
left_top_r = np.min(rows)
left_top_c = np.min(clos)
right_bottom_r = np.max(rows)
right_bottom_c = np.max(clos)
return [
left_top_c, left_top_r, right_bottom_c - left_top_c,
right_bottom_r - left_top_r
]
def deal_json(img_path, json_path):
data_coco = {}
label_to_num = {}
images_list = []
categories_list = []
annotations_list = []
labels_list = []
num = -1
for img_file in os.listdir(img_path):
img_label = img_file.split('.')[0]
label_file = osp.join(json_path, img_label + '.json')
print('Generating dataset from:', label_file)
num = num + 1
with open(label_file) as f:
data = json.load(f)
images_list.append(images(data, num))
for shapes in data['shapes']:
label = shapes['label']
if label not in labels_list:
categories_list.append(categories(label, labels_list))
labels_list.append(label)
label_to_num[label] = len(labels_list)
points = shapes['points']
p_type = shapes['shape_type']
if p_type == 'polygon':
annotations_list.append(
annotations_polygon(data['imageHeight'], data[
'imageWidth'], points, label, num, label_to_num))
if p_type == 'rectangle':
points.append([points[0][0], points[1][1]])
points.append([points[1][0], points[0][1]])
annotations_list.append(
annotations_rectangle(points, label, num, label_to_num))
data_coco['images'] = images_list
data_coco['categories'] = categories_list
data_coco['annotations'] = annotations_list
return data_coco
def main():
parser = argparse.ArgumentParser(
formatter_class=argparse.ArgumentDefaultsHelpFormatter)
parser.add_argument('--json_input_dir', help='input annotated directory')
parser.add_argument('--image_input_dir', help='image directory')
parser.add_argument(
'--output_dir', help='output dataset directory', default='../../../')
parser.add_argument(
'--train_proportion',
help='the proportion of train dataset',
type=float,
default=1.0)
parser.add_argument(
'--val_proportion',
help='the proportion of validation dataset',
type=float,
default=0.0)
parser.add_argument(
'--test_proportion',
help='the proportion of test dataset',
type=float,
default=0.0)
args = parser.parse_args()
try:
assert os.path.exists(args.json_input_dir)
except AssertionError as e:
print('The json folder does not exist!')
os._exit(0)
try:
assert os.path.exists(args.image_input_dir)
except AssertionError as e:
print('The image folder does not exist!')
os._exit(0)
try:
assert args.train_proportion + args.val_proportion + args.test_proportion == 1.0
except AssertionError as e:
print(
'The sum of pqoportion of training, validation and test datase must be 1!'
)
os._exit(0)
# Allocate the dataset.
total_num = len(glob.glob(osp.join(args.json_input_dir, '*.json')))
if args.train_proportion != 0:
train_num = int(total_num * args.train_proportion)
os.makedirs(args.output_dir + '/train')
else:
train_num = 0
if args.val_proportion == 0.0:
val_num = 0
test_num = total_num - train_num
if args.test_proportion != 0.0:
os.makedirs(args.output_dir + '/test')
else:
val_num = int(total_num * args.val_proportion)
test_num = total_num - train_num - val_num
os.makedirs(args.output_dir + '/val')
if args.test_proportion != 0.0:
os.makedirs(args.output_dir + '/test')
count = 1
for img_name in os.listdir(args.image_input_dir):
if count <= train_num:
shutil.copyfile(
osp.join(args.image_input_dir, img_name),
osp.join(args.output_dir + '/train/', img_name))
else:
if count <= train_num + val_num:
shutil.copyfile(
osp.join(args.image_input_dir, img_name),
osp.join(args.output_dir + '/val/', img_name))
else:
shutil.copyfile(
osp.join(args.image_input_dir, img_name),
osp.join(args.output_dir + '/test/', img_name))
count = count + 1
# Deal with the json files.
if not os.path.exists(args.output_dir + '/annotations'):
os.makedirs(args.output_dir + '/annotations')
if args.train_proportion != 0:
train_data_coco = deal_json(args.output_dir + '/train',
args.json_input_dir)
train_json_path = osp.join(args.output_dir + '/annotations',
'instance_train.json')
json.dump(
train_data_coco,
open(train_json_path, 'w'),
indent=4,
cls=MyEncoder)
if args.val_proportion != 0:
val_data_coco = deal_json(args.output_dir + '/val', args.json_input_dir)
val_json_path = osp.join(args.output_dir + '/annotations',
'instance_val.json')
json.dump(
val_data_coco, open(val_json_path, 'w'), indent=4, cls=MyEncoder)
if args.test_proportion != 0:
test_data_coco = deal_json(args.output_dir + '/test',
args.json_input_dir)
test_json_path = osp.join(args.output_dir + '/annotations',
'instance_test.json')
json.dump(
test_data_coco, open(test_json_path, 'w'), indent=4, cls=MyEncoder)
if __name__ == '__main__':
main()
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import print_function
import copy
import logging
from .transformer import MappedDataset, BatchedDataset
from .post_map import build_post_map
from .parallel_map import ParallelMappedDataset
from .operators import BaseOperator, registered_ops
__all__ = ['build_mapper', 'map', 'batch', 'batch_map']
logger = logging.getLogger(__name__)
def build_mapper(ops, context=None):
"""
Build a mapper for operators in 'ops'
Args:
ops (list of operator.BaseOperator or list of op dict):
configs for oprators, eg:
[{'name': 'DecodeImage', 'params': {'to_rgb': True}}, {xxx}]
context (dict): a context object for mapper
Returns:
a mapper function which accept one argument 'sample' and
return the processed result
"""
new_ops = []
for _dict in ops:
new_dict = {}
for i, j in _dict.items():
new_dict[i.lower()] = j
new_ops.append(new_dict)
ops = new_ops
op_funcs = []
op_repr = []
for op in ops:
if type(op) is dict and 'op' in op:
op_func = getattr(BaseOperator, op['op'])
params = copy.deepcopy(op)
del params['op']
o = op_func(**params)
elif not isinstance(op, BaseOperator):
op_func = getattr(BaseOperator, op['name'])
params = {} if 'params' not in op else op['params']
o = op_func(**params)
else:
assert isinstance(op, BaseOperator), \
"invalid operator when build ops"
o = op
op_funcs.append(o)
op_repr.append('{{}}'.format(str(o)))
op_repr = '[{}]'.format(','.join(op_repr))
def _mapper(sample):
ctx = {} if context is None else copy.deepcopy(context)
for f in op_funcs:
try:
out = f(sample, ctx)
sample = out
except Exception as e:
logger.warn("fail to map op [{}] with error: {}".format(f, e))
return out
_mapper.ops = op_repr
return _mapper
def map(ds, mapper, worker_args=None):
"""
Apply 'mapper' to 'ds'
Args:
ds (instance of Dataset): dataset to be mapped
mapper (function): action to be executed for every data sample
worker_args (dict): configs for concurrent mapper
Returns:
a mapped dataset
"""
if worker_args is not None:
return ParallelMappedDataset(ds, mapper, worker_args)
else:
return MappedDataset(ds, mapper)
def batch(ds, batchsize, drop_last=False):
"""
Batch data samples to batches
Args:
batchsize (int): number of samples for a batch
drop_last (bool): drop last few samples if not enough for a batch
Returns:
a batched dataset
"""
return BatchedDataset(ds, batchsize, drop_last=drop_last)
def batch_map(ds, config):
"""
Post process the batches.
Args:
ds (instance of Dataset): dataset to be mapped
mapper (function): action to be executed for every batch
Returns:
a batched dataset which is processed
"""
mapper = build_post_map(**config)
return MappedDataset(ds, mapper)
for nm in registered_ops:
op = getattr(BaseOperator, nm)
locals()[nm] = op
__all__ += registered_ops
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# function:
# operators to process sample,
# eg: decode/resize/crop image
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import logging
import numpy as np
from .operators import BaseOperator, register_op
logger = logging.getLogger(__name__)
@register_op
class ArrangeRCNN(BaseOperator):
"""
Transform dict to tuple format needed for training.
Args:
is_mask (bool): whether to use include mask data
"""
def __init__(self, is_mask=False):
super(ArrangeRCNN, self).__init__()
self.is_mask = is_mask
assert isinstance(self.is_mask, bool), "wrong type for is_mask"
def __call__(self, sample, context=None):
"""
Args:
sample: a dict which contains image
info and annotation info.
context: a dict which contains additional info.
Returns:
sample: a tuple containing following items
(image, im_info, im_id, gt_bbox, gt_class, is_crowd, gt_masks)
"""
im = sample['image']
gt_bbox = sample['gt_bbox']
gt_class = sample['gt_class']
keys = list(sample.keys())
if 'is_crowd' in keys:
is_crowd = sample['is_crowd']
else:
raise KeyError("The dataset doesn't have 'is_crowd' key.")
if 'im_info' in keys:
im_info = sample['im_info']
else:
raise KeyError("The dataset doesn't have 'im_info' key.")
im_id = sample['im_id']
outs = (im, im_info, im_id, gt_bbox, gt_class, is_crowd)
gt_masks = []
if self.is_mask and len(sample['gt_poly']) != 0 \
and 'is_crowd' in keys:
valid = True
segms = sample['gt_poly']
assert len(segms) == is_crowd.shape[0]
for i in range(len(sample['gt_poly'])):
segm, iscrowd = segms[i], is_crowd[i]
gt_segm = []
if iscrowd:
gt_segm.append([[0, 0]])
else:
for poly in segm:
if len(poly) == 0:
valid = False
break
gt_segm.append(np.array(poly).reshape(-1, 2))
if (not valid) or len(gt_segm) == 0:
break
gt_masks.append(gt_segm)
outs = outs + (gt_masks, )
return outs
@register_op
class ArrangeTestRCNN(BaseOperator):
"""
Transform dict to the tuple format needed for training.
"""
def __init__(self):
super(ArrangeTestRCNN, self).__init__()
def __call__(self, sample, context=None):
"""
Args:
sample: a dict which contains image
info and annotation info.
context: a dict which contains additional info.
Returns:
sample: a tuple containing the following items:
(image, im_info, im_id)
"""
im = sample['image']
keys = list(sample.keys())
if 'im_info' in keys:
im_info = sample['im_info']
else:
raise KeyError("The dataset doesn't have 'im_info' key.")
im_id = sample['im_id']
h = sample['h']
w = sample['w']
# For rcnn models in eval and infer stage, original image size
# is needed to clip the bounding boxes. And box clip op in
# bbox prediction needs im_info as input in format of [N, 3],
# so im_shape is appended by 1 to match dimension.
im_shape = np.array((h, w, 1), dtype=np.float32)
outs = (im, im_info, im_id, im_shape)
return outs
@register_op
class ArrangeSSD(BaseOperator):
"""
Transform dict to tuple format needed for training.
Args:
is_mask (bool): whether to use include mask data
"""
def __init__(self, is_mask=False):
super(ArrangeSSD, self).__init__()
self.is_mask = is_mask
assert isinstance(self.is_mask, bool), "wrong type for is_mask"
def __call__(self, sample, context=None):
"""
Args:
sample: a dict which contains image
info and annotation info.
context: a dict which contains additional info.
Returns:
sample: a tuple containing the following items:
(image, gt_bbox, gt_class, difficult)
"""
im = sample['image']
gt_bbox = sample['gt_bbox']
gt_class = sample['gt_class']
difficult = sample['difficult']
outs = (im, gt_bbox, gt_class, difficult)
return outs
@register_op
class ArrangeTestSSD(BaseOperator):
"""
Transform dict to tuple format needed for training.
Args:
is_mask (bool): whether to use include mask data
"""
def __init__(self, is_mask=False):
super(ArrangeTestSSD, self).__init__()
self.is_mask = is_mask
assert isinstance(self.is_mask, bool), "wrong type for is_mask"
def __call__(self, sample, context=None):
"""
Args:
sample: a dict which contains image
info and annotation info.
context: a dict which contains additional info.
Returns:
sample: a tuple containing the following items: (image)
"""
im = sample['image']
im_id = sample['im_id']
outs = (im, im_id)
return outs
@register_op
class ArrangeYOLO(BaseOperator):
"""
Transform dict to the tuple format needed for training.
"""
def __init__(self):
super(ArrangeYOLO, self).__init__()
def __call__(self, sample, context=None):
"""
Args:
sample: a dict which contains image
info and annotation info.
context: a dict which contains additional info.
Returns:
sample: a tuple containing the following items:
(image, gt_bbox, gt_class, gt_score,
is_crowd, im_info, gt_masks)
"""
im = sample['image']
if len(sample['gt_bbox']) != len(sample['gt_class']):
raise ValueError("gt num mismatch: bbox and class.")
if len(sample['gt_bbox']) != len(sample['gt_score']):
raise ValueError("gt num mismatch: bbox and score.")
gt_bbox = np.zeros((50, 4), dtype=im.dtype)
gt_class = np.zeros((50, ), dtype=np.int32)
gt_score = np.zeros((50, ), dtype=im.dtype)
gt_num = min(50, len(sample['gt_bbox']))
if gt_num > 0:
gt_bbox[:gt_num, :] = sample['gt_bbox'][:gt_num, :]
gt_class[:gt_num] = sample['gt_class'][:gt_num, 0]
gt_score[:gt_num] = sample['gt_score'][:gt_num, 0]
# parse [x1, y1, x2, y2] to [x, y, w, h]
gt_bbox[:, 2:4] = gt_bbox[:, 2:4] - gt_bbox[:, :2]
gt_bbox[:, :2] = gt_bbox[:, :2] + gt_bbox[:, 2:4] / 2.
outs = (im, gt_bbox, gt_class, gt_score)
return outs
@register_op
class ArrangeTestYOLO(BaseOperator):
"""
Transform dict to the tuple format needed for training.
"""
def __init__(self):
super(ArrangeTestYOLO, self).__init__()
def __call__(self, sample, context=None):
"""
Args:
sample: a dict which contains image
info and annotation info.
context: a dict which contains additional info.
Returns:
sample: a tuple containing the following items:
(image, gt_bbox, gt_class, gt_score, is_crowd,
im_info, gt_masks)
"""
im = sample['image']
im_id = sample['im_id']
h = sample['h']
w = sample['w']
im_shape = np.array((h, w))
outs = (im, im_shape, im_id)
return outs
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# this file contains helper methods for BBOX processing
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
def meet_emit_constraint(src_bbox, sample_bbox):
center_x = (src_bbox[2] + src_bbox[0]) / 2
center_y = (src_bbox[3] + src_bbox[1]) / 2
if center_x >= sample_bbox[0] and \
center_x <= sample_bbox[2] and \
center_y >= sample_bbox[1] and \
center_y <= sample_bbox[3]:
return True
return False
def clip_bbox(src_bbox):
src_bbox[0] = max(min(src_bbox[0], 1.0), 0.0)
src_bbox[1] = max(min(src_bbox[1], 1.0), 0.0)
src_bbox[2] = max(min(src_bbox[2], 1.0), 0.0)
src_bbox[3] = max(min(src_bbox[3], 1.0), 0.0)
return src_bbox
def bbox_area(src_bbox):
width = src_bbox[2] - src_bbox[0]
height = src_bbox[3] - src_bbox[1]
return width * height
def filter_and_process(sample_bbox, bboxes, labels, scores=None):
new_bboxes = []
new_labels = []
new_scores = []
for i in range(len(labels)):
new_bbox = [0, 0, 0, 0]
obj_bbox = [bboxes[i][0], bboxes[i][1], bboxes[i][2], bboxes[i][3]]
if not meet_emit_constraint(obj_bbox, sample_bbox):
continue
sample_width = sample_bbox[2] - sample_bbox[0]
sample_height = sample_bbox[3] - sample_bbox[1]
new_bbox[0] = (obj_bbox[0] - sample_bbox[0]) / sample_width
new_bbox[1] = (obj_bbox[1] - sample_bbox[1]) / sample_height
new_bbox[2] = (obj_bbox[2] - sample_bbox[0]) / sample_width
new_bbox[3] = (obj_bbox[3] - sample_bbox[1]) / sample_height
new_bbox = clip_bbox(new_bbox)
if bbox_area(new_bbox) > 0:
new_bboxes.append(new_bbox)
new_labels.append([labels[i][0]])
if scores is not None:
new_scores.append([scores[i][0]])
bboxes = np.array(new_bboxes)
labels = np.array(new_labels)
scores = np.array(new_scores)
return bboxes, labels, scores
def generate_sample_bbox(sampler):
scale = np.random.uniform(sampler[2], sampler[3])
aspect_ratio = np.random.uniform(sampler[4], sampler[5])
aspect_ratio = max(aspect_ratio, (scale**2.0))
aspect_ratio = min(aspect_ratio, 1 / (scale**2.0))
bbox_width = scale * (aspect_ratio**0.5)
bbox_height = scale / (aspect_ratio**0.5)
xmin_bound = 1 - bbox_width
ymin_bound = 1 - bbox_height
xmin = np.random.uniform(0, xmin_bound)
ymin = np.random.uniform(0, ymin_bound)
xmax = xmin + bbox_width
ymax = ymin + bbox_height
sampled_bbox = [xmin, ymin, xmax, ymax]
return sampled_bbox
def jaccard_overlap(sample_bbox, object_bbox):
if sample_bbox[0] >= object_bbox[2] or \
sample_bbox[2] <= object_bbox[0] or \
sample_bbox[1] >= object_bbox[3] or \
sample_bbox[3] <= object_bbox[1]:
return 0
intersect_xmin = max(sample_bbox[0], object_bbox[0])
intersect_ymin = max(sample_bbox[1], object_bbox[1])
intersect_xmax = min(sample_bbox[2], object_bbox[2])
intersect_ymax = min(sample_bbox[3], object_bbox[3])
intersect_size = (intersect_xmax - intersect_xmin) * (
intersect_ymax - intersect_ymin)
sample_bbox_size = bbox_area(sample_bbox)
object_bbox_size = bbox_area(object_bbox)
overlap = intersect_size / (
sample_bbox_size + object_bbox_size - intersect_size)
return overlap
def satisfy_sample_constraint(sampler,
sample_bbox,
gt_bboxes,
satisfy_all=False):
if sampler[6] == 0 and sampler[7] == 0:
return True
satisfied = []
for i in range(len(gt_bboxes)):
object_bbox = [
gt_bboxes[i][0], gt_bboxes[i][1], gt_bboxes[i][2], gt_bboxes[i][3]
]
overlap = jaccard_overlap(sample_bbox, object_bbox)
if sampler[6] != 0 and \
overlap < sampler[6]:
satisfied.append(False)
continue
if sampler[7] != 0 and \
overlap > sampler[7]:
satisfied.append(False)
continue
satisfied.append(True)
if not satisfy_all:
return True
if satisfy_all:
return np.all(satisfied)
else:
return False
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# function:
# operators to process sample,
# eg: decode/resize/crop image
from __future__ import absolute_import
from __future__ import print_function
from __future__ import division
import uuid
import logging
import random
import math
import numpy as np
import cv2
from PIL import Image, ImageEnhance
from ppdet.core.workspace import serializable
from .op_helper import (satisfy_sample_constraint, filter_and_process,
generate_sample_bbox, clip_bbox)
logger = logging.getLogger(__name__)
registered_ops = []
def register_op(cls):
registered_ops.append(cls.__name__)
if not hasattr(BaseOperator, cls.__name__):
setattr(BaseOperator, cls.__name__, cls)
else:
raise KeyError("The {} class has been registered.".format(cls.__name__))
return serializable(cls)
class BboxError(ValueError):
pass
class ImageError(ValueError):
pass
class BaseOperator(object):
def __init__(self, name=None):
if name is None:
name = self.__class__.__name__
self._id = name + '_' + str(uuid.uuid4())[-6:]
def __call__(self, sample, context=None):
""" Process a sample.
Args:
sample (dict): a dict of sample, eg: {'image':xx, 'label': xxx}
context (dict): info about this sample processing
Returns:
result (dict): a processed sample
"""
return sample
def __str__(self):
return str(self._id)
@register_op
class DecodeImage(BaseOperator):
def __init__(self, to_rgb=True, with_mixup=False):
""" Transform the image data to numpy format.
Args:
to_rgb (bool): whether to convert BGR to RGB
"""
super(DecodeImage, self).__init__()
self.to_rgb = to_rgb
self.with_mixup = with_mixup
if not isinstance(self.to_rgb, bool):
raise TypeError("{}: input type is invalid.".format(self))
if not isinstance(self.with_mixup, bool):
raise TypeError("{}: input type is invalid.".format(self))
def __call__(self, sample, context=None):
""" load image if 'im_file' field is not empty but 'image' is"""
if 'image' not in sample:
with open(sample['im_file'], 'rb') as f:
sample['image'] = f.read()
im = sample['image']
data = np.frombuffer(im, dtype='uint8')
im = cv2.imdecode(data, 1) # BGR mode, but need RGB mode
if self.to_rgb:
im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
sample['image'] = im
if 'h' not in sample:
sample['h'] = im.shape[0]
if 'w' not in sample:
sample['w'] = im.shape[1]
# make default im_info with [h, w, 1]
sample['im_info'] = np.array(
[im.shape[0], im.shape[1], 1.], dtype=np.float32)
# decode mixup image
if self.with_mixup and 'mixup' in sample:
self.__call__(sample['mixup'], context)
return sample
@register_op
class ResizeImage(BaseOperator):
def __init__(self,
target_size=0,
max_size=0,
interp=cv2.INTER_LINEAR,
use_cv2=True):
"""
Args:
target_size (int): the taregt size of image's short side
max_size (int): the max size of image
interp (int): the interpolation method
use_cv2 (bool): use the cv2 interpolation method or use PIL interpolation method
"""
super(ResizeImage, self).__init__()
self.target_size = int(target_size)
self.max_size = int(max_size)
self.interp = int(interp)
self.use_cv2 = use_cv2
if not (isinstance(self.target_size, int) and isinstance(
self.max_size, int) and isinstance(self.interp, int)):
raise TypeError("{}: input type is invalid.".format(self))
def __call__(self, sample, context=None):
""" Resise the image numpy.
"""
im = sample['image']
if not isinstance(im, np.ndarray):
raise TypeError("{}: image type is not numpy.".format(self))
if len(im.shape) != 3:
raise ImageError('{}: image is not 3-dimensional.'.format(self))
im_shape = im.shape
im_size_min = np.min(im_shape[0:2])
im_size_max = np.max(im_shape[0:2])
if float(im_size_min) == 0:
raise ZeroDivisionError('{}: min size of image is 0'.format(self))
if self.max_size != 0:
im_scale = float(self.target_size) / float(im_size_min)
# Prevent the biggest axis from being more than max_size
if np.round(im_scale * im_size_max) > self.max_size:
im_scale = float(self.max_size) / float(im_size_max)
im_scale_x = im_scale
im_scale_y = im_scale
sample['im_info'] = np.array(
[
np.round(im_shape[0] * im_scale),
np.round(im_shape[1] * im_scale), im_scale
],
dtype=np.float32)
else:
im_scale_x = float(self.target_size) / float(im_shape[1])
im_scale_y = float(self.target_size) / float(im_shape[0])
if self.use_cv2:
im = cv2.resize(
im,
None,
None,
fx=im_scale_x,
fy=im_scale_y,
interpolation=self.interp)
else:
im = Image.fromarray(im)
im = im.resize((self.target_size, self.target_size), self.interp)
im = np.array(im)
sample['image'] = im
return sample
@register_op
class RandomFlipImage(BaseOperator):
def __init__(self, prob=0.5, is_normalized=False, is_mask_flip=False):
"""
Args:
prob (float): the probability of flipping image
is_normalized (bool): whether the bbox scale to [0,1]
is_mask_flip (bool): whether flip the segmentation
"""
super(RandomFlipImage, self).__init__()
self.prob = prob
self.is_normalized = is_normalized
self.is_mask_flip = is_mask_flip
if not (isinstance(self.prob, float) and
isinstance(self.is_normalized, bool) and
isinstance(self.is_mask_flip, bool)):
raise TypeError("{}: input type is invalid.".format(self))
def flip_segms(self, segms, height, width):
def _flip_poly(poly, width):
flipped_poly = np.array(poly)
flipped_poly[0::2] = width - np.array(poly[0::2]) - 1
return flipped_poly.tolist()
def _flip_rle(rle, height, width):
if 'counts' in rle and type(rle['counts']) == list:
rle = mask_util.frPyObjects([rle], height, width)
mask = mask_util.decode(rle)
mask = mask[:, ::-1, :]
rle = mask_util.encode(np.array(mask, order='F', dtype=np.uint8))
return rle
def is_poly(segm):
assert isinstance(segm, (list, dict)), \
"Invalid segm type: {}".format(type(segm))
return isinstance(segm, list)
flipped_segms = []
for segm in segms:
if is_poly(segm):
# Polygon format
flipped_segms.append([_flip_poly(poly, width) for poly in segm])
else:
# RLE format
import pycocotools.mask as mask_util
flipped_segms.append(_flip_rle(segm, height, width))
return flipped_segms
def __call__(self, sample, context=None):
"""Filp the image and bounding box.
Operators:
1. Flip the image numpy.
2. Transform the bboxes' x coordinates.
(Must judge whether the coordinates are normalized!)
3. Transform the segmentations' x coordinates.
(Must judge whether the coordinates are normalized!)
Output:
sample: the image, bounding box and segmentation part
in sample are flipped.
"""
gt_bbox = sample['gt_bbox']
im = sample['image']
if not isinstance(im, np.ndarray):
raise TypeError("{}: image is not a numpy array.".format(self))
if len(im.shape) != 3:
raise ImageError("{}: image is not 3-dimensional.".format(self))
height, width, _ = im.shape
if np.random.uniform(0, 1) < self.prob:
im = im[:, ::-1, :]
if gt_bbox.shape[0] == 0:
return sample
oldx1 = gt_bbox[:, 0].copy()
oldx2 = gt_bbox[:, 2].copy()
if self.is_normalized:
gt_bbox[:, 0] = 1 - oldx2
gt_bbox[:, 2] = 1 - oldx1
else:
gt_bbox[:, 0] = width - oldx2 - 1
gt_bbox[:, 2] = width - oldx1 - 1
if gt_bbox.shape[0] != 0 and (gt_bbox[:, 2] < gt_bbox[:, 0]).all():
m = "{}: invalid box, x2 should be greater than x1".format(self)
raise BboxError(m)
sample['gt_bbox'] = gt_bbox
if self.is_mask_flip and len(sample['gt_poly']) != 0:
sample['gt_poly'] = self.flip_segms(sample['gt_poly'], height,
width)
sample['flipped'] = True
sample['image'] = im
return sample
@register_op
class NormalizeImage(BaseOperator):
def __init__(self,
mean=[0.485, 0.456, 0.406],
std=[1, 1, 1],
is_scale=True,
is_channel_first=True):
"""
Args:
mean (list): the pixel mean
std (list): the pixel variance
"""
super(NormalizeImage, self).__init__()
self.mean = mean
self.std = std
self.is_scale = is_scale
self.is_channel_first = is_channel_first
if not (isinstance(self.mean, list) and isinstance(self.std, list) and
isinstance(self.is_scale, bool)):
raise TypeError("{}: input type is invalid.".format(self))
from functools import reduce
if reduce(lambda x, y: x * y, self.std) == 0:
raise ValueError('{}: std is invalid!'.format(self))
def __call__(self, sample, context=None):
"""Normalize the image.
Operators:
1.(optional) Scale the image to [0,1]
2. Each pixel minus mean and is divided by std
"""
im = sample['image']
im = im.astype(np.float32, copy=False)
if self.is_channel_first:
mean = np.array(self.mean)[:, np.newaxis, np.newaxis]
std = np.array(self.std)[:, np.newaxis, np.newaxis]
else:
mean = np.array(self.mean)[np.newaxis, np.newaxis, :]
std = np.array(self.std)[np.newaxis, np.newaxis, :]
if self.is_scale:
im = im / 255.0
im -= mean
im /= std
sample['image'] = im
return sample
@register_op
class RandomDistort(BaseOperator):
def __init__(self,
brightness_lower=0.5,
brightness_upper=1.5,
contrast_lower=0.5,
contrast_upper=1.5,
saturation_lower=0.5,
saturation_upper=1.5,
hue_lower=-18,
hue_upper=18,
brightness_prob=0.5,
contrast_prob=0.5,
saturation_prob=0.5,
hue_prob=0.5,
count=4,
is_order=False):
"""
Args:
brightness_lower/ brightness_upper (float): the brightness
between brightness_lower and brightness_upper
contrast_lower/ contrast_upper (float): the contrast between
contrast_lower and contrast_lower
saturation_lower/ saturation_upper (float): the saturation
between saturation_lower and saturation_upper
hue_lower/ hue_upper (float): the hue between
hue_lower and hue_upper
brightness_prob (float): the probability of changing brightness
contrast_prob (float): the probability of changing contrast
saturation_prob (float): the probability of changing saturation
hue_prob (float): the probability of changing hue
count (int): the kinds of doing distrot
is_order (bool): whether determine the order of distortion
"""
super(RandomDistort, self).__init__()
self.brightness_lower = brightness_lower
self.brightness_upper = brightness_upper
self.contrast_lower = contrast_lower
self.contrast_upper = contrast_upper
self.saturation_lower = saturation_lower
self.saturation_upper = saturation_upper
self.hue_lower = hue_lower
self.hue_upper = hue_upper
self.brightness_prob = brightness_prob
self.contrast_prob = contrast_prob
self.saturation_prob = saturation_prob
self.hue_prob = hue_prob
self.count = count
self.is_order = is_order
def random_brightness(self, img):
brightness_delta = np.random.uniform(self.brightness_lower,
self.brightness_upper)
prob = np.random.uniform(0, 1)
if prob < self.brightness_prob:
img = ImageEnhance.Brightness(img).enhance(brightness_delta)
return img
def random_contrast(self, img):
contrast_delta = np.random.uniform(self.contrast_lower,
self.contrast_upper)
prob = np.random.uniform(0, 1)
if prob < self.contrast_prob:
img = ImageEnhance.Contrast(img).enhance(contrast_delta)
return img
def random_saturation(self, img):
saturation_delta = np.random.uniform(self.saturation_lower,
self.saturation_upper)
prob = np.random.uniform(0, 1)
if prob < self.saturation_prob:
img = ImageEnhance.Color(img).enhance(saturation_delta)
return img
def random_hue(self, img):
hue_delta = np.random.uniform(self.hue_lower, self.hue_upper)
prob = np.random.uniform(0, 1)
if prob < self.hue_prob:
img = np.array(img.convert('HSV'))
img[:, :, 0] = img[:, :, 0] + hue_delta
img = Image.fromarray(img, mode='HSV').convert('RGB')
return img
def __call__(self, sample, context):
"""random distort the image"""
ops = [
self.random_brightness, self.random_contrast,
self.random_saturation, self.random_hue
]
if self.is_order:
prob = np.random.uniform(0, 1)
if prob < 0.5:
ops = [
self.random_brightness,
self.random_saturation,
self.random_hue,
self.random_contrast,
]
else:
ops = random.sample(ops, self.count)
assert 'image' in sample, "image data not found"
im = sample['image']
im = Image.fromarray(im)
for id in range(self.count):
im = ops[id](im)
im = np.asarray(im)
sample['image'] = im
return sample
@register_op
class ExpandImage(BaseOperator):
def __init__(self, max_ratio, prob, mean=[127.5, 127.5, 127.5]):
"""
Args:
ratio (float): the ratio of expanding
prob (float): the probability of expanding image
mean (list): the pixel mean
"""
super(ExpandImage, self).__init__()
self.max_ratio = max_ratio
self.mean = mean
self.prob = prob
def __call__(self, sample, context):
"""
Expand the image and modify bounding box.
Operators:
1. Scale the image weight and height.
2. Construct new images with new height and width.
3. Fill the new image with the mean.
4. Put original imge into new image.
5. Rescale the bounding box.
6. Determine if the new bbox is satisfied in the new image.
Returns:
sample: the image, bounding box are replaced.
"""
prob = np.random.uniform(0, 1)
assert 'image' in sample, 'not found image data'
im = sample['image']
gt_bbox = sample['gt_bbox']
gt_class = sample['gt_class']
im_width = sample['w']
im_height = sample['h']
if prob < self.prob:
if self.max_ratio - 1 >= 0.01:
expand_ratio = np.random.uniform(1, self.max_ratio)
height = int(im_height * expand_ratio)
width = int(im_width * expand_ratio)
h_off = math.floor(np.random.uniform(0, height - im_height))
w_off = math.floor(np.random.uniform(0, width - im_width))
expand_bbox = [
-w_off / im_width, -h_off / im_height,
(width - w_off) / im_width, (height - h_off) / im_height
]
expand_im = np.ones((height, width, 3))
expand_im = np.uint8(expand_im * np.squeeze(self.mean))
expand_im = Image.fromarray(expand_im)
im = Image.fromarray(im)
expand_im.paste(im, (int(w_off), int(h_off)))
expand_im = np.asarray(expand_im)
gt_bbox, gt_class, _ = filter_and_process(expand_bbox, gt_bbox,
gt_class)
sample['image'] = expand_im
sample['gt_bbox'] = gt_bbox
sample['gt_class'] = gt_class
sample['w'] = width
sample['h'] = height
return sample
@register_op
class CropImage(BaseOperator):
def __init__(self, batch_sampler, satisfy_all=False, avoid_no_bbox=True):
"""
Args:
batch_sampler (list): Multiple sets of different
parameters for cropping.
satisfy_all (bool): whether all boxes must satisfy.
avoid_no_bbox (bool): whether to to avoid the
situation where the box does not appear.
e.g.[[1, 1, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0],
[1, 50, 0.3, 1.0, 0.5, 2.0, 0.1, 1.0],
[1, 50, 0.3, 1.0, 0.5, 2.0, 0.3, 1.0],
[1, 50, 0.3, 1.0, 0.5, 2.0, 0.5, 1.0],
[1, 50, 0.3, 1.0, 0.5, 2.0, 0.7, 1.0],
[1, 50, 0.3, 1.0, 0.5, 2.0, 0.9, 1.0],
[1, 50, 0.3, 1.0, 0.5, 2.0, 0.0, 1.0]]
[max sample, max trial, min scale, max scale,
min aspect ratio, max aspect ratio,
min overlap, max overlap]
"""
super(CropImage, self).__init__()
self.batch_sampler = batch_sampler
self.satisfy_all = satisfy_all
self.avoid_no_bbox = avoid_no_bbox
def __call__(self, sample, context):
"""
Crop the image and modify bounding box.
Operators:
1. Scale the image weight and height.
2. Crop the image according to a radom sample.
3. Rescale the bounding box.
4. Determine if the new bbox is satisfied in the new image.
Returns:
sample: the image, bounding box are replaced.
"""
assert 'image' in sample, "image data not found"
im = sample['image']
gt_bbox = sample['gt_bbox']
gt_class = sample['gt_class']
im_width = sample['w']
im_height = sample['h']
gt_score = sample['gt_score']
sampled_bbox = []
gt_bbox = gt_bbox.tolist()
for sampler in self.batch_sampler:
found = 0
for i in range(sampler[1]):
if found >= sampler[0]:
break
sample_bbox = generate_sample_bbox(sampler)
if satisfy_sample_constraint(sampler, sample_bbox, gt_bbox,
self.satisfy_all):
sampled_bbox.append(sample_bbox)
found = found + 1
im = np.array(im)
while sampled_bbox:
idx = int(np.random.uniform(0, len(sampled_bbox)))
sample_bbox = sampled_bbox.pop(idx)
sample_bbox = clip_bbox(sample_bbox)
crop_bbox, crop_class, crop_score = \
filter_and_process(sample_bbox, gt_bbox, gt_class, gt_score)
if self.avoid_no_bbox:
if len(crop_bbox) < 1:
continue
xmin = int(sample_bbox[0] * im_width)
xmax = int(sample_bbox[2] * im_width)
ymin = int(sample_bbox[1] * im_height)
ymax = int(sample_bbox[3] * im_height)
im = im[ymin:ymax, xmin:xmax]
sample['image'] = im
sample['gt_bbox'] = crop_bbox
sample['gt_class'] = crop_class
sample['gt_score'] = crop_score
return sample
return sample
@register_op
class NormalizeBox(BaseOperator):
"""Transform the bounding box's coornidates to [0,1]."""
def __init__(self):
super(NormalizeBox, self).__init__()
def __call__(self, sample, context):
gt_bbox = sample['gt_bbox']
width = sample['w']
height = sample['h']
for i in range(gt_bbox.shape[0]):
gt_bbox[i][0] = gt_bbox[i][0] / width
gt_bbox[i][1] = gt_bbox[i][1] / height
gt_bbox[i][2] = gt_bbox[i][2] / width
gt_bbox[i][3] = gt_bbox[i][3] / height
sample['gt_bbox'] = gt_bbox
return sample
@register_op
class Permute(BaseOperator):
def __init__(self, to_bgr=True, channel_first=True):
"""
Change the channel.
Args:
to_bgr (bool): confirm whether to convert RGB to BGR
channel_first (bool): confirm whether to change channel
"""
super(Permute, self).__init__()
self.to_bgr = to_bgr
self.channel_first = channel_first
if not (isinstance(self.to_bgr, bool) and
isinstance(self.channel_first, bool)):
raise TypeError("{}: input type is invalid.".format(self))
def __call__(self, sample, context=None):
assert 'image' in sample, "image data not found"
im = sample['image']
if self.channel_first:
im = np.swapaxes(im, 1, 2)
im = np.swapaxes(im, 1, 0)
if self.to_bgr:
im = im[[2, 1, 0], :, :]
sample['image'] = im
return sample
@register_op
class MixupImage(BaseOperator):
def __init__(self, alpha=1.5, beta=1.5):
""" Mixup image and gt_bbbox/gt_score
Args:
alpha (float): alpha parameter of beta distribute
beta (float): beta parameter of beta distribute
"""
super(MixupImage, self).__init__()
self.alpha = alpha
self.beta = beta
if self.alpha <= 0.0:
raise ValueError("alpha shold be positive in {}".format(self))
if self.beta <= 0.0:
raise ValueError("beta shold be positive in {}".format(self))
def _mixup_img(self, img1, img2, factor):
h = max(img1.shape[0], img2.shape[0])
w = max(img1.shape[1], img2.shape[1])
img = np.zeros((h, w, img1.shape[2]), 'float32')
img[:img1.shape[0], :img1.shape[1], :] = \
img1.astype('float32') * factor
img[:img2.shape[0], :img2.shape[1], :] += \
img2.astype('float32') * (1.0 - factor)
return img.astype('uint8')
def __call__(self, sample, context=None):
if 'mixup' not in sample:
return sample
factor = np.random.beta(self.alpha, self.beta)
factor = max(0.0, min(1.0, factor))
if factor >= 1.0:
sample.pop('mixup')
return sample
if factor <= 0.0:
return sample['mixup']
im = self._mixup_img(sample['image'], sample['mixup']['image'], factor)
gt_bbox1 = sample['gt_bbox']
gt_bbox2 = sample['mixup']['gt_bbox']
gt_bbox = np.concatenate((gt_bbox1, gt_bbox2), axis=0)
gt_class1 = sample['gt_class']
gt_class2 = sample['mixup']['gt_class']
gt_class = np.concatenate((gt_class1, gt_class2), axis=0)
gt_score1 = sample['gt_score']
gt_score2 = sample['mixup']['gt_score']
gt_score = np.concatenate(
(gt_score1 * factor, gt_score2 * (1. - factor)), axis=0)
sample['image'] = im
sample['gt_bbox'] = gt_bbox
sample['gt_score'] = gt_score
sample['gt_class'] = gt_class
sample['h'] = im.shape[0]
sample['w'] = im.shape[1]
sample.pop('mixup')
return sample
@register_op
class RandomInterpImage(BaseOperator):
def __init__(self, target_size=0, max_size=0):
"""
Random reisze image by multiply interpolate method.
Args:
target_size (int): the taregt size of image's short side
max_size (int): the max size of image
"""
super(RandomInterpImage, self).__init__()
self.target_size = target_size
self.max_size = max_size
if not (isinstance(self.target_size, int) and
isinstance(self.max_size, int)):
raise TypeError('{}: input type is invalid.'.format(self))
interps = [
cv2.INTER_NEAREST,
cv2.INTER_LINEAR,
cv2.INTER_AREA,
cv2.INTER_CUBIC,
cv2.INTER_LANCZOS4,
]
self.resizers = []
for interp in interps:
self.resizers.append(ResizeImage(target_size, max_size, interp))
def __call__(self, sample, context=None):
"""Resise the image numpy by random resizer."""
resizer = random.choice(self.resizers)
return resizer(sample, context)
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# function:
# transform samples in 'source' using 'mapper'
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import sys
import six
import uuid
import logging
import signal
import threading
from .transformer import ProxiedDataset
logger = logging.getLogger(__name__)
class EndSignal(object):
def __init__(self, errno=0, errmsg=''):
self.errno = errno
self.errmsg = errmsg
class ParallelMappedDataset(ProxiedDataset):
"""
Transform samples to mapped samples which is similar to 'basic.MappedDataset',
but multiple workers (threads or processes) will be used
Notes:
this class is not thread-safe
"""
def __init__(self, source, mapper, worker_args):
super(ParallelMappedDataset, self).__init__(source)
worker_args = {k.lower(): v for k, v in worker_args.items()}
args = {'bufsize': 100, 'worker_num': 8}
args.update(worker_args)
self._worker_args = args
self._started = False
self._source = source
self._mapper = mapper
self._exit = False
self._setup()
def _setup(self):
"""setup input/output queues and workers """
use_process = False
if 'use_process' in self._worker_args:
use_process = self._worker_args['use_process']
bufsize = self._worker_args['bufsize']
if use_process:
from .shared_queue import SharedQueue as Queue
from multiprocessing import Process as Worker
from multiprocessing import Event
else:
if six.PY3:
from queue import Queue
else:
from Queue import Queue
from threading import Thread as Worker
from threading import Event
self._inq = Queue(bufsize)
self._outq = Queue(bufsize)
consumer_num = self._worker_args['worker_num']
id = str(uuid.uuid4())[-3:]
self._producer = threading.Thread(
target=self._produce,
args=('producer-' + id, self._source, self._inq))
self._producer.daemon = True
self._consumers = []
for i in range(consumer_num):
p = Worker(
target=self._consume,
args=('consumer-' + id + '_' + str(i), self._inq, self._outq,
self._mapper))
self._consumers.append(p)
p.daemon = True
self._epoch = -1
self._feeding_ev = Event()
self._produced = 0 # produced sample in self._produce
self._consumed = 0 # consumed sample in self.next
self._stopped_consumers = 0
def _produce(self, id, source, inq):
"""Fetch data from source and feed it to 'inq' queue"""
while True:
self._feeding_ev.wait()
if self._exit:
break
try:
inq.put(source.next())
self._produced += 1
except StopIteration:
self._feeding_ev.clear()
self._feeding_ev.wait() # wait other guy to wake up me
logger.debug("producer[{}] starts new epoch".format(id))
except Exception as e:
msg = "producer[{}] failed with error: {}".format(id, str(e))
inq.put(EndSignal(-1, msg))
break
logger.debug("producer[{}] exits".format(id))
def _consume(self, id, inq, outq, mapper):
"""Fetch data from 'inq', process it and put result to 'outq'"""
while True:
sample = inq.get()
if isinstance(sample, EndSignal):
sample.errmsg += "[consumer[{}] exits]".format(id)
outq.put(sample)
logger.debug("end signal received, " +
"consumer[{}] exits".format(id))
break
try:
result = mapper(sample)
outq.put(result)
except Exception as e:
msg = 'failed to map consumer[%s], error: {}'.format(str(e), id)
outq.put(EndSignal(-1, msg))
break
def drained(self):
assert self._epoch >= 0, "first epoch has not started yet"
return self._source.drained() and self._produced == self._consumed
def stop(self):
""" notify to exit
"""
self._exit = True
self._feeding_ev.set()
for _ in range(len(self._consumers)):
self._inq.put(EndSignal(0, "notify consumers to exit"))
def next(self):
""" get next transformed sample
"""
if self._epoch < 0:
self.reset()
if self.drained():
raise StopIteration()
while True:
sample = self._outq.get()
if isinstance(sample, EndSignal):
self._stopped_consumers += 1
if sample.errno != 0:
logger.warn("consumer failed with error: {}".format(
sample.errmsg))
if self._stopped_consumers < len(self._consumers):
self._inq.put(sample)
else:
raise ValueError("all consumers exited, no more samples")
else:
self._consumed += 1
return sample
def reset(self):
""" reset for a new epoch of samples
"""
if self._epoch < 0:
self._epoch = 0
for p in self._consumers:
p.start()
self._producer.start()
else:
if not self.drained():
logger.warn("do not reset before epoch[%d] finishes".format(
self._epoch))
self._produced = self._produced - self._consumed
else:
self._produced = 0
self._epoch += 1
assert self._stopped_consumers == 0, "some consumers already exited," \
+ " cannot start another epoch"
self._source.reset()
self._consumed = 0
self._feeding_ev.set()
# FIXME(dengkaipeng): fix me if you have better impliment
# handle terminate reader process, do not print stack frame
def _reader_exit(signum, frame):
logger.debug("Reader process exit.")
sys.exit()
signal.signal(signal.SIGTERM, _reader_exit)
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import logging
import cv2
import numpy as np
logger = logging.getLogger(__name__)
def build_post_map(coarsest_stride=1,
is_padding=False,
random_shapes=[],
multi_scales=[],
use_padded_im_info=False):
"""
Build a mapper for post-processing batches
Args:
config (dict of parameters):
{
coarsest_stride (int): stride of the coarsest FPN level
is_padding (bool): whether to padding in minibatch
random_shapes: (list of int): resize to image to random
shapes, [] for not resize.
multi_scales: (list of int): resize image by random
scales, [] for not resize.
}
Returns:
a mapper function which accept one argument 'batch' and
return the processed result
"""
def padding_minibatch(batch_data):
if len(batch_data) == 1 and coarsest_stride == 1:
return batch_data
max_shape = np.array([data[0].shape for data in batch_data]).max(axis=0)
if coarsest_stride > 1:
max_shape[1] = int(
np.ceil(max_shape[1] / coarsest_stride) * coarsest_stride)
max_shape[2] = int(
np.ceil(max_shape[2] / coarsest_stride) * coarsest_stride)
padding_batch = []
for data in batch_data:
im_c, im_h, im_w = data[0].shape[:]
padding_im = np.zeros(
(im_c, max_shape[1], max_shape[2]), dtype=np.float32)
padding_im[:, :im_h, :im_w] = data[0]
if use_padded_im_info:
data[1][:2] = max_shape[1:3]
padding_batch.append((padding_im, ) + data[1:])
return padding_batch
def random_shape(batch_data):
# For YOLO: gt_bbox is normalized, is scale invariant.
shape = np.random.choice(random_shapes)
scaled_batch = []
h, w = batch_data[0][0].shape[1:3]
scale_x = float(shape) / w
scale_y = float(shape) / h
for data in batch_data:
im = cv2.resize(
data[0].transpose((1, 2, 0)),
None,
None,
fx=scale_x,
fy=scale_y,
interpolation=cv2.INTER_NEAREST)
scaled_batch.append((im.transpose(2, 0, 1), ) + data[1:])
return scaled_batch
def multi_scale_resize(batch_data):
# For RCNN: image shape in record in im_info.
scale = np.random.choice(multi_scales)
scaled_batch = []
for data in batch_data:
im = cv2.resize(
data[0].transpose((1, 2, 0)),
None,
None,
fx=scale,
fy=scale,
interpolation=cv2.INTER_NEAREST)
im_info = [im.shape[:2], scale]
scaled_batch.append((im.transpose(2, 0, 1), im_info) + data[2:])
return scaled_batch
def _mapper(batch_data):
try:
if is_padding:
batch_data = padding_minibatch(batch_data)
if len(random_shapes) > 0:
batch_data = random_shape(batch_data)
if len(multi_scales) > 0:
batch_data = multi_scale_resize(batch_data)
except Exception as e:
errmsg = "post-process failed with error: " + str(e)
logger.warn(errmsg)
raise e
return batch_data
return _mapper
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
__all__ = ['SharedBuffer', 'SharedMemoryMgr', 'SharedQueue']
from .sharedmemory import SharedBuffer
from .sharedmemory import SharedMemoryMgr
from .sharedmemory import SharedMemoryError
from .queue import SharedQueue
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import sys
import six
if six.PY3:
import pickle
from io import BytesIO as StringIO
else:
import cPickle as pickle
from cStringIO import StringIO
import logging
import traceback
import multiprocessing as mp
from multiprocessing.queues import Queue
from .sharedmemory import SharedMemoryMgr
logger = logging.getLogger(__name__)
class SharedQueueError(ValueError):
""" SharedQueueError
"""
pass
class SharedQueue(Queue):
""" a Queue based on shared memory to communicate data between Process,
and it's interface is compatible with 'multiprocessing.queues.Queue'
"""
def __init__(self, maxsize=0, mem_mgr=None, memsize=None, pagesize=None):
""" init
"""
if six.PY3:
super(SharedQueue, self).__init__(maxsize, ctx=mp.get_context())
else:
super(SharedQueue, self).__init__(maxsize)
if mem_mgr is not None:
self._shared_mem = mem_mgr
else:
self._shared_mem = SharedMemoryMgr(
capacity=memsize, pagesize=pagesize)
def put(self, obj, **kwargs):
""" put an object to this queue
"""
obj = pickle.dumps(obj, -1)
buff = None
try:
buff = self._shared_mem.malloc(len(obj))
buff.put(obj)
super(SharedQueue, self).put(buff, **kwargs)
except Exception as e:
stack_info = traceback.format_exc()
err_msg = 'failed to put a element to SharedQueue '\
'with stack info[%s]' % (stack_info)
logger.warn(err_msg)
if buff is not None:
buff.free()
raise e
def get(self, **kwargs):
""" get an object from this queue
"""
buff = None
try:
buff = super(SharedQueue, self).get(**kwargs)
data = buff.get()
return pickle.load(StringIO(data))
except Exception as e:
stack_info = traceback.format_exc()
err_msg = 'failed to get element from SharedQueue '\
'with stack info[%s]' % (stack_info)
logger.warn(err_msg)
raise e
finally:
if buff is not None:
buff.free()
def release(self):
self._shared_mem.release()
self._shared_mem = None
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# utils for memory management which is allocated on sharedmemory,
# note that these structures may not be thread-safe
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import os
import time
import math
import struct
import sys
import six
if six.PY3:
import pickle
else:
import cPickle as pickle
import json
import uuid
import random
import numpy as np
import weakref
import logging
from multiprocessing import Lock
from multiprocessing import RawArray
logger = logging.getLogger(__name__)
class SharedMemoryError(ValueError):
""" SharedMemoryError
"""
pass
class SharedBufferError(SharedMemoryError):
""" SharedBufferError
"""
pass
class MemoryFullError(SharedMemoryError):
""" MemoryFullError
"""
def __init__(self, errmsg=''):
super(MemoryFullError, self).__init__()
self.errmsg = errmsg
def memcopy(dst, src, offset=0, length=None):
""" copy data from 'src' to 'dst' in bytes
"""
length = length if length is not None else len(src)
assert type(dst) == np.ndarray, 'invalid type for "dst" in memcopy'
if type(src) is not np.ndarray:
if type(src) is str and six.PY3:
src = src.encode()
src = np.frombuffer(src, dtype='uint8', count=len(src))
dst[:] = src[offset:offset + length]
class SharedBuffer(object):
""" Buffer allocated from SharedMemoryMgr, and it stores data on shared memory
note that:
every instance of this should be freed explicitely by calling 'self.free'
"""
def __init__(self, owner, capacity, pos, size=0, alloc_status=''):
""" Init
Args:
owner (str): manager to own this buffer
capacity (int): capacity in bytes for this buffer
pos (int): page position in shared memory
size (int): bytes already used
alloc_status (str): debug info about allocator when allocate this
"""
self._owner = owner
self._cap = capacity
self._pos = pos
self._size = size
self._alloc_status = alloc_status
assert self._pos >= 0 and self._cap > 0, \
"invalid params[%d:%d] to construct SharedBuffer" \
% (self._pos, self._cap)
def owner(self):
""" get owner
"""
return SharedMemoryMgr.get_mgr(self._owner)
def put(self, data, override=False):
""" put data to this buffer
Args:
data (str): data to be stored in this buffer
Returns:
None
Raises:
SharedMemoryError when not enough space in this buffer
"""
assert type(data) in [str, bytes], \
'invalid type[%s] for SharedBuffer::put' % (str(type(data)))
if self._size > 0 and not override:
raise SharedBufferError('already has already been setted before')
if self.capacity() < len(data):
raise SharedBufferError('data[%d] is larger than size of buffer[%s]'\
% (len(data), str(self)))
self.owner().put_data(self, data)
self._size = len(data)
def get(self, offset=0, size=None, no_copy=True):
""" get the data stored this buffer
Args:
offset (int): position for the start point to 'get'
size (int): size to get
Returns:
data (np.ndarray('uint8')): user's data in numpy
which is passed in by 'put'
None: if no data stored in
"""
offset = offset if offset >= 0 else self._size + offset
if self._size <= 0:
return None
size = self._size if size is None else size
assert offset + size <= self._cap, 'invalid offset[%d] '\
'or size[%d] for capacity[%d]' % (offset, size, self._cap)
return self.owner().get_data(self, offset, size, no_copy=no_copy)
def size(self):
""" bytes of used memory
"""
return self._size
def resize(self, size):
""" resize the used memory to 'size', should not be greater than capacity
"""
assert size >= 0 and size <= self._cap, \
"invalid size[%d] for resize" % (size)
self._size = size
def capacity(self):
""" size of allocated memory
"""
return self._cap
def __str__(self):
""" human readable format
"""
return "SharedBuffer(owner:%s, pos:%d, size:%d, "\
"capacity:%d, alloc_status:[%s], pid:%d)" \
% (str(self._owner), self._pos, self._size, \
self._cap, self._alloc_status, os.getpid())
def free(self):
""" free this buffer to it's owner
"""
if self._owner is not None:
self.owner().free(self)
self._owner = None
self._cap = 0
self._pos = -1
self._size = 0
return True
else:
return False
class PageAllocator(object):
""" allocator used to malloc and free shared memory which
is split into pages
"""
s_allocator_header = 12
def __init__(self, base, total_pages, page_size):
""" init
"""
self._magic_num = 1234321000 + random.randint(100, 999)
self._base = base
self._total_pages = total_pages
self._page_size = page_size
header_pages = int(
math.ceil((total_pages + self.s_allocator_header) / page_size))
self._header_pages = header_pages
self._free_pages = total_pages - header_pages
self._header_size = self._header_pages * page_size
self._reset()
def _dump_alloc_info(self, fname):
hpages, tpages, pos, used = self.header()
start = self.s_allocator_header
end = start + self._page_size * hpages
alloc_flags = self._base[start:end].tostring()
info = {
'magic_num': self._magic_num,
'header_pages': hpages,
'total_pages': tpages,
'pos': pos,
'used': used
}
info['alloc_flags'] = alloc_flags
fname = fname + '.' + str(uuid.uuid4())[:6]
with open(fname, 'wb') as f:
f.write(pickle.dumps(info, -1))
logger.warn('dump alloc info to file[%s]' % (fname))
def _reset(self):
alloc_page_pos = self._header_pages
used_pages = self._header_pages
header_info = struct.pack(
str('III'), self._magic_num, alloc_page_pos, used_pages)
assert len(header_info) == self.s_allocator_header, \
'invalid size of header_info'
memcopy(self._base[0:self.s_allocator_header], header_info)
self.set_page_status(0, self._header_pages, '1')
self.set_page_status(self._header_pages, self._free_pages, '0')
def header(self):
""" get header info of this allocator
"""
header_str = self._base[0:self.s_allocator_header].tostring()
magic, pos, used = struct.unpack(str('III'), header_str)
assert magic == self._magic_num, \
'invalid header magic[%d] in shared memory' % (magic)
return self._header_pages, self._total_pages, pos, used
def empty(self):
""" are all allocatable pages available
"""
header_pages, pages, pos, used = self.header()
return header_pages == used
def full(self):
""" are all allocatable pages used
"""
header_pages, pages, pos, used = self.header()
return header_pages + used == pages
def __str__(self):
header_pages, pages, pos, used = self.header()
desc = '{page_info[magic:%d,total:%d,used:%d,header:%d,alloc_pos:%d,pagesize:%d]}' \
% (self._magic_num, pages, used, header_pages, pos, self._page_size)
return 'PageAllocator:%s' % (desc)
def set_alloc_info(self, alloc_pos, used_pages):
""" set allocating position to new value
"""
memcopy(self._base[4:12], struct.pack(str('II'), alloc_pos, used_pages))
def set_page_status(self, start, page_num, status):
""" set pages from 'start' to 'end' with new same status 'status'
"""
assert status in ['0', '1'], 'invalid status[%s] for page status '\
'in allocator[%s]' % (status, str(self))
start += self.s_allocator_header
end = start + page_num
assert start >= 0 and end <= self._header_size, 'invalid end[%d] of pages '\
'in allocator[%s]' % (end, str(self))
memcopy(self._base[start:end], str(status * page_num))
def get_page_status(self, start, page_num, ret_flag=False):
start += self.s_allocator_header
end = start + page_num
assert start >= 0 and end <= self._header_size, 'invalid end[%d] of pages '\
'in allocator[%s]' % (end, str(self))
status = self._base[start:end].tostring().decode()
if ret_flag:
return status
zero_num = status.count('0')
if zero_num == 0:
return (page_num, 1)
else:
return (zero_num, 0)
def malloc_page(self, page_num):
header_pages, pages, pos, used = self.header()
end = pos + page_num
if end > pages:
pos = self._header_pages
end = pos + page_num
start_pos = pos
flags = ''
while True:
# maybe flags already has some '0' pages,
# so just check 'page_num - len(flags)' pages
flags += self.get_page_status(
pos, page_num - len(flags), ret_flag=True)
if flags.count('0') == page_num:
break
# not found enough pages, so shift to next few pages
free_pos = flags.rfind('1') + 1
flags = flags[free_pos:]
pos += free_pos
end = pos + page_num
if end > pages:
pos = self._header_pages
end = pos + page_num
flags = ''
# not found available pages after scan all pages
if pos <= start_pos and end >= start_pos:
logger.debug('not found available pages after scan all pages')
break
page_status = (flags.count('0'), 0)
if page_status != (page_num, 0):
free_pages = self._total_pages - used
if free_pages == 0:
err_msg = 'all pages have been used:%s' % (str(self))
else:
err_msg = 'not found available pages with page_status[%s] '\
'and %d free pages' % (str(page_status), free_pages)
err_msg = 'failed to malloc %d pages at pos[%d] for reason[%s] and allocator status[%s]' \
% (page_num, pos, err_msg, str(self))
raise MemoryFullError(err_msg)
self.set_page_status(pos, page_num, '1')
used += page_num
self.set_alloc_info(end, used)
assert self.get_page_status(pos, page_num) == (page_num, 1), \
'faild to validate the page status'
return pos
def free_page(self, start, page_num):
""" free 'page_num' pages start from 'start'
"""
page_status = self.get_page_status(start, page_num)
assert page_status == (page_num, 1), \
'invalid status[%s] when free [%d, %d]' \
% (str(page_status), start, page_num)
self.set_page_status(start, page_num, '0')
_, _, pos, used = self.header()
used -= page_num
self.set_alloc_info(pos, used)
DEFAULT_SHARED_MEMORY_SIZE = 1024 * 1024 * 1024
class SharedMemoryMgr(object):
""" manage a continouse block of memory, provide
'malloc' to allocate new buffer, and 'free' to free buffer
"""
s_memory_mgrs = weakref.WeakValueDictionary()
s_mgr_num = 0
s_log_statis = False
@classmethod
def get_mgr(cls, id):
""" get a SharedMemoryMgr with size of 'capacity'
"""
assert id in cls.s_memory_mgrs, 'invalid id[%s] for memory managers' % (
id)
return cls.s_memory_mgrs[id]
def __init__(self, capacity=None, pagesize=None):
""" init
"""
logger.debug('create SharedMemoryMgr')
pagesize = 64 * 1024 if pagesize is None else pagesize
assert type(pagesize) is int, "invalid type of pagesize[%s]" \
% (str(pagesize))
capacity = DEFAULT_SHARED_MEMORY_SIZE if capacity is None else capacity
assert type(capacity) is int, "invalid type of capacity[%s]" \
% (str(capacity))
assert capacity > 0, '"size of shared memory should be greater than 0'
self._released = False
self._cap = capacity
self._page_size = pagesize
assert self._cap % self._page_size == 0, \
"capacity[%d] and pagesize[%d] are not consistent" \
% (self._cap, self._page_size)
self._total_pages = self._cap // self._page_size
self._pid = os.getpid()
SharedMemoryMgr.s_mgr_num += 1
self._id = self._pid * 100 + SharedMemoryMgr.s_mgr_num
SharedMemoryMgr.s_memory_mgrs[self._id] = self
self._locker = Lock()
self._setup()
def _setup(self):
self._shared_mem = RawArray('c', self._cap)
self._base = np.frombuffer(
self._shared_mem, dtype='uint8', count=self._cap)
self._locker.acquire()
try:
self._allocator = PageAllocator(self._base, self._total_pages,
self._page_size)
finally:
self._locker.release()
def malloc(self, size, wait=True):
""" malloc a new SharedBuffer
Args:
size (int): buffer size to be malloc
wait (bool): whether to wait when no enough memory
Returns:
SharedBuffer
Raises:
SharedMemoryError when not found available memory
"""
page_num = int(math.ceil(size / self._page_size))
size = page_num * self._page_size
start = None
ct = 0
errmsg = ''
while True:
self._locker.acquire()
try:
start = self._allocator.malloc_page(page_num)
alloc_status = str(self._allocator)
except MemoryFullError as e:
start = None
errmsg = e.errmsg
if not wait:
raise e
finally:
self._locker.release()
if start is None:
time.sleep(0.1)
if ct % 100 == 0:
logger.warn('not enough space for reason[%s]' % (errmsg))
ct += 1
else:
break
return SharedBuffer(self._id, size, start, alloc_status=alloc_status)
def free(self, shared_buf):
""" free a SharedBuffer
Args:
shared_buf (SharedBuffer): buffer to be freed
Returns:
None
Raises:
SharedMemoryError when failed to release this buffer
"""
assert shared_buf._owner == self._id, "invalid shared_buf[%s] "\
"for it's not allocated from me[%s]" % (str(shared_buf), str(self))
cap = shared_buf.capacity()
start_page = shared_buf._pos
page_num = cap // self._page_size
#maybe we don't need this lock here
self._locker.acquire()
try:
self._allocator.free_page(start_page, page_num)
finally:
self._locker.release()
def put_data(self, shared_buf, data):
""" fill 'data' into 'shared_buf'
"""
assert len(data) <= shared_buf.capacity(), 'too large data[%d] '\
'for this buffer[%s]' % (len(data), str(shared_buf))
start = shared_buf._pos * self._page_size
end = start + len(data)
assert start >= 0 and end <= self._cap, "invalid start "\
"position[%d] when put data to buff:%s" % (start, str(shared_buf))
self._base[start:end] = np.frombuffer(data, 'uint8', len(data))
def get_data(self, shared_buf, offset, size, no_copy=True):
""" extract 'data' from 'shared_buf' in range [offset, offset + size)
"""
start = shared_buf._pos * self._page_size
start += offset
if no_copy:
return self._base[start:start + size]
else:
return self._base[start:start + size].tostring()
def __str__(self):
return 'SharedMemoryMgr:{id:%d, %s}' % (self._id, str(self._allocator))
def __del__(self):
if SharedMemoryMgr.s_log_statis:
logger.info('destroy [%s]' % (self))
if not self._released and not self._allocator.empty():
logger.warn('not empty when delete this SharedMemoryMgr[%s]' %
(self))
else:
self._released = True
if self._id in SharedMemoryMgr.s_memory_mgrs:
del SharedMemoryMgr.s_memory_mgrs[self._id]
SharedMemoryMgr.s_mgr_num -= 1
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
import functools
import collections
from ..dataset import Dataset
class ProxiedDataset(Dataset):
"""proxy method called to 'self._ds' when if not defined"""
def __init__(self, ds):
super(ProxiedDataset, self).__init__()
self._ds = ds
methods = filter(lambda k: not k.startswith('_'),
Dataset.__dict__.keys())
for m in methods:
func = functools.partial(self._proxy_method, getattr(self, m))
setattr(self, m, func)
def _proxy_method(self, func, *args, **kwargs):
"""
proxy call to 'func', if not available then call self._ds.xxx
whose name is the same with func.__name__
"""
method = func.__name__
try:
return func(*args, **kwargs)
except NotImplementedError:
ds_func = getattr(self._ds, method)
return ds_func(*args, **kwargs)
class MappedDataset(ProxiedDataset):
def __init__(self, ds, mapper):
super(MappedDataset, self).__init__(ds)
self._ds = ds
self._mapper = mapper
def next(self):
sample = self._ds.next()
return self._mapper(sample)
class BatchedDataset(ProxiedDataset):
"""
Batching samples
Args:
ds (instance of Dataset): dataset to be batched
batchsize (int): sample number for each batch
drop_last (bool): drop last samples when not enough for one batch
"""
def __init__(self, ds, batchsize, drop_last=False):
super(BatchedDataset, self).__init__(ds)
self._batchsz = batchsize
self._drop_last = drop_last
def next(self):
"""proxy to self._ds.next"""
def empty(x):
if isinstance(x, np.ndarray) and x.size == 0:
return True
elif isinstance(x, collections.Sequence) and len(x) == 0:
return True
else:
return False
def has_empty(items):
if any(x is None for x in items):
return True
if any(empty(x) for x in items):
return True
return False
batch = []
for _ in range(self._batchsz):
try:
out = self._ds.next()
while has_empty(out):
out = self._ds.next()
batch.append(out)
except StopIteration:
if not self._drop_last and len(batch) > 0:
return batch
else:
raise StopIteration
return batch
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
# XXX for triggering decorators
from . import anchor_heads
from . import architectures
from . import backbones
from . import roi_extractors
from . import roi_heads
from . import ops
from . import target_assigners
from .anchor_heads import *
from .architectures import *
from .backbones import *
from .roi_extractors import *
from .roi_heads import *
from .ops import *
from .target_assigners import *
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from . import rpn_head
from . import yolo_head
from . import retina_head
from .rpn_head import *
from .yolo_head import *
from .retina_head import *
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
import paddle.fluid as fluid
from paddle.fluid.param_attr import ParamAttr
from paddle.fluid.initializer import Normal, Constant
from paddle.fluid.regularizer import L2Decay
from ppdet.modeling.ops import (AnchorGenerator, RetinaTargetAssign,
RetinaOutputDecoder)
from ppdet.core.workspace import register, serializable
__all__ = ['RetinaHead']
@register
class RetinaHead(object):
"""
Retina Head
Args:
anchor_generator (object): `AnchorGenerator` instance
target_assign (object): `RetinaTargetAssign` instance
output_decoder (object): `RetinaOutputDecoder` instance
num_convs_per_octave (int): Number of convolution layers in each octave
num_chan (int): Number of octave output channels
max_level (int): Highest level of FPN output
min_level (int): Lowest level of FPN output
prior_prob (float): Used to set the bias init for the class prediction layer
base_scale (int): Anchors are generated based on this scale
num_scales_per_octave (int): Number of anchor scales per octave
num_classes (int): Number of classes
gamma (float): The parameter in focal loss
alpha (float): The parameter in focal loss
sigma (float): The parameter in smooth l1 loss
"""
__inject__ = ['anchor_generator', 'target_assign', 'output_decoder']
def __init__(self,
anchor_generator=AnchorGenerator().__dict__,
target_assign=RetinaTargetAssign().__dict__,
output_decoder=RetinaOutputDecoder().__dict__,
num_convs_per_octave=4,
num_chan=256,
max_level=7,
min_level=3,
prior_prob=0.01,
base_scale=4,
num_scales_per_octave=3,
num_classes=81,
gamma=2.0,
alpha=0.25,
sigma=3.0151134457776365):
self.anchor_generator = anchor_generator
self.target_assign = target_assign
self.output_decoder = output_decoder
self.num_convs_per_octave = num_convs_per_octave
self.num_chan = num_chan
self.max_level = max_level
self.min_level = min_level
self.prior_prob = prior_prob
self.base_scale = base_scale
self.num_scales_per_octave = num_scales_per_octave
self.num_classes = num_classes
self.gamma = gamma
self.alpha = alpha
self.sigma = sigma
if isinstance(anchor_generator, dict):
self.anchor_generator = AnchorGenerator(**anchor_generator)
if isinstance(target_assign, dict):
self.target_assign = RetinaTargetAssign(**target_assign)
if isinstance(output_decoder, dict):
self.output_decoder = RetinaOutputDecoder(**output_decoder)
def _class_subnet(self, body_feats, spatial_scale):
"""
Get class predictions of all level FPN level.
Args:
fpn_dict(dict): A dictionary represents the output of FPN with
their name.
spatial_scale(list): A list of multiplicative spatial scale factor.
Returns:
cls_pred_input(list): Class prediction of all input fpn levels.
"""
assert len(body_feats) == self.max_level - self.min_level + 1
fpn_name_list = list(body_feats.keys())
cls_pred_list = []
for lvl in range(self.min_level, self.max_level + 1):
fpn_name = fpn_name_list[self.max_level - lvl]
subnet_blob = body_feats[fpn_name]
for i in range(self.num_convs_per_octave):
conv_name = 'retnet_cls_conv_n{}_fpn{}'.format(i, lvl)
conv_share_name = 'retnet_cls_conv_n{}_fpn{}'.format(
i, self.min_level)
subnet_blob_in = subnet_blob
subnet_blob = fluid.layers.conv2d(
input=subnet_blob_in,
num_filters=self.num_chan,
filter_size=3,
stride=1,
padding=1,
act='relu',
name=conv_name,
param_attr=ParamAttr(
name=conv_share_name + '_w',
initializer=Normal(
loc=0., scale=0.01)),
bias_attr=ParamAttr(
name=conv_share_name + '_b',
learning_rate=2.,
regularizer=L2Decay(0.)))
# class prediction
cls_name = 'retnet_cls_pred_fpn{}'.format(lvl)
cls_share_name = 'retnet_cls_pred_fpn{}'.format(self.min_level)
num_anchors = self.num_scales_per_octave * len(
self.anchor_generator.aspect_ratios)
cls_dim = num_anchors * (self.num_classes - 1)
# bias initialization: b = -log((1 - pai) / pai)
bias_init = float(-np.log((1 - self.prior_prob) / self.prior_prob))
out_cls = fluid.layers.conv2d(
input=subnet_blob,
num_filters=cls_dim,
filter_size=3,
stride=1,
padding=1,
act=None,
name=cls_name,
param_attr=ParamAttr(
name=cls_share_name + '_w',
initializer=Normal(
loc=0., scale=0.01)),
bias_attr=ParamAttr(
name=cls_share_name + '_b',
initializer=Constant(value=bias_init),
learning_rate=2.,
regularizer=L2Decay(0.)))
cls_pred_list.append(out_cls)
return cls_pred_list
def _bbox_subnet(self, body_feats, spatial_scale):
"""
Get bounding box predictions of all level FPN level.
Args:
fpn_dict(dict): A dictionary represents the output of FPN with
their name.
spatial_scale(list): A list of multiplicative spatial scale factor.
Returns:
bbox_pred_input(list): Bounding box prediction of all input fpn
levels.
"""
assert len(body_feats) == self.max_level - self.min_level + 1
fpn_name_list = list(body_feats.keys())
bbox_pred_list = []
for lvl in range(self.min_level, self.max_level + 1):
fpn_name = fpn_name_list[self.max_level - lvl]
subnet_blob = body_feats[fpn_name]
for i in range(self.num_convs_per_octave):
conv_name = 'retnet_bbox_conv_n{}_fpn{}'.format(i, lvl)
conv_share_name = 'retnet_bbox_conv_n{}_fpn{}'.format(
i, self.min_level)
subnet_blob_in = subnet_blob
subnet_blob = fluid.layers.conv2d(
input=subnet_blob_in,
num_filters=self.num_chan,
filter_size=3,
stride=1,
padding=1,
act='relu',
name=conv_name,
param_attr=ParamAttr(
name=conv_share_name + '_w',
initializer=Normal(
loc=0., scale=0.01)),
bias_attr=ParamAttr(
name=conv_share_name + '_b',
learning_rate=2.,
regularizer=L2Decay(0.)))
# bbox prediction
bbox_name = 'retnet_bbox_pred_fpn{}'.format(lvl)
bbox_share_name = 'retnet_bbox_pred_fpn{}'.format(self.min_level)
num_anchors = self.num_scales_per_octave * len(
self.anchor_generator.aspect_ratios)
bbox_dim = num_anchors * 4
out_bbox = fluid.layers.conv2d(
input=subnet_blob,
num_filters=bbox_dim,
filter_size=3,
stride=1,
padding=1,
act=None,
name=bbox_name,
param_attr=ParamAttr(
name=bbox_share_name + '_w',
initializer=Normal(
loc=0., scale=0.01)),
bias_attr=ParamAttr(
name=bbox_share_name + '_b',
learning_rate=2.,
regularizer=L2Decay(0.)))
bbox_pred_list.append(out_bbox)
return bbox_pred_list
def _anchor_generate(self, body_feats, spatial_scale):
"""
Get anchor boxes of all level FPN level.
Args:
fpn_dict(dict): A dictionary represents the output of FPN with
their name.
spatial_scale(list): A list of multiplicative spatial scale factor.
Return:
anchor_input(list): Anchors of all input fpn levels with shape of.
anchor_var_input(list): Anchor variance of all input fpn levels with
shape.
"""
assert len(body_feats) == self.max_level - self.min_level + 1
fpn_name_list = list(body_feats.keys())
anchor_list = []
anchor_var_list = []
for lvl in range(self.min_level, self.max_level + 1):
anchor_sizes = []
stride = int(1 / spatial_scale[self.max_level - lvl])
for octave in range(self.num_scales_per_octave):
anchor_size = stride * (
2**(float(octave) /
float(self.num_scales_per_octave))) * self.base_scale
anchor_sizes.append(anchor_size)
fpn_name = fpn_name_list[self.max_level - lvl]
anchor, anchor_var = self.anchor_generator(
input=body_feats[fpn_name],
anchor_sizes=anchor_sizes,
aspect_ratios=self.anchor_generator.aspect_ratios,
stride=[stride, stride])
anchor_list.append(anchor)
anchor_var_list.append(anchor_var)
return anchor_list, anchor_var_list
def _get_output(self, body_feats, spatial_scale):
"""
Get class, bounding box predictions and anchor boxes of all level FPN level.
Args:
fpn_dict(dict): A dictionary represents the output of FPN with
their name.
spatial_scale(list): A list of multiplicative spatial scale factor.
Returns:
cls_pred_input(list): Class prediction of all input fpn levels.
bbox_pred_input(list): Bounding box prediction of all input fpn
levels.
anchor_input(list): Anchors of all input fpn levels with shape of.
anchor_var_input(list): Anchor variance of all input fpn levels with
shape.
"""
assert len(body_feats) == self.max_level - self.min_level + 1
# class subnet
cls_pred_list = self._class_subnet(body_feats, spatial_scale)
# bbox subnet
bbox_pred_list = self._bbox_subnet(body_feats, spatial_scale)
#generate anchors
anchor_list, anchor_var_list = self._anchor_generate(body_feats,
spatial_scale)
cls_pred_reshape_list = []
bbox_pred_reshape_list = []
anchor_reshape_list = []
anchor_var_reshape_list = []
for i in range(self.max_level - self.min_level + 1):
cls_pred_transpose = fluid.layers.transpose(
cls_pred_list[i], perm=[0, 2, 3, 1])
cls_pred_reshape = fluid.layers.reshape(
cls_pred_transpose, shape=(0, -1, self.num_classes - 1))
bbox_pred_transpose = fluid.layers.transpose(
bbox_pred_list[i], perm=[0, 2, 3, 1])
bbox_pred_reshape = fluid.layers.reshape(
bbox_pred_transpose, shape=(0, -1, 4))
anchor_reshape = fluid.layers.reshape(anchor_list[i], shape=(-1, 4))
anchor_var_reshape = fluid.layers.reshape(
anchor_var_list[i], shape=(-1, 4))
cls_pred_reshape_list.append(cls_pred_reshape)
bbox_pred_reshape_list.append(bbox_pred_reshape)
anchor_reshape_list.append(anchor_reshape)
anchor_var_reshape_list.append(anchor_var_reshape)
output = {}
output['cls_pred'] = cls_pred_reshape_list
output['bbox_pred'] = bbox_pred_reshape_list
output['anchor'] = anchor_reshape_list
output['anchor_var'] = anchor_var_reshape_list
return output
def get_prediction(self, body_feats, spatial_scale, im_info):
"""
Get prediction bounding box in test stage.
Args:
fpn_dict(dict): A dictionary represents the output of FPN with
their name.
spatial_scale(list): A list of multiplicative spatial scale factor.
im_info (Variable): A 2-D LoDTensor with shape [B, 3]. B is the
number of input images, each element consists of im_height,
im_width, im_scale.
Returns:
pred_result(Variable): Prediction result with shape [N, 6]. Each
row has 6 values: [label, confidence, xmin, ymin, xmax, ymax].
N is the total number of prediction.
"""
output = self._get_output(body_feats, spatial_scale)
cls_pred_reshape_list = output['cls_pred']
bbox_pred_reshape_list = output['bbox_pred']
anchor_reshape_list = output['anchor']
anchor_var_reshape_list = output['anchor_var']
for i in range(self.max_level - self.min_level + 1):
cls_pred_reshape_list[i] = fluid.layers.sigmoid(
cls_pred_reshape_list[i])
pred_result = self.output_decoder(
bboxes=bbox_pred_reshape_list,
scores=cls_pred_reshape_list,
anchors=anchor_reshape_list,
im_info=im_info)
return {'bbox': pred_result}
def get_loss(self, body_feats, spatial_scale, im_info, gt_box, gt_label,
is_crowd):
"""
Calculate the loss of retinanet.
Args:
fpn_dict(dict): A dictionary represents the output of FPN with
their name.
spatial_scale(list): A list of multiplicative spatial scale factor.
im_info(Variable): A 2-D LoDTensor with shape [B, 3]. B is the
number of input images, each element consists of im_height,
im_width, im_scale.
gt_box(Variable): The ground-truth bounding boxes with shape [M, 4].
M is the number of groundtruth.
gt_label(Variable): The ground-truth labels with shape [M, 1].
M is the number of groundtruth.
is_crowd(Variable): Indicates groud-truth is crowd or not with
shape [M, 1]. M is the number of groundtruth.
Returns:
Type: dict
loss_cls(Variable): focal loss.
loss_bbox(Variable): smooth l1 loss.
"""
output = self._get_output(body_feats, spatial_scale)
cls_pred_reshape_list = output['cls_pred']
bbox_pred_reshape_list = output['bbox_pred']
anchor_reshape_list = output['anchor']
anchor_var_reshape_list = output['anchor_var']
cls_pred_input = fluid.layers.concat(cls_pred_reshape_list, axis=1)
bbox_pred_input = fluid.layers.concat(bbox_pred_reshape_list, axis=1)
anchor_input = fluid.layers.concat(anchor_reshape_list, axis=0)
anchor_var_input = fluid.layers.concat(anchor_var_reshape_list, axis=0)
score_pred, loc_pred, score_tgt, loc_tgt, bbox_weight, fg_num = \
self.target_assign(
bbox_pred=bbox_pred_input,
cls_logits=cls_pred_input,
anchor_box=anchor_input,
anchor_var=anchor_var_input,
gt_boxes=gt_box,
gt_labels=gt_label,
is_crowd=is_crowd,
im_info=im_info,
num_classes=self.num_classes - 1)
fg_num = fluid.layers.reduce_sum(fg_num, name='fg_num')
loss_cls = fluid.layers.sigmoid_focal_loss(
x=score_pred,
label=score_tgt,
fg_num=fg_num,
gamma=self.gamma,
alpha=self.alpha)
loss_cls = fluid.layers.reduce_sum(loss_cls, name='loss_cls')
loss_bbox = fluid.layers.smooth_l1(
x=loc_pred,
y=loc_tgt,
sigma=self.sigma,
inside_weight=bbox_weight,
outside_weight=bbox_weight)
loss_bbox = fluid.layers.reduce_sum(loss_bbox, name='loss_bbox')
loss_bbox = loss_bbox / fg_num
return {'loss_cls': loss_cls, 'loss_bbox': loss_bbox}
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from paddle import fluid
from paddle.fluid.param_attr import ParamAttr
from paddle.fluid.initializer import Normal
from paddle.fluid.regularizer import L2Decay
from ppdet.core.workspace import register
from ppdet.modeling.ops import (AnchorGenerator,
RPNTargetAssign, GenerateProposals)
__all__ = ['RPNTargetAssign', 'GenerateProposals', 'RPNHead', 'FPNRPNHead']
@register
class RPNHead(object):
"""
RPN Head
Args:
anchor_generator (object): `AnchorGenerator` instance
rpn_target_assign (object): `RPNTargetAssign` instance
train_proposal (object): `GenerateProposals` instance for training
test_proposal (object): `GenerateProposals` instance for testing
"""
__inject__ = [
'anchor_generator', 'rpn_target_assign', 'train_proposal',
'test_proposal'
]
def __init__(self,
anchor_generator=AnchorGenerator().__dict__,
rpn_target_assign=RPNTargetAssign().__dict__,
train_proposal=GenerateProposals(12000, 2000).__dict__,
test_proposal=GenerateProposals().__dict__):
super(RPNHead, self).__init__()
self.anchor_generator = anchor_generator
self.rpn_target_assign = rpn_target_assign
self.train_proposal = train_proposal
self.test_proposal = test_proposal
if isinstance(anchor_generator, dict):
self.anchor_generator = AnchorGenerator(**anchor_generator)
if isinstance(rpn_target_assign, dict):
self.rpn_target_assign = RPNTargetAssign(**rpn_target_assign)
if isinstance(train_proposal, dict):
self.train_proposal = GenerateProposals(**train_proposal)
if isinstance(test_proposal, dict):
self.test_proposal = GenerateProposals(**test_proposal)
def _get_output(self, input):
"""
Get anchor and RPN head output.
Args:
input(Variable): feature map from backbone with shape of [N, C, H, W]
Returns:
rpn_cls_score(Variable): Output of rpn head with shape of
[N, num_anchors, H, W].
rpn_bbox_pred(Variable): Output of rpn head with shape of
[N, num_anchors * 4, H, W].
"""
dim_out = input.shape[1]
rpn_conv = fluid.layers.conv2d(
input=input,
num_filters=dim_out,
filter_size=3,
stride=1,
padding=1,
act='relu',
name='conv_rpn',
param_attr=ParamAttr(
name="conv_rpn_w", initializer=Normal(
loc=0., scale=0.01)),
bias_attr=ParamAttr(
name="conv_rpn_b", learning_rate=2., regularizer=L2Decay(0.)))
# Generate anchors
self.anchor, self.anchor_var = self.anchor_generator(input=rpn_conv)
num_anchor = self.anchor.shape[2]
# Proposal classification scores
self.rpn_cls_score = fluid.layers.conv2d(
rpn_conv,
num_filters=num_anchor,
filter_size=1,
stride=1,
padding=0,
act=None,
name='rpn_cls_score',
param_attr=ParamAttr(
name="rpn_cls_logits_w", initializer=Normal(
loc=0., scale=0.01)),
bias_attr=ParamAttr(
name="rpn_cls_logits_b",
learning_rate=2.,
regularizer=L2Decay(0.)))
# Proposal bbox regression deltas
self.rpn_bbox_pred = fluid.layers.conv2d(
rpn_conv,
num_filters=4 * num_anchor,
filter_size=1,
stride=1,
padding=0,
act=None,
name='rpn_bbox_pred',
param_attr=ParamAttr(
name="rpn_bbox_pred_w", initializer=Normal(
loc=0., scale=0.01)),
bias_attr=ParamAttr(
name="rpn_bbox_pred_b",
learning_rate=2.,
regularizer=L2Decay(0.)))
return self.rpn_cls_score, self.rpn_bbox_pred
def get_proposals(self, body_feats, im_info, mode='train'):
"""
Get proposals according to the output of backbone.
Args:
body_feats (dict): The dictionary of feature maps from backbone.
im_info(Variable): The information of image with shape [N, 3] with
shape (height, width, scale).
body_feat_names(list): A list of names of feature maps from
backbone.
Returns:
rpn_rois(Variable): Output proposals with shape of (rois_num, 4).
"""
# In RPN Heads, only the last feature map of backbone is used.
# And body_feat_names[-1] represents the last level name of backbone.
body_feat = list(body_feats.values())[-1]
rpn_cls_score, rpn_bbox_pred = self._get_output(body_feat)
rpn_cls_score_prob = fluid.layers.sigmoid(
rpn_cls_score, name='rpn_cls_score_prob')
prop_op = self.train_proposal if mode == 'train' else self.test_proposal
rpn_rois, rpn_roi_probs = prop_op(
scores=rpn_cls_score_prob,
bbox_deltas=rpn_bbox_pred,
im_info=im_info,
anchors=self.anchor,
variances=self.anchor_var)
return rpn_rois
def _transform_input(self, rpn_cls_score, rpn_bbox_pred, anchor,
anchor_var):
rpn_cls_score = fluid.layers.transpose(rpn_cls_score, perm=[0, 2, 3, 1])
rpn_bbox_pred = fluid.layers.transpose(rpn_bbox_pred, perm=[0, 2, 3, 1])
anchor = fluid.layers.reshape(anchor, shape=(-1, 4))
anchor_var = fluid.layers.reshape(anchor_var, shape=(-1, 4))
rpn_cls_score = fluid.layers.reshape(x=rpn_cls_score, shape=(0, -1, 1))
rpn_bbox_pred = fluid.layers.reshape(x=rpn_bbox_pred, shape=(0, -1, 4))
return rpn_cls_score, rpn_bbox_pred, anchor, anchor_var
def _get_loss_input(self):
for attr in ['rpn_cls_score', 'rpn_bbox_pred', 'anchor', 'anchor_var']:
if not getattr(self, attr, None):
raise ValueError("self.{} should not be None,".format(attr),
"call RPNHead.get_proposals first")
return self._transform_input(self.rpn_cls_score, self.rpn_bbox_pred,
self.anchor, self.anchor_var)
def get_loss(self, im_info, gt_box, is_crowd):
"""
Sample proposals and Calculate rpn loss.
Args:
im_info(Variable): The information of image with shape [N, 3] with
shape (height, width, scale).
gt_box(Variable): The ground-truth bounding boxes with shape [M, 4].
M is the number of groundtruth.
is_crowd(Variable): Indicates groud-truth is crowd or not with
shape [M, 1]. M is the number of groundtruth.
Returns:
Type: dict
rpn_cls_loss(Variable): RPN classification loss.
rpn_bbox_loss(Variable): RPN bounding box regression loss.
"""
rpn_cls, rpn_bbox, anchor, anchor_var = self._get_loss_input()
score_pred, loc_pred, score_tgt, loc_tgt, bbox_weight = \
self.rpn_target_assign(
bbox_pred=rpn_bbox,
cls_logits=rpn_cls,
anchor_box=anchor,
anchor_var=anchor_var,
gt_boxes=gt_box,
is_crowd=is_crowd,
im_info=im_info)
score_tgt = fluid.layers.cast(x=score_tgt, dtype='float32')
score_tgt.stop_gradient = True
rpn_cls_loss = fluid.layers.sigmoid_cross_entropy_with_logits(
x=score_pred, label=score_tgt)
rpn_cls_loss = fluid.layers.reduce_mean(
rpn_cls_loss, name='loss_rpn_cls')
loc_tgt = fluid.layers.cast(x=loc_tgt, dtype='float32')
loc_tgt.stop_gradient = True
rpn_reg_loss = fluid.layers.smooth_l1(
x=loc_pred,
y=loc_tgt,
sigma=3.0,
inside_weight=bbox_weight,
outside_weight=bbox_weight)
rpn_reg_loss = fluid.layers.reduce_sum(
rpn_reg_loss, name='loss_rpn_bbox')
score_shape = fluid.layers.shape(score_tgt)
score_shape = fluid.layers.cast(x=score_shape, dtype='float32')
norm = fluid.layers.reduce_prod(score_shape)
norm.stop_gradient = True
rpn_reg_loss = rpn_reg_loss / norm
return {'loss_rpn_cls': rpn_cls_loss, 'loss_rpn_bbox': rpn_reg_loss}
@register
class FPNRPNHead(RPNHead):
"""
RPN Head that supports FPN input
Args:
anchor_generator (object): `AnchorGenerator` instance
rpn_target_assign (object): `RPNTargetAssign` instance
train_proposal (object): `GenerateProposals` instance for training
test_proposal (object): `GenerateProposals` instance for testing
anchor_start_size (int): size of anchor at the first scale
num_chan (int): number of FPN output channels
min_level (int): lowest level of FPN output
max_level (int): highest level of FPN output
"""
__inject__ = [
'anchor_generator', 'rpn_target_assign', 'train_proposal',
'test_proposal'
]
def __init__(self,
anchor_generator=AnchorGenerator().__dict__,
rpn_target_assign=RPNTargetAssign().__dict__,
train_proposal=GenerateProposals(12000, 2000).__dict__,
test_proposal=GenerateProposals().__dict__,
anchor_start_size=32,
num_chan=256,
min_level=2,
max_level=6):
super(FPNRPNHead, self).__init__(anchor_generator, rpn_target_assign,
train_proposal, test_proposal)
self.anchor_start_size = anchor_start_size
self.num_chan = num_chan
self.min_level = min_level
self.max_level = max_level
self.fpn_rpn_list = []
self.anchors_list = []
self.anchor_var_list = []
def _get_output(self, input, feat_lvl):
"""
Get anchor and FPN RPN head output at one level.
Args:
input(Variable): Body feature from backbone.
feat_lvl(int): Indicate the level of rpn output corresponding
to the level of feature map.
Return:
rpn_cls_score(Variable): Output of one level of fpn rpn head with
shape of [N, num_anchors, H, W].
rpn_bbox_pred(Variable): Output of one level of fpn rpn head with
shape of [N, num_anchors * 4, H, W].
"""
slvl = str(feat_lvl)
conv_name = 'conv_rpn_fpn' + slvl
cls_name = 'rpn_cls_logits_fpn' + slvl
bbox_name = 'rpn_bbox_pred_fpn' + slvl
conv_share_name = 'conv_rpn_fpn' + str(self.min_level)
cls_share_name = 'rpn_cls_logits_fpn' + str(self.min_level)
bbox_share_name = 'rpn_bbox_pred_fpn' + str(self.min_level)
num_anchors = len(self.anchor_generator.aspect_ratios)
conv_rpn_fpn = fluid.layers.conv2d(
input=input,
num_filters=self.num_chan,
filter_size=3,
padding=1,
act='relu',
name=conv_name,
param_attr=ParamAttr(
name=conv_share_name + '_w',
initializer=Normal(
loc=0., scale=0.01)),
bias_attr=ParamAttr(
name=conv_share_name + '_b',
learning_rate=2.,
regularizer=L2Decay(0.)))
self.anchors, self.anchor_var = self.anchor_generator(
input=conv_rpn_fpn,
anchor_sizes=(self.anchor_start_size * 2.
**(feat_lvl - self.min_level), ),
stride=(2.**feat_lvl, 2.**feat_lvl))
self.rpn_cls_score = fluid.layers.conv2d(
input=conv_rpn_fpn,
num_filters=num_anchors,
filter_size=1,
act=None,
name=cls_name,
param_attr=ParamAttr(
name=cls_share_name + '_w',
initializer=Normal(
loc=0., scale=0.01)),
bias_attr=ParamAttr(
name=cls_share_name + '_b',
learning_rate=2.,
regularizer=L2Decay(0.)))
self.rpn_bbox_pred = fluid.layers.conv2d(
input=conv_rpn_fpn,
num_filters=num_anchors * 4,
filter_size=1,
act=None,
name=bbox_name,
param_attr=ParamAttr(
name=bbox_share_name + '_w',
initializer=Normal(
loc=0., scale=0.01)),
bias_attr=ParamAttr(
name=bbox_share_name + '_b',
learning_rate=2.,
regularizer=L2Decay(0.)))
return self.rpn_cls_score, self.rpn_bbox_pred
def _get_single_proposals(self, body_feat, im_info, feat_lvl, mode='train'):
"""
Get proposals in one level according to the output of fpn rpn head
Args:
body_feat(Variable): the feature map from backone.
im_info(Variable): The information of image with shape [N, 3] with
format (height, width, scale).
feat_lvl(int): Indicate the level of proposals corresponding to
the feature maps.
Returns:
rpn_rois_fpn(Variable): Output proposals with shape of (rois_num, 4).
rpn_roi_probs_fpn(Variable): Scores of proposals with
shape of (rois_num, 1).
"""
rpn_cls_logits_fpn, rpn_bbox_pred_fpn = self._get_output(body_feat,
feat_lvl)
prop_op = self.train_proposal if mode == 'train' else self.test_proposal
rpn_cls_prob_fpn = fluid.layers.sigmoid(
rpn_cls_logits_fpn, name='rpn_cls_probs_fpn' + str(feat_lvl))
rpn_rois_fpn, rpn_roi_probs_fpn = prop_op(
scores=rpn_cls_prob_fpn,
bbox_deltas=rpn_bbox_pred_fpn,
im_info=im_info,
anchors=self.anchors,
variances=self.anchor_var)
return rpn_rois_fpn, rpn_roi_probs_fpn
def get_proposals(self, fpn_feats, im_info, mode='train'):
"""
Get proposals in multiple levels according to the output of fpn
rpn head
Args:
fpn_feats(dict): A dictionary represents the output feature map
of FPN with their name.
im_info(Variable): The information of image with shape [N, 3] with
format (height, width, scale).
Return:
rois_list(Variable): Output proposals in shape of [rois_num, 4]
"""
rois_list = []
roi_probs_list = []
fpn_feat_names = list(fpn_feats.keys())
for lvl in range(self.min_level, self.max_level + 1):
fpn_feat_name = fpn_feat_names[self.max_level - lvl]
fpn_feat = fpn_feats[fpn_feat_name]
rois_fpn, roi_probs_fpn = self._get_single_proposals(
fpn_feat, im_info, lvl, mode)
self.fpn_rpn_list.append((self.rpn_cls_score, self.rpn_bbox_pred))
rois_list.append(rois_fpn)
roi_probs_list.append(roi_probs_fpn)
self.anchors_list.append(self.anchors)
self.anchor_var_list.append(self.anchor_var)
prop_op = self.train_proposal if mode == 'train' else self.test_proposal
post_nms_top_n = prop_op.post_nms_top_n
rois_collect = fluid.layers.collect_fpn_proposals(
rois_list,
roi_probs_list,
self.min_level,
self.max_level,
post_nms_top_n,
name='collect')
return rois_collect
def _get_loss_input(self):
rpn_clses = []
rpn_bboxes = []
anchors = []
anchor_vars = []
for i in range(len(self.fpn_rpn_list)):
single_input = self._transform_input(
self.fpn_rpn_list[i][0], self.fpn_rpn_list[i][1],
self.anchors_list[i], self.anchor_var_list[i])
rpn_clses.append(single_input[0])
rpn_bboxes.append(single_input[1])
anchors.append(single_input[2])
anchor_vars.append(single_input[3])
rpn_cls = fluid.layers.concat(rpn_clses, axis=1)
rpn_bbox = fluid.layers.concat(rpn_bboxes, axis=1)
anchors = fluid.layers.concat(anchors)
anchor_var = fluid.layers.concat(anchor_vars)
return rpn_cls, rpn_bbox, anchors, anchor_var
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from paddle import fluid
from paddle.fluid.param_attr import ParamAttr
from paddle.fluid.regularizer import L2Decay
from ppdet.modeling.ops import MultiClassNMS
from ppdet.core.workspace import register
__all__ = ['YOLOv3Head']
@register
class YOLOv3Head(object):
"""
Head block for YOLOv3 network
Args:
norm_decay (float): weight decay for normalization layer weights
num_classes (int): number of output classes
ignore_thresh (float): threshold to ignore confidence loss
label_smooth (bool): whether to use label smoothing
anchors (list): anchors
anchor_masks (list): anchor masks
nms (object): an instance of `MultiClassNMS`
"""
__inject__ = ['nms']
def __init__(self,
norm_decay=0.,
num_classes=80,
ignore_thresh=0.7,
label_smooth=True,
anchors=[[10, 13], [16, 30], [33, 23], [30, 61], [62, 45],
[59, 119], [116, 90], [156, 198], [373, 326]],
anchor_masks=[[6, 7, 8], [3, 4, 5], [0, 1, 2]],
nms=MultiClassNMS(
score_threshold=0.01,
nms_top_k=1000,
keep_top_k=100,
nms_threshold=0.45,
background_label=-1).__dict__):
self.norm_decay = norm_decay
self.num_classes = num_classes
self.ignore_thresh = ignore_thresh
self.label_smooth = label_smooth
self.anchor_masks = anchor_masks
self._parse_anchors(anchors)
self.nms = nms
if isinstance(nms, dict):
self.nms = MultiClassNMS(**nms)
def _conv_bn(self,
input,
ch_out,
filter_size,
stride,
padding,
act='leaky',
is_test=True,
name=None):
conv = fluid.layers.conv2d(
input=input,
num_filters=ch_out,
filter_size=filter_size,
stride=stride,
padding=padding,
act=None,
param_attr=ParamAttr(name=name + ".conv.weights"),
bias_attr=False)
bn_name = name + ".bn"
bn_param_attr = ParamAttr(
regularizer=L2Decay(self.norm_decay), name=bn_name + '.scale')
bn_bias_attr = ParamAttr(
regularizer=L2Decay(self.norm_decay), name=bn_name + '.offset')
out = fluid.layers.batch_norm(
input=conv,
act=None,
is_test=is_test,
param_attr=bn_param_attr,
bias_attr=bn_bias_attr,
moving_mean_name=bn_name + '.mean',
moving_variance_name=bn_name + '.var')
if act == 'leaky':
out = fluid.layers.leaky_relu(x=out, alpha=0.1)
return out
def _detection_block(self, input, channel, is_test=True, name=None):
assert channel % 2 == 0, \
"channel {} cannot be divided by 2 in detection block {}" \
.format(channel, name)
conv = input
for j in range(2):
conv = self._conv_bn(
conv,
channel,
filter_size=1,
stride=1,
padding=0,
is_test=is_test,
name='{}.{}.0'.format(name, j))
conv = self._conv_bn(
conv,
channel * 2,
filter_size=3,
stride=1,
padding=1,
is_test=is_test,
name='{}.{}.1'.format(name, j))
route = self._conv_bn(
conv,
channel,
filter_size=1,
stride=1,
padding=0,
is_test=is_test,
name='{}.2'.format(name))
tip = self._conv_bn(
route,
channel * 2,
filter_size=3,
stride=1,
padding=1,
is_test=is_test,
name='{}.tip'.format(name))
return route, tip
def _upsample(self, input, scale=2, name=None):
# get dynamic upsample output shape
shape_nchw = fluid.layers.shape(input)
shape_hw = fluid.layers.slice(
shape_nchw, axes=[0], starts=[2], ends=[4])
shape_hw.stop_gradient = True
in_shape = fluid.layers.cast(shape_hw, dtype='int32')
out_shape = in_shape * scale
out_shape.stop_gradient = True
# reisze by actual_shape
out = fluid.layers.resize_nearest(
input=input, scale=scale, actual_shape=out_shape, name=name)
return out
def _parse_anchors(self, anchors):
"""
Check ANCHORS/ANCHOR_MASKS in config and parse mask_anchors
"""
self.anchors = []
self.mask_anchors = []
assert len(anchors) > 0, "ANCHORS not set."
assert len(self.anchor_masks) > 0, "ANCHOR_MASKS not set."
for anchor in anchors:
assert len(anchor) == 2, "anchor {} len should be 2".format(anchor)
self.anchors.extend(anchor)
anchor_num = len(anchors)
for masks in self.anchor_masks:
self.mask_anchors.append([])
for mask in masks:
assert mask < anchor_num, "anchor mask index overflow"
self.mask_anchors[-1].extend(anchors[mask])
def _get_outputs(self, input, is_train=True):
"""
Get YOLOv3 head output
Args:
input (list): List of Variables, output of backbone stages
is_train (bool): whether in train or test mode
Returns:
outputs (list): Variables of each output layer
"""
outputs = []
# get last out_layer_num blocks in reverse order
out_layer_num = len(self.anchor_masks)
blocks = input[-1:-out_layer_num - 1:-1]
route = None
for i, block in enumerate(blocks):
if i > 0: # perform concat in first 2 detection_block
block = fluid.layers.concat(input=[route, block], axis=1)
route, tip = self._detection_block(
block,
channel=512 // (2**i),
is_test=(not is_train),
name="yolo_block.{}".format(i))
# out channel number = mask_num * (5 + class_num)
num_filters = len(self.anchor_masks[i]) * (self.num_classes + 5)
block_out = fluid.layers.conv2d(
input=tip,
num_filters=num_filters,
filter_size=1,
stride=1,
padding=0,
act=None,
param_attr=ParamAttr(
name="yolo_output.{}.conv.weights".format(i)),
bias_attr=ParamAttr(
regularizer=L2Decay(0.),
name="yolo_output.{}.conv.bias".format(i)))
outputs.append(block_out)
if i < len(blocks) - 1:
# do not perform upsample in the last detection_block
route = self._conv_bn(
input=route,
ch_out=256 // (2**i),
filter_size=1,
stride=1,
padding=0,
is_test=(not is_train),
name="yolo_transition.{}".format(i))
# upsample
route = self._upsample(route)
return outputs
def get_loss(self, input, gt_box, gt_label, gt_score):
"""
Get final loss of network of YOLOv3.
Args:
input (list): List of Variables, output of backbone stages
gt_box (Variable): The ground-truth boudding boxes.
gt_label (Variable): The ground-truth class labels.
gt_score (Variable): The ground-truth boudding boxes mixup scores.
Returns:
loss (Variable): The loss Variable of YOLOv3 network.
"""
outputs = self._get_outputs(input, is_train=True)
losses = []
downsample = 32
for i, output in enumerate(outputs):
anchor_mask = self.anchor_masks[i]
loss = fluid.layers.yolov3_loss(
x=output,
gt_box=gt_box,
gt_label=gt_label,
gt_score=gt_score,
anchors=self.anchors,
anchor_mask=anchor_mask,
class_num=self.num_classes,
ignore_thresh=self.ignore_thresh,
downsample_ratio=downsample,
use_label_smooth=self.label_smooth,
name="yolo_loss" + str(i))
losses.append(fluid.layers.reduce_mean(loss))
downsample //= 2
return sum(losses)
def get_prediction(self, input, im_shape):
"""
Get prediction result of YOLOv3 network
Args:
input (list): List of Variables, output of backbone stages
im_shape (Variable): Variable of shape([h, w]) of each image
Returns:
pred (Variable): The prediction result after non-max suppress.
"""
outputs = self._get_outputs(input, is_train=False)
boxes = []
scores = []
downsample = 32
for i, output in enumerate(outputs):
box, score = fluid.layers.yolo_box(
x=output,
img_size=im_shape,
anchors=self.mask_anchors[i],
class_num=self.num_classes,
conf_thresh=self.nms.score_threshold,
downsample_ratio=downsample,
name="yolo_box" + str(i))
boxes.append(box)
scores.append(fluid.layers.transpose(score, perm=[0, 2, 1]))
downsample //= 2
yolo_boxes = fluid.layers.concat(boxes, axis=1)
yolo_scores = fluid.layers.concat(scores, axis=2)
pred = self.nms(bboxes=yolo_boxes, scores=yolo_scores)
return {'bbox': pred}
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from . import faster_rcnn
from . import mask_rcnn
from . import cascade_rcnn
from . import yolov3
from . import ssd
from . import retinanet
from .faster_rcnn import *
from .mask_rcnn import *
from .cascade_rcnn import *
from .yolov3 import *
from .ssd import *
from .retinanet import *
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import paddle.fluid as fluid
from ppdet.core.workspace import register
__all__ = ['CascadeRCNN']
@register
class CascadeRCNN(object):
"""
Cascade R-CNN architecture, see https://arxiv.org/abs/1712.00726
Args:
backbone (object): backbone instance
rpn_head (object): `RPNhead` instance
bbox_assigner (object): `BBoxAssigner` instance
roi_extractor (object): ROI extractor instance
bbox_head (object): `BBoxHead` instance
fpn (object): feature pyramid network instance
"""
__category__ = 'architecture'
__inject__ = [
'backbone', 'fpn', 'rpn_head', 'bbox_assigner', 'roi_extractor',
'bbox_head'
]
def __init__(self,
backbone,
rpn_head,
roi_extractor='FPNRoIAlign',
bbox_head='CascadeBBoxHead',
bbox_assigner='CascadeBBoxAssigner',
fpn='FPN'):
super(CascadeRCNN, self).__init__()
assert fpn is not None, "cascade RCNN requires FPN"
self.backbone = backbone
self.fpn = fpn
self.rpn_head = rpn_head
self.bbox_assigner = bbox_assigner
self.roi_extractor = roi_extractor
self.bbox_head = bbox_head
# Cascade local cfg
self.cls_agnostic_bbox_reg = 2
(brw0, brw1, brw2) = self.bbox_assigner.bbox_reg_weights
self.cascade_bbox_reg_weights = [
[1. / brw0, 1. / brw0, 2. / brw0, 2. / brw0],
[1. / brw1, 1. / brw1, 2. / brw1, 2. / brw1],
[1. / brw2, 1. / brw2, 2. / brw2, 2. / brw2]
]
self.cascade_rcnn_loss_weight = [1.0, 0.5, 0.25]
def build(self, feed_vars, mode='train'):
im = feed_vars['image']
im_info = feed_vars['im_info']
if mode == 'train':
gt_box = feed_vars['gt_box']
is_crowd = feed_vars['is_crowd']
# backbone
body_feats = self.backbone(im)
# body_feat_names = list(body_feats.keys())
# FPN
if self.fpn is not None:
body_feats, spatial_scale = self.fpn.get_output(body_feats)
# rpn proposals
rpn_rois = self.rpn_head.get_proposals(body_feats, im_info, mode=mode)
if mode == 'train':
rpn_loss = self.rpn_head.get_loss(im_info, gt_box, is_crowd)
proposal_list = []
roi_feat_list = []
rcnn_pred_list = []
rcnn_target_list = []
proposals = None
bbox_pred = None
for i in range(3):
if i > 0:
refined_bbox = self._decode_box(
proposals,
bbox_pred,
curr_stage=i - 1, )
else:
refined_bbox = rpn_rois
if mode == 'train':
outs = self.bbox_assigner(
input_rois=refined_bbox, feed_vars=feed_vars, curr_stage=i)
proposals = outs[0]
rcnn_target_list.append(outs)
else:
proposals = refined_bbox
proposal_list.append(proposals)
# extract roi features
roi_feat = self.roi_extractor(body_feats, proposals, spatial_scale)
roi_feat_list.append(roi_feat)
# bbox head
cls_score, bbox_pred = self.bbox_head.get_output(
roi_feat,
wb_scalar=1.0 / self.cascade_rcnn_loss_weight[i],
name='_' + str(i + 1) if i > 0 else '')
rcnn_pred_list.append((cls_score, bbox_pred))
if mode == 'train':
loss = self.bbox_head.get_loss(rcnn_pred_list, rcnn_target_list,
self.cascade_rcnn_loss_weight)
loss.update(rpn_loss)
total_loss = fluid.layers.sum(list(loss.values()))
loss.update({'loss': total_loss})
return loss
else:
pred = self.bbox_head.get_prediction(
im_info, roi_feat_list, rcnn_pred_list, proposal_list,
self.cascade_bbox_reg_weights, self.cls_agnostic_bbox_reg)
return pred
def _decode_box(self, proposals, bbox_pred, curr_stage):
rcnn_loc_delta_r = fluid.layers.reshape(
bbox_pred, (-1, self.cls_agnostic_bbox_reg, 4))
# only use fg box delta to decode box
rcnn_loc_delta_s = fluid.layers.slice(
rcnn_loc_delta_r, axes=[1], starts=[1], ends=[2])
refined_bbox = fluid.layers.box_coder(
prior_box=proposals,
prior_box_var=self.cascade_bbox_reg_weights[curr_stage],
target_box=rcnn_loc_delta_s,
code_type='decode_center_size',
box_normalized=False,
axis=1, )
refined_bbox = fluid.layers.reshape(refined_bbox, shape=[-1, 4])
return refined_bbox
def train(self, feed_vars):
return self.build(feed_vars, 'train')
def eval(self, feed_vars):
return self.build(feed_vars, 'test')
def test(self, feed_vars):
return self.build(feed_vars, 'test')
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from paddle import fluid
from ppdet.core.workspace import register
__all__ = ['FasterRCNN']
@register
class FasterRCNN(object):
"""
Faster R-CNN architecture, see https://arxiv.org/abs/1506.01497
Args:
backbone (object): backbone instance
rpn_head (object): `RPNhead` instance
bbox_assigner (object): `BBoxAssigner` instance
roi_extractor (object): ROI extractor instance
bbox_head (object): `BBoxHead` instance
fpn (object): feature pyramid network instance
"""
__category__ = 'architecture'
__inject__ = [
'backbone', 'rpn_head', 'bbox_assigner', 'roi_extractor', 'bbox_head',
'fpn'
]
def __init__(self,
backbone,
rpn_head,
roi_extractor,
bbox_head='BBoxHead',
bbox_assigner='BBoxAssigner',
fpn=None):
super(FasterRCNN, self).__init__()
self.backbone = backbone
self.rpn_head = rpn_head
self.bbox_assigner = bbox_assigner
self.roi_extractor = roi_extractor
self.bbox_head = bbox_head
self.fpn = fpn
def build(self, feed_vars, mode='train'):
im = feed_vars['image']
im_info = feed_vars['im_info']
if mode == 'train':
gt_box = feed_vars['gt_box']
is_crowd = feed_vars['is_crowd']
else:
im_shape = feed_vars['im_info']
body_feats = self.backbone(im)
body_feat_names = list(body_feats.keys())
if self.fpn is not None:
body_feats, spatial_scale = self.fpn.get_output(body_feats)
rois = self.rpn_head.get_proposals(body_feats, im_info, mode=mode)
if mode == 'train':
rpn_loss = self.rpn_head.get_loss(im_info, gt_box, is_crowd)
# sampled rpn proposals
for var in ['gt_label', 'is_crowd', 'gt_box', 'im_info']:
assert var in feed_vars, "{} has no {}".format(feed_vars, var)
outs = self.bbox_assigner(
rpn_rois=rois,
gt_classes=feed_vars['gt_label'],
is_crowd=feed_vars['is_crowd'],
gt_boxes=feed_vars['gt_box'],
im_info=feed_vars['im_info'])
rois = outs[0]
labels_int32 = outs[1]
bbox_targets = outs[2]
bbox_inside_weights = outs[3]
bbox_outside_weights = outs[4]
if self.fpn is None:
# in models without FPN, roi extractor only uses the last level of
# feature maps. And body_feat_names[-1] represents the name of
# last feature map.
body_feat = body_feats[body_feat_names[-1]]
roi_feat = self.roi_extractor(body_feat, rois)
else:
roi_feat = self.roi_extractor(body_feats, rois, spatial_scale)
if mode == 'train':
loss = self.bbox_head.get_loss(roi_feat, labels_int32, bbox_targets,
bbox_inside_weights,
bbox_outside_weights)
loss.update(rpn_loss)
total_loss = fluid.layers.sum(list(loss.values()))
loss.update({'loss': total_loss})
return loss
else:
pred = self.bbox_head.get_prediction(roi_feat, rois, im_info,
im_shape)
return pred
def train(self, feed_vars):
return self.build(feed_vars, 'train')
def eval(self, feed_vars):
return self.build(feed_vars, 'test')
def test(self, feed_vars):
return self.build(feed_vars, 'test')
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from paddle import fluid
from ppdet.core.workspace import register
__all__ = ['MaskRCNN']
@register
class MaskRCNN(object):
"""
Mask R-CNN architecture, see https://arxiv.org/abs/1703.06870
Args:
backbone (object): backbone instance
rpn_head (object): `RPNhead` instance
bbox_assigner (object): `BBoxAssigner` instance
roi_extractor (object): ROI extractor instance
bbox_head (object): `BBoxHead` instance
mask_assigner (object): `MaskAssigner` instance
mask_head (object): `MaskHead` instance
fpn (object): feature pyramid network instance
"""
__category__ = 'architecture'
__inject__ = [
'backbone', 'rpn_head', 'bbox_assigner', 'roi_extractor', 'bbox_head',
'mask_assigner', 'mask_head', 'fpn'
]
def __init__(self,
backbone,
rpn_head,
bbox_head='BBoxHead',
bbox_assigner='BBoxAssigner',
roi_extractor='RoIAlign',
mask_assigner='MaskAssigner',
mask_head='MaskHead',
fpn=None):
super(MaskRCNN, self).__init__()
self.backbone = backbone
self.rpn_head = rpn_head
self.bbox_assigner = bbox_assigner
self.roi_extractor = roi_extractor
self.bbox_head = bbox_head
self.mask_assigner = mask_assigner
self.mask_head = mask_head
self.fpn = fpn
def build(self, feed_vars, mode='train'):
im = feed_vars['image']
assert mode in ['train', 'test'], \
"only 'train' and 'test' mode is supported"
if mode == 'train':
required_fields = [
'gt_label', 'gt_box', 'gt_mask', 'is_crowd', 'im_info'
]
else:
required_fields = ['im_shape', 'im_info']
for var in required_fields:
assert var in feed_vars, \
"{} has no {} field".format(feed_vars, var)
im_info = feed_vars['im_info']
body_feats = self.backbone(im)
# FPN
if self.fpn is not None:
body_feats, spatial_scale = self.fpn.get_output(body_feats)
# RPN proposals
rois = self.rpn_head.get_proposals(body_feats, im_info, mode=mode)
if mode == 'train':
rpn_loss = self.rpn_head.get_loss(im_info, feed_vars['gt_box'],
feed_vars['is_crowd'])
outs = self.bbox_assigner(
rpn_rois=rois,
gt_classes=feed_vars['gt_label'],
is_crowd=feed_vars['is_crowd'],
gt_boxes=feed_vars['gt_box'],
im_info=feed_vars['im_info'])
rois = outs[0]
labels_int32 = outs[1]
if self.fpn is None:
last_feat = body_feats[list(body_feats.keys())[-1]]
roi_feat = self.roi_extractor(last_feat, rois)
else:
roi_feat = self.roi_extractor(body_feats, rois, spatial_scale)
loss = self.bbox_head.get_loss(roi_feat, labels_int32, *outs[2:])
loss.update(rpn_loss)
mask_rois, roi_has_mask_int32, mask_int32 = self.mask_assigner(
rois=rois,
gt_classes=feed_vars['gt_label'],
is_crowd=feed_vars['is_crowd'],
gt_segms=feed_vars['gt_mask'],
im_info=feed_vars['im_info'],
labels_int32=labels_int32)
if self.fpn is None:
bbox_head_feat = self.bbox_head.get_head_feat()
feat = fluid.layers.gather(bbox_head_feat, roi_has_mask_int32)
else:
feat = self.roi_extractor(
body_feats, mask_rois, spatial_scale, is_mask=True)
mask_loss = self.mask_head.get_loss(feat, mask_int32)
loss.update(mask_loss)
total_loss = fluid.layers.sum(list(loss.values()))
loss.update({'loss': total_loss})
return loss
else:
if self.fpn is None:
last_feat = body_feats[list(body_feats.keys())[-1]]
roi_feat = self.roi_extractor(last_feat, rois)
else:
roi_feat = self.roi_extractor(body_feats, rois, spatial_scale)
bbox_pred = self.bbox_head.get_prediction(roi_feat, rois, im_info,
feed_vars['im_shape'])
bbox_pred = bbox_pred['bbox']
# share weight
bbox_shape = fluid.layers.shape(bbox_pred)
bbox_size = fluid.layers.reduce_prod(bbox_shape)
bbox_size = fluid.layers.reshape(bbox_size, [1, 1])
size = fluid.layers.fill_constant([1, 1], value=6, dtype='int32')
cond = fluid.layers.less_than(x=bbox_size, y=size)
mask_pred = fluid.layers.create_global_var(
shape=[1], value=0.0, dtype='float32', persistable=False)
with fluid.layers.control_flow.Switch() as switch:
with switch.case(cond):
fluid.layers.assign(input=bbox_pred, output=mask_pred)
with switch.default():
bbox = fluid.layers.slice(
bbox_pred, [1], starts=[2], ends=[6])
im_scale = fluid.layers.slice(
im_info, [1], starts=[2], ends=[3])
im_scale = fluid.layers.sequence_expand(im_scale, bbox)
mask_rois = bbox * im_scale
if self.fpn is None:
mask_feat = self.roi_extractor(last_feat, mask_rois)
mask_feat = self.bbox_head.get_head_feat(mask_feat)
else:
mask_feat = self.roi_extractor(
body_feats, mask_rois, spatial_scale, is_mask=True)
mask_out = self.mask_head.get_prediction(mask_feat, bbox)
fluid.layers.assign(input=mask_out, output=mask_pred)
return {'bbox': bbox_pred, 'mask': mask_pred}
def train(self, feed_vars):
return self.build(feed_vars, 'train')
def eval(self, feed_vars):
return self.build(feed_vars, 'test')
def test(self, feed_vars):
return self.build(feed_vars, 'test')
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import paddle.fluid as fluid
from ppdet.core.workspace import register
__all__ = ['RetinaNet']
@register
class RetinaNet(object):
"""
RetinaNet architecture, see https://arxiv.org/abs/1708.02002
Args:
backbone (object): backbone instance
fpn (object): feature pyramid network instance
retina_head (object): `RetinaHead` instance
"""
__category__ = 'architecture'
__inject__ = ['backbone', 'fpn', 'retina_head']
def __init__(self, backbone, fpn, retina_head):
super(RetinaNet, self).__init__()
self.backbone = backbone
self.fpn = fpn
self.retina_head = retina_head
def build(self, feed_vars, mode='train'):
im = feed_vars['image']
im_info = feed_vars['im_info']
if mode == 'train':
gt_box = feed_vars['gt_box']
gt_label = feed_vars['gt_label']
is_crowd = feed_vars['is_crowd']
# backbone
body_feats = self.backbone(im)
# FPN
body_feats, spatial_scale = self.fpn.get_output(body_feats)
# retinanet head
if mode == 'train':
loss = self.retina_head.get_loss(body_feats, spatial_scale, im_info,
gt_box, gt_label, is_crowd)
total_loss = fluid.layers.sum(list(loss.values()))
loss.update({'loss': total_loss})
return loss
else:
pred = self.retina_head.get_prediction(body_feats, spatial_scale,
im_info)
return pred
def train(self, feed_vars):
return self.build(feed_vars, 'train')
def eval(self, feed_vars):
return self.build(feed_vars, 'test')
def test(self, feed_vars):
return self.build(feed_vars, 'test')
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from paddle import fluid
from ppdet.core.workspace import register
from ppdet.modeling.ops import SSDOutputDecoder, SSDMetric
__all__ = ['SSD']
@register
class SSD(object):
"""
Single Shot MultiBox Detector, see https://arxiv.org/abs/1512.02325
Args:
backbone (object): backbone instance
multi_box_head (object): `MultiBoxHead` instance
output_decoder (object): `SSDOutputDecoder` instance
metric (object): `SSDMetric` instance for training
num_classes (int): number of output classes
"""
__category__ = 'architecture'
__inject__ = ['backbone', 'multi_box_head', 'output_decoder', 'metric']
def __init__(self,
backbone,
multi_box_head='MultiBoxHead',
output_decoder=SSDOutputDecoder().__dict__,
metric=SSDMetric().__dict__,
num_classes=21):
super(SSD, self).__init__()
self.backbone = backbone
self.multi_box_head = multi_box_head
self.num_classes = num_classes
self.output_decoder = output_decoder
self.metric = metric
if isinstance(output_decoder, dict):
self.output_decoder = SSDOutputDecoder(**output_decoder)
if isinstance(metric, dict):
self.metric = SSDMetric(**metric)
def _forward(self, feed_vars, mode='train'):
im = feed_vars['image']
if mode == 'train' or mode == 'eval':
gt_box = feed_vars['gt_box']
gt_label = feed_vars['gt_label']
difficult = feed_vars['is_difficult']
body_feats = self.backbone(im)
locs, confs, box, box_var = self.multi_box_head(
inputs=body_feats, image=im, num_classes=self.num_classes)
if mode == 'train':
loss = fluid.layers.ssd_loss(locs, confs, gt_box, gt_label, box,
box_var)
loss = fluid.layers.reduce_sum(loss)
return {'loss': loss}
else:
pred = self.output_decoder(locs, confs, box, box_var)
if mode == 'eval':
map_eval = self.metric(
pred,
gt_label,
gt_box,
difficult,
class_num=self.num_classes)
_, accum_map = map_eval.get_map_var()
return {'map': map_eval, 'accum_map': accum_map}
else:
return {'bbox': pred}
def train(self, feed_vars):
return self._forward(feed_vars, 'train')
def eval(self, feed_vars):
return self._forward(feed_vars, 'eval')
def test(self, feed_vars):
return self._forward(feed_vars, 'test')
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from collections import OrderedDict
from ppdet.core.workspace import register
__all__ = ['YOLOv3']
@register
class YOLOv3(object):
"""
YOLOv3 network, see https://arxiv.org/abs/1804.02767
Args:
backbone (object): an backbone instance
yolo_head (object): an `YOLOv3Head` instance
"""
__category__ = 'architecture'
__inject__ = ['backbone', 'yolo_head']
def __init__(self, backbone, yolo_head='YOLOv3Head'):
super(YOLOv3, self).__init__()
self.backbone = backbone
self.yolo_head = yolo_head
def build(self, feed_vars, mode='train'):
im = feed_vars['image']
body_feats = self.backbone(im)
if isinstance(body_feats, OrderedDict):
body_feat_names = list(body_feats.keys())
body_feats = [body_feats[name] for name in body_feat_names]
if mode == 'train':
gt_box = feed_vars['gt_box']
gt_label = feed_vars['gt_label']
gt_score = feed_vars['gt_score']
return {
'loss': self.yolo_head.get_loss(body_feats, gt_box, gt_label,
gt_score)
}
else:
im_shape = feed_vars['im_shape']
return self.yolo_head.get_prediction(body_feats, im_shape)
def train(self, feed_vars):
return self.build(feed_vars, mode='train')
def eval(self, feed_vars):
return self.build(feed_vars, mode='test')
def test(self, feed_vars):
return self.build(feed_vars, mode='test')
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from . import resnet
from . import resnext
from . import darknet
from . import mobilenet
from . import senet
from . import fpn
from .resnet import *
from .resnext import *
from .darknet import *
from .mobilenet import *
from .senet import *
from .fpn import *
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import six
from paddle import fluid
from paddle.fluid.param_attr import ParamAttr
from paddle.fluid.regularizer import L2Decay
from ppdet.core.workspace import register
__all__ = ['DarkNet']
@register
class DarkNet(object):
"""
DarkNet, see https://pjreddie.com/darknet/yolo/
Args:
depth (int): network depth, currently only darknet 53 is supported
norm_type (str): normalization type, 'bn' and 'sync_bn' are supported
norm_decay (float): weight decay for normalization layer weights
"""
def __init__(self, depth=53, norm_type='bn', norm_decay=0.):
assert depth in [53], "unsupported depth value"
self.depth = depth
self.norm_type = norm_type
self.norm_decay = norm_decay
self.depth_cfg = {53: ([1, 2, 8, 8, 4], self.basicblock)}
def _conv_norm(self,
input,
ch_out,
filter_size,
stride,
padding,
act='leaky',
name=None):
conv = fluid.layers.conv2d(
input=input,
num_filters=ch_out,
filter_size=filter_size,
stride=stride,
padding=padding,
act=None,
param_attr=ParamAttr(name=name + ".conv.weights"),
bias_attr=False)
bn_name = name + ".bn"
bn_param_attr = ParamAttr(
regularizer=L2Decay(float(self.norm_decay)),
name=bn_name + '.scale')
bn_bias_attr = ParamAttr(
regularizer=L2Decay(float(self.norm_decay)),
name=bn_name + '.offset')
out = fluid.layers.batch_norm(
input=conv,
act=None,
param_attr=bn_param_attr,
bias_attr=bn_bias_attr,
moving_mean_name=bn_name + '.mean',
moving_variance_name=bn_name + '.var')
# leaky relu here has `alpha` as 0.1, can not be set by
# `act` param in fluid.layers.batch_norm above.
if act == 'leaky':
out = fluid.layers.leaky_relu(x=out, alpha=0.1)
return out
def _downsample(self,
input,
ch_out,
filter_size=3,
stride=2,
padding=1,
name=None):
return self._conv_norm(
input,
ch_out=ch_out,
filter_size=filter_size,
stride=stride,
padding=padding,
name=name)
def basicblock(self, input, ch_out, name=None):
conv1 = self._conv_norm(
input,
ch_out=ch_out,
filter_size=1,
stride=1,
padding=0,
name=name + ".0")
conv2 = self._conv_norm(
conv1,
ch_out=ch_out * 2,
filter_size=3,
stride=1,
padding=1,
name=name + ".1")
out = fluid.layers.elementwise_add(x=input, y=conv2, act=None)
return out
def layer_warp(self, block_func, input, ch_out, count, name=None):
out = block_func(input, ch_out=ch_out, name='{}.0'.format(name))
for j in six.moves.xrange(1, count):
out = block_func(out, ch_out=ch_out, name='{}.{}'.format(name, j))
return out
def __call__(self, input):
"""
Get the backbone of DarkNet, that is output for the 5 stages.
Args:
input (Variable): input variable.
Returns:
The last variables of each stage.
"""
stages, block_func = self.depth_cfg[self.depth]
stages = stages[0:5]
conv = self._conv_norm(
input=input,
ch_out=32,
filter_size=3,
stride=1,
padding=1,
name="yolo_input")
downsample_ = self._downsample(
input=conv, ch_out=conv.shape[1] * 2, name="yolo_input.downsample")
blocks = []
for i, stage in enumerate(stages):
block = self.layer_warp(
block_func=block_func,
input=downsample_,
ch_out=32 * 2**i,
count=stage,
name="stage.{}".format(i))
blocks.append(block)
if i < len(stages) - 1: # do not downsaple in the last stage
downsample_ = self._downsample(
input=block,
ch_out=block.shape[1] * 2,
name="stage.{}.downsample".format(i))
return blocks
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from collections import OrderedDict
from paddle import fluid
from paddle.fluid.param_attr import ParamAttr
from paddle.fluid.initializer import Xavier
from paddle.fluid.regularizer import L2Decay
from ppdet.core.workspace import register
__all__ = ['FPN']
@register
class FPN(object):
"""
Feature Pyramid Network, see https://arxiv.org/abs/1612.03144
Args:
num_chan (int): number of feature channels
min_level (int): lowest level of the backbone feature map to use
max_level (int): highest level of the backbone feature map to use
spatial_scale (list): feature map scaling factor
has_extra_convs (bool): whether has extral convolutions in higher levels
"""
def __init__(self,
num_chan=256,
min_level=2,
max_level=6,
spatial_scale=[1. / 32., 1. / 16., 1. / 8., 1. / 4.],
has_extra_convs=False):
self.num_chan = num_chan
self.min_level = min_level
self.max_level = max_level
self.spatial_scale = spatial_scale
self.has_extra_convs = has_extra_convs
def _add_topdown_lateral(self, body_name, body_input, upper_output):
lateral_name = 'fpn_inner_' + body_name + '_lateral'
topdown_name = 'fpn_topdown_' + body_name
fan = body_input.shape[1]
lateral = fluid.layers.conv2d(
body_input,
self.num_chan,
1,
param_attr=ParamAttr(
name=lateral_name + "_w", initializer=Xavier(fan_out=fan)),
bias_attr=ParamAttr(
name=lateral_name + "_b",
learning_rate=2.,
regularizer=L2Decay(0.)),
name=lateral_name)
shape = fluid.layers.shape(upper_output)
shape_hw = fluid.layers.slice(shape, axes=[0], starts=[2], ends=[4])
out_shape_ = shape_hw * 2
out_shape = fluid.layers.cast(out_shape_, dtype='int32')
out_shape.stop_gradient = True
topdown = fluid.layers.resize_nearest(
upper_output, scale=2., actual_shape=out_shape, name=topdown_name)
return lateral + topdown
def get_output(self, body_dict):
"""
Add FPN onto backbone.
Args:
body_dict(OrderedDict): Dictionary of variables and each element is the
output of backbone.
Return:
fpn_dict(OrderedDict): A dictionary represents the output of FPN with
their name.
spatial_scale(list): A list of multiplicative spatial scale factor.
"""
body_name_list = list(body_dict.keys())[::-1]
num_backbone_stages = len(body_name_list)
self.fpn_inner_output = [[] for _ in range(num_backbone_stages)]
fpn_inner_name = 'fpn_inner_' + body_name_list[0]
body_input = body_dict[body_name_list[0]]
fan = body_input.shape[1]
self.fpn_inner_output[0] = fluid.layers.conv2d(
body_input,
self.num_chan,
1,
param_attr=ParamAttr(
name=fpn_inner_name + "_w", initializer=Xavier(fan_out=fan)),
bias_attr=ParamAttr(
name=fpn_inner_name + "_b",
learning_rate=2.,
regularizer=L2Decay(0.)),
name=fpn_inner_name)
for i in range(1, num_backbone_stages):
body_name = body_name_list[i]
body_input = body_dict[body_name]
top_output = self.fpn_inner_output[i - 1]
fpn_inner_single = self._add_topdown_lateral(body_name, body_input,
top_output)
self.fpn_inner_output[i] = fpn_inner_single
fpn_dict = {}
fpn_name_list = []
for i in range(num_backbone_stages):
fpn_name = 'fpn_' + body_name_list[i]
fan = self.fpn_inner_output[i].shape[1] * 3 * 3
fpn_output = fluid.layers.conv2d(
self.fpn_inner_output[i],
self.num_chan,
filter_size=3,
padding=1,
param_attr=ParamAttr(
name=fpn_name + "_w", initializer=Xavier(fan_out=fan)),
bias_attr=ParamAttr(
name=fpn_name + "_b",
learning_rate=2.,
regularizer=L2Decay(0.)),
name=fpn_name)
fpn_dict[fpn_name] = fpn_output
fpn_name_list.append(fpn_name)
if not self.has_extra_convs and self.max_level - self.min_level == len(
self.spatial_scale):
body_top_name = fpn_name_list[0]
body_top_extension = fluid.layers.pool2d(
fpn_dict[body_top_name],
1,
'max',
pool_stride=2,
name=body_top_name + '_subsampled_2x')
fpn_dict[body_top_name + '_subsampled_2x'] = body_top_extension
fpn_name_list.insert(0, body_top_name + '_subsampled_2x')
self.spatial_scale.insert(0, self.spatial_scale[0] * 0.5)
# Coarser FPN levels introduced for RetinaNet
highest_backbone_level = self.min_level + len(self.spatial_scale) - 1
if self.has_extra_convs and self.max_level > highest_backbone_level:
fpn_blob = body_dict[body_name_list[0]]
for i in range(highest_backbone_level + 1, self.max_level + 1):
fpn_blob_in = fpn_blob
fpn_name = 'fpn_' + str(i)
if i > highest_backbone_level + 1:
fpn_blob_in = fluid.layers.relu(fpn_blob)
fan = fpn_blob_in.shape[1] * 3 * 3
fpn_blob = fluid.layers.conv2d(
input=fpn_blob_in,
num_filters=self.num_chan,
filter_size=3,
stride=2,
padding=1,
param_attr=ParamAttr(
name=fpn_name + "_w", initializer=Xavier(fan_out=fan)),
bias_attr=ParamAttr(
name=fpn_name + "_b",
learning_rate=2.,
regularizer=L2Decay(0.)),
name=fpn_name)
fpn_dict[fpn_name] = fpn_blob
fpn_name_list.insert(0, fpn_name)
self.spatial_scale.insert(0, self.spatial_scale[0] * 0.5)
res_dict = OrderedDict([(k, fpn_dict[k]) for k in fpn_name_list])
return res_dict, self.spatial_scale
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from paddle import fluid
from paddle.fluid.param_attr import ParamAttr
from paddle.fluid.regularizer import L2Decay
from ppdet.core.workspace import register
__all__ = ['MobileNet']
@register
class MobileNet(object):
"""
MobileNet v1, see https://arxiv.org/abs/1704.04861
Args:
norm_type (str): normalization type, 'bn' and 'sync_bn' are supported
norm_decay (float): weight decay for normalization layer weights
conv_group_scale (int): scaling factor for convolution groups
with_extra_blocks (bool): if extra blocks should be added
extra_block_filters (list): number of filter for each extra block
"""
def __init__(self,
norm_type='bn',
norm_decay=0.,
conv_group_scale=1,
with_extra_blocks=False,
extra_block_filters=[[256, 512], [128, 256], [128, 256],
[64, 128]]):
self.norm_type = norm_type
self.norm_decay = norm_decay
self.conv_group_scale = conv_group_scale
self.with_extra_blocks = with_extra_blocks
self.extra_block_filters = extra_block_filters
def _conv_norm(self,
input,
filter_size,
num_filters,
stride,
padding,
channels=None,
num_groups=1,
act='relu',
use_cudnn=True,
name=None):
parameter_attr = ParamAttr(
learning_rate=0.1,
initializer=fluid.initializer.MSRA(),
name=name + "_weights")
conv = fluid.layers.conv2d(
input=input,
num_filters=num_filters,
filter_size=filter_size,
stride=stride,
padding=padding,
groups=num_groups,
act=None,
use_cudnn=use_cudnn,
param_attr=parameter_attr,
bias_attr=False)
bn_name = name + "_bn"
norm_decay = self.norm_decay
bn_param_attr = ParamAttr(
regularizer=L2Decay(norm_decay), name=bn_name + '_scale')
bn_bias_attr = ParamAttr(
regularizer=L2Decay(norm_decay), name=bn_name + '_offset')
return fluid.layers.batch_norm(
input=conv,
act=act,
param_attr=bn_param_attr,
bias_attr=bn_bias_attr,
moving_mean_name=bn_name + '_mean',
moving_variance_name=bn_name + '_variance')
def depthwise_separable(self,
input,
num_filters1,
num_filters2,
num_groups,
stride,
scale,
name=None):
depthwise_conv = self._conv_norm(
input=input,
filter_size=3,
num_filters=int(num_filters1 * scale),
stride=stride,
padding=1,
num_groups=int(num_groups * scale),
use_cudnn=False,
name=name + "_dw")
pointwise_conv = self._conv_norm(
input=depthwise_conv,
filter_size=1,
num_filters=int(num_filters2 * scale),
stride=1,
padding=0,
name=name + "_sep")
return pointwise_conv
def _extra_block(self,
input,
num_filters1,
num_filters2,
num_groups,
stride,
scale,
name=None):
pointwise_conv = self._conv_norm(
input=input,
filter_size=1,
num_filters=int(num_filters1 * scale),
stride=1,
num_groups=int(num_groups * scale),
padding=0,
name=name + "_extra1")
normal_conv = self._conv_norm(
input=pointwise_conv,
filter_size=3,
num_filters=int(num_filters2 * scale),
stride=2,
num_groups=int(num_groups * scale),
padding=1,
name=name + "_extra2")
return normal_conv
def __call__(self, input):
scale = self.conv_group_scale
blocks = []
# input 1/1
out = self._conv_norm(input, 3, int(32 * scale), 2, 1, 3, name="conv1")
# 1/2
out = self.depthwise_separable(
out, 32, 64, 32, 1, scale, name="conv2_1")
out = self.depthwise_separable(
out, 64, 128, 64, 2, scale, name="conv2_2")
# 1/4
out = self.depthwise_separable(
out, 128, 128, 128, 1, scale, name="conv3_1")
out = self.depthwise_separable(
out, 128, 256, 128, 2, scale, name="conv3_2")
# 1/8
blocks.append(out)
out = self.depthwise_separable(
out, 256, 256, 256, 1, scale, name="conv4_1")
out = self.depthwise_separable(
out, 256, 512, 256, 2, scale, name="conv4_2")
# 1/16
blocks.append(out)
for i in range(5):
out = self.depthwise_separable(
out, 512, 512, 512, 1, scale, name="conv5_" + str(i + 1))
module11 = out
out = self.depthwise_separable(
out, 512, 1024, 512, 2, scale, name="conv5_6")
# 1/32
out = self.depthwise_separable(
out, 1024, 1024, 1024, 1, scale, name="conv6")
module13 = out
blocks.append(out)
if not self.with_extra_blocks:
return blocks
num_filters = self.extra_block_filters
module14 = self._extra_block(module13, num_filters[0][0],
num_filters[0][1], 1, 2, scale, "conv7_1")
module15 = self._extra_block(module14, num_filters[1][0],
num_filters[1][1], 1, 2, scale, "conv7_2")
module16 = self._extra_block(module15, num_filters[2][0],
num_filters[2][1], 1, 2, scale, "conv7_3")
module17 = self._extra_block(module16, num_filters[3][0],
num_filters[3][1], 1, 2, scale, "conv7_4")
return module11, module13, module14, module15, module16, module17
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
class NameAdapter(object):
"""Fix the backbones variable names for pretrained weight"""
def __init__(self, model):
super(NameAdapter, self).__init__()
self.model = model
@property
def model_type(self):
return getattr(self.model, '_model_type', '')
@property
def variant(self):
return getattr(self.model, 'variant', '')
def fix_conv_norm_name(self, name):
if name == "conv1":
bn_name = "bn_" + name
else:
bn_name = "bn" + name[3:]
# the naming rule is same as pretrained weight
if self.model_type == 'SEResNeXt':
bn_name = name + "_bn"
return bn_name
def fix_shortcut_name(self, name):
if self.model_type == 'SEResNeXt':
name = 'conv' + name + '_prj'
return name
def fix_bottleneck_name(self, name):
if self.model_type == 'SEResNeXt':
conv_name1 = 'conv' + name + '_x1'
conv_name2 = 'conv' + name + '_x2'
conv_name3 = 'conv' + name + '_x3'
shortcut_name = name
else:
conv_name1 = name + "_branch2a"
conv_name2 = name + "_branch2b"
conv_name3 = name + "_branch2c"
shortcut_name = name + "_branch1"
return conv_name1, conv_name2, conv_name3, shortcut_name
def fix_layer_warp_name(self, stage_num, count, i):
name = 'res' + str(stage_num)
if count > 10 and stage_num == 4:
if i == 0:
conv_name = name + "a"
else:
conv_name = name + "b" + str(i)
else:
conv_name = name + chr(ord("a") + i)
if self.model_type == 'SEResNeXt':
conv_name = str(stage_num + 2) + '_' + str(i + 1)
return conv_name
def fix_c1_stage_name(self):
return "res_conv1" if self.model_type == 'ResNeXt' else "conv1"
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from collections import OrderedDict
from paddle import fluid
from paddle.fluid.param_attr import ParamAttr
from paddle.fluid.framework import Variable
from paddle.fluid.regularizer import L2Decay
from ppdet.core.workspace import register, serializable
from numbers import Integral
from .name_adapter import NameAdapter
__all__ = ['ResNet', 'ResNetC5']
@register
@serializable
class ResNet(object):
"""
Residual Network, see https://arxiv.org/abs/1512.03385
Args:
depth (int): ResNet depth, should be 18, 34, 50, 101, 152.
freeze_at (int): freeze the backbone at which stage
norm_type (str): normalization type, 'bn'/'sync_bn'/'affine_channel'
freeze_norm (bool): freeze normalization layers
norm_decay (float): weight decay for normalization layer weights
variant (str): ResNet variant, supports 'a', 'b', 'c', 'd' currently
feature_maps (list): index of stages whose feature maps are returned
"""
def __init__(self,
depth=50,
freeze_at=2,
norm_type='affine_channel',
freeze_norm=True,
norm_decay=0.,
variant='b',
feature_maps=[2, 3, 4, 5]):
super(ResNet, self).__init__()
if isinstance(feature_maps, Integral):
feature_maps = [feature_maps]
assert depth in [18, 34, 50, 101, 152], \
"depth {} not in [18, 34, 50, 101, 152]"
assert variant in ['a', 'b', 'c', 'd'], "invalid ResNet variant"
assert 0 <= freeze_at <= 4, "freeze_at should be 0, 1, 2, 3 or 4"
assert len(feature_maps) > 0, "need one or more feature maps"
assert norm_type in ['bn', 'sync_bn', 'affine_channel']
self.depth = depth
self.freeze_at = freeze_at
self.norm_type = norm_type
self.norm_decay = norm_decay
self.freeze_norm = freeze_norm
self.variant = variant
self._model_type = 'ResNet'
self.feature_maps = feature_maps
self.depth_cfg = {
18: ([2, 2, 2, 2], self.basicblock),
34: ([3, 4, 6, 3], self.basicblock),
50: ([3, 4, 6, 3], self.bottleneck),
101: ([3, 4, 23, 3], self.bottleneck),
152: ([3, 8, 36, 3], self.bottleneck)
}
self.stage_filters = [64, 128, 256, 512]
self._c1_out_chan_num = 64
self.na = NameAdapter(self)
def _conv_norm(self,
input,
num_filters,
filter_size,
stride=1,
groups=1,
act=None,
name=None):
conv = fluid.layers.conv2d(
input=input,
num_filters=num_filters,
filter_size=filter_size,
stride=stride,
padding=(filter_size - 1) // 2,
groups=groups,
act=None,
param_attr=ParamAttr(name=name + "_weights"),
bias_attr=False,
name=name + '.conv2d.output.1')
bn_name = self.na.fix_conv_norm_name(name)
norm_lr = 0. if self.freeze_norm else 1.
norm_decay = self.norm_decay
pattr = ParamAttr(
name=bn_name + '_scale',
learning_rate=norm_lr,
regularizer=L2Decay(norm_decay))
battr = ParamAttr(
name=bn_name + '_offset',
learning_rate=norm_lr,
regularizer=L2Decay(norm_decay))
if self.norm_type in ['bn', 'sync_bn']:
out = fluid.layers.batch_norm(
input=conv,
act=act,
name=bn_name + '.output.1',
param_attr=pattr,
bias_attr=battr,
moving_mean_name=bn_name + '_mean',
moving_variance_name=bn_name + '_variance', )
scale = fluid.framework._get_var(pattr.name)
bias = fluid.framework._get_var(battr.name)
elif self.norm_type == 'affine_channel':
scale = fluid.layers.create_parameter(
shape=[conv.shape[1]],
dtype=conv.dtype,
attr=pattr,
default_initializer=fluid.initializer.Constant(1.))
bias = fluid.layers.create_parameter(
shape=[conv.shape[1]],
dtype=conv.dtype,
attr=battr,
default_initializer=fluid.initializer.Constant(0.))
out = fluid.layers.affine_channel(
x=conv, scale=scale, bias=bias, act=act)
if self.freeze_norm:
scale.stop_gradient = True
bias.stop_gradient = True
return out
def _shortcut(self, input, ch_out, stride, is_first, name):
max_pooling_in_short_cut = self.variant == 'd'
ch_in = input.shape[1]
# the naming rule is same as pretrained weight
name = self.na.fix_shortcut_name(name)
if ch_in != ch_out or stride != 1 or (self.depth < 50 and is_first):
if max_pooling_in_short_cut and not is_first:
input = fluid.layers.pool2d(
input=input,
pool_size=2,
pool_stride=2,
pool_padding=0,
ceil_mode=True,
pool_type='avg')
return self._conv_norm(input, ch_out, 1, 1, name=name)
return self._conv_norm(input, ch_out, 1, stride, name=name)
else:
return input
def bottleneck(self, input, num_filters, stride, is_first, name):
if self.variant == 'a':
stride1, stride2 = stride, 1
else:
stride1, stride2 = 1, stride
# ResNeXt
groups = getattr(self, 'groups', 1)
group_width = getattr(self, 'group_width', -1)
if groups == 1:
expand = 4
elif (groups * group_width) == 256:
expand = 1
else: # FIXME hard code for now, handles 32x4d, 64x4d and 32x8d
num_filters = num_filters // 2
expand = 2
conv_name1, conv_name2, conv_name3, \
shortcut_name = self.na.fix_bottleneck_name(name)
conv_def = [[num_filters, 1, stride1, 'relu', 1, conv_name1],
[num_filters, 3, stride2, 'relu', groups, conv_name2],
[num_filters * expand, 1, 1, None, 1, conv_name3]]
residual = input
for (c, k, s, act, g, _name) in conv_def:
residual = self._conv_norm(
input=residual,
num_filters=c,
filter_size=k,
stride=s,
act=act,
groups=g,
name=_name)
short = self._shortcut(
input,
num_filters * expand,
stride,
is_first=is_first,
name=shortcut_name)
# Squeeze-and-Excitation
if callable(getattr(self, '_squeeze_excitation', None)):
residual = self._squeeze_excitation(
input=residual, num_channels=num_filters, name='fc' + name)
return fluid.layers.elementwise_add(
x=short, y=residual, act='relu', name=name + ".add.output.5")
def basicblock(self, input, num_filters, stride, is_first, name):
conv0 = self._conv_norm(
input=input,
num_filters=num_filters,
filter_size=3,
act='relu',
stride=stride,
name=name + "_branch2a")
conv1 = self._conv_norm(
input=conv0,
num_filters=num_filters,
filter_size=3,
act=None,
name=name + "_branch2b")
short = self._shortcut(
input, num_filters, stride, is_first, name=name + "_branch1")
return fluid.layers.elementwise_add(x=short, y=conv1, act='relu')
def layer_warp(self, input, stage_num):
"""
Args:
input (Variable): input variable.
stage_num (int): the stage number, should be 2, 3, 4, 5
Returns:
The last variable in endpoint-th stage.
"""
assert stage_num in [2, 3, 4, 5]
stages, block_func = self.depth_cfg[self.depth]
count = stages[stage_num - 2]
ch_out = self.stage_filters[stage_num - 2]
is_first = False if stage_num != 2 else True
# Make the layer name and parameter name consistent
# with ImageNet pre-trained model
conv = input
for i in range(count):
conv_name = self.na.fix_layer_warp_name(stage_num, count, i)
if self.depth < 50:
is_first = True if i == 0 and stage_num == 2 else False
conv = block_func(
input=conv,
num_filters=ch_out,
stride=2 if i == 0 and stage_num != 2 else 1,
is_first=is_first,
name=conv_name)
return conv
def c1_stage(self, input):
out_chan = self._c1_out_chan_num
conv1_name = self.na.fix_c1_stage_name()
if self.variant in ['c', 'd']:
conv_def = [
[out_chan // 2, 3, 2, "conv1_1"],
[out_chan // 2, 3, 1, "conv1_2"],
[out_chan, 3, 1, "conv1_3"],
]
else:
conv_def = [[out_chan, 7, 2, conv1_name]]
for (c, k, s, _name) in conv_def:
input = self._conv_norm(
input=input,
num_filters=c,
filter_size=k,
stride=s,
act='relu',
name=_name)
output = fluid.layers.pool2d(
input=input,
pool_size=3,
pool_stride=2,
pool_padding=1,
pool_type='max')
return output
def __call__(self, input):
assert isinstance(input, Variable)
assert not (set(self.feature_maps) - set([2, 3, 4, 5])), \
"feature maps {} not in [2, 3, 4, 5]".format(self.feature_maps)
res_endpoints = []
res = input
feature_maps = self.feature_maps
severed_head = getattr(self, 'severed_head', False)
if not severed_head:
res = self.c1_stage(res)
feature_maps = range(2, max(self.feature_maps) + 1)
for i in feature_maps:
res = self.layer_warp(res, i)
if i in self.feature_maps:
res_endpoints.append(res)
if self.freeze_at >= i:
res.stop_gradient = True
return OrderedDict([('res{}_sum'.format(self.feature_maps[idx]), feat)
for idx, feat in enumerate(res_endpoints)])
@register
@serializable
class ResNetC5(ResNet):
__doc__ = ResNet.__doc__
def __init__(self,
depth=50,
freeze_at=2,
norm_type='affine_channel',
freeze_norm=True,
norm_decay=0.,
variant='b',
feature_maps=[5]):
super(ResNetC5, self).__init__(
depth, freeze_at, norm_type, freeze_norm, norm_decay,
variant, feature_maps)
self.severed_head = True
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from ppdet.core.workspace import register, serializable
from .resnet import ResNet
__all__ = ['ResNeXt']
@register
@serializable
class ResNeXt(ResNet):
"""
ResNeXt, see https://arxiv.org/abs/1611.05431
Args:
depth (int): network depth, should be 50, 101, 152.
groups (int): group convolution cardinality
group_width (int): width of each group convolution
freeze_at (int): freeze the backbone at which stage
norm_type (str): normalization type, 'bn', 'sync_bn' or 'affine_channel'
freeze_norm (bool): freeze normalization layers
norm_decay (float): weight decay for normalization layer weights
variant (str): ResNet variant, supports 'a', 'b', 'c', 'd' currently
feature_maps (list): index of the stages whose feature maps are returned
"""
def __init__(self,
depth=50,
groups=64,
group_width=4,
freeze_at=2,
norm_type='affine_channel',
freeze_norm=True,
norm_decay=True,
variant='a',
feature_maps=[2, 3, 4, 5]):
assert depth in [50, 101, 152], "depth {} should be 50, 101 or 152"
super(ResNeXt, self).__init__(depth, freeze_at, norm_type, freeze_norm,
norm_decay, variant, feature_maps)
self.depth_cfg = {
50: ([3, 4, 6, 3], self.bottleneck),
101: ([3, 4, 23, 3], self.bottleneck),
152: ([3, 8, 36, 3], self.bottleneck)
}
self.stage_filters = [256, 512, 1024, 2048]
self.groups = groups
self.group_width = group_width
self._model_type = 'ResNeXt'
@register
@serializable
class ResNeXtC5(ResNeXt):
__doc__ = ResNeXt.__doc__
def __init__(self,
depth=50,
groups=64,
group_width=4,
freeze_at=2,
norm_type='affine_channel',
freeze_norm=True,
norm_decay=True,
variant='a',
feature_maps=[5]):
super(ResNeXtC5, self).__init__(depth, groups, group_width, freeze_at,
norm_type, freeze_norm, norm_decay,
variant, feature_maps)
self.severed_head = True
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import math
from paddle import fluid
from paddle.fluid.param_attr import ParamAttr
from ppdet.core.workspace import register, serializable
from .resnext import ResNeXt
__all__ = ['SENet', 'SENetC5']
@register
@serializable
class SENet(ResNeXt):
"""
Squeeze-and-Excitation Networks, see https://arxiv.org/abs/1709.01507
Args:
depth (int): SENet depth, should be 50, 101, 152
groups (int): group convolution cardinality
group_width (int): width of each group convolution
freeze_at (int): freeze the backbone at which stage
norm_type (str): normalization type, 'bn', 'sync_bn' or 'affine_channel'
freeze_norm (bool): freeze normalization layers
norm_decay (float): weight decay for normalization layer weights
variant (str): ResNet variant, supports 'a', 'b', 'c', 'd' currently
feature_maps (list): index of the stages whose feature maps are returned
"""
def __init__(self,
depth=50,
groups=64,
group_width=4,
freeze_at=2,
norm_type='affine_channel',
freeze_norm=True,
norm_decay=0.,
variant='d',
feature_maps=[2, 3, 4, 5]):
super(SENet, self).__init__(depth, groups, group_width, freeze_at,
norm_type, freeze_norm, norm_decay, variant,
feature_maps)
if depth < 152:
self.stage_filters = [128, 256, 512, 1024]
else:
self.stage_filters = [256, 512, 1024, 2048]
self.reduction_ratio = 16
self._c1_out_chan_num = 128
self._model_type = 'SEResNeXt'
def _squeeze_excitation(self, input, num_channels, name=None):
pool = fluid.layers.pool2d(
input=input,
pool_size=0,
pool_type='avg',
global_pooling=True,
use_cudnn=False)
stdv = 1.0 / math.sqrt(pool.shape[1] * 1.0)
squeeze = fluid.layers.fc(
input=pool,
size=int(num_channels / self.reduction_ratio),
act='relu',
param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Uniform(-stdv, stdv),
name=name + '_sqz_weights'),
bias_attr=ParamAttr(name=name + '_sqz_offset'))
stdv = 1.0 / math.sqrt(squeeze.shape[1] * 1.0)
excitation = fluid.layers.fc(
input=squeeze,
size=num_channels,
act='sigmoid',
param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Uniform(-stdv, stdv),
name=name + '_exc_weights'),
bias_attr=ParamAttr(name=name + '_exc_offset'))
scale = fluid.layers.elementwise_mul(x=input, y=excitation, axis=0)
return scale
@register
@serializable
class SENetC5(SENet):
__doc__ = SENet.__doc__
def __init__(self,
depth=50,
groups=64,
group_width=4,
freeze_at=2,
norm_type='affine_channel',
freeze_norm=True,
norm_decay=0.,
variant='d',
feature_maps=[5]):
super(SENetC5, self).__init__(depth, groups, group_width, freeze_at,
norm_type, freeze_norm, norm_decay,
variant, feature_maps)
self.severed_head = True
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import print_function
from __future__ import division
from collections import OrderedDict
from paddle import fluid
__all__ = ['create_feed']
# yapf: disable
feed_var_def = [
{'name': 'im_info', 'shape': [3], 'dtype': 'float32', 'lod_level': 0},
{'name': 'im_id', 'shape': [1], 'dtype': 'int32', 'lod_level': 0},
{'name': 'gt_box', 'shape': [4], 'dtype': 'float32', 'lod_level': 1},
{'name': 'gt_label', 'shape': [1], 'dtype': 'int32', 'lod_level': 1},
{'name': 'is_crowd', 'shape': [1], 'dtype': 'int32', 'lod_level': 1},
{'name': 'gt_mask', 'shape': [2], 'dtype': 'float32', 'lod_level': 3},
{'name': 'is_difficult', 'shape': [1], 'dtype': 'int32', 'lod_level': 1},
{'name': 'gt_score', 'shape': [1], 'dtype': 'float32', 'lod_level': 0},
{'name': 'im_shape', 'shape': [3], 'dtype': 'float32', 'lod_level': 0},
]
# yapf: enable
def create_feed(feed, use_pyreader=True):
image_shape = feed.image_shape
feed_var_map = {var['name']: var for var in feed_var_def}
feed_var_map['image'] = {
'name': 'image',
'shape': image_shape,
'dtype': 'float32',
'lod_level': 0
}
# YOLO var dim is fixed
if getattr(feed, 'num_max_boxes', None) is not None:
feed_var_map['gt_label']['shape'] = [feed.num_max_boxes]
feed_var_map['gt_score']['shape'] = [feed.num_max_boxes]
feed_var_map['gt_box']['shape'] = [feed.num_max_boxes, 4]
feed_var_map['gt_label']['lod_level'] = 0
feed_var_map['gt_score']['lod_level'] = 0
feed_var_map['gt_box']['lod_level'] = 0
feed_var_map['im_shape']['shape'] = [2]
feed_var_map['im_shape']['dtype'] = 'int32'
feed_vars = OrderedDict([(key, fluid.layers.data(
name=feed_var_map[key]['name'],
shape=feed_var_map[key]['shape'],
dtype=feed_var_map[key]['dtype'],
lod_level=feed_var_map[key]['lod_level'])) for key in feed.fields])
pyreader = None
if use_pyreader:
pyreader = fluid.io.PyReader(
feed_list=list(feed_vars.values()),
capacity=64,
use_double_buffer=True,
iterable=False)
return pyreader, feed_vars
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from numbers import Integral
from paddle import fluid
from ppdet.core.workspace import register, serializable
__all__ = [
'AnchorGenerator', 'RPNTargetAssign', 'GenerateProposals', 'MultiClassNMS',
'BBoxAssigner', 'MaskAssigner', 'RoIAlign', 'RoIPool', 'MultiBoxHead',
'SSDOutputDecoder', 'SSDMetric', 'RetinaTargetAssign', 'RetinaOutputDecoder'
]
@register
@serializable
class AnchorGenerator(object):
__op__ = fluid.layers.anchor_generator
__append_doc__ = True
def __init__(self,
stride=[16.0, 16.0],
anchor_sizes=[32, 64, 128, 256, 512],
aspect_ratios=[0.5, 1., 2.],
variance=[1., 1., 1., 1.]):
super(AnchorGenerator, self).__init__()
self.anchor_sizes = anchor_sizes
self.aspect_ratios = aspect_ratios
self.variance = variance
self.stride = stride
@register
@serializable
class RPNTargetAssign(object):
__op__ = fluid.layers.rpn_target_assign
__append_doc__ = True
def __init__(self,
rpn_batch_size_per_im=256,
rpn_straddle_thresh=0.,
rpn_fg_fraction=0.5,
rpn_positive_overlap=0.7,
rpn_negative_overlap=0.3,
use_random=True):
super(RPNTargetAssign, self).__init__()
self.rpn_batch_size_per_im = rpn_batch_size_per_im
self.rpn_straddle_thresh = rpn_straddle_thresh
self.rpn_fg_fraction = rpn_fg_fraction
self.rpn_positive_overlap = rpn_positive_overlap
self.rpn_negative_overlap = rpn_negative_overlap
self.use_random = use_random
@register
@serializable
class GenerateProposals(object):
__op__ = fluid.layers.generate_proposals
__append_doc__ = True
def __init__(self,
pre_nms_top_n=6000,
post_nms_top_n=1000,
nms_thresh=.5,
min_size=.1,
eta=1.):
super(GenerateProposals, self).__init__()
self.pre_nms_top_n = pre_nms_top_n
self.post_nms_top_n = post_nms_top_n
self.nms_thresh = nms_thresh
self.min_size = min_size
self.eta = eta
@register
class MaskAssigner(object):
__op__ = fluid.layers.generate_mask_labels
__append_doc__ = True
def __init__(self, num_classes=81, resolution=14):
super(MaskAssigner, self).__init__()
self.num_classes = num_classes
self.resolution = resolution
@register
@serializable
class MultiClassNMS(object):
__op__ = fluid.layers.multiclass_nms
__append_doc__ = True
def __init__(self,
score_threshold=.05,
nms_top_k=-1,
keep_top_k=100,
nms_threshold=.5,
normalized=False,
nms_eta=1.0,
background_label=0):
super(MultiClassNMS, self).__init__()
self.score_threshold = score_threshold
self.nms_top_k = nms_top_k
self.keep_top_k = keep_top_k
self.nms_threshold = nms_threshold
self.normalized = normalized
self.nms_eta = nms_eta
self.background_label = background_label
@register
class BBoxAssigner(object):
__op__ = fluid.layers.generate_proposal_labels
__append_doc__ = True
def __init__(self,
batch_size_per_im=512,
fg_fraction=.25,
fg_thresh=.5,
bg_thresh_hi=.5,
bg_thresh_lo=0.,
bbox_reg_weights=[0.1, 0.1, 0.2, 0.2],
num_classes=81,
shuffle_before_sample=True):
super(BBoxAssigner, self).__init__()
self.batch_size_per_im = batch_size_per_im
self.fg_fraction = fg_fraction
self.fg_thresh = fg_thresh
self.bg_thresh_hi = bg_thresh_hi
self.bg_thresh_lo = bg_thresh_lo
self.bbox_reg_weights = bbox_reg_weights
self.class_nums = num_classes
self.use_random = shuffle_before_sample
@register
class RoIAlign(object):
__op__ = fluid.layers.roi_align
__append_doc__ = True
def __init__(self, resolution=7, spatial_scale=1. / 16, sampling_ratio=0):
super(RoIAlign, self).__init__()
if isinstance(resolution, Integral):
resolution = [resolution, resolution]
self.pooled_height = resolution[0]
self.pooled_width = resolution[1]
self.spatial_scale = spatial_scale
self.sampling_ratio = sampling_ratio
@register
class RoIPool(object):
__op__ = fluid.layers.roi_pool
__append_doc__ = True
def __init__(self, resolution=7, spatial_scale=1. / 16):
super(RoIPool, self).__init__()
if isinstance(resolution, Integral):
resolution = [resolution, resolution]
self.pooled_height = resolution[0]
self.pooled_width = resolution[1]
self.spatial_scale = spatial_scale
@register
class MultiBoxHead(object):
__op__ = fluid.layers.multi_box_head
__append_doc__ = True
def __init__(self,
min_ratio=20,
max_ratio=90,
min_sizes=[60.0, 105.0, 150.0, 195.0, 240.0, 285.0],
max_sizes=[[], 150.0, 195.0, 240.0, 285.0, 300.0],
aspect_ratios=[[2.], [2., 3.], [2., 3.], [2., 3.], [2., 3.],
[2., 3.]],
base_size=300,
offset=0.5,
flip=True):
super(MultiBoxHead, self).__init__()
self.min_ratio = min_ratio
self.max_ratio = max_ratio
self.min_sizes = min_sizes
self.max_sizes = max_sizes
self.aspect_ratios = aspect_ratios
self.base_size = base_size
self.offset = offset
self.flip = flip
@register
@serializable
class SSDOutputDecoder(object):
__op__ = fluid.layers.detection_output
__append_doc__ = True
def __init__(self,
nms_threshold=0.45,
nms_top_k=400,
keep_top_k=200,
score_threshold=0.01,
nms_eta=1.0,
background_label=0):
super(SSDOutputDecoder, self).__init__()
self.nms_threshold = nms_threshold
self.background_label = background_label
self.nms_top_k = nms_top_k
self.keep_top_k = keep_top_k
self.score_threshold = score_threshold
self.nms_eta = nms_eta
@register
@serializable
class SSDMetric(object):
__op__ = fluid.metrics.DetectionMAP
__append_doc__ = True
def __init__(self,
overlap_threshold=0.5,
evaluate_difficult=False,
ap_version='integral'):
super(SSDMetric, self).__init__()
self.overlap_threshold = overlap_threshold
self.evaluate_difficult = evaluate_difficult
self.ap_version = ap_version
@register
@serializable
class RetinaTargetAssign(object):
__op__ = fluid.layers.retinanet_target_assign
__append_doc__ = True
def __init__(self, positive_overlap=0.5, negative_overlap=0.4):
super(RetinaTargetAssign, self).__init__()
self.positive_overlap = positive_overlap
self.negative_overlap = negative_overlap
@register
@serializable
class RetinaOutputDecoder(object):
__op__ = fluid.layers.retinanet_detection_output
__append_doc__ = True
def __init__(self,
score_thresh=0.05,
nms_thresh=0.3,
pre_nms_top_n=1000,
detections_per_im=100,
nms_eta=1.0):
super(RetinaOutputDecoder, self).__init__()
self.score_threshold = score_thresh
self.nms_threshold = nms_thresh
self.nms_top_k = pre_nms_top_n
self.keep_top_k = detections_per_im
self.nms_eta = nms_eta
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from . import roi_extractor
from .roi_extractor import *
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import paddle.fluid as fluid
from ppdet.core.workspace import register
from ppdet.modeling.ops import RoIAlign, RoIPool
__all__ = ['RoIPool', 'RoIAlign', 'FPNRoIAlign']
@register
class FPNRoIAlign(object):
"""
RoI align pooling for FPN feature maps
Args:
pooled_height (int): output height
pooled_height (int): output width
sampling_ratio (int): number of sampling points
min_level (int): lowest level of FPN layer
max_level (int): highest level of FPN layer
canconical_level (int): the canconical FPN feature map level
canonical_size (int): the canconical FPN feature map size
"""
def __init__(self,
sampling_ratio=0,
min_level=2,
max_level=5,
canconical_level=4,
canonical_size=224,
box_resolution=7,
mask_resolution=14):
super(FPNRoIAlign, self).__init__()
self.sampling_ratio = sampling_ratio
self.min_level = min_level
self.max_level = max_level
self.canconical_level = canconical_level
self.canonical_size = canonical_size
self.box_resolution = box_resolution
self.mask_resolution = mask_resolution
def __call__(self, head_inputs, rois, spatial_scale, is_mask=False):
"""
Adopt RoI align onto several level of feature maps to get RoI features.
Distribute RoIs to different levels by area and get a list of RoI
features by distributed RoIs and their corresponding feature maps.
Returns:
roi_feat(Variable): RoI features with shape of [M, C, R, R],
where M is the number of RoIs and R is RoI resolution
"""
k_min = self.min_level
k_max = self.max_level
num_roi_lvls = k_max - k_min + 1
name_list = list(head_inputs.keys())
input_name_list = name_list[-num_roi_lvls:]
spatial_scale = spatial_scale[-num_roi_lvls:]
rois_dist, restore_index = fluid.layers.distribute_fpn_proposals(
rois, k_min, k_max, self.canconical_level, self.canonical_size)
# rois_dist is in ascend order
roi_out_list = []
resolution = is_mask and self.mask_resolution or self.box_resolution
for lvl in range(num_roi_lvls):
name_index = num_roi_lvls - lvl - 1
rois_input = rois_dist[lvl]
head_input = head_inputs[input_name_list[name_index]]
sc = spatial_scale[name_index]
roi_out = fluid.layers.roi_align(
input=head_input,
rois=rois_input,
pooled_height=resolution,
pooled_width=resolution,
spatial_scale=sc,
sampling_ratio=self.sampling_ratio)
roi_out_list.append(roi_out)
roi_feat_shuffle = fluid.layers.concat(roi_out_list)
roi_feat_ = fluid.layers.gather(roi_feat_shuffle, restore_index)
roi_feat = fluid.layers.lod_reset(roi_feat_, rois)
return roi_feat
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from . import bbox_head
from . import mask_head
from . import cascade_head
from .bbox_head import *
from .mask_head import *
from .cascade_head import *
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from collections import OrderedDict
from paddle import fluid
from paddle.fluid.param_attr import ParamAttr
from paddle.fluid.initializer import Normal, Xavier
from paddle.fluid.regularizer import L2Decay
from ppdet.modeling.ops import MultiClassNMS
from ppdet.core.workspace import register, serializable
__all__ = ['BBoxHead', 'TwoFCHead']
@register
@serializable
class BoxCoder(object):
__op__ = fluid.layers.box_coder
__append_doc__ = True
def __init__(self,
prior_box_var=[0.1, 0.1, 0.2, 0.2],
code_type='decode_center_size',
box_normalized=False,
axis=1):
super(BoxCoder, self).__init__()
self.prior_box_var = prior_box_var
self.code_type = code_type
self.box_normalized = box_normalized
self.axis = axis
@register
class TwoFCHead(object):
"""
RCNN head with two Fully Connected layers
Args:
num_chan (int): num of filters for the fc layers
"""
def __init__(self, num_chan=1024):
super(TwoFCHead, self).__init__()
self.num_chan = num_chan
def __call__(self, roi_feat):
fan = roi_feat.shape[1] * roi_feat.shape[2] * roi_feat.shape[3]
fc6 = fluid.layers.fc(input=roi_feat,
size=self.num_chan,
act='relu',
name='fc6',
param_attr=ParamAttr(
name='fc6_w',
initializer=Xavier(fan_out=fan)),
bias_attr=ParamAttr(
name='fc6_b',
learning_rate=2.,
regularizer=L2Decay(0.)))
head_feat = fluid.layers.fc(input=fc6,
size=self.num_chan,
act='relu',
name='fc7',
param_attr=ParamAttr(
name='fc7_w', initializer=Xavier()),
bias_attr=ParamAttr(
name='fc7_b',
learning_rate=2.,
regularizer=L2Decay(0.)))
return head_feat
@register
class BBoxHead(object):
"""
RCNN bbox head
Args:
head (object): the head module instance, e.g., `ResNetC5` or `TwoFCHead`
box_coder (object): `BoxCoder` instance
nms (object): `MultiClassNMS` instance
num_classes: number of output classes
"""
__inject__ = ['head', 'box_coder', 'nms']
def __init__(self,
head,
box_coder=BoxCoder().__dict__,
nms=MultiClassNMS().__dict__,
num_classes=81):
super(BBoxHead, self).__init__()
self.head = head
self.num_classes = num_classes
self.box_coder = box_coder
self.nms = nms
if isinstance(box_coder, dict):
self.box_coder = BoxCoder(**box_coder)
if isinstance(nms, dict):
self.nms = MultiClassNMS(**nms)
self.head_feat = None
def get_head_feat(self, input=None):
"""
Get the bbox head feature map.
"""
if input is not None:
feat = self.head(input)
if isinstance(feat, OrderedDict):
feat = list(feat.values())[0]
self.head_feat = feat
return self.head_feat
def _get_output(self, roi_feat):
"""
Get bbox head output.
Args:
roi_feat (Variable): RoI feature from RoIExtractor.
Returns:
cls_score(Variable): Output of rpn head with shape of
[N, num_anchors, H, W].
bbox_pred(Variable): Output of rpn head with shape of
[N, num_anchors * 4, H, W].
"""
head_feat = self.get_head_feat(roi_feat)
# when ResNetC5 output a single feature map
if not isinstance(self.head, TwoFCHead):
head_feat = fluid.layers.pool2d(
head_feat, pool_type='avg', global_pooling=True)
cls_score = fluid.layers.fc(input=head_feat,
size=self.num_classes,
act=None,
name='cls_score',
param_attr=ParamAttr(
name='cls_score_w',
initializer=Normal(
loc=0.0, scale=0.01)),
bias_attr=ParamAttr(
name='cls_score_b',
learning_rate=2.,
regularizer=L2Decay(0.)))
bbox_pred = fluid.layers.fc(input=head_feat,
size=4 * self.num_classes,
act=None,
name='bbox_pred',
param_attr=ParamAttr(
name='bbox_pred_w',
initializer=Normal(
loc=0.0, scale=0.001)),
bias_attr=ParamAttr(
name='bbox_pred_b',
learning_rate=2.,
regularizer=L2Decay(0.)))
return cls_score, bbox_pred
def get_loss(self, roi_feat, labels_int32, bbox_targets,
bbox_inside_weights, bbox_outside_weights):
"""
Get bbox_head loss.
Args:
roi_feat (Variable): RoI feature from RoIExtractor.
labels_int32(Variable): Class label of a RoI with shape [P, 1].
P is the number of RoI.
bbox_targets(Variable): Box label of a RoI with shape
[P, 4 * class_nums].
bbox_inside_weights(Variable): Indicates whether a box should
contribute to loss. Same shape as bbox_targets.
bbox_outside_weights(Variable): Indicates whether a box should
contribute to loss. Same shape as bbox_targets.
Return:
Type: Dict
loss_cls(Variable): bbox_head loss.
loss_bbox(Variable): bbox_head loss.
"""
cls_score, bbox_pred = self._get_output(roi_feat)
labels_int64 = fluid.layers.cast(x=labels_int32, dtype='int64')
labels_int64.stop_gradient = True
loss_cls = fluid.layers.softmax_with_cross_entropy(
logits=cls_score, label=labels_int64, numeric_stable_mode=True)
loss_cls = fluid.layers.reduce_mean(loss_cls)
loss_bbox = fluid.layers.smooth_l1(
x=bbox_pred,
y=bbox_targets,
inside_weight=bbox_inside_weights,
outside_weight=bbox_outside_weights,
sigma=1.0)
loss_bbox = fluid.layers.reduce_mean(loss_bbox)
return {'loss_cls': loss_cls, 'loss_bbox': loss_bbox}
def get_prediction(self, roi_feat, rois, im_info, im_shape):
"""
Get prediction bounding box in test stage.
Args:
rois (Variable): Output of generate_proposals in rpn head.
im_info (Variable): A 2-D LoDTensor with shape [B, 3]. B is the
number of input images, each element consists of im_height,
im_width, im_scale.
cls_score (Variable), bbox_pred(Variable): Output of get_output.
Returns:
pred_result(Variable): Prediction result with shape [N, 6]. Each
row has 6 values: [label, confidence, xmin, ymin, xmax, ymax].
N is the total number of prediction.
"""
cls_score, bbox_pred = self._get_output(roi_feat)
im_scale = fluid.layers.slice(im_info, [1], starts=[2], ends=[3])
im_scale = fluid.layers.sequence_expand(im_scale, rois)
boxes = rois / im_scale
cls_prob = fluid.layers.softmax(cls_score, use_cudnn=False)
bbox_pred = fluid.layers.reshape(bbox_pred, (-1, self.num_classes, 4))
decoded_box = self.box_coder(prior_box=boxes, target_box=bbox_pred)
cliped_box = fluid.layers.box_clip(input=decoded_box, im_info=im_shape)
pred_result = self.nms(bboxes=cliped_box, scores=cls_prob)
return {'bbox': pred_result}
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import paddle.fluid as fluid
from paddle.fluid.param_attr import ParamAttr
from paddle.fluid.initializer import Normal, Xavier
from paddle.fluid.regularizer import L2Decay
from ppdet.modeling.ops import MultiClassNMS
from ppdet.core.workspace import register
__all__ = ['CascadeBBoxHead']
@register
class CascadeBBoxHead(object):
"""
Cascade RCNN bbox head
Args:
head (object): the head module instance
nms (object): `MultiClassNMS` instance
num_classes: number of output classes
"""
__inject__ = ['head', 'nms']
def __init__(self, head, nms=MultiClassNMS().__dict__, num_classes=81):
super(CascadeBBoxHead, self).__init__()
self.head = head
self.nms = nms
self.num_classes = num_classes
if isinstance(nms, dict):
self.nms = MultiClassNMS(**nms)
def get_output(self,
roi_feat,
cls_agnostic_bbox_reg=2,
wb_scalar=2.0,
name=''):
"""
Get bbox head output.
Args:
roi_feat (Variable): RoI feature from RoIExtractor.
cls_agnostic_bbox_reg(Int): BBox regressor are class agnostic.
wb_scalar(Float): Weights and Bias's learning rate.
name(String): Layer's name
Returns:
cls_score(Variable): cls score.
bbox_pred(Variable): bbox regression.
"""
head_feat = self.head(roi_feat, wb_scalar, name)
cls_score = fluid.layers.fc(input=head_feat,
size=self.num_classes,
act=None,
name='cls_score' + name,
param_attr=ParamAttr(
name='cls_score%s_w' % name,
initializer=Normal(
loc=0.0, scale=0.01),
learning_rate=wb_scalar),
bias_attr=ParamAttr(
name='cls_score%s_b' % name,
learning_rate=wb_scalar,
regularizer=L2Decay(0.)))
bbox_pred = fluid.layers.fc(input=head_feat,
size=4 * cls_agnostic_bbox_reg,
act=None,
name='bbox_pred' + name,
param_attr=ParamAttr(
name='bbox_pred%s_w' % name,
initializer=Normal(
loc=0.0, scale=0.001),
learning_rate=wb_scalar),
bias_attr=ParamAttr(
name='bbox_pred%s_b' % name,
learning_rate=wb_scalar,
regularizer=L2Decay(0.)))
return cls_score, bbox_pred
def get_loss(self, rcnn_pred_list, rcnn_target_list, rcnn_loss_weight_list):
"""
Get bbox_head loss.
Args:
rcnn_pred_list(List): Cascade RCNN's head's output including
bbox_pred and cls_score
rcnn_target_list(List): Cascade rcnn's bbox and label target
rcnn_loss_weight_list(List): The weight of location and class loss
Return:
loss_cls(Variable): bbox_head loss.
loss_bbox(Variable): bbox_head loss.
"""
loss_dict = {}
for i, (rcnn_pred, rcnn_target
) in enumerate(zip(rcnn_pred_list, rcnn_target_list)):
labels_int64 = fluid.layers.cast(x=rcnn_target[1], dtype='int64')
labels_int64.stop_gradient = True
loss_cls = fluid.layers.softmax_with_cross_entropy(
logits=rcnn_pred[0],
label=labels_int64,
numeric_stable_mode=True, )
loss_cls = fluid.layers.reduce_mean(
loss_cls, name='loss_cls_' + str(i)) * rcnn_loss_weight_list[i]
loss_bbox = fluid.layers.smooth_l1(
x=rcnn_pred[1],
y=rcnn_target[2],
inside_weight=rcnn_target[3],
outside_weight=rcnn_target[4],
sigma=1.0, # detectron use delta = 1./sigma**2
)
loss_bbox = fluid.layers.reduce_mean(
loss_bbox,
name='loss_bbox_' + str(i)) * rcnn_loss_weight_list[i]
loss_dict['loss_cls_%d' % i] = loss_cls
loss_dict['loss_loc_%d' % i] = loss_bbox
return loss_dict
def get_prediction(self,
im_info,
roi_feat_list,
rcnn_pred_list,
proposal_list,
cascade_bbox_reg_weights,
cls_agnostic_bbox_reg=2):
"""
Get prediction bounding box in test stage.
:
Args:
im_info (Variable): A 2-D LoDTensor with shape [B, 3]. B is the
number of input images, each element consists
of im_height, im_width, im_scale.
rois_feat_list (List): RoI feature from RoIExtractor.
rcnn_pred_list (Variable): Cascade rcnn's head's output
including bbox_pred and cls_score
proposal_list (List): RPN proposal boxes.
cascade_bbox_reg_weights (List): BBox decode var.
cls_agnostic_bbox_reg(Int): BBox regressor are class agnostic
Returns:
pred_result(Variable): Prediction result with shape [N, 6]. Each
row has 6 values: [label, confidence, xmin, ymin, xmax, ymax].
N is the total number of prediction.
"""
self.im_scale = fluid.layers.slice(im_info, [1], starts=[2], ends=[3])
boxes_cls_prob_l = []
rcnn_pred = rcnn_pred_list[-1] # stage 3
repreat_num = 1
repreat_num = 3
bbox_reg_w = cascade_bbox_reg_weights[-1]
for i in range(repreat_num):
# cls score
if i < 2:
cls_score = self._head_share(
roi_feat_list[-1], # roi_feat_3
name='_' + str(i + 1) if i > 0 else '')
else:
cls_score = rcnn_pred[0]
cls_prob = fluid.layers.softmax(cls_score, use_cudnn=False)
boxes_cls_prob_l.append(cls_prob)
boxes_cls_prob_mean = (
boxes_cls_prob_l[0] + boxes_cls_prob_l[1] + boxes_cls_prob_l[2]
) / 3.0
# bbox pred
proposals_boxes = proposal_list[-1]
im_scale_lod = fluid.layers.sequence_expand(self.im_scale,
proposals_boxes)
proposals_boxes = proposals_boxes / im_scale_lod
bbox_pred = rcnn_pred[1]
bbox_pred_new = fluid.layers.reshape(bbox_pred,
(-1, cls_agnostic_bbox_reg, 4))
if cls_agnostic_bbox_reg == 2:
# only use fg box delta to decode box
bbox_pred_new = fluid.layers.slice(
bbox_pred_new, axes=[1], starts=[1], ends=[2])
bbox_pred_new = fluid.layers.expand(bbox_pred_new, [1, 81, 1])
decoded_box = fluid.layers.box_coder(
prior_box=proposals_boxes,
prior_box_var=bbox_reg_w,
target_box=bbox_pred_new,
code_type='decode_center_size',
box_normalized=False,
axis=1)
# TODO: notice detectron use img.shape
box_out = fluid.layers.box_clip(input=decoded_box, im_info=im_info)
pred_result = self.nms(bboxes=box_out, scores=boxes_cls_prob_mean)
return {"bbox": pred_result}
def _head_share(self, roi_feat, wb_scalar=2.0, name=''):
# FC6 FC7
fan = roi_feat.shape[1] * roi_feat.shape[2] * roi_feat.shape[3]
fc6 = fluid.layers.fc(input=roi_feat,
size=self.head.num_chan,
act='relu',
name='fc6' + name,
param_attr=ParamAttr(
name='fc6%s_w' % name,
initializer=Xavier(fan_out=fan),
learning_rate=wb_scalar, ),
bias_attr=ParamAttr(
name='fc6%s_b' % name,
learning_rate=2.0,
regularizer=L2Decay(0.)))
fc7 = fluid.layers.fc(input=fc6,
size=self.head.num_chan,
act='relu',
name='fc7' + name,
param_attr=ParamAttr(
name='fc7%s_w' % name,
initializer=Xavier(),
learning_rate=wb_scalar, ),
bias_attr=ParamAttr(
name='fc7%s_b' % name,
learning_rate=2.0,
regularizer=L2Decay(0.)))
cls_score = fluid.layers.fc(input=fc7,
size=self.num_classes,
act=None,
name='cls_score' + name,
param_attr=ParamAttr(
name='cls_score%s_w' % name,
initializer=Normal(
loc=0.0, scale=0.01),
learning_rate=wb_scalar, ),
bias_attr=ParamAttr(
name='cls_score%s_b' % name,
learning_rate=2.0,
regularizer=L2Decay(0.)))
return cls_score
@register
class FC6FC7Head(object):
"""
Cascade RCNN head with two Fully Connected layers
Args:
num_chan (int): num of filters for the fc layers
"""
def __init__(self, num_chan):
super(FC6FC7Head, self).__init__()
self.num_chan = num_chan
def __call__(self, roi_feat, wb_scalar=1.0, name=''):
fan = roi_feat.shape[1] * roi_feat.shape[2] * roi_feat.shape[3]
fc6 = fluid.layers.fc(input=roi_feat,
size=self.num_chan,
act='relu',
name='fc6' + name,
param_attr=ParamAttr(
name='fc6%s_w' % name,
initializer=Xavier(fan_out=fan),
learning_rate=wb_scalar),
bias_attr=ParamAttr(
name='fc6%s_b' % name,
learning_rate=wb_scalar,
regularizer=L2Decay(0.)))
head_feat = fluid.layers.fc(input=fc6,
size=self.num_chan,
act='relu',
name='fc7' + name,
param_attr=ParamAttr(
name='fc7%s_w' % name,
initializer=Xavier(),
learning_rate=wb_scalar),
bias_attr=ParamAttr(
name='fc7%s_b' % name,
learning_rate=wb_scalar,
regularizer=L2Decay(0.)))
return head_feat
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from paddle import fluid
from paddle.fluid.param_attr import ParamAttr
from paddle.fluid.initializer import MSRA
from paddle.fluid.regularizer import L2Decay
from ppdet.core.workspace import register
__all__ = ['MaskHead']
@register
class MaskHead(object):
"""
RCNN mask head
Args:
num_convs (int): num of convolutions, 4 for FPN, 1 otherwise
num_chan_reduced (int): num of channels after first convolution
resolution (int): size of the output mask
dilation (int): dilation rate
num_classes (int): number of output classes
"""
def __init__(self,
num_convs=0,
num_chan_reduced=256,
resolution=14,
dilation=1,
num_classes=81):
super(MaskHead, self).__init__()
self.num_convs = num_convs
self.num_chan_reduced = num_chan_reduced
self.resolution = resolution
self.dilation = dilation
self.num_classes = num_classes
def _mask_conv_head(self, roi_feat, num_convs):
for i in range(num_convs):
layer_name = "mask_inter_feat_" + str(i + 1)
fan = self.num_chan_reduced * 3 * 3
roi_feat = fluid.layers.conv2d(
input=roi_feat,
num_filters=self.num_chan_reduced,
filter_size=3,
padding=1 * self.dilation,
act='relu',
stride=1,
dilation=self.dilation,
name=layer_name,
param_attr=ParamAttr(
name=layer_name + '_w',
initializer=MSRA(
uniform=False, fan_in=fan)),
bias_attr=ParamAttr(
name=layer_name + '_b',
learning_rate=2.,
regularizer=L2Decay(0.)))
fan = roi_feat.shape[1] * 2 * 2
feat = fluid.layers.conv2d_transpose(
input=roi_feat,
num_filters=self.num_chan_reduced,
filter_size=2,
stride=2,
act='relu',
param_attr=ParamAttr(
name='conv5_mask_w',
initializer=MSRA(
uniform=False, fan_in=fan)),
bias_attr=ParamAttr(
name='conv5_mask_b', learning_rate=2., regularizer=L2Decay(0.)))
return feat
def _get_output(self, roi_feat):
class_num = self.num_classes
# configure the conv number for FPN if necessary
head_feat = self._mask_conv_head(roi_feat, self.num_convs)
fan = class_num
mask_logits = fluid.layers.conv2d(
input=head_feat,
num_filters=class_num,
filter_size=1,
act=None,
param_attr=ParamAttr(
name='mask_fcn_logits_w',
initializer=MSRA(
uniform=False, fan_in=fan)),
bias_attr=ParamAttr(
name="mask_fcn_logits_b",
learning_rate=2.,
regularizer=L2Decay(0.)))
return mask_logits
def get_loss(self, roi_feat, mask_int32):
mask_logits = self._get_output(roi_feat)
num_classes = self.num_classes
resolution = self.resolution
dim = num_classes * resolution * resolution
mask_logits = fluid.layers.reshape(mask_logits, (-1, dim))
mask_label = fluid.layers.cast(x=mask_int32, dtype='float32')
mask_label.stop_gradient = True
loss_mask = fluid.layers.sigmoid_cross_entropy_with_logits(
x=mask_logits, label=mask_label, ignore_index=-1, normalize=True)
loss_mask = fluid.layers.reduce_sum(loss_mask, name='loss_mask')
return {'loss_mask': loss_mask}
def get_prediction(self, roi_feat, bbox_pred):
"""
Get prediction mask in test stage.
Args:
roi_feat (Variable): RoI feature from RoIExtractor.
bbox_pred (Variable): predicted bbox.
Returns:
mask_pred (Variable): Prediction mask with shape
[N, num_classes, resolution, resolution].
"""
mask_logits = self._get_output(roi_feat)
mask_prob = fluid.layers.sigmoid(mask_logits)
mask_prob = fluid.layers.lod_reset(mask_prob, bbox_pred)
return mask_prob
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from paddle import fluid
from ppdet.core.workspace import register
from ppdet.modeling.ops import BBoxAssigner, MaskAssigner
__all__ = ['BBoxAssigner', 'MaskAssigner', 'CascadeBBoxAssigner']
@register
class CascadeBBoxAssigner(object):
def __init__(self,
batch_size_per_im=512,
fg_fraction=.25,
fg_thresh=[0.5, 0.6, 0.7],
bg_thresh_hi=[0.5, 0.6, 0.7],
bg_thresh_lo=[0., 0., 0.],
bbox_reg_weights=[10, 20, 30],
num_classes=81,
shuffle_before_sample=True):
super(CascadeBBoxAssigner, self).__init__()
self.batch_size_per_im = batch_size_per_im
self.fg_fraction = fg_fraction
self.fg_thresh = fg_thresh
self.bg_thresh_hi = bg_thresh_hi
self.bg_thresh_lo = bg_thresh_lo
self.bbox_reg_weights = bbox_reg_weights
self.class_nums = num_classes
self.use_random = shuffle_before_sample
def __call__(self, input_rois, feed_vars, curr_stage):
curr_bbox_reg_w = [
1. / self.bbox_reg_weights[curr_stage],
2. / self.bbox_reg_weights[curr_stage],
2. / self.bbox_reg_weights[curr_stage],
2. / self.bbox_reg_weights[curr_stage],
]
outs = fluid.layers.generate_proposal_labels(
rpn_rois=input_rois,
gt_classes=feed_vars['gt_label'],
is_crowd=feed_vars['is_crowd'],
gt_boxes=feed_vars['gt_box'],
im_info=feed_vars['im_info'],
batch_size_per_im=self.batch_size_per_im,
fg_thresh=self.fg_thresh[curr_stage],
bg_thresh_hi=self.bg_thresh_hi[curr_stage],
bg_thresh_lo=self.bg_thresh_lo[curr_stage],
bbox_reg_weights=curr_bbox_reg_w,
use_random=self.use_random,
class_nums=2,
is_cls_agnostic=True,
is_cascade_rcnn=True if curr_stage > 0 else False)
return outs
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import paddle.fluid as fluid
__all__ = ['prog_scope']
def prog_scope():
def __impl__(fn):
def __fn__(*args, **kwargs):
prog = fluid.Program()
startup_prog = fluid.Program()
scope = fluid.core.Scope()
with fluid.scope_guard(scope):
with fluid.program_guard(prog, startup_prog):
with fluid.unique_name.guard():
fn(*args, **kwargs)
return __fn__
return __impl__
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import unittest
import numpy as np
import paddle.fluid as fluid
from ppdet.modeling.tests.decorator_helper import prog_scope
from ppdet.core.workspace import load_config, merge_config, create
from ppdet.modeling.model_input import create_feed
class TestFasterRCNN(unittest.TestCase):
def setUp(self):
self.set_config()
self.cfg = load_config(self.cfg_file)
self.detector_type = self.cfg['architecture']
def set_config(self):
self.cfg_file = 'configs/faster_rcnn_r50_1x.yml'
@prog_scope()
def test_train(self):
train_feed = create(self.cfg['train_feed'])
model = create(self.detector_type)
_, feed_vars = create_feed(train_feed)
train_fetches = model.train(feed_vars)
@prog_scope()
def test_test(self):
test_feed = create(self.cfg['eval_feed'])
model = create(self.detector_type)
_, feed_vars = create_feed(test_feed)
test_fetches = model.eval(feed_vars)
class TestMaskRCNN(TestFasterRCNN):
def set_config(self):
self.cfg_file = 'configs/mask_rcnn_r50_1x.yml'
class TestCascadeRCNN(TestFasterRCNN):
def set_config(self):
self.cfg_file = 'configs/cascade_rcnn_r50_fpn_1x.yml'
class TestYolov3(TestFasterRCNN):
def set_config(self):
self.cfg_file = 'configs/yolov3_darknet.yml'
class TestRetinaNet(TestFasterRCNN):
def set_config(self):
self.cfg_file = 'configs/retinanet_r50_fpn_1x.yml'
class TestSSD(TestFasterRCNN):
def set_config(self):
self.cfg_file = 'configs/ssd_mobilenet_v1_voc.yml'
if __name__ == '__main__':
unittest.main()
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import logging
from paddle import fluid
import paddle.fluid.optimizer as optimizer
import paddle.fluid.regularizer as regularizer
from ppdet.core.workspace import register, serializable
__all__ = ['LearningRate', 'OptimizerBuilder']
logger = logging.getLogger(__name__)
@serializable
class PiecewiseDecay(object):
"""
Multi step learning rate decay
Args:
gamma (float): decay factor
milestones (list): steps at which to decay learning rate
"""
def __init__(self, gamma=0.1, milestones=[6000, 8000], values=None):
super(PiecewiseDecay, self).__init__()
self.gamma = gamma
self.milestones = milestones
self.values = values
def __call__(self, base_lr=None, learning_rate=None):
if self.values is not None:
return fluid.layers.piecewise_decay(self.milestones, self.values)
assert base_lr is not None, "either base LR or values should be provided"
values = [base_lr]
lr = base_lr
for _ in self.milestones:
lr *= self.gamma
values.append(lr)
return fluid.layers.piecewise_decay(self.milestones, values)
@serializable
class LinearWarmup(object):
"""
Warm up learning rate linearly
Args:
steps (int): warm up steps
start_factor (float): initial learning rate factor
"""
def __init__(self, steps=500, start_factor=1. / 3):
super(LinearWarmup, self).__init__()
self.steps = steps
self.start_factor = start_factor
def __call__(self, base_lr, learning_rate):
start_lr = base_lr * self.start_factor
return fluid.layers.linear_lr_warmup(
learning_rate=learning_rate,
warmup_steps=self.steps,
start_lr=start_lr,
end_lr=base_lr)
@register
class LearningRate(object):
"""
Learning Rate configuration
Args:
base_lr (float): base learning rate
schedulers (list): learning rate schedulers
"""
__category__ = 'optim'
def __init__(self,
base_lr=0.01,
schedulers=[PiecewiseDecay(), LinearWarmup()]):
super(LearningRate, self).__init__()
self.base_lr = base_lr
self.schedulers = schedulers
def __call__(self):
lr = None
for sched in self.schedulers:
lr = sched(self.base_lr, lr)
return lr
@register
class OptimizerBuilder():
"""
Build optimizer handles
Args:
regularizer (object): an `Regularizer` instance
optimizer (object): an `Optimizer` instance
"""
__category__ = 'optim'
def __init__(self,
regularizer={'type': 'L2',
'factor': .0001},
optimizer={'type': 'Momentum',
'momentum': .9}):
self.regularizer = regularizer
self.optimizer = optimizer
def __call__(self, learning_rate):
reg_type = self.regularizer['type'] + 'Decay'
reg_factor = self.regularizer['factor']
regularization = getattr(regularizer, reg_type)(reg_factor)
optim_args = self.optimizer.copy()
optim_type = optim_args['type']
del optim_args['type']
op = getattr(optimizer, optim_type)
return op(learning_rate=learning_rate,
regularization=regularization,
**optim_args)
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import os
import shutil
import numpy as np
import paddle.fluid as fluid
from .download import get_weights_path
import logging
logger = logging.getLogger(__name__)
__all__ = ['load_checkpoint', 'load_and_fusebn', 'save']
def is_url(path):
"""
Whether path is URL.
Args:
path (string): URL string or not.
"""
return path.startswith('http://') or path.startswith('https://')
def load_pretrain(exe, prog, path):
"""
Load model from the given path.
Args:
exe (fluid.Executor): The fluid.Executor object.
prog (fluid.Program): load weight to which Program object.
path (string): URL string or loca model path.
"""
if is_url(path):
path = get_weights_path(path)
if not os.path.exists(path):
logger.info('Model path {} does not exists.'.format(path))
logger.info('Loading pretrained model from {}...'.format(path))
def _if_exist(var):
b = os.path.exists(os.path.join(path, var.name))
if b:
logger.debug('load weight {}'.format(var.name))
return b
fluid.io.load_vars(exe, path, prog, predicate=_if_exist)
def load_checkpoint(exe, prog, path):
"""
Load model from the given path.
Args:
exe (fluid.Executor): The fluid.Executor object.
prog (fluid.Program): load weight to which Program object.
path (string): URL string or loca model path.
"""
if is_url(path):
path = get_weights_path(path)
if not os.path.exists(path):
logger.info('Model path {} does not exists.'.format(path))
logger.info('Loading checkpoint from {}...'.format(path))
fluid.io.load_persistables(exe, path, prog)
def save(exe, prog, path):
"""
Load model from the given path.
Args:
exe (fluid.Executor): The fluid.Executor object.
prog (fluid.Program): save weight from which Program object.
path (string): the path to save model.
"""
if os.path.isdir(path):
shutil.rmtree(path)
logger.info('Save model to {}.'.format(path))
fluid.io.save_persistables(exe, path, prog)
def load_and_fusebn(exe, prog, path):
"""
Fuse params of batch norm to scale and bias.
Args:
exe (fluid.Executor): The fluid.Executor object.
prog (fluid.Program): save weight from which Program object.
path (string): the path to save model.
"""
logger.info('Load model and fuse batch norm from {}...'.format(path))
if is_url(path):
path = get_weights_path(path)
def _if_exist(var):
b = os.path.exists(os.path.join(path, var.name))
if b:
logger.debug('load weight {}'.format(var.name))
return b
all_vars = list(filter(_if_exist, prog.list_vars()))
# Since the program uses affine-channel, there is no running mean and var
# in the program, here append running mean and var.
# NOTE, the params of batch norm should be like:
# x_scale
# x_offset
# x_mean
# x_variance
# x is any prefix
mean_variances = set()
bn_vars = []
bn_in_path = True
inner_prog = fluid.Program()
inner_start_prog = fluid.Program()
with fluid.program_guard(inner_prog, inner_start_prog):
for block in prog.blocks:
ops = list(block.ops)
if not bn_in_path:
break
for op in ops:
if op.type == 'affine_channel':
# remove 'scale' as prefix
scale_name = op.input('Scale')[0] # _scale
bias_name = op.input('Bias')[0] # _offset
prefix = scale_name[:-5]
mean_name = prefix + 'mean'
variance_name = prefix + 'variance'
if not os.path.exists(os.path.join(path, mean_name)):
bn_in_path = False
break
if not os.path.exists(os.path.join(path, variance_name)):
bn_in_path = False
break
bias = block.var(bias_name)
mean_vb = fluid.layers.create_parameter(
bias.shape, bias.dtype, mean_name)
variance_vb = fluid.layers.create_parameter(
bias.shape, bias.dtype, variance_name)
mean_variances.add(mean_vb)
mean_variances.add(variance_vb)
bn_vars.append(
[scale_name, bias_name, mean_name, variance_name])
if not bn_in_path:
raise ValueError("The model in path {} has not params of batch norm.")
# load running mean and running variance on cpu place into global scope.
place = fluid.CPUPlace()
exe_cpu = fluid.Executor(place)
fluid.io.load_vars(exe_cpu, path, vars=[v for v in mean_variances])
# load params on real place into global scope.
fluid.io.load_vars(exe, path, prog, vars=all_vars)
eps = 1e-5
for names in bn_vars:
scale_name, bias_name, mean_name, var_name = names
scale = fluid.global_scope().find_var(scale_name).get_tensor()
bias = fluid.global_scope().find_var(bias_name).get_tensor()
mean = fluid.global_scope().find_var(mean_name).get_tensor()
var = fluid.global_scope().find_var(var_name).get_tensor()
scale_arr = np.array(scale)
bias_arr = np.array(bias)
mean_arr = np.array(mean)
var_arr = np.array(var)
bn_std = np.sqrt(np.add(var_arr, eps))
new_scale = np.float32(np.divide(scale_arr, bn_std))
new_bias = bias_arr - mean_arr * new_scale
# fuse to scale and bias in affine_channel
scale.set(new_scale, exe.place)
bias.set(new_bias, exe.place)
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from argparse import ArgumentParser, RawDescriptionHelpFormatter
import yaml
__all__ = ['ColorTTY', 'ArgsParser']
class ColorTTY(object):
def __init__(self):
super(ColorTTY, self).__init__()
self.colors = ['red', 'green', 'yellow', 'blue', 'magenta', 'cyan']
def __getattr__(self, attr):
if attr in self.colors:
color = self.colors.index(attr) + 31
def color_message(message):
return "[{}m{}".format(color, message)
setattr(self, attr, color_message)
return color_message
def bold(self, message):
return self.with_code('01', message)
def with_code(self, code, message):
return "[{}m{}".format(code, message)
class ArgsParser(ArgumentParser):
def __init__(self):
super(ArgsParser, self).__init__(
formatter_class=RawDescriptionHelpFormatter)
self.add_argument("-c", "--config", help="configuration file to use")
self.add_argument("-o", "--opt", nargs='*',
help="set configuration options")
def parse_args(self, argv=None):
args = super(ArgsParser, self).parse_args(argv)
assert args.config is not None, \
"Please specify --config=configure_file_path."
args.opt = self._parse_opt(args.opt)
return args
def _parse_opt(self, opts):
config = {}
if not opts:
return config
for s in opts:
s = s.strip()
k, v = s.split('=')
if '.' not in k:
config[k] = v
else:
keys = k.split('.')
config[keys[0]] = {}
cur = config[keys[0]]
for idx, key in enumerate(keys[1:]):
if idx == len(keys) - 2:
cur[key] = yaml.load(v, Loader=yaml.Loader)
else:
cur[key] = {}
cur = cur[key]
return config
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import os
import sys
import json
import cv2
import numpy as np
from pycocotools.coco import COCO
from pycocotools.cocoeval import COCOeval
import pycocotools.mask as mask_util
import logging
logger = logging.getLogger(__name__)
__all__ = [
'bbox_eval', 'mask_eval', 'bbox2out', 'mask2out', 'get_category_info'
]
def clip_bbox(bbox):
xmin = max(min(bbox[0], 1.), 0.)
ymin = max(min(bbox[1], 1.), 0.)
xmax = max(min(bbox[2], 1.), 0.)
ymax = max(min(bbox[3], 1.), 0.)
return xmin, ymin, xmax, ymax
def bbox_eval(results, anno_file, outfile, with_background=True):
assert 'bbox' in results[0]
assert outfile.endswith('.json')
coco_gt = COCO(anno_file)
cat_ids = coco_gt.getCatIds()
# when with_background = True, mapping category to classid, like:
# background:0, first_class:1, second_class:2, ...
clsid2catid = dict(
{i + int(with_background): catid
for i, catid in enumerate(cat_ids)})
xywh_results = bbox2out(results, clsid2catid)
with open(outfile, 'w') as f:
json.dump(xywh_results, f)
logger.info("Start evaluate...")
coco_dt = coco_gt.loadRes(outfile)
coco_ev = COCOeval(coco_gt, coco_dt, 'bbox')
coco_ev.evaluate()
coco_ev.accumulate()
coco_ev.summarize()
# flush coco evaluation result
sys.stdout.flush()
def mask_eval(results, anno_file, outfile, resolution, thresh_binarize=0.5):
assert 'mask' in results[0]
assert outfile.endswith('.json')
coco_gt = COCO(anno_file)
clsid2catid = {i + 1: v for i, v in enumerate(coco_gt.getCatIds())}
segm_results = mask2out(results, clsid2catid, resolution, thresh_binarize)
with open(outfile, 'w') as f:
json.dump(segm_results, f)
logger.info("Start evaluate...")
coco_dt = coco_gt.loadRes(outfile)
coco_ev = COCOeval(coco_gt, coco_dt, 'segm')
coco_ev.evaluate()
coco_ev.accumulate()
coco_ev.summarize()
def bbox2out(results, clsid2catid, is_bbox_normalized=False):
xywh_res = []
for t in results:
bboxes = t['bbox'][0]
lengths = t['bbox'][1][0]
im_ids = np.array(t['im_id'][0])
if bboxes.shape == (1, 1) or bboxes is None:
continue
k = 0
for i in range(len(lengths)):
num = lengths[i]
im_id = int(im_ids[i][0])
for j in range(num):
dt = bboxes[k]
clsid, score, xmin, ymin, xmax, ymax = dt.tolist()
catid = clsid2catid[clsid]
if is_bbox_normalized:
xmin, ymin, xmax, ymax = \
clip_bbox([xmin, ymin, xmax, ymax])
w = xmax - xmin
h = ymax - ymin
else:
w = xmax - xmin + 1
h = ymax - ymin + 1
bbox = [xmin, ymin, w, h]
coco_res = {
'image_id': im_id,
'category_id': catid,
'bbox': bbox,
'score': score
}
xywh_res.append(coco_res)
k += 1
return xywh_res
def mask2out(results, clsid2catid, resolution, thresh_binarize=0.5):
scale = (resolution + 2.0) / resolution
segm_res = []
# for each batch
for t in results:
bboxes = t['bbox'][0]
lengths = t['bbox'][1][0]
im_ids = np.array(t['im_id'][0])
if bboxes.shape == (1, 1) or bboxes is None:
continue
if len(bboxes.tolist()) == 0:
continue
masks = t['mask'][0]
im_shape = t['im_shape'][0][0]
s = 0
# for each sample
for i in range(len(lengths)):
num = lengths[i]
im_id = int(im_ids[i][0])
bbox = bboxes[s:s + num][:, 2:]
clsid_scores = bboxes[s:s + num][:, 0:2]
mask = masks[s:s + num]
s += num
im_h = int(im_shape[0])
im_w = int(im_shape[1])
expand_bbox = expand_boxes(bbox, scale)
expand_bbox = expand_bbox.astype(np.int32)
padded_mask = np.zeros(
(resolution + 2, resolution + 2), dtype=np.float32)
for j in range(num):
xmin, ymin, xmax, ymax = expand_bbox[j].tolist()
clsid, score = clsid_scores[j].tolist()
clsid = int(clsid)
padded_mask[1:-1, 1:-1] = mask[j, clsid, :, :]
catid = clsid2catid[clsid]
w = xmax - xmin + 1
h = ymax - ymin + 1
w = np.maximum(w, 1)
h = np.maximum(h, 1)
resized_mask = cv2.resize(padded_mask, (w, h))
resized_mask = np.array(
resized_mask > thresh_binarize, dtype=np.uint8)
im_mask = np.zeros((im_h, im_w), dtype=np.uint8)
x0 = min(max(xmin, 0), im_w)
x1 = min(max(xmax + 1, 0), im_w)
y0 = min(max(ymin, 0), im_h)
y1 = min(max(ymax + 1, 0), im_h)
im_mask[y0:y1, x0:x1] = resized_mask[(y0 - ymin):(y1 - ymin), (
x0 - xmin):(x1 - xmin)]
segm = mask_util.encode(
np.array(
im_mask[:, :, np.newaxis], order='F'))[0]
catid = clsid2catid[clsid]
segm['counts'] = segm['counts'].decode('utf8')
coco_res = {
'image_id': im_id,
'category_id': catid,
'segmentation': segm,
'score': score
}
segm_res.append(coco_res)
return segm_res
def expand_boxes(boxes, scale):
"""
Expand an array of boxes by a given scale.
"""
w_half = (boxes[:, 2] - boxes[:, 0]) * .5
h_half = (boxes[:, 3] - boxes[:, 1]) * .5
x_c = (boxes[:, 2] + boxes[:, 0]) * .5
y_c = (boxes[:, 3] + boxes[:, 1]) * .5
w_half *= scale
h_half *= scale
boxes_exp = np.zeros(boxes.shape)
boxes_exp[:, 0] = x_c - w_half
boxes_exp[:, 2] = x_c + w_half
boxes_exp[:, 1] = y_c - h_half
boxes_exp[:, 3] = y_c + h_half
return boxes_exp
def get_category_info(anno_file=None,
with_background=True,
use_default_label=False):
if use_default_label or anno_file is None \
or not os.path.exists(anno_file):
logger.info("Not found annotation file {}, load "
"coco17 categories.".format(anno_file))
return coco17_category_info(with_background)
else:
logger.info("Load categories from {}".format(anno_file))
return get_category_info_from_anno(anno_file, with_background)
def get_category_info_from_anno(anno_file, with_background=True):
"""
Get class id to category id map and category id
to category name map from annotation file.
Args:
anno_file (str): annotation file path
with_background (bool, default True):
whether load background as class 0.
"""
coco = COCO(anno_file)
cats = coco.loadCats(coco.getCatIds())
clsid2catid = {
i + int(with_background): cat['id']
for i, cat in enumerate(cats)
}
catid2name = {cat['id']: cat['name'] for cat in cats}
return clsid2catid, catid2name
def coco17_category_info(with_background=True):
"""
Get class id to category id map and category id
to category name map of COCO2017 dataset
Args:
with_background (bool, default True):
whether load background as class 0.
"""
clsid2catid = {
1: 1,
2: 2,
3: 3,
4: 4,
5: 5,
6: 6,
7: 7,
8: 8,
9: 9,
10: 10,
11: 11,
12: 13,
13: 14,
14: 15,
15: 16,
16: 17,
17: 18,
18: 19,
19: 20,
20: 21,
21: 22,
22: 23,
23: 24,
24: 25,
25: 27,
26: 28,
27: 31,
28: 32,
29: 33,
30: 34,
31: 35,
32: 36,
33: 37,
34: 38,
35: 39,
36: 40,
37: 41,
38: 42,
39: 43,
40: 44,
41: 46,
42: 47,
43: 48,
44: 49,
45: 50,
46: 51,
47: 52,
48: 53,
49: 54,
50: 55,
51: 56,
52: 57,
53: 58,
54: 59,
55: 60,
56: 61,
57: 62,
58: 63,
59: 64,
60: 65,
61: 67,
62: 70,
63: 72,
64: 73,
65: 74,
66: 75,
67: 76,
68: 77,
69: 78,
70: 79,
71: 80,
72: 81,
73: 82,
74: 84,
75: 85,
76: 86,
77: 87,
78: 88,
79: 89,
80: 90
}
catid2name = {
0: 'background',
1: 'person',
2: 'bicycle',
3: 'car',
4: 'motorcycle',
5: 'airplane',
6: 'bus',
7: 'train',
8: 'truck',
9: 'boat',
10: 'traffic light',
11: 'fire hydrant',
13: 'stop sign',
14: 'parking meter',
15: 'bench',
16: 'bird',
17: 'cat',
18: 'dog',
19: 'horse',
20: 'sheep',
21: 'cow',
22: 'elephant',
23: 'bear',
24: 'zebra',
25: 'giraffe',
27: 'backpack',
28: 'umbrella',
31: 'handbag',
32: 'tie',
33: 'suitcase',
34: 'frisbee',
35: 'skis',
36: 'snowboard',
37: 'sports ball',
38: 'kite',
39: 'baseball bat',
40: 'baseball glove',
41: 'skateboard',
42: 'surfboard',
43: 'tennis racket',
44: 'bottle',
46: 'wine glass',
47: 'cup',
48: 'fork',
49: 'knife',
50: 'spoon',
51: 'bowl',
52: 'banana',
53: 'apple',
54: 'sandwich',
55: 'orange',
56: 'broccoli',
57: 'carrot',
58: 'hot dog',
59: 'pizza',
60: 'donut',
61: 'cake',
62: 'chair',
63: 'couch',
64: 'potted plant',
65: 'bed',
67: 'dining table',
70: 'toilet',
72: 'tv',
73: 'laptop',
74: 'mouse',
75: 'remote',
76: 'keyboard',
77: 'cell phone',
78: 'microwave',
79: 'oven',
80: 'toaster',
81: 'sink',
82: 'refrigerator',
84: 'book',
85: 'clock',
86: 'vase',
87: 'scissors',
88: 'teddy bear',
89: 'hair drier',
90: 'toothbrush'
}
if not with_background:
clsid2catid = {k - 1: v for k, v in clsid2catid.items()}
return clsid2catid, catid2name
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import numpy as np
def colormap(rgb=False):
"""
Get colormap
"""
color_list = np.array([
0.000, 0.447, 0.741, 0.850, 0.325, 0.098, 0.929, 0.694, 0.125, 0.494,
0.184, 0.556, 0.466, 0.674, 0.188, 0.301, 0.745, 0.933, 0.635, 0.078,
0.184, 0.300, 0.300, 0.300, 0.600, 0.600, 0.600, 1.000, 0.000, 0.000,
1.000, 0.500, 0.000, 0.749, 0.749, 0.000, 0.000, 1.000, 0.000, 0.000,
0.000, 1.000, 0.667, 0.000, 1.000, 0.333, 0.333, 0.000, 0.333, 0.667,
0.000, 0.333, 1.000, 0.000, 0.667, 0.333, 0.000, 0.667, 0.667, 0.000,
0.667, 1.000, 0.000, 1.000, 0.333, 0.000, 1.000, 0.667, 0.000, 1.000,
1.000, 0.000, 0.000, 0.333, 0.500, 0.000, 0.667, 0.500, 0.000, 1.000,
0.500, 0.333, 0.000, 0.500, 0.333, 0.333, 0.500, 0.333, 0.667, 0.500,
0.333, 1.000, 0.500, 0.667, 0.000, 0.500, 0.667, 0.333, 0.500, 0.667,
0.667, 0.500, 0.667, 1.000, 0.500, 1.000, 0.000, 0.500, 1.000, 0.333,
0.500, 1.000, 0.667, 0.500, 1.000, 1.000, 0.500, 0.000, 0.333, 1.000,
0.000, 0.667, 1.000, 0.000, 1.000, 1.000, 0.333, 0.000, 1.000, 0.333,
0.333, 1.000, 0.333, 0.667, 1.000, 0.333, 1.000, 1.000, 0.667, 0.000,
1.000, 0.667, 0.333, 1.000, 0.667, 0.667, 1.000, 0.667, 1.000, 1.000,
1.000, 0.000, 1.000, 1.000, 0.333, 1.000, 1.000, 0.667, 1.000, 0.167,
0.000, 0.000, 0.333, 0.000, 0.000, 0.500, 0.000, 0.000, 0.667, 0.000,
0.000, 0.833, 0.000, 0.000, 1.000, 0.000, 0.000, 0.000, 0.167, 0.000,
0.000, 0.333, 0.000, 0.000, 0.500, 0.000, 0.000, 0.667, 0.000, 0.000,
0.833, 0.000, 0.000, 1.000, 0.000, 0.000, 0.000, 0.167, 0.000, 0.000,
0.333, 0.000, 0.000, 0.500, 0.000, 0.000, 0.667, 0.000, 0.000, 0.833,
0.000, 0.000, 1.000, 0.000, 0.000, 0.000, 0.143, 0.143, 0.143, 0.286,
0.286, 0.286, 0.429, 0.429, 0.429, 0.571, 0.571, 0.571, 0.714, 0.714,
0.714, 0.857, 0.857, 0.857, 1.000, 1.000, 1.000
]).astype(np.float32)
color_list = color_list.reshape((-1, 3)) * 255
if not rgb:
color_list = color_list[:, ::-1]
return color_list
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import os.path as osp
import shutil
import requests
import tqdm
import hashlib
import tarfile
import zipfile
from .voc_utils import merge_and_create_list
import logging
logger = logging.getLogger(__name__)
__all__ = ['get_weights_path', 'get_dataset_path']
WEIGHTS_HOME = osp.expanduser("~/.cache/paddle/weights")
DATASET_HOME = osp.expanduser("~/.cache/paddle/dataset")
# dict of {dataset_name: (downalod_info, sub_dirs)}
# download info: (url, md5sum)
DATASETS = {
'coco': ([
('http://images.cocodataset.org/zips/train2017.zip',
'cced6f7f71b7629ddf16f17bbcfab6b2', ),
('http://images.cocodataset.org/zips/val2017.zip',
'442b8da7639aecaf257c1dceb8ba8c80', ),
('http://images.cocodataset.org/annotations/annotations_trainval2017.zip',
'f4bbac642086de4f52a3fdda2de5fa2c', ),
], ["annotations", "train2017", "val2017"]),
'voc': ([
('http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar',
'6cd6e144f989b92b3379bac3b3de84fd', ),
('http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar',
'c52e279531787c972589f7e41ab4ae64', ),
('http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar',
'b6e924de25625d8de591ea690078ad9f', ),
], ["VOCdevkit/VOC_all"]),
}
DOWNLOAD_RETRY_LIMIT = 3
def get_weights_path(url):
"""Get weights path from WEIGHT_HOME, if not exists,
download it from url.
"""
return get_path(url, WEIGHTS_HOME)
def get_dataset_path(path):
"""
If path exists, return path.
Otherwise, get dataset path from DATASET_HOME, if not exists,
download it.
"""
if _dataset_exists(path):
logger.debug("Dataset path: {}".format(osp.realpath(path)))
return path
logger.info("Dataset {} not exitst, try searching {} or "
"downloading dataset...".format(
osp.realpath(path), DATASET_HOME))
for name, dataset in DATASETS.items():
if path.lower().find(name) >= 0:
logger.info("Parse dataset_dir {} as dataset "
"{}".format(path, name))
data_dir = osp.join(DATASET_HOME, name)
# For voc, only check merged dir VOC_all
if name == 'voc':
check_dir = osp.join(data_dir, dataset[1][0])
if osp.exists(check_dir):
logger.info("Found {}".format(check_dir))
return data_dir
for url, md5sum in dataset[0]:
get_path(url, data_dir, md5sum)
# voc should merge dir and create list after download
if name == 'voc':
logger.info("Download voc dataset successed, merge "
"VOC2007 and VOC2012 to VOC_all...")
output_dir = osp.join(data_dir, dataset[1][0])
devkit_dir = "/".join(output_dir.split('/')[:-1])
years = ['2007', '2012']
# merge dir in output_tmp_dir at first, move to
# output_dir after merge sucessed.
output_tmp_dir = osp.join(data_dir, 'tmp')
if osp.isdir(output_tmp_dir):
shutil.rmtree(output_tmp_dir)
# NOTE(dengkaipeng): since using auto download VOC
# dataset, VOC default label list should be used,
# do not generate label_list.txt here. For default
# label, see ../data/source/voc_loader.py
merge_and_create_list(devkit_dir, years,
output_tmp_dir)
shutil.move(output_tmp_dir, output_dir)
# remove source directory VOC2007 and VOC2012
shutil.rmtree(osp.join(devkit_dir, "VOC2007"))
shutil.rmtree(osp.join(devkit_dir, "VOC2012"))
return data_dir
# not match any dataset in DATASETS
raise ValueError("{} not exists and unknow dataset type".format(path))
def get_path(url, root_dir, md5sum=None):
""" Download from given url to root_dir.
if file or directory specified by url is exists under
root_dir, return the path directly, otherwise download
from url and decompress it, return the path.
url (str): download url
root_dir (str): root dir for downloading, it should be
WEIGHTS_HOME or DATASET_HOME
md5sum (str): md5 sum of download package
"""
# parse path after download to decompress under root_dir
fname = url.split('/')[-1]
zip_formats = ['.zip', '.tar', '.gz']
fpath = fname
for zip_format in zip_formats:
fpath = fpath.replace(zip_format, '')
fullpath = osp.join(root_dir, fpath)
# For same zip file, decompressed directory name different
# from zip file name, rename by following map
decompress_name_map = {
"VOC": "VOCdevkit/VOC_all",
"annotations_trainval": "annotations"
}
for k, v in decompress_name_map.items():
if fullpath.find(k) >= 0:
fullpath = '/'.join(fullpath.split('/')[:-1] + [v])
if osp.exists(fullpath):
logger.info("Found {}".format(fullpath))
else:
fullname = _download(url, root_dir, md5sum)
_decompress(fullname)
return fullpath
def _dataset_exists(path):
"""
Check if user define dataset exists
"""
if not osp.exists(path):
return False
for name, dataset in DATASETS.items():
if path.lower().find(name) >= 0:
for sub_dir in dataset[1]:
if not osp.exists(osp.join(path, sub_dir)):
return False
return True
return True
def _download(url, path, md5sum=None):
"""
Download from url, save to path.
url (str): download url
path (str): download to given path
"""
if not osp.exists(path):
os.makedirs(path)
fname = url.split('/')[-1]
fullname = osp.join(path, fname)
retry_cnt = 0
while not (osp.exists(fullname) and _md5check(fullname, md5sum)):
if retry_cnt < DOWNLOAD_RETRY_LIMIT:
retry_cnt += 1
else:
raise RuntimeError("Download from {} failed. "
"Retry limit reached".format(url))
logger.info("Downloading {} from {}".format(fname, url))
req = requests.get(url, stream=True)
if req.status_code != 200:
raise RuntimeError("Downloading from {} failed with code "
"{}!".format(url, req.status_code))
total_size = req.headers.get('content-length')
with open(fullname, 'wb') as f:
if total_size:
for chunk in tqdm.tqdm(
req.iter_content(chunk_size=1024),
total=(int(total_size) + 1023) // 1024,
unit='KB'):
f.write(chunk)
else:
for chunk in req.iter_content(chunk_size=1024):
if chunk:
f.write(chunk)
return fullname
def _md5check(fullname, md5sum=None):
if md5sum is None:
return True
logger.info("File {} md5 checking...".format(fullname))
md5 = hashlib.md5()
with open(fullname, 'rb') as f:
for chunk in iter(lambda: f.read(4096), b""):
md5.update(chunk)
calc_md5sum = md5.hexdigest()
if calc_md5sum != md5sum:
logger.info("File {} md5 check failed, {}(calc) != "
"{}(base)".format(fullname, calc_md5sum, md5sum))
return False
return True
def _decompress(fname):
"""
Decompress for zip and tar file
"""
logger.info("Decompressing {}...".format(fname))
# For protecting decompressing interupted,
# decompress to fpath_tmp directory firstly, if decompress
# successed, move decompress files to fpath and delete
# fpath_tmp and remove download compress file.
fpath = '/'.join(fname.split('/')[:-1])
fpath_tmp = osp.join(fpath, 'tmp')
if osp.isdir(fpath_tmp):
shutil.rmtree(fpath_tmp)
os.makedirs(fpath_tmp)
if fname.find('tar') >= 0:
with tarfile.open(fname) as tf:
tf.extractall(path=fpath_tmp)
elif fname.find('zip') >= 0:
with zipfile.ZipFile(fname) as zf:
zf.extractall(path=fpath_tmp)
else:
raise TypeError("Unsupport compress file type {}".format(fname))
for f in os.listdir(fpath_tmp):
src_dir = osp.join(fpath_tmp, f)
dst_dir = osp.join(fpath, f)
_move_and_merge_tree(src_dir, dst_dir)
shutil.rmtree(fpath_tmp)
os.remove(fname)
def _move_and_merge_tree(src, dst):
"""
Move src directory to dst, if dst is already exists,
merge src to dst
"""
if not osp.exists(dst):
shutil.move(src, dst)
else:
for fp in os.listdir(src):
src_fp = osp.join(src, fp)
dst_fp = osp.join(dst, fp)
if osp.isdir(src_fp):
if osp.isdir(dst_fp):
_move_and_merge_tree(src_fp, dst_fp)
else:
shutil.move(src_fp, dst_fp)
elif osp.isfile(src_fp) and \
not osp.isfile(dst_fp):
shutil.move(src_fp, dst_fp)
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import logging
import numpy as np
import paddle.fluid as fluid
__all__ = ['parse_fetches', 'eval_run', 'eval_results']
logger = logging.getLogger(__name__)
def parse_fetches(fetches, prog=None, extra_keys=None):
"""
Parse fetch variable infos from model fetches,
values for fetch_list and keys for stat
"""
keys, values = [], []
cls = []
for k, v in fetches.items():
if hasattr(v, 'name'):
keys.append(k)
v.persistable = True
values.append(v.name)
else:
cls.append(v)
if prog is not None and extra_keys is not None:
for k in extra_keys:
try:
v = fluid.framework._get_var(k, prog)
v.persistable = True
keys.append(k)
values.append(v.name)
except Exception:
pass
return keys, values, cls
def eval_run(exe, compile_program, pyreader, keys, values, cls):
"""
Run evaluation program, return program outputs.
"""
iter_id = 0
results = []
if len(cls) != 0:
values = []
for i in range(len(cls)):
_, accum_map = cls[i].get_map_var()
cls[i].reset(exe)
values.append(accum_map)
try:
pyreader.start()
while True:
outs = exe.run(compile_program,
fetch_list=values,
return_numpy=False)
res = {
k: (np.array(v), v.recursive_sequence_lengths())
for k, v in zip(keys, outs)
}
results.append(res)
if iter_id % 100 == 0:
logger.info('Test iter {}'.format(iter_id))
iter_id += 1
except (StopIteration, fluid.core.EOFException):
pyreader.reset()
logger.info('Test finish iter {}'.format(iter_id))
return results
def eval_results(results, feed, metric, resolution=None, output_file=None):
"""Evaluation for evaluation program results"""
if metric == 'COCO':
from ppdet.utils.coco_eval import bbox_eval, mask_eval
anno_file = getattr(feed.dataset, 'annotation', None)
with_background = getattr(feed, 'with_background', True)
output = 'bbox.json'
if output_file:
output = '{}_bbox.json'.format(output_file)
bbox_eval(results, anno_file, output, with_background)
if 'mask' in results[0]:
output = 'mask.json'
if output_file:
output = '{}_mask.json'.format(output_file)
mask_eval(results, anno_file, output, resolution)
else:
res = np.mean(results[-1]['accum_map'][0])
logger.info('Test mAP: {}'.format(res))
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import collections
import numpy as np
import datetime
__all__ = ['TrainingStats', 'Time']
class SmoothedValue(object):
"""Track a series of values and provide access to smoothed values over a
window or the global series average.
"""
def __init__(self, window_size):
self.deque = collections.deque(maxlen=window_size)
def add_value(self, value):
self.deque.append(value)
def get_median_value(self):
return np.median(self.deque)
def Time():
return datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S.%f')
class TrainingStats(object):
def __init__(self, window_size, stats_keys):
self.smoothed_losses_and_metrics = {
key: SmoothedValue(window_size)
for key in stats_keys
}
def update(self, stats):
for k, v in self.smoothed_losses_and_metrics.items():
v.add_value(stats[k])
def get(self, extras=None):
stats = collections.OrderedDict()
if extras:
for k, v in extras.items():
stats[k] = v
for k, v in self.smoothed_losses_and_metrics.items():
stats[k] = round(v.get_median_value(), 6)
return stats
def log(self, extras=None):
d = self.get(extras)
strs = ', '.join(
str(dict({
x.encode('utf-8'): y
})).strip('{}') for x, y in d.items())
return strs
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import numpy as np
import pycocotools.mask as mask_util
from PIL import Image, ImageDraw
from .colormap import colormap
__all__ = ['visualize_results']
def visualize_results(image,
im_id,
catid2name,
threshold=0.5,
bbox_results=None,
mask_results=None,
is_bbox_normalized=False):
"""
Visualize bbox and mask results
"""
if mask_results:
image = draw_mask(image, im_id, mask_results, threshold)
if bbox_results:
image = draw_bbox(image, im_id, catid2name, bbox_results,
threshold, is_bbox_normalized)
return image
def draw_mask(image, im_id, segms, threshold, alpha=0.7):
"""
Draw mask on image
"""
mask_color_id = 0
w_ratio = .4
color_list = colormap(rgb=True)
img_array = np.array(image).astype('float32')
for dt in np.array(segms):
if im_id != dt['image_id']:
continue
segm, score = dt['segmentation'], dt['score']
if score < threshold:
continue
mask = mask_util.decode(segm) * 255
color_mask = color_list[mask_color_id % len(color_list), 0:3]
mask_color_id += 1
for c in range(3):
color_mask[c] = color_mask[c] * (1 - w_ratio) + w_ratio * 255
idx = np.nonzero(mask)
img_array[idx[0], idx[1], :] *= 1.0 - alpha
img_array[idx[0], idx[1], :] += alpha * color_mask
return Image.fromarray(img_array.astype('uint8'))
def draw_bbox(image, im_id, catid2name, bboxes, threshold,
is_bbox_normalized=False):
"""
Draw bbox on image
"""
draw = ImageDraw.Draw(image)
catid2color = {}
color_list = colormap(rgb=True)[:40]
for dt in np.array(bboxes):
if im_id != dt['image_id']:
continue
catid, bbox, score = dt['category_id'], dt['bbox'], dt['score']
if score < threshold:
continue
xmin, ymin, w, h = bbox
if is_bbox_normalized:
im_width, im_height = image.size
xmin *= im_width
ymin *= im_height
w *= im_width
h *= im_height
xmax = xmin + w
ymax = ymin + h
if catid not in catid2color:
idx = np.random.randint(len(color_list))
catid2color[catid] = color_list[idx]
color = tuple(catid2color[catid])
# draw bbox
draw.line(
[(xmin, ymin), (xmin, ymax), (xmax, ymax), (xmax, ymin),
(xmin, ymin)],
width=2,
fill=color)
# draw label
text = "{} {:.2f}".format(catid2name[catid], score)
tw, th = draw.textsize(text)
draw.rectangle([(xmin + 1, ymin - th),
(xmin + tw + 1, ymin)],
fill=color)
draw.text((xmin + 1, ymin - th), text, fill=(255, 255, 255))
return image
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import os
import sys
import numpy as np
from ..data.source.voc_loader import pascalvoc_label
from .coco_eval import bbox2out
import logging
logger = logging.getLogger(__name__)
__all__ = [
'bbox2out', 'get_category_info'
]
def get_category_info(anno_file=None,
with_background=True,
use_default_label=False):
if use_default_label or anno_file is None \
or not os.path.exists(anno_file):
logger.info("Not found annotation file {}, load "
"voc2012 categories.".format(anno_file))
return vocall_category_info(with_background)
else:
logger.info("Load categories from {}".format(anno_file))
return get_category_info_from_anno(anno_file, with_background)
def get_category_info_from_anno(anno_file, with_background=True):
"""
Get class id to category id map and category id
to category name map from annotation file.
Args:
anno_file (str): annotation file path
with_background (bool, default True):
whether load background as class 0.
"""
cats = []
with open(anno_file) as f:
for line in f.readlines():
cats.append(line.strip())
if cats[0] != 'background' and with_background:
cats.insert(0, 'background')
if cats[0] == 'background' and not with_background:
cats = cats[1:]
clsid2catid = {i: i for i in range(len(cats))}
catid2name = {i: name for i, name in enumerate(cats)}
return clsid2catid, catid2name
def vocall_category_info(with_background=True):
"""
Get class id to category id map and category id
to category name map of mixup voc dataset
Args:
with_background (bool, default True):
whether load background as class 0.
"""
label_map = pascalvoc_label(with_background)
label_map = sorted(label_map.items(), key=lambda x: x[1])
cats = [l[0] for l in label_map]
if with_background:
cats.insert(0, 'background')
clsid2catid = {i: i for i in range(len(cats))}
catid2name = {i: name for i, name in enumerate(cats)}
return clsid2catid, catid2name
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import os.path as osp
import re
import random
import shutil
__all__ = ['merge_and_create_list']
def merge_and_create_list(devkit_dir, years, output_dir):
"""
Merge VOC2007 and VOC2012 to output_dir and create following list:
1. train.txt
2. val.txt
3. test.txt
"""
os.makedirs(osp.join(output_dir, 'Annotations/'))
os.makedirs(osp.join(output_dir, 'ImageSets/Main/'))
os.makedirs(osp.join(output_dir, 'JPEGImages/'))
trainval_list = []
test_list = []
for year in years:
trainval, test = _walk_voc_dir(devkit_dir, year, output_dir)
trainval_list.extend(trainval)
test_list.extend(test)
main_dir = osp.join(output_dir, 'ImageSets/Main/')
random.shuffle(trainval_list)
with open(osp.join(main_dir, 'train.txt'), 'w') as ftrainval:
for item in trainval_list:
ftrainval.write(item + '\n')
with open(osp.join(main_dir, 'val.txt'), 'w') as fval:
with open(osp.join(main_dir, 'test.txt'), 'w') as ftest:
ct = 0
for item in test_list:
ct += 1
fval.write(item + '\n')
if ct <= 1000:
ftest.write(item + '\n')
def _get_voc_dir(devkit_dir, year, type):
return osp.join(devkit_dir, 'VOC' + year, type)
def _walk_voc_dir(devkit_dir, year, output_dir):
filelist_dir = _get_voc_dir(devkit_dir, year, 'ImageSets/Main')
annotation_dir = _get_voc_dir(devkit_dir, year, 'Annotations')
img_dir = _get_voc_dir(devkit_dir, year, 'JPEGImages')
trainval_list = []
test_list = []
added = set()
for _, _, files in os.walk(filelist_dir):
for fname in files:
img_ann_list = []
if re.match('[a-z]+_trainval\.txt', fname):
img_ann_list = trainval_list
elif re.match('[a-z]+_test\.txt', fname):
img_ann_list = test_list
else:
continue
fpath = osp.join(filelist_dir, fname)
for line in open(fpath):
name_prefix = line.strip().split()[0]
if name_prefix in added:
continue
added.add(name_prefix)
ann_path = osp.join(annotation_dir, name_prefix + '.xml')
img_path = osp.join(img_dir, name_prefix + '.jpg')
new_ann_path = osp.join(output_dir, 'Annotations/',
name_prefix + '.xml')
new_img_path = osp.join(output_dir, 'JPEGImages/',
name_prefix + '.jpg')
shutil.copy(ann_path, new_ann_path)
shutil.copy(img_path, new_img_path)
img_ann_list.append(name_prefix)
return trainval_list, test_list
tqdm
docstring_parser @ http://github.com/willthefrog/docstring_parser/tarball/master
typeguard ; python_version >= '3.4'
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
import re
import sys
from argparse import ArgumentParser, RawDescriptionHelpFormatter
import yaml
from ppdet.core.workspace import get_registered_modules, load_config
from ppdet.utils.cli import ColorTTY
color_tty = ColorTTY()
MISC_CONFIG = {
"architecture": "<value>",
"max_iters": "<value>",
"train_feed": "<value>",
"eval_feed": "<value>",
"test_feed": "<value>",
"pretrain_weights": "<value>",
"save_dir": "<value>",
"weights": "<value>",
"metric": "<value>",
"log_smooth_window": 20,
"snapshot_iter": 10000,
"use_gpu": True,
}
def dump_value(value):
# XXX this is hackish, but collections.abc is not available in python 2
if hasattr(value, '__dict__') or isinstance(value, (dict, tuple, list)):
value = yaml.dump(value, default_flow_style=True)
value = value.replace('\n', '')
value = value.replace('...', '')
return "'{}'".format(value)
else:
# primitive types
return str(value)
def dump_config(module, minimal=False):
args = module.schema.values()
if minimal:
args = [arg for arg in args if not arg.has_default()]
return yaml.dump(
{
module.name: {
arg.name: arg.default if arg.has_default() else "<value>"
for arg in args
}
},
default_flow_style=False,
default_style='')
def list_modules(**kwargs):
target_category = kwargs['category']
module_schema = get_registered_modules()
module_by_category = {}
for schema in module_schema.values():
category = schema.category
if target_category is not None and schema.category != target_category:
continue
if category not in module_by_category:
module_by_category[category] = [schema]
else:
module_by_category[category].append(schema)
for cat, modules in module_by_category.items():
print("Available modules in the category '{}':".format(cat))
print("")
max_len = max([len(mod.name) for mod in modules])
for mod in modules:
print(color_tty.green(mod.name.ljust(max_len)),
mod.doc.split('\n')[0])
print("")
def help_module(**kwargs):
schema = get_registered_modules()[kwargs['module']]
doc = schema.doc is None and "Not documented" or "{}".format(schema.doc)
func_args = {arg.name: arg.doc for arg in schema.schema.values()}
max_len = max([len(k) for k in func_args.keys()])
opts = "\n".join([
"{} {}".format(color_tty.green(k.ljust(max_len)), v)
for k, v in func_args.items()
])
template = dump_config(schema)
print("{}\n\n{}\n\n{}\n\n{}\n\n{}\n\n{}\n{}\n".format(
color_tty.bold(color_tty.blue("MODULE DESCRIPTION:")),
doc,
color_tty.bold(color_tty.blue("MODULE OPTIONS:")),
opts,
color_tty.bold(color_tty.blue("CONFIGURATION TEMPLATE:")),
template,
color_tty.bold(color_tty.blue("COMMAND LINE OPTIONS:")), ))
for arg in schema.schema.values():
print("--opt {}.{}={}".format(schema.name, arg.name,
dump_value(arg.default)
if arg.has_default() else "<value>"))
def generate_config(**kwargs):
minimal = kwargs['minimal']
modules = kwargs['modules']
module_schema = get_registered_modules()
visited = []
schema = []
def walk(m):
if m in visited:
return
s = module_schema[m]
schema.append(s)
visited.append(m)
for mod in modules:
walk(mod)
# XXX try to be smart about when to add header,
# if any "architecture" module, is included, head will be added as well
if any([getattr(m, 'category', None) == 'architecture' for m in schema]):
# XXX for ordered printing
header = ""
for k, v in MISC_CONFIG.items():
header += yaml.dump(
{
k: v
}, default_flow_style=False, default_style='')
print(header)
for s in schema:
print(dump_config(s, minimal))
# FIXME this is pretty hackish, maybe implement a custom YAML printer?
def analyze_config(**kwargs):
config = load_config(kwargs['file'])
modules = get_registered_modules()
green = '___{}___'.format(color_tty.colors.index('green') + 31)
styled = {}
for key in config.keys():
if not config[key]: # empty schema
continue
if key not in modules and not hasattr(config[key], '__dict__'):
styled[key] = config[key]
continue
elif key in modules:
module = modules[key]
else:
type_name = type(config[key]).__name__
if type_name in modules:
module = modules[type_name].copy()
module.update({
k: v
for k, v in config[key].__dict__.items()
if k in module.schema
})
key += " ({})".format(type_name)
default = module.find_default_keys()
missing = module.find_missing_keys()
mismatch = module.find_mismatch_keys()
extra = module.find_extra_keys()
dep_missing = []
for dep in module.inject:
if isinstance(module[dep], str) and module[dep] != '<value>':
if module[dep] not in modules: # not a valid module
dep_missing.append(dep)
else:
dep_mod = modules[module[dep]]
# empty dict but mandatory
if not dep_mod and dep_mod.mandatory():
dep_missing.append(dep)
override = list(
set(module.keys()) - set(default) - set(extra) - set(dep_missing))
replacement = {}
for name in set(override + default + extra + mismatch + missing):
new_name = name
if name in missing:
value = "<missing>"
else:
value = module[name]
if name in extra:
value = dump_value(value) + " <extraneous>"
elif name in mismatch:
value = dump_value(value) + " <type mismatch>"
elif name in dep_missing:
value = dump_value(value) + " <module config missing>"
elif name in override and value != '<missing>':
mark = green
new_name = mark + name
replacement[new_name] = value
styled[key] = replacement
buffer = yaml.dump(styled, default_flow_style=False, default_style='')
buffer = (re.sub(r"<missing>", r"<missing>", buffer))
buffer = (re.sub(r"<extraneous>", r"<extraneous>", buffer))
buffer = (re.sub(r"<type mismatch>", r"<type mismatch>", buffer))
buffer = (re.sub(r"<module config missing>",
r"<module config missing>", buffer))
buffer = re.sub(r"___(\d+)___(.*?):", r"[\1m\2:", buffer)
print(buffer)
if __name__ == '__main__':
argv = sys.argv[1:]
parser = ArgumentParser(formatter_class=RawDescriptionHelpFormatter)
subparsers = parser.add_subparsers(help='Supported Commands')
list_parser = subparsers.add_parser("list", help="list available modules")
help_parser = subparsers.add_parser(
"help", help="show detail options for module")
generate_parser = subparsers.add_parser(
"generate", help="generate configuration template")
analyze_parser = subparsers.add_parser(
"analyze", help="analyze configuration file")
list_parser.set_defaults(func=list_modules)
help_parser.set_defaults(func=help_module)
generate_parser.set_defaults(func=generate_config)
analyze_parser.set_defaults(func=analyze_config)
list_group = list_parser.add_mutually_exclusive_group()
list_group.add_argument(
"-c",
"--category",
type=str,
default=None,
help="list modules for <category>")
help_parser.add_argument(
"module",
help="module to show info for",
choices=list(get_registered_modules().keys()))
generate_parser.add_argument(
"modules",
nargs='+',
help="include these module in generated configuration template",
choices=list(get_registered_modules().keys()))
generate_group = generate_parser.add_mutually_exclusive_group()
generate_group.add_argument(
"--minimal", action='store_true', help="only include required options")
generate_group.add_argument(
"--full",
action='store_false',
dest='minimal',
help="include all options")
analyze_parser.add_argument("file", help="configuration file to analyze")
if len(sys.argv) < 2:
parser.print_help()
sys.exit(1)
args = parser.parse_args(argv)
if hasattr(args, 'func'):
args.func(**vars(args))
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import multiprocessing
import paddle.fluid as fluid
from ppdet.utils.eval_utils import parse_fetches, eval_run, eval_results
import ppdet.utils.checkpoint as checkpoint
from ppdet.utils.cli import ArgsParser
from ppdet.modeling.model_input import create_feed
from ppdet.data.data_feed import create_reader
from ppdet.core.workspace import load_config, merge_config, create
import logging
FORMAT = '%(asctime)s-%(levelname)s: %(message)s'
logging.basicConfig(level=logging.INFO, format=FORMAT)
logger = logging.getLogger(__name__)
def main():
"""
Main evaluate function
"""
cfg = load_config(FLAGS.config)
if 'architecture' in cfg:
main_arch = cfg.architecture
else:
raise ValueError("'architecture' not specified in config file.")
merge_config(FLAGS.opt)
if cfg.use_gpu:
devices_num = fluid.core.get_cuda_device_count()
else:
devices_num = int(
os.environ.get('CPU_NUM', multiprocessing.cpu_count()))
if 'eval_feed' not in cfg:
eval_feed = create(main_arch + 'EvalFeed')
else:
eval_feed = create(cfg.eval_feed)
# define executor
place = fluid.CUDAPlace(0) if cfg.use_gpu else fluid.CPUPlace()
exe = fluid.Executor(place)
# build program
model = create(main_arch)
startup_prog = fluid.Program()
eval_prog = fluid.Program()
with fluid.program_guard(eval_prog, startup_prog):
with fluid.unique_name.guard():
pyreader, feed_vars = create_feed(eval_feed)
fetches = model.eval(feed_vars)
eval_prog = eval_prog.clone(True)
reader = create_reader(eval_feed)
pyreader.decorate_sample_list_generator(reader, place)
# compile program for multi-devices
if devices_num <= 1:
compile_program = fluid.compiler.CompiledProgram(eval_prog)
else:
build_strategy = fluid.BuildStrategy()
build_strategy.memory_optimize = False
build_strategy.enable_inplace = False
compile_program = fluid.compiler.CompiledProgram(
eval_prog).with_data_parallel(build_strategy=build_strategy)
# load model
exe.run(startup_prog)
if 'weights' in cfg:
checkpoint.load_pretrain(exe, eval_prog, cfg.weights)
extra_keys = []
if 'metric' in cfg and cfg.metric == 'COCO':
extra_keys = ['im_info', 'im_id', 'im_shape']
keys, values, cls = parse_fetches(fetches, eval_prog, extra_keys)
results = eval_run(exe, compile_program, pyreader, keys, values, cls)
# evaluation
resolution = None
if 'mask' in results[0]:
resolution = model.mask_head.resolution
eval_results(results, eval_feed, cfg.metric, resolution, FLAGS.output_file)
if __name__ == '__main__':
parser = ArgsParser()
parser.add_argument(
"-f",
"--output_file",
default=None,
type=str,
help="Evaluation file name, default to bbox.json and mask.json.")
FLAGS = parser.parse_args()
main()
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import glob
import numpy as np
from PIL import Image
from paddle import fluid
from ppdet.core.workspace import load_config, merge_config, create
from ppdet.modeling.model_input import create_feed
from ppdet.data.data_feed import create_reader
from ppdet.utils.eval_utils import parse_fetches
from ppdet.utils.cli import ArgsParser
from ppdet.utils.visualizer import visualize_results
import ppdet.utils.checkpoint as checkpoint
import logging
FORMAT = '%(asctime)s-%(levelname)s: %(message)s'
logging.basicConfig(level=logging.INFO, format=FORMAT)
logger = logging.getLogger(__name__)
def get_save_image_name(output_dir, image_path):
"""
Get save image name from source image path.
"""
if not os.path.exists(output_dir):
os.makedirs(output_dir)
image_name = image_path.split('/')[-1]
name, ext = os.path.splitext(image_name)
return os.path.join(output_dir, "{}".format(name)) + ext
def get_test_images(infer_dir, infer_img):
"""
Get image path list in TEST mode
"""
assert infer_img is not None or infer_dir is not None, \
"--infer_img or --infer_dir should be set"
assert infer_img is None or os.path.isfile(infer_img), \
"{} is not a file".format(infer_img)
assert infer_dir is None or os.path.isdir(infer_dir), \
"{} is not a directory".format(infer_dir)
images = []
# infer_img has a higher priority
if infer_img and os.path.isfile(infer_img):
images.append(infer_img)
return images
infer_dir = os.path.abspath(infer_dir)
assert os.path.isdir(infer_dir), \
"infer_dir {} is not a directory".format(infer_dir)
exts = ['jpg', 'jpeg', 'png', 'bmp']
exts += [ext.upper() for ext in exts]
for ext in exts:
images.extend(glob.glob('{}/*.{}'.format(infer_dir, ext)))
assert len(images) > 0, "no image found in {}".format(infer_dir)
logger.info("Found {} inference images in total.".format(len(images)))
return images
def main():
cfg = load_config(FLAGS.config)
if 'architecture' in cfg:
main_arch = cfg.architecture
else:
raise ValueError("'architecture' not specified in config file.")
merge_config(FLAGS.opt)
if 'test_feed' not in cfg:
test_feed = create(main_arch + 'TestFeed')
else:
test_feed = create(cfg.test_feed)
test_images = get_test_images(FLAGS.infer_dir, FLAGS.infer_img)
test_feed.dataset.add_images(test_images)
place = fluid.CUDAPlace(0) if cfg.use_gpu else fluid.CPUPlace()
exe = fluid.Executor(place)
model = create(main_arch)
startup_prog = fluid.Program()
infer_prog = fluid.Program()
with fluid.program_guard(infer_prog, startup_prog):
with fluid.unique_name.guard():
_, feed_vars = create_feed(test_feed, use_pyreader=False)
test_fetches = model.test(feed_vars)
infer_prog = infer_prog.clone(True)
reader = create_reader(test_feed)
feeder = fluid.DataFeeder(place=place, feed_list=feed_vars.values())
exe.run(startup_prog)
if cfg.weights:
checkpoint.load_checkpoint(exe, infer_prog, cfg.weights)
# parse infer fetches
extra_keys = []
if cfg['metric'] == 'COCO':
extra_keys = ['im_info', 'im_id', 'im_shape']
if cfg['metric'] == 'VOC':
extra_keys = ['im_id']
keys, values, _ = parse_fetches(test_fetches, infer_prog, extra_keys)
# parse dataset category
if cfg.metric == 'COCO':
from ppdet.utils.coco_eval import bbox2out, mask2out, get_category_info
if cfg.metric == "VOC":
from ppdet.utils.voc_eval import bbox2out, get_category_info
anno_file = getattr(test_feed.dataset, 'annotation', None)
with_background = getattr(test_feed, 'with_background', True)
use_default_label = getattr(test_feed, 'use_default_label', False)
clsid2catid, catid2name = get_category_info(anno_file, with_background,
use_default_label)
imid2path = reader.imid2path
for iter_id, data in enumerate(reader()):
outs = exe.run(infer_prog,
feed=feeder.feed(data),
fetch_list=values,
return_numpy=False)
res = {
k: (np.array(v), v.recursive_sequence_lengths())
for k, v in zip(keys, outs)
}
logger.info('Infer iter {}'.format(iter_id))
bbox_results = None
mask_results = None
is_bbox_normalized = True if cfg.metric == 'VOC' else False
if 'bbox' in res:
bbox_results = bbox2out([res], clsid2catid, is_bbox_normalized)
if 'mask' in res:
mask_results = mask2out([res], clsid2catid,
model.mask_head.resolution)
# visualize result
im_ids = res['im_id'][0]
for im_id in im_ids:
image_path = imid2path[int(im_id)]
image = Image.open(image_path).convert('RGB')
image = visualize_results(image,
int(im_id), catid2name, 0.5, bbox_results,
mask_results, is_bbox_normalized)
save_name = get_save_image_name(FLAGS.output_dir, image_path)
logger.info("Detection bbox results save in {}".format(save_name))
image.save(save_name)
if __name__ == '__main__':
parser = ArgsParser()
parser.add_argument(
"--infer_dir",
type=str,
default=None,
help="Directory for images to perform inference on.")
parser.add_argument(
"--infer_img",
type=str,
default=None,
help="Image path, has higher priority over --infer_dir")
parser.add_argument(
"--output_dir",
type=str,
default="output",
help="Directory for storing the output visualization files.")
FLAGS = parser.parse_args()
main()
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import time
import multiprocessing
import numpy as np
def set_paddle_flags(**kwargs):
for key, value in kwargs.items():
if os.environ.get(key, None) is None:
os.environ[key] = str(value)
# NOTE(paddle-dev): All of these flags should be
# set before `import paddle`. Otherwise, it would
# not take any effect.
set_paddle_flags(
FLAGS_eager_delete_tensor_gb=0, # enable GC to save memory
)
from paddle import fluid
from ppdet.core.workspace import load_config, merge_config, create
from ppdet.data.data_feed import create_reader
from ppdet.utils.eval_utils import parse_fetches, eval_run, eval_results
from ppdet.utils.stats import TrainingStats
from ppdet.utils.cli import ArgsParser
import ppdet.utils.checkpoint as checkpoint
from ppdet.modeling.model_input import create_feed
import logging
FORMAT = '%(asctime)s-%(levelname)s: %(message)s'
logging.basicConfig(level=logging.INFO, format=FORMAT)
logger = logging.getLogger(__name__)
def main():
cfg = load_config(FLAGS.config)
if 'architecture' in cfg:
main_arch = cfg.architecture
else:
raise ValueError("'architecture' not specified in config file.")
merge_config(FLAGS.opt)
if cfg.use_gpu:
devices_num = fluid.core.get_cuda_device_count()
else:
devices_num = int(
os.environ.get('CPU_NUM', multiprocessing.cpu_count()))
if 'train_feed' not in cfg:
train_feed = create(main_arch + 'TrainFeed')
else:
train_feed = create(cfg.train_feed)
if FLAGS.eval:
if 'eval_feed' not in cfg:
eval_feed = create(main_arch + 'EvalFeed')
else:
eval_feed = create(cfg.eval_feed)
place = fluid.CUDAPlace(0) if cfg.use_gpu else fluid.CPUPlace()
exe = fluid.Executor(place)
model = create(main_arch)
lr_builder = create('LearningRate')
optim_builder = create('OptimizerBuilder')
# build program
startup_prog = fluid.Program()
train_prog = fluid.Program()
with fluid.program_guard(train_prog, startup_prog):
with fluid.unique_name.guard():
train_pyreader, feed_vars = create_feed(train_feed)
train_fetches = model.train(feed_vars)
loss = train_fetches['loss']
lr = lr_builder()
optimizer = optim_builder(lr)
optimizer.minimize(loss)
train_reader = create_reader(train_feed, cfg.max_iters * devices_num)
train_pyreader.decorate_sample_list_generator(train_reader, place)
# parse train fetches
train_keys, train_values, _ = parse_fetches(train_fetches)
train_values.append(lr)
if FLAGS.eval:
eval_prog = fluid.Program()
with fluid.program_guard(eval_prog, startup_prog):
with fluid.unique_name.guard():
eval_pyreader, feed_vars = create_feed(eval_feed)
fetches = model.eval(feed_vars)
eval_prog = eval_prog.clone(True)
eval_reader = create_reader(eval_feed)
eval_pyreader.decorate_sample_list_generator(eval_reader, place)
# parse train fetches
extra_keys = ['im_info', 'im_id'] if cfg.metric == 'COCO' else []
eval_keys, eval_values, eval_cls = parse_fetches(fetches, eval_prog,
extra_keys)
# compile program for multi-devices
build_strategy = fluid.BuildStrategy()
build_strategy.memory_optimize = False
build_strategy.enable_inplace = True
sync_bn = getattr(model.backbone, 'norm_type', None) == 'sync_bn'
build_strategy.sync_batch_norm = sync_bn
train_compile_program = fluid.compiler.CompiledProgram(
train_prog).with_data_parallel(
loss_name=loss.name, build_strategy=build_strategy)
if FLAGS.eval:
eval_compile_program = fluid.compiler.CompiledProgram(eval_prog)
exe.run(startup_prog)
freeze_bn = getattr(model.backbone, 'freeze_norm', False)
if FLAGS.resume_checkpoint:
checkpoint.load_checkpoint(exe, train_prog, FLAGS.resume_checkpoint)
elif cfg.pretrain_weights and freeze_bn:
checkpoint.load_and_fusebn(exe, train_prog, cfg.pretrain_weights)
elif cfg.pretrain_weights:
checkpoint.load_pretrain(exe, train_prog, cfg.pretrain_weights)
train_stats = TrainingStats(cfg.log_smooth_window, train_keys)
train_pyreader.start()
start_time = time.time()
end_time = time.time()
cfg_name = os.path.basename(FLAGS.config).split('.')[0]
save_dir = os.path.join(cfg.save_dir, cfg_name)
for it in range(cfg.max_iters):
start_time = end_time
end_time = time.time()
outs = exe.run(train_compile_program, fetch_list=train_values)
stats = {k: np.array(v).mean() for k, v in zip(train_keys, outs[:-1])}
train_stats.update(stats)
logs = train_stats.log()
strs = 'iter: {}, lr: {:.6f}, {}, time: {:.3f}'.format(
it, np.mean(outs[-1]), logs, end_time - start_time)
logger.info(strs)
if it > 0 and it % cfg.snapshot_iter == 0:
checkpoint.save(exe, train_prog, os.path.join(save_dir, str(it)))
if FLAGS.eval:
# evaluation
results = eval_run(exe, eval_compile_program, eval_pyreader,
eval_keys, eval_values, eval_cls)
resolution = None
if 'mask' in results[0]:
resolution = model.mask_head.resolution
eval_results(results, eval_feed, cfg.metric, resolution,
FLAGS.output_file)
checkpoint.save(exe, train_prog, os.path.join(save_dir, "model_final"))
train_pyreader.reset()
if __name__ == '__main__':
parser = ArgsParser()
parser.add_argument(
"-r",
"--resume_checkpoint",
default=None,
type=str,
help="Checkpoint path for resuming training.")
parser.add_argument(
"--eval",
action='store_true',
default=False,
help="Whether to perform evaluation in train")
parser.add_argument(
"-f",
"--output_file",
default=None,
type=str,
help="Evaluation file name, default to bbox.json and mask.json.")
FLAGS = parser.parse_args()
main()
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册