提交 1394ab33 编写于 作者: S still-wait

Merge branch 'master' of https://github.com/paddlepaddle/PaddleDetection into add_solov2_new

English | [简体中文](README_cn.md)
Documentation:[https://paddledetection.readthedocs.io](https://paddledetection.readthedocs.io)
# PaddleDetection
PaddleDetection is an end-to-end object detection development kit based on PaddlePaddle, which
aims to help developers in the whole development of training models, optimizing performance and
inference speed, and deploying models. PaddleDetection provides varied object detection architectures
in modular design, and wealthy data augmentation methods, network components, loss functions, etc.
PaddleDetection supported practical projects such as industrial quality inspection, remote sensing
image object detection, and automatic inspection with its practical features such as model compression
and multi-platform deployment.
[PP-YOLO](https://arxiv.org/abs/2007.12099), which is faster and has higer performance than YOLOv4,
has been released, it reached mAP(0.5:0.95) as 45.2% on COCO test2019 dataset and 72.9 FPS on single
Test V100. Please refer to [PP-YOLO](configs/ppyolo/README.md) for details.
**Now all models in PaddleDetection require PaddlePaddle version 1.8 or higher, or suitable develop version.**
<div align="center">
<img src="docs/images/000000570688.jpg" />
</div>
## Introduction
Features:
- Rich models:
PaddleDetection provides rich of models, including 100+ pre-trained models
such as object detection, instance segmentation, face detection etc. It covers
the champion models, the practical detection models for cloud and edge device.
- Production Ready:
Key operations are implemented in C++ and CUDA, together with PaddlePaddle's
highly efficient inference engine, enables easy deployment in server environments.
- Highly Flexible:
Components are designed to be modular. Model architectures, as well as data
preprocess pipelines, can be easily customized with simple configuration
changes.
- Performance Optimized:
With the help of the underlying PaddlePaddle framework, faster training and
reduced GPU memory footprint is achieved. Notably, YOLOv3 training is
much faster compared to other frameworks. Another example is Mask-RCNN
(ResNet50), we managed to fit up to 4 images per GPU (Tesla V100 16GB) during
multi-GPU training.
Supported Architectures:
| | ResNet | ResNet-vd <sup>[1](#vd)</sup> | ResNeXt-vd | SENet | MobileNet | HRNet | Res2Net |
| ------------------- | :----: | ----------------------------: | :--------: | :---: | :-------: |:------:|:-----: |
| Faster R-CNN | ✓ | ✓ | x | ✓ | ✗ | ✗ | ✗ |
| Faster R-CNN + FPN | ✓ | ✓ | ✓ | ✓ | ✗ | ✓ | ✓ |
| Mask R-CNN | ✓ | ✓ | x | ✓ | ✗ | ✗ | ✗ |
| Mask R-CNN + FPN | ✓ | ✓ | ✓ | ✓ | ✗ | ✗ | ✓ |
| Cascade Faster-RCNN | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ |
| Cascade Mask-RCNN | ✓ | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ |
| Libra R-CNN | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ |
| RetinaNet | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
| YOLOv3 | ✓ | ✓ | ✗ | ✗ | ✓ | ✗ | ✗ |
| SSD | ✗ | ✗ | ✗ | ✗ | ✓ | ✗ | ✗ |
| BlazeFace | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
| Faceboxes | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
<a name="vd">[1]</a> [ResNet-vd](https://arxiv.org/pdf/1812.01187) models offer much improved accuracy with negligible performance cost.
**NOTE:** ✓ for config file and pretrain model provided in [Model Zoo](docs/MODEL_ZOO.md), ✗ for not provided but is supported generally.
More models:
- EfficientDet
- FCOS
- CornerNet-Squeeze
- YOLOv4
- PP-YOLO
More Backbones:
- DarkNet
- VGG
- GCNet
- CBNet
Advanced Features:
- [x] **Synchronized Batch Norm**
- [x] **Group Norm**
- [x] **Modulated Deformable Convolution**
- [x] **Deformable PSRoI Pooling**
- [x] **Non-local and GCNet**
**NOTE:** Synchronized batch normalization can only be used on multiple GPU devices, can not be used on CPU devices or single GPU device.
The following is the relationship between COCO mAP and FPS on Tesla V100 of representative models of each architectures and backbones.
<div align="center">
<img src="docs/images/map_fps.png" width=800 />
</div>
**NOTE:**
- `CBResNet` stands for `Cascade-Faster-RCNN-CBResNet200vd-FPN`, which has highest mAP on COCO as 53.3% in PaddleDetection models
- `Cascade-Faster-RCNN` stands for `Cascade-Faster-RCNN-ResNet50vd-DCN`, which has been optimized to 20 FPS inference speed when COCO mAP as 47.8%
- The enhanced `YOLOv3-ResNet50vd-DCN` is 10.6 absolute percentage points higher than paper on COCO mAP, and inference speed is nearly 70% faster than the darknet framework
- All these models can be get in [Model Zoo](#Model-Zoo)
The following is the relationship between COCO mAP and FPS on Tesla V100 of SOTA object detecters and PP-YOLO, which is faster and has better performance than YOLOv4, and reached mAP(0.5:0.95) as 45.2% on COCO test2019 dataset and 72.9 FPS on single Test V100. Please refer to [PP-YOLO](configs/ppyolo/README.md) for details.
<div align="center">
<img src="docs/images/ppyolo_map_fps.png" width=600 />
</div>
## Tutorials
### Get Started
- [Installation guide](docs/tutorials/INSTALL.md)
- [Quick start on small dataset](docs/tutorials/QUICK_STARTED.md)
- [Train/Evaluation/Inference](docs/tutorials/GETTING_STARTED.md)
- [How to train a custom dataset](docs/tutorials/Custom_DataSet.md)
- [FAQ](docs/FAQ.md)
### Advanced Tutorial
- [Guide to preprocess pipeline and dataset definition](docs/advanced_tutorials/READER.md)
- [Models technical](docs/advanced_tutorials/MODEL_TECHNICAL.md)
- [Transfer learning document](docs/advanced_tutorials/TRANSFER_LEARNING.md)
- [Parameter configuration](docs/advanced_tutorials/config_doc):
- [Introduction to the configuration workflow](docs/advanced_tutorials/config_doc/CONFIG.md)
- [Parameter configuration for RCNN model](docs/advanced_tutorials/config_doc/RCNN_PARAMS_DOC.md)
- [IPython Notebook demo](demo/mask_rcnn_demo.ipynb)
- [Model compression](slim)
- [Model compression benchmark](slim)
- [Quantization](slim/quantization)
- [Model pruning](slim/prune)
- [Model distillation](slim/distillation)
- [Neural Architecture Search](slim/nas)
- [Deployment](deploy)
- [Export model for inference](docs/advanced_tutorials/deploy/EXPORT_MODEL.md)
- [Python inference](deploy/python)
- [C++ inference](deploy/cpp)
- [Inference benchmark](docs/advanced_tutorials/deploy/BENCHMARK_INFER_cn.md)
## Model Zoo
- Pretrained models are available in the [PaddleDetection model zoo](docs/MODEL_ZOO.md).
- [Mobile models](configs/mobile/README.md)
- [Anchor free models](configs/anchor_free/README.md)
- [Face detection models](docs/featured_model/FACE_DETECTION_en.md)
- [Pretrained models for pedestrian detection](docs/featured_model/CONTRIB.md)
- [Pretrained models for vehicle detection](docs/featured_model/CONTRIB.md)
- [YOLOv3 enhanced model](docs/featured_model/YOLOv3_ENHANCEMENT.md): Compared to MAP of 33.0% in paper, enhanced YOLOv3 reaches the MAP of 43.6%, and inference speed is improved as well
- [PP-YOLO](configs/ppyolo/README.md): PP-YOLO reeached mAP as 45.3% on COCO dataset,and 72.9 FPS on single Tesla V100
- [Objects365 2019 Challenge champion model](docs/featured_model/champion_model/CACascadeRCNN.md)
- [Best single model of Open Images 2019-Object Detction](docs/featured_model/champion_model/OIDV5_BASELINE_MODEL.md)
- [Practical Server-side detection method](configs/rcnn_enhance/README_en.md): Inference speed on single V100 GPU can reach 20FPS when COCO mAP is 47.8%.
- [Large-scale practical object detection models](docs/featured_model/LARGE_SCALE_DET_MODEL_en.md): Large-scale practical server-side detection pretrained models with 676 categories are provided for most application scenarios, which can be used not only for direct inference but also finetuning on other datasets.
## License
PaddleDetection is released under the [Apache 2.0 license](LICENSE).
## Updates
v0.4.0 was released at `05/2020`, add PP-YOLO, TTFNet, HTC, ACFPN, etc. And add BlaceFace face landmark detection model, add a series of optimized SSDLite models on mobile side, add data augmentations GridMask and RandomErasing, add Matrix NMS and EMA training, and improved ease of use, fix many known bugs, etc.
Please refer to [版本更新文档](docs/CHANGELOG.md) for details.
## Contributing
Contributions are highly welcomed and we would really appreciate your feedback!!
README_cn.md
\ No newline at end of file
简体中文 | [English](README.md) 简体中文 | [English](README_en.md)
文档:[https://paddledetection.readthedocs.io](https://paddledetection.readthedocs.io) 文档:[https://paddledetection.readthedocs.io](https://paddledetection.readthedocs.io)
# PaddleDetection # 简介
飞桨推出的PaddleDetection是端到端目标检测开发套件,旨在帮助开发者更快更好地完成检测模型的训练、精度速度优化到部署全流程。PaddleDetection以模块化的设计实现了多种主流目标检测算法,并且提供了丰富的数据增强、网络组件、损失函数等模块,集成了模型压缩和跨平台高性能部署能力。目前基于PaddleDetection已经完成落地的项目涉及工业质检、遥感图像检测、无人巡检等多个领域 PaddleDetection飞桨目标检测开发套件,旨在帮助开发者更快更好地完成检测模型的组建、训练、优化及部署等全开发流程
PaddleDetection新发布精度速度领先的[PP-YOLO](https://arxiv.org/abs/2007.12099)模型,COCO数据集精度达到45.2%,单卡Tesla V100预测速度达到72.9 FPS,详细信息见[PP-YOLO模型](configs/ppyolo/README_cn.md) PaddleDetection模块化地实现了多种主流目标检测算法,提供了丰富的数据增强策略、网络模块组件(如骨干网络)、损失函数等,并集成了模型压缩和跨平台高性能部署能力。
**目前检测库下模型均要求使用PaddlePaddle 1.8及以上版本或适当的develop版本。** 经过长时间产业实践打磨,PaddleDetection已拥有顺畅、卓越的使用体验,被工业质检、遥感图像检测、无人巡检、新零售、互联网、科研等十多个行业的开发者广泛应用。
<div align="center"> <div align="center">
<img src="docs/images/000000570688.jpg" /> <img src="docs/images/football.gif" width='800'/>
</div> </div>
### 产品动态
## 简介
- 2020.09.21-27: 【目标检测7日打卡课】手把手教你从入门到进阶,深入了解目标检测算法的前世今生。立即加入课程QQ交流群(1136406895)一起学习吧 :)
特性: - 2020.07.24: 发布**产业最实用**目标检测模型 [PP-YOLO](https://arxiv.org/abs/2007.12099) ,深入考虑产业应用对精度速度的双重面诉求,COCO数据集精度45.2%,Tesla V100预测速度72.9 FPS,详细信息见[文档](configs/ppyolo/README_cn.md)
- 2020.06.11: 发布676类大规模服务器端实用目标检测模型,适用于绝大部分使用场景,可以直接用来预测,也可以用于微调其他任务。
- 模型丰富:
### 特性
PaddleDetection提供了丰富的模型,包含目标检测、实例分割、人脸检测等100+个预训练模型,涵盖多种数据集竞赛冠军方案、适合云端/边缘端设备部署的检测方案。
- **模型丰富**: 包含**目标检测****实例分割****人脸检测****100+个预训练模型**,涵盖多种**全球竞赛冠军**方案
- 易部署: - **使用简洁**:模块化设计,解耦各个网络组件,开发者轻松搭建、试用各种检测模型及优化策略,快速得到高性能、定制化的算法。
- **端到端打通**: 从数据增强、组网、训练、压缩、部署端到端打通,并完备支持**云端**/**边缘端**多架构、多设备部署。
PaddleDetection的模型中使用的核心算子均通过C++或CUDA实现,同时基于PaddlePaddle的高性能推理引擎可以方便地部署在多种硬件平台上。 - **高性能**: 基于飞桨的高性能内核,模型训练速度及显存占用优势明显。支持FP16训练, 支持多机训练。
- 高灵活度: #### 套件结构概览
PaddleDetection通过模块化设计来解耦各个组件,基于配置文件可以轻松地搭建各种检测模型。 <table>
<tbody>
- 高性能: <tr align="center" valign="bottom">
<td>
基于PaddlePaddle框架的高性能内核,在模型训练速度、显存占用上有一定的优势。例如,YOLOv3的训练速度快于其他框架,在Tesla V100 16GB环境下,Mask-RCNN(ResNet50)可以单卡Batch Size可以达到4 (甚至到5)。 <b>Architectures</b>
</td>
<td>
支持的模型结构: <b>Backbones</b>
</td>
| | ResNet | ResNet-vd <sup>[1](#vd)</sup> | ResNeXt-vd | SENet | MobileNet | HRNet | Res2Net | <td>
|--------------------|:------:|------------------------------:|:----------:|:-----:|:---------:|:------:| :--: | <b>Components</b>
| Faster R-CNN | ✓ | ✓ | x | ✓ | ✗ | ✗ | ✗ | </td>
| Faster R-CNN + FPN | ✓ | ✓ | ✓ | ✓ | ✗ | ✓ | ✓ | <td>
| Mask R-CNN | ✓ | ✓ | x | ✓ | ✗ | ✗ | ✗ | <b>Data Augmentation</b>
| Mask R-CNN + FPN | ✓ | ✓ | ✓ | ✓ | ✗ | ✗ | ✓ | </td>
| Cascade Faster-RCNN | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ | </tr>
| Cascade Mask-RCNN | ✓ | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ | <tr valign="top">
| Libra R-CNN | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | <td>
| RetinaNet | ✓ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | <ul><li><b>Two-Stage Detection</b></li>
| YOLOv3 | ✓ | ✓ | ✗ | ✗ | ✓ | ✗ | ✗ | <ul>
| SSD | ✗ | ✗ | ✗ | ✗ | ✓ | ✗ | ✗ | <li>Faster RCNN</li>
| BlazeFace | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | <li>FPN</li>
| Faceboxes | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | <li>Cascade-RCNN</li>
<li>Libra RCNN</li>
<a name="vd">[1]</a> [ResNet-vd](https://arxiv.org/pdf/1812.01187) 模型预测速度基本不变的情况下提高了精度。 <li>Hybrid Task RCNN</li>
<li>PSS-Det RCNN</li>
**说明:** ✓ 为[模型库](docs/MODEL_ZOO_cn.md)中提供了对应配置文件和预训练模型,✗ 为未提供参考配置,但一般都支持。 </ul>
</ul>
更多的模型: <ul><li><b>One-Stage Detection</b></li>
<ul>
- EfficientDet <li>RetinaNet</li>
- FCOS <li>YOLOv3</li>
- CornerNet-Squeeze <li>YOLOv4</li>
- YOLOv4 <li>PP-YOLO</li>
- PP-YOLO <li>SSD</li>
</ul>
更多的Backone: </ul>
<ul><li><b>Anchor Free</b></li>
- DarkNet <ul>
- VGG <li>CornerNet-Squeeze</li>
- GCNet <li>FCOS</li>
- CBNet <li>TTFNet</li>
- Hourglass </ul>
</ul>
扩展特性: <ul>
<li><b>Instance Segmentation</b></li>
- [x] **Synchronized Batch Norm** <ul>
- [x] **Group Norm** <li>Mask RCNN</li>
- [x] **Modulated Deformable Convolution** <li>SOLOv2 is coming soon</li>
- [x] **Deformable PSRoI Pooling** </ul>
- [x] **Non-local和GCNet** </ul>
<ul>
**注意:** Synchronized batch normalization 只能在多GPU环境下使用,不能在CPU环境或者单GPU环境下使用。 <li><b>Face-Detction</b></li>
<ul>
以下为选取各模型结构和骨干网络的代表模型COCO数据集精度mAP和单卡Tesla V100上预测速度(FPS)关系图。 <li>FaceBoxes</li>
<li>BlazeFace</li>
<li>BlazeFace-NAS</li>
</ul>
</ul>
</td>
<td>
<ul>
<li>ResNet(&vd)</li>
<li>ResNeXt(&vd)</li>
<li>SENet</li>
<li>Res2Net</li>
<li>HRNet</li>
<li>Hourglass</li>
<li>CBNet</li>
<li>GCNet</li>
<li>DarkNet</li>
<li>CSPDarkNet</li>
<li>VGG</li>
<li>MobileNetv1/v3</li>
<li>GhostNet</li>
<li>Efficientnet</li>
</ul>
</td>
<td>
<ul><li><b>Common</b></li>
<ul>
<li>Sync-BN</li>
<li>Group Norm</li>
<li>DCNv2</li>
<li>Non-local</li>
</ul>
</ul>
<ul><li><b>FPN</b></li>
<ul>
<li>BiFPN</li>
<li>BFP</li>
<li>HRFPN</li>
<li>ACFPN</li>
</ul>
</ul>
<ul><li><b>Loss</b></li>
<ul>
<li>Smooth-L1</li>
<li>GIoU/DIoU/CIoU</li>
<li>IoUAware</li>
</ul>
</ul>
<ul><li><b>Post-processing</b></li>
<ul>
<li>SoftNMS</li>
<li>MatrixNMS</li>
</ul>
</ul>
<ul><li><b>Speed</b></li>
<ul>
<li>FP16 training</li>
<li>Multi-machine training </li>
</ul>
</ul>
</td>
<td>
<ul>
<li>Resize</li>
<li>Flipping</li>
<li>Expand</li>
<li>Crop</li>
<li>Color Distort</li>
<li>Random Erasing</li>
<li>Mixup </li>
<li>Cutmix </li>
<li>Grid Mask</li>
<li>Auto Augment</li>
</ul>
</td>
</tr>
</td>
</tr>
</tbody>
</table>
#### 模型性能概览
各模型结构和骨干网络的代表模型在COCO数据集上精度mAP和单卡Tesla V100上预测速度(FPS)对比图。
<div align="center"> <div align="center">
<img src="docs/images/map_fps.png" width=800 /> <img src="docs/images/map_fps.png" />
</div> </div>
**说明:** **说明:**
...@@ -95,11 +180,6 @@ PaddleDetection新发布精度速度领先的[PP-YOLO](https://arxiv.org/abs/200 ...@@ -95,11 +180,6 @@ PaddleDetection新发布精度速度领先的[PP-YOLO](https://arxiv.org/abs/200
- PaddleDetection增强版`YOLOv3-ResNet50vd-DCN`在COCO数据集mAP高于原作10.6个绝对百分点,推理速度为61.3FPS,快于原作约70% - PaddleDetection增强版`YOLOv3-ResNet50vd-DCN`在COCO数据集mAP高于原作10.6个绝对百分点,推理速度为61.3FPS,快于原作约70%
- 图中模型均可在[模型库](#模型库)中获取 - 图中模型均可在[模型库](#模型库)中获取
以下为PaddleDetection发布的精度和预测速度优于YOLOv4模型的PP-YOLO与前沿目标检测算法的COCO数据集精度与单卡Tesla V100预测速度(FPS)关系图, PP-YOLO模型在[COCO](http://cocodataset.org) test2019数据集上精度达到45.2%,在单卡V100上FP32推理速度为72.9 FPS,详细信息见[PP-YOLO模型](configs/ppyolo/README_cn.md)
<div align="center">
<img src="docs/images/ppyolo_map_fps.png" width=600 />
</div>
## 文档教程 ## 文档教程
...@@ -108,51 +188,56 @@ PaddleDetection新发布精度速度领先的[PP-YOLO](https://arxiv.org/abs/200 ...@@ -108,51 +188,56 @@ PaddleDetection新发布精度速度领先的[PP-YOLO](https://arxiv.org/abs/200
- [安装说明](docs/tutorials/INSTALL_cn.md) - [安装说明](docs/tutorials/INSTALL_cn.md)
- [快速开始](docs/tutorials/QUICK_STARTED_cn.md) - [快速开始](docs/tutorials/QUICK_STARTED_cn.md)
- [训练/评估/预测流程](docs/tutorials/GETTING_STARTED_cn.md) - [训练/评估/预测流程](docs/tutorials/GETTING_STARTED_cn.md)
- [如何训练自定义数据集](docs/tutorials/Custom_DataSet.md) - [如何自定义数据集](docs/tutorials/Custom_DataSet.md)
- [常见问题汇总](docs/FAQ.md) - [常见问题汇总](docs/FAQ.md)
### 进阶教程 ### 进阶教程
- [数据预处理及数据集定义](docs/advanced_tutorials/READER.md) - 参数配置
- [搭建模型步骤](docs/advanced_tutorials/MODEL_TECHNICAL.md)
- [模型参数配置](docs/advanced_tutorials/config_doc):
- [配置模块设计和介绍](docs/advanced_tutorials/config_doc/CONFIG_cn.md) - [配置模块设计和介绍](docs/advanced_tutorials/config_doc/CONFIG_cn.md)
- [RCNN模型参数说明](docs/advanced_tutorials/config_doc/RCNN_PARAMS_DOC.md) - [RCNN参数说明](docs/advanced_tutorials/config_doc/RCNN_PARAMS_DOC.md)
- [迁移学习教程](docs/advanced_tutorials/TRANSFER_LEARNING_cn.md) - [YOLOv3参数说明]()
- [IPython Notebook demo](demo/mask_rcnn_demo.ipynb) - 迁移学习
- [模型压缩](slim) - [如何加载预训练](docs/advanced_tutorials/TRANSFER_LEARNING_cn.md)
- 模型压缩(基于[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim))
- [压缩benchmark](slim) - [压缩benchmark](slim)
- [量化](slim/quantization) - [量化](slim/quantization), [剪枝](slim/prune), [蒸馏](slim/distillation), [搜索](slim/nas)
- [剪枝](slim/prune) - 推理部署
- [蒸馏](slim/distillation)
- [神经网络搜索](slim/nas)
- [推理部署](deploy)
- [模型导出教程](docs/advanced_tutorials/deploy/EXPORT_MODEL.md) - [模型导出教程](docs/advanced_tutorials/deploy/EXPORT_MODEL.md)
- [Python端推理部署](deploy/python) - [服务器端Python部署](deploy/python)
- [C++端推理部署](deploy/cpp) - [服务器端C++部署](deploy/cpp)
- [移动端部署](https://github.com/PaddlePaddle/Paddle-Lite-Demo)
- [在线Serving部署](https://github.com/PaddlePaddle/Serving)
- [推理Benchmark](docs/advanced_tutorials/deploy/BENCHMARK_INFER_cn.md) - [推理Benchmark](docs/advanced_tutorials/deploy/BENCHMARK_INFER_cn.md)
- 进阶开发
- [新增数据预处理](docs/advanced_tutorials/READER.md)
- [新增检测算法](docs/advanced_tutorials/MODEL_TECHNICAL.md)
## 模型库 ## 模型库
- [模型库](docs/MODEL_ZOO_cn.md) - 通用目标检测:
- [移动端模型](configs/mobile/README.md) - [模型库和基线](docs/MODEL_ZOO_cn.md)
- [Anchor free模型](configs/anchor_free/README.md) - [移动端模型](configs/mobile/README.md)
- [人脸检测模型](docs/featured_model/FACE_DETECTION.md) - [Anchor Free](configs/anchor_free/README.md)
- [YOLOv3增强模型](docs/featured_model/YOLOv3_ENHANCEMENT.md): COCO mAP高达43.6%,原论文精度为33.0% - [PP-YOLO模型](configs/ppyolo/README_cn.md)
- [PP-YOLO模型](configs/ppyolo/README_cn.md): COCO mAP高达45.3%,单卡Tesla V100预测速度高达72.9 FPS - [676类目标检测](docs/featured_model/LARGE_SCALE_DET_MODEL.md)
- [行人检测预训练模型](docs/featured_model/CONTRIB_cn.md) - [两阶段实用模型PSS-Det](configs/rcnn_enhance/README.md)
- [车辆检测预训练模型](docs/featured_model/CONTRIB_cn.md) - 垂类领域
- [Objects365 2019 Challenge夺冠模型](docs/featured_model/champion_model/CACascadeRCNN.md) - [人脸检测](docs/featured_model/FACE_DETECTION.md)
- [Open Images 2019-Object Detction比赛最佳单模型](docs/featured_model/champion_model/OIDV5_BASELINE_MODEL.md) - [行人检测](docs/featured_model/CONTRIB_cn.md)
- [服务器端实用目标检测模型](configs/rcnn_enhance/README.md): V100上速度20FPS时,COCO mAP高达47.8%。 - [车辆检测](docs/featured_model/CONTRIB_cn.md)
- [大规模实用目标检测模型](docs/featured_model/LARGE_SCALE_DET_MODEL.md): 提供了包含676个类别的大规模服务器端实用目标检测模型,适用于绝大部分使用场景,可以直接用来预测,也可以用于微调其他任务。 - 比赛方案
- [Objects365 2019 Challenge夺冠模型](docs/featured_model/champion_model/CACascadeRCNN.md)
- [Open Images 2019-Object Detction比赛最佳单模型](docs/featured_model/champion_model/OIDV5_BASELINE_MODEL.md)
## 版本更新
v0.4.0版本已经在`07/2020`发布,增加PP-YOLO, TTFNet, HTC, ACFPN等多个模型,新增BlazeFace人脸关键点检测模型,新增移动端SSDLite系列优化模型,新增GridMask,RandomErasing数据增强方法,新增Matrix NMS和EMA训练,提升易用性,修复已知诸多bug等,详细内容请参考[版本更新文档](docs/CHANGELOG.md)
## 许可证书 ## 许可证书
本项目的发布受[Apache 2.0 license](LICENSE)许可认证。 本项目的发布受[Apache 2.0 license](LICENSE)许可认证。
## 版本更新
v0.4.0版本已经在`07/2020`发布,增加PP-YOLO, TTFNet, HTC, ACFPN等多个模型,新增BlazeFace人脸关键点检测模型,新增移动端SSDLite系列优化模型,新增GridMask,RandomErasing数据增强方法,新增Matrix NMS和EMA训练,提升易用性,修复已知诸多bug等,详细内容请参考[版本更新文档](docs/CHANGELOG.md)
## 如何贡献代码 ## 贡献代码
我们非常欢迎你可以为PaddleDetection提供代码,也十分感谢你的反馈。 我们非常欢迎你可以为PaddleDetection提供代码,也十分感谢你的反馈。
English | [简体中文](README_cn.md)
Documentation:[https://paddledetection.readthedocs.io](https://paddledetection.readthedocs.io)
# PaddleDetection
PaddleDetection is an end-to-end object detection development kit based on PaddlePaddle, which
aims to help developers in the whole development of training models, optimizing performance and
inference speed, and deploying models. PaddleDetection provides varied object detection architectures
in modular design, and wealthy data augmentation methods, network components, loss functions, etc.
PaddleDetection supported practical projects such as industrial quality inspection, remote sensing
image object detection, and automatic inspection with its practical features such as model compression
and multi-platform deployment.
[PP-YOLO](https://arxiv.org/abs/2007.12099), which is faster and has higer performance than YOLOv4,
has been released, it reached mAP(0.5:0.95) as 45.2% on COCO test2019 dataset and 72.9 FPS on single
Test V100. Please refer to [PP-YOLO](configs/ppyolo/README.md) for details.
**Now all models in PaddleDetection require PaddlePaddle version 1.8 or higher, or suitable develop version.**
<div align="center">
<img src="docs/images/000000570688.jpg" />
</div>
## Introduction
Features:
- Rich models:
PaddleDetection provides rich of models, including 100+ pre-trained models
such as object detection, instance segmentation, face detection etc. It covers
the champion models, the practical detection models for cloud and edge device.
- Production Ready:
Key operations are implemented in C++ and CUDA, together with PaddlePaddle's
highly efficient inference engine, enables easy deployment in server environments.
- Highly Flexible:
Components are designed to be modular. Model architectures, as well as data
preprocess pipelines, can be easily customized with simple configuration
changes.
- Performance Optimized:
With the help of the underlying PaddlePaddle framework, faster training and
reduced GPU memory footprint is achieved. Notably, YOLOv3 training is
much faster compared to other frameworks. Another example is Mask-RCNN
(ResNet50), we managed to fit up to 4 images per GPU (Tesla V100 16GB) during
multi-GPU training.
Supported Architectures:
| | ResNet | ResNet-vd <sup>[1](#vd)</sup> | ResNeXt-vd | SENet | MobileNet | HRNet | Res2Net |
| ------------------- | :----: | ----------------------------: | :--------: | :---: | :-------: |:------:|:-----: |
| Faster R-CNN | ✓ | ✓ | x | ✓ | ✗ | ✗ | ✗ |
| Faster R-CNN + FPN | ✓ | ✓ | ✓ | ✓ | ✗ | ✓ | ✓ |
| Mask R-CNN | ✓ | ✓ | x | ✓ | ✗ | ✗ | ✗ |
| Mask R-CNN + FPN | ✓ | ✓ | ✓ | ✓ | ✗ | ✗ | ✓ |
| Cascade Faster-RCNN | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ |
| Cascade Mask-RCNN | ✓ | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ |
| Libra R-CNN | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ |
| RetinaNet | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
| YOLOv3 | ✓ | ✓ | ✗ | ✗ | ✓ | ✗ | ✗ |
| SSD | ✗ | ✗ | ✗ | ✗ | ✓ | ✗ | ✗ |
| BlazeFace | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
| Faceboxes | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
<a name="vd">[1]</a> [ResNet-vd](https://arxiv.org/pdf/1812.01187) models offer much improved accuracy with negligible performance cost.
**NOTE:** ✓ for config file and pretrain model provided in [Model Zoo](docs/MODEL_ZOO.md), ✗ for not provided but is supported generally.
More models:
- EfficientDet
- FCOS
- CornerNet-Squeeze
- YOLOv4
- PP-YOLO
More Backbones:
- DarkNet
- VGG
- GCNet
- CBNet
Advanced Features:
- [x] **Synchronized Batch Norm**
- [x] **Group Norm**
- [x] **Modulated Deformable Convolution**
- [x] **Deformable PSRoI Pooling**
- [x] **Non-local and GCNet**
**NOTE:** Synchronized batch normalization can only be used on multiple GPU devices, can not be used on CPU devices or single GPU device.
The following is the relationship between COCO mAP and FPS on Tesla V100 of representative models of each architectures and backbones.
<div align="center">
<img src="docs/images/map_fps.png" width=800 />
</div>
**NOTE:**
- `CBResNet` stands for `Cascade-Faster-RCNN-CBResNet200vd-FPN`, which has highest mAP on COCO as 53.3% in PaddleDetection models
- `Cascade-Faster-RCNN` stands for `Cascade-Faster-RCNN-ResNet50vd-DCN`, which has been optimized to 20 FPS inference speed when COCO mAP as 47.8%
- The enhanced `YOLOv3-ResNet50vd-DCN` is 10.6 absolute percentage points higher than paper on COCO mAP, and inference speed is nearly 70% faster than the darknet framework
- All these models can be get in [Model Zoo](#Model-Zoo)
The following is the relationship between COCO mAP and FPS on Tesla V100 of SOTA object detecters and PP-YOLO, which is faster and has better performance than YOLOv4, and reached mAP(0.5:0.95) as 45.2% on COCO test2019 dataset and 72.9 FPS on single Test V100. Please refer to [PP-YOLO](configs/ppyolo/README.md) for details.
<div align="center">
<img src="docs/images/ppyolo_map_fps.png" width=600 />
</div>
## Tutorials
### Get Started
- [Installation guide](docs/tutorials/INSTALL.md)
- [Quick start on small dataset](docs/tutorials/QUICK_STARTED.md)
- [Train/Evaluation/Inference](docs/tutorials/GETTING_STARTED.md)
- [How to train a custom dataset](docs/tutorials/Custom_DataSet.md)
- [FAQ](docs/FAQ.md)
### Advanced Tutorial
- [Guide to preprocess pipeline and dataset definition](docs/advanced_tutorials/READER.md)
- [Models technical](docs/advanced_tutorials/MODEL_TECHNICAL.md)
- [Transfer learning document](docs/advanced_tutorials/TRANSFER_LEARNING.md)
- [Parameter configuration](docs/advanced_tutorials/config_doc):
- [Introduction to the configuration workflow](docs/advanced_tutorials/config_doc/CONFIG.md)
- [Parameter configuration for RCNN model](docs/advanced_tutorials/config_doc/RCNN_PARAMS_DOC.md)
- [IPython Notebook demo](demo/mask_rcnn_demo.ipynb)
- [Model compression](slim)
- [Model compression benchmark](slim)
- [Quantization](slim/quantization)
- [Model pruning](slim/prune)
- [Model distillation](slim/distillation)
- [Neural Architecture Search](slim/nas)
- [Deployment](deploy)
- [Export model for inference](docs/advanced_tutorials/deploy/EXPORT_MODEL.md)
- [Python inference](deploy/python)
- [C++ inference](deploy/cpp)
- [Inference benchmark](docs/advanced_tutorials/deploy/BENCHMARK_INFER_cn.md)
## Model Zoo
- Pretrained models are available in the [PaddleDetection model zoo](docs/MODEL_ZOO.md).
- [Mobile models](configs/mobile/README.md)
- [Anchor free models](configs/anchor_free/README.md)
- [Face detection models](docs/featured_model/FACE_DETECTION_en.md)
- [Pretrained models for pedestrian detection](docs/featured_model/CONTRIB.md)
- [Pretrained models for vehicle detection](docs/featured_model/CONTRIB.md)
- [YOLOv3 enhanced model](docs/featured_model/YOLOv3_ENHANCEMENT.md): Compared to MAP of 33.0% in paper, enhanced YOLOv3 reaches the MAP of 43.6%, and inference speed is improved as well
- [PP-YOLO](configs/ppyolo/README.md): PP-YOLO reeached mAP as 45.3% on COCO dataset,and 72.9 FPS on single Tesla V100
- [Objects365 2019 Challenge champion model](docs/featured_model/champion_model/CACascadeRCNN.md)
- [Best single model of Open Images 2019-Object Detction](docs/featured_model/champion_model/OIDV5_BASELINE_MODEL.md)
- [Practical Server-side detection method](configs/rcnn_enhance/README_en.md): Inference speed on single V100 GPU can reach 20FPS when COCO mAP is 47.8%.
- [Large-scale practical object detection models](docs/featured_model/LARGE_SCALE_DET_MODEL_en.md): Large-scale practical server-side detection pretrained models with 676 categories are provided for most application scenarios, which can be used not only for direct inference but also finetuning on other datasets.
## License
PaddleDetection is released under the [Apache 2.0 license](LICENSE).
## Updates
v0.4.0 was released at `05/2020`, add PP-YOLO, TTFNet, HTC, ACFPN, etc. And add BlaceFace face landmark detection model, add a series of optimized SSDLite models on mobile side, add data augmentations GridMask and RandomErasing, add Matrix NMS and EMA training, and improved ease of use, fix many known bugs, etc.
Please refer to [版本更新文档](docs/CHANGELOG.md) for details.
## Contributing
Contributions are highly welcomed and we would really appreciate your feedback!!
...@@ -40,11 +40,6 @@ YOLOv3Head: ...@@ -40,11 +40,6 @@ YOLOv3Head:
score_threshold: 0.01 score_threshold: 0.01
YOLOv3Loss: YOLOv3Loss:
# batch_size here is only used for fine grained loss, not used
# for training batch_size setting, training batch_size setting
# is in configs/yolov3_reader.yml TrainReader.batch_size, batch
# size here should be set as same value as TrainReader.batch_size
batch_size: 8
ignore_thresh: 0.7 ignore_thresh: 0.7
label_smooth: false label_smooth: false
......
...@@ -44,7 +44,6 @@ YOLOv3Head: ...@@ -44,7 +44,6 @@ YOLOv3Head:
drop_block: true drop_block: true
YOLOv3Loss: YOLOv3Loss:
batch_size: 8
ignore_thresh: 0.7 ignore_thresh: 0.7
label_smooth: false label_smooth: false
use_fine_grained_loss: true use_fine_grained_loss: true
......
...@@ -42,11 +42,6 @@ YOLOv3Head: ...@@ -42,11 +42,6 @@ YOLOv3Head:
drop_block: true drop_block: true
YOLOv3Loss: YOLOv3Loss:
# batch_size here is only used for fine grained loss, not used
# for training batch_size setting, training batch_size setting
# is in configs/yolov3_reader.yml TrainReader.batch_size, batch
# size here should be set as same value as TrainReader.batch_size
batch_size: 8
ignore_thresh: 0.7 ignore_thresh: 0.7
label_smooth: false label_smooth: false
use_fine_grained_loss: true use_fine_grained_loss: true
......
...@@ -43,11 +43,6 @@ YOLOv3Head: ...@@ -43,11 +43,6 @@ YOLOv3Head:
keep_prob: 0.94 keep_prob: 0.94
YOLOv3Loss: YOLOv3Loss:
# batch_size here is only used for fine grained loss, not used
# for training batch_size setting, training batch_size setting
# is in configs/yolov3_reader.yml TrainReader.batch_size, batch
# size here should be set as same value as TrainReader.batch_size
batch_size: 8
ignore_thresh: 0.7 ignore_thresh: 0.7
label_smooth: false label_smooth: false
use_fine_grained_loss: true use_fine_grained_loss: true
......
...@@ -41,11 +41,6 @@ YOLOv3Head: ...@@ -41,11 +41,6 @@ YOLOv3Head:
score_threshold: 0.01 score_threshold: 0.01
YOLOv3Loss: YOLOv3Loss:
# batch_size here is only used for fine grained loss, not used
# for training batch_size setting, training batch_size setting
# is in configs/yolov3_reader.yml TrainReader.batch_size, batch
# size here should be set as same value as TrainReader.batch_size
batch_size: 8
ignore_thresh: 0.7 ignore_thresh: 0.7
label_smooth: false label_smooth: false
use_fine_grained_loss: true use_fine_grained_loss: true
......
...@@ -82,6 +82,12 @@ Training PP-YOLO on 8 GPUs with following command(all commands should be run und ...@@ -82,6 +82,12 @@ Training PP-YOLO on 8 GPUs with following command(all commands should be run und
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python tools/train.py -c configs/ppyolo/ppyolo.yml --eval CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python tools/train.py -c configs/ppyolo/ppyolo.yml --eval
``` ```
optional: Run `tools/anchor_cluster.py` to get anchors suitable for your dataset, and modify the anchor setting in `configs/ppyolo/ppyolo.yml`.
``` bash
python tools/anchor_cluster.py -c configs/ppyolo/ppyolo.yml -n 9 -s 608 -m v2 -i 1000
```
### 2. Evaluation ### 2. Evaluation
Evaluating PP-YOLO on COCO val2017 dataset in single GPU with following commands: Evaluating PP-YOLO on COCO val2017 dataset in single GPU with following commands:
......
...@@ -82,6 +82,10 @@ PP-YOLO从如下方面优化和提升YOLOv3模型的精度和速度: ...@@ -82,6 +82,10 @@ PP-YOLO从如下方面优化和提升YOLOv3模型的精度和速度:
```bash ```bash
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python tools/train.py -c configs/ppyolo/ppyolo.yml --eval CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python tools/train.py -c configs/ppyolo/ppyolo.yml --eval
``` ```
可选:在训练之前使用`tools/anchor_cluster.py`得到适用于你的数据集的anchor,并修改`configs/ppyolo/ppyolo.yml`中的anchor设置
```bash
python tools/anchor_cluster.py -c configs/ppyolo/ppyolo.yml -n 9 -s 608 -m v2 -i 1000
```
### 2. 评估 ### 2. 评估
......
...@@ -44,7 +44,6 @@ YOLOv3Head: ...@@ -44,7 +44,6 @@ YOLOv3Head:
drop_block: true drop_block: true
YOLOv3Loss: YOLOv3Loss:
batch_size: 24
ignore_thresh: 0.7 ignore_thresh: 0.7
scale_x_y: 1.05 scale_x_y: 1.05
label_smooth: false label_smooth: false
......
...@@ -44,7 +44,6 @@ YOLOv3Head: ...@@ -44,7 +44,6 @@ YOLOv3Head:
drop_block: true drop_block: true
YOLOv3Loss: YOLOv3Loss:
batch_size: 24
ignore_thresh: 0.7 ignore_thresh: 0.7
scale_x_y: 1.05 scale_x_y: 1.05
label_smooth: false label_smooth: false
......
...@@ -39,7 +39,6 @@ YOLOv3Head: ...@@ -39,7 +39,6 @@ YOLOv3Head:
drop_block: true drop_block: true
YOLOv3Loss: YOLOv3Loss:
batch_size: 32
ignore_thresh: 0.7 ignore_thresh: 0.7
scale_x_y: 1.05 scale_x_y: 1.05
label_smooth: false label_smooth: false
......
...@@ -47,7 +47,6 @@ YOLOv3Head: ...@@ -47,7 +47,6 @@ YOLOv3Head:
drop_block: true drop_block: true
YOLOv3Loss: YOLOv3Loss:
batch_size: 24
ignore_thresh: 0.7 ignore_thresh: 0.7
scale_x_y: 1.05 scale_x_y: 1.05
label_smooth: false label_smooth: false
......
...@@ -35,11 +35,6 @@ YOLOv3Head: ...@@ -35,11 +35,6 @@ YOLOv3Head:
score_threshold: 0.01 score_threshold: 0.01
YOLOv3Loss: YOLOv3Loss:
# batch_size here is only used for fine grained loss, not used
# for training batch_size setting, training batch_size setting
# is in configs/yolov3_reader.yml TrainReader.batch_size, batch
# size here should be set as same value as TrainReader.batch_size
batch_size: 8
ignore_thresh: 0.7 ignore_thresh: 0.7
label_smooth: true label_smooth: true
......
...@@ -36,11 +36,6 @@ YOLOv3Head: ...@@ -36,11 +36,6 @@ YOLOv3Head:
score_threshold: 0.01 score_threshold: 0.01
YOLOv3Loss: YOLOv3Loss:
# batch_size here is only used for fine grained loss, not used
# for training batch_size setting, training batch_size setting
# is in configs/yolov3_reader.yml TrainReader.batch_size, batch
# size here should be set as same value as TrainReader.batch_size
batch_size: 8
ignore_thresh: 0.7 ignore_thresh: 0.7
label_smooth: false label_smooth: false
......
...@@ -36,7 +36,6 @@ YOLOv3Head: ...@@ -36,7 +36,6 @@ YOLOv3Head:
score_threshold: 0.01 score_threshold: 0.01
YOLOv3Loss: YOLOv3Loss:
batch_size: 8
ignore_thresh: 0.7 ignore_thresh: 0.7
label_smooth: false label_smooth: false
iou_loss: DiouLossYolo iou_loss: DiouLossYolo
......
...@@ -36,11 +36,6 @@ YOLOv3Head: ...@@ -36,11 +36,6 @@ YOLOv3Head:
score_threshold: 0.01 score_threshold: 0.01
YOLOv3Loss: YOLOv3Loss:
# batch_size here is only used for fine grained loss, not used
# for training batch_size setting, training batch_size setting
# is in configs/yolov3_reader.yml TrainReader.batch_size, batch
# size here should be set as same value as TrainReader.batch_size
batch_size: 8
ignore_thresh: 0.7 ignore_thresh: 0.7
label_smooth: true label_smooth: true
......
...@@ -38,11 +38,6 @@ YOLOv3Head: ...@@ -38,11 +38,6 @@ YOLOv3Head:
score_threshold: 0.01 score_threshold: 0.01
YOLOv3Loss: YOLOv3Loss:
# batch_size here is only used for fine grained loss, not used
# for training batch_size setting, training batch_size setting
# is in configs/yolov3_reader.yml TrainReader.batch_size, batch
# size here should be set as same value as TrainReader.batch_size
batch_size: 8
ignore_thresh: 0.7 ignore_thresh: 0.7
label_smooth: true label_smooth: true
......
#####################################基础配置#####################################
# 检测算法使用YOLOv3,backbone使用MobileNet_v1
# 检测模型的名称
architecture: YOLOv3
# 根据硬件选择是否使用GPU
use_gpu: true
# ### max_iters为最大迭代次数,而一个iter会运行batch_size * device_num张图片。batch_size在下面 TrainReader.batch_size设置。
max_iters: 1200
# log平滑参数,平滑窗口大小,会从取历史窗口中取log_smooth_window大小的loss求平均值
log_smooth_window: 20
# 模型保存文件夹
save_dir: output
# 每隔多少迭代保存模型
snapshot_iter: 200
# ### mAP 评估方式,mAP评估方式可以选择COCO和VOC或WIDERFACE,其中VOC有11point和integral两种评估方法
# VOC数据格式只能使用VOC mAP评估方法
metric: VOC
map_type: integral
# ### pretrain_weights 可以是imagenet的预训练好的分类模型权重,也可以是在VOC或COCO数据集上的预训练的检测模型权重
# 模型配置文件和权重文件可参考[模型库](https://github.com/PaddlePaddle/PaddleDetection/blob/release/0.4/docs/MODEL_ZOO.md)
pretrain_weights: https://paddlemodels.bj.bcebos.com/object_detection/yolov3_mobilenet_v1.tar
# 模型保存文件夹,如果开启了--eval,会在这个文件夹下保存best_model
weights: output/yolov3_mobilenet_v1_roadsign_coco_template/
# ### 根据用户数据设置类别数,注意这里不含背景类
num_classes: 4
# finetune时忽略的参数,按照正则化匹配,匹配上的参数会被忽略掉
finetune_exclude_pretrained_params: ['yolo_output']
# use_fine_grained_loss
use_fine_grained_loss: false
# 检测模型的结构
YOLOv3:
# 默认是 MobileNetv1
backbone: MobileNet
yolo_head: YOLOv3Head
# 检测模型的backbone
MobileNet:
norm_decay: 0.
conv_group_scale: 1
with_extra_blocks: false
# 检测模型的Head
YOLOv3Head:
# anchor_masks
anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
# 3x3 anchors
anchors: [[10, 13], [16, 30], [33, 23],
[30, 61], [62, 45], [59, 119],
[116, 90], [156, 198], [373, 326]]
# yolo_loss
yolo_loss: YOLOv3Loss
# nms 类型参数,可以设置为[MultiClassNMS, MultiClassSoftNMS, MatrixNMS], 默认使用 MultiClassNMS
nms:
# background_label,背景标签(类别)的索引,如果设置为 0 ,则忽略背景标签(类别)。如果设置为 -1 ,则考虑所有类别。默认值:0
background_label: -1
# NMS步骤后每个图像要保留的总bbox数。 -1表示在NMS步骤之后保留所有bbox。
keep_top_k: 100
# 在NMS中用于剔除检测框IOU的阈值,默认值:0.3 。
nms_threshold: 0.45
# 基于 score_threshold 的过滤检测后,根据置信度保留的最大检测次数。
nms_top_k: 1000
# 是否归一化,默认值:True 。
normalized: false
# 过滤掉低置信度分数的边界框的阈值。
score_threshold: 0.01
YOLOv3Loss:
# 这里的batch_size与训练中的batch_size(即TrainReader.batch_size)不同.
# 仅且当use_fine_grained_loss=true时,计算Loss时使用,且必须要与TrainReader.batch_size设置成一样
batch_size: 8
# 忽略样本的阈值 ignore_thresh
ignore_thresh: 0.7
# 是否使用label_smooth
label_smooth: true
LearningRate:
# ### 学习率设置 参考 https://github.com/PaddlePaddle/PaddleDetection/blob/release/0.4/docs/FAQ.md#faq%E5%B8%B8%E8%A7%81%E9%97%AE%E9%A2%98
# base_lr
base_lr: 0.0001
# 学习率调整策略
# 具体实现参考[API](fluid.layers.piecewise_decay)
schedulers:
# 学习率调整策略
- !PiecewiseDecay
gamma: 0.1
milestones:
# ### 参考 https://github.com/PaddlePaddle/PaddleDetection/blob/release/0.4/docs/FAQ.md#faq%E5%B8%B8%E8%A7%81%E9%97%AE%E9%A2%98
# ### 8/12 11/12
- 800
- 1100
# 在训练开始时,调低学习率为base_lr * start_factor,然后逐步增长到base_lr,这个过程叫学习率热身,按照以下公式更新学习率
# linear_step = end_lr - start_lr
# lr = start_lr + linear_step * (global_step / warmup_steps)
# 具体实现参考[API](fluid.layers.linear_lr_warmup)
- !LinearWarmup
start_factor: 0.3333333333333333
steps: 100
OptimizerBuilder:
# 默认使用SGD+Momentum进行训练
# 具体实现参考[API](fluid.optimizer)
optimizer:
momentum: 0.9
type: Momentum
# 默认使用SGD+Momentum进行训练
# 具体实现参考[API](fluid.optimizer)
regularizer:
factor: 0.0005
type: L2
#####################################数据配置#####################################
# 模型训练集设置参考
# 训练、验证、测试使用的数据配置主要区别在数据路径、模型输入、数据增强参数设置
# 如果使用 yolov3_reader.yml,下面的参数设置优先级高,会覆盖yolov3_reader.yml中的参数设置。
# _READER_: 'yolov3_reader.yml'
TrainReader:
# 训练过程中模型的输入设置
# 包括图片,图片长宽高等基本信息,图片id,标记的目标框,类别等信息
inputs_def:
fields: ['image', 'gt_bbox', 'gt_class', 'gt_score']
# num_max_boxes,每个样本的groud truth的最多保留个数,若不够用0填充。
num_max_boxes: 50
# 训练数据集路径
dataset:
# 指定数据集格式
!VOCDataSet
#dataset/xxx/
#├── annotations
#│ ├── xxx1.xml
#│ ├── xxx2.xml
#│ ├── xxx3.xml
#│ | ...
#├── images
#│ ├── xxx1.png
#│ ├── xxx2.png
#│ ├── xxx3.png
#│ | ...
#├── label_list.txt (用户自定义必须提供,且文件名称必须是label_list.txt。当使用VOC数据且use_default_label=true时,可不提供 )
#├── train.txt (训练数据集文件列表, ./images/xxx1.png ./Annotations/xxx1.xml)
#└── valid.txt (测试数据集文件列表)
# 图片文件夹相对路径,路径是相对于dataset_dir,图像路径= dataset_dir + image_dir + image_name
dataset_dir: dataset/roadsign_voc
# 标记文件名
anno_path: train.txt
# 是否包含背景类,若with_background=true,num_classes需要+1
# YOLO 系列with_background必须是false,FasterRCNN系列是true ###
with_background: false
sample_transforms:
# 读取Image图像为numpy数组
# 可以选择将图片从BGR转到RGB,可以选择对一个batch中的图片做mixup增强
- !DecodeImage
to_rgb: True
with_mixup: True
# MixupImage
- !MixupImage
alpha: 1.5
beta: 1.5
# ColorDistort
- !ColorDistort {}
# RandomExpand
- !RandomExpand
fill_value: [123.675, 116.28, 103.53]
# 随机扩充比例,默认值是4.0
ratio: 1.5
- !RandomCrop {}
- !RandomFlipImage
is_normalized: false
# 归一化坐标
- !NormalizeBox {}
# 如果 bboxes 数量小于 num_max_boxes,填充值为0的 box
- !PadBox
num_max_boxes: 50
# 坐标格式转化,从XYXY转成XYWH格式
- !BboxXYXY2XYWH {}
# 以下是对一个batch中的所有图片同时做的数据处理
batch_transforms:
# 多尺度训练时,从list中随机选择一个尺寸,对一个batch数据同时同时resize
- !RandomShape
sizes: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608]
random_inter: True
# NormalizeImage
- !NormalizeImage
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
is_scale: True
is_channel_first: false
- !Permute
to_bgr: false
channel_first: True
# Gt2YoloTarget is only used when use_fine_grained_loss set as true,
# this operator will be deleted automatically if use_fine_grained_loss
# is set as false
- !Gt2YoloTarget
anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
anchors: [[10, 13], [16, 30], [33, 23],
[30, 61], [62, 45], [59, 119],
[116, 90], [156, 198], [373, 326]]
downsample_ratios: [32, 16, 8]
# 1个GPU的batch size,默认为1。需要注意:每个iter迭代会运行batch_size * device_num张图片
batch_size: 8
# 是否shuffle
shuffle: true
# mixup,-1表示不做Mixup数据增强。注意,这里是epoch为单位
mixup_epoch: 250
# 注意,在某些情况下,drop_last=false时训练过程中可能会出错,建议训练时都设置为true
drop_last: true
# 若选用多进程,设置使用多进程/线程的数目
# 开启多进程后,占用内存会成倍增加,根据内存设置###
worker_num: 4
# 共享内存bufsize。注意,缓存是以batch为单位,缓存的样本数据总量为batch_size * bufsize,所以请注意不要设置太大,请根据您的硬件设置。
bufsize: 2
# 是否使用多进程
use_process: true
EvalReader:
# 评估过程中模型的输入设置
# 包括图片,图片长宽高等基本信息,图片id,标记的目标框,类别等信息
inputs_def:
fields: ['image', 'im_size', 'im_id', 'gt_bbox', 'gt_class', 'is_difficult']
# num_max_boxes,每个样本的groud truth的最多保留个数,若不够用0填充。
num_max_boxes: 50
# 数据集路径
dataset:
!VOCDataSet
# 图片文件夹相对路径,路径是相对于dataset_dir,图像路径= dataset_dir + image_dir + image_name
dataset_dir: dataset/roadsign_voc
# 评估文件列表
anno_path: valid.txt
# 是否包含背景类,若with_background=true,num_classes需要+1
# YOLO 系列with_background必须是false,FasterRCNN系列是true ###
with_background: false
sample_transforms:
# 读取Image图像为numpy数组
# 可以选择将图片从BGR转到RGB,可以选择对一个batch中的图片做mixup增强
- !DecodeImage
to_rgb: True
# ResizeImage
- !ResizeImage
target_size: 608
interp: 2
# NormalizeImage
- !NormalizeImage
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
is_scale: True
is_channel_first: false
# 如果 bboxes 数量小于 num_max_boxes,填充值为0的 box
- !PadBox
num_max_boxes: 50
- !Permute
to_bgr: false
channel_first: True
# 1个GPU的batch size,默认为1。需要注意:每个iter迭代会运行batch_size * device_num张图片
batch_size: 8
# drop_empty
drop_empty: false
# 若选用多进程,设置使用多进程/线程的数目
# 开启多进程后,占用内存会成倍增加,根据内存设置###
worker_num: 4
# 共享内存bufsize。注意,缓存是以batch为单位,缓存的样本数据总量为batch_size * bufsize,所以请注意不要设置太大,请根据您的硬件设置。
bufsize: 2
TestReader:
# 预测过程中模型的输入设置
# 包括图片,图片长宽高等基本信息,图片id,标记的目标框,类别等信息
inputs_def:
# 预测图像输入尺寸
image_shape: [3, 608, 608]
fields: ['image', 'im_size', 'im_id']
# 数据集路径
dataset:
# 预测数据
!ImageFolder
# anno_path
anno_path: dataset/roadsign_voc/label_list.txt
# 是否包含背景类,若with_background=true,num_classes需要+1
# YOLO 系列with_background必须是false,FasterRCNN系列是true ###
with_background: false
sample_transforms:
- !DecodeImage
to_rgb: True
# ResizeImage
- !ResizeImage
# 注意与上面图像尺寸保持一致
target_size: 608
interp: 2
# NormalizeImage
- !NormalizeImage
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
is_scale: True
is_channel_first: false
# Permute
- !Permute
to_bgr: false
channel_first: True
# 1个GPU的batch size,默认为1
batch_size: 1
...@@ -37,11 +37,6 @@ YOLOv3Head: ...@@ -37,11 +37,6 @@ YOLOv3Head:
score_threshold: 0.01 score_threshold: 0.01
YOLOv3Loss: YOLOv3Loss:
# batch_size here is only used for fine grained loss, not used
# for training batch_size setting, training batch_size setting
# is in configs/yolov3_reader.yml TrainReader.batch_size, batch
# size here should be set as same value as TrainReader.batch_size
batch_size: 8
ignore_thresh: 0.7 ignore_thresh: 0.7
label_smooth: false label_smooth: false
......
...@@ -38,11 +38,6 @@ YOLOv3Head: ...@@ -38,11 +38,6 @@ YOLOv3Head:
score_threshold: 0.01 score_threshold: 0.01
YOLOv3Loss: YOLOv3Loss:
# batch_size here is only used for fine grained loss, not used
# for training batch_size setting, training batch_size setting
# is in configs/yolov3_reader.yml TrainReader.batch_size, batch
# size here should be set as same value as TrainReader.batch_size
batch_size: 8
ignore_thresh: 0.7 ignore_thresh: 0.7
label_smooth: false label_smooth: false
......
...@@ -38,11 +38,6 @@ YOLOv3Head: ...@@ -38,11 +38,6 @@ YOLOv3Head:
score_threshold: 0.01 score_threshold: 0.01
YOLOv3Loss: YOLOv3Loss:
# batch_size here is only used for fine grained loss, not used
# for training batch_size setting, training batch_size setting
# is in configs/yolov3_reader.yml TrainReader.batch_size, batch
# size here should be set as same value as TrainReader.batch_size
batch_size: 8
ignore_thresh: 0.7 ignore_thresh: 0.7
label_smooth: true label_smooth: true
......
...@@ -39,11 +39,6 @@ YOLOv3Head: ...@@ -39,11 +39,6 @@ YOLOv3Head:
score_threshold: 0.01 score_threshold: 0.01
YOLOv3Loss: YOLOv3Loss:
# batch_size here is only used for fine grained loss, not used
# for training batch_size setting, training batch_size setting
# is in configs/yolov3_reader.yml TrainReader.batch_size, batch
# size here should be set as same value as TrainReader.batch_size
batch_size: 8
ignore_thresh: 0.7 ignore_thresh: 0.7
label_smooth: false label_smooth: false
......
...@@ -21,6 +21,21 @@ ...@@ -21,6 +21,21 @@
- label_smooth - label_smooth
- grid_sensitive - grid_sensitive
目前支持YOLO系列的Anchor聚类算法
``` bash
python tools/anchor_cluster.py -c ${config} -m ${method} -s ${size}
```
主要参数配置参考下表
| 参数 | 用途 | 默认值 | 备注 |
|:------:|:------:|:------:|:------:|
| -c/--config | 模型的配置文件 | 无默认值 | 必须指定 |
| -n/--n | 聚类的簇数 | 9 | Anchor的数目 |
| -s/--size | 图片的输入尺寸 | None | 若指定,则使用指定的尺寸,如果不指定, 则尝试从配置文件中读取图片尺寸 |
| -m/--method | 使用的Anchor聚类方法 | v2 | 目前只支持yolov2/v5的聚类算法 |
| -i/--iters | kmeans聚类算法的迭代次数 | 1000 | kmeans算法收敛或者达到迭代次数后终止 |
| -gi/--gen_iters | 遗传算法的迭代次数 | 1000 | 该参数只用于yolov5的Anchor聚类算法 |
| -t/--thresh| Anchor尺度的阈值 | 0.25 | 该参数只用于yolov5的Anchor聚类算法 |
## 模型库 ## 模型库
下表中展示了当前支持的网络结构。 下表中展示了当前支持的网络结构。
......
...@@ -35,11 +35,6 @@ YOLOv4Head: ...@@ -35,11 +35,6 @@ YOLOv4Head:
scale_x_y: [1.2, 1.1, 1.05] scale_x_y: [1.2, 1.1, 1.05]
YOLOv3Loss: YOLOv3Loss:
# batch_size here is only used for fine grained loss, not used
# for training batch_size setting, training batch_size setting
# is in configs/yolov3_reader.yml TrainReader.batch_size, batch
# size here should be set as same value as TrainReader.batch_size
batch_size: 4
ignore_thresh: 0.7 ignore_thresh: 0.7
label_smooth: true label_smooth: true
downsample: [8,16,32] downsample: [8,16,32]
......
...@@ -34,11 +34,6 @@ YOLOv4Head: ...@@ -34,11 +34,6 @@ YOLOv4Head:
scale_x_y: [1.2, 1.1, 1.05] scale_x_y: [1.2, 1.1, 1.05]
YOLOv3Loss: YOLOv3Loss:
# batch_size here is only used for fine grained loss, not used
# for training batch_size setting, training batch_size setting
# is in configs/yolov3_reader.yml TrainReader.batch_size, batch
# size here should be set as same value as TrainReader.batch_size
batch_size: 8
ignore_thresh: 0.7 ignore_thresh: 0.7
label_smooth: true label_smooth: true
downsample: [8,16,32] downsample: [8,16,32]
......
...@@ -34,11 +34,6 @@ YOLOv4Head: ...@@ -34,11 +34,6 @@ YOLOv4Head:
scale_x_y: [1.2, 1.1, 1.05] scale_x_y: [1.2, 1.1, 1.05]
YOLOv3Loss: YOLOv3Loss:
# batch_size here is only used for fine grained loss, not used
# for training batch_size setting, training batch_size setting
# is in configs/yolov3_reader.yml TrainReader.batch_size, batch
# size here should be set as same value as TrainReader.batch_size
batch_size: 4
ignore_thresh: 0.7 ignore_thresh: 0.7
label_smooth: true label_smooth: true
downsample: [8,16,32] downsample: [8,16,32]
......
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import sys
import os.path as osp
import logging
# add python path of PadleDetection to sys.path
parent_path = osp.abspath(osp.join(__file__, *(['..'] * 3)))
if parent_path not in sys.path:
sys.path.append(parent_path)
from ppdet.utils.download import download_dataset
logging.basicConfig(level=logging.INFO)
download_path = osp.split(osp.realpath(sys.argv[0]))[0]
download_dataset(download_path, 'roadsign_voc')
speedlimit
crosswalk
trafficlight
stop
\ No newline at end of file
# PaddleDetection 预测部署 # PaddleDetection 预测部署
`PaddleDetection`目前支持使用`Python``C++`部署在`Windows``Linux` 上运行。 `PaddleDetection`目前支持:
- 使用`Python``C++`部署在`Windows``Linux` 上运行
- [在线服务化部署](./serving/README.md)
- [移动端部署](https://github.com/PaddlePaddle/Paddle-Lite-Demo)
## 模型导出 ## 模型导出
训练得到一个满足要求的模型后,如果想要将该模型接入到C++服务器端预测库或移动端预测库,需要通过`tools/export_model.py`导出该模型。 训练得到一个满足要求的模型后,如果想要将该模型接入到C++服务器端预测库或移动端预测库,需要通过`tools/export_model.py`导出该模型。
...@@ -20,4 +23,5 @@ yolov3_darknet # 模型目录 ...@@ -20,4 +23,5 @@ yolov3_darknet # 模型目录
## 预测部署 ## 预测部署
- [1. Python预测(支持 Linux 和 Windows)](https://github.com/PaddlePaddle/PaddleDetection/blob/master/deploy/python) - [1. Python预测(支持 Linux 和 Windows)](https://github.com/PaddlePaddle/PaddleDetection/blob/master/deploy/python)
- [2. C++预测(支持 Linux 和 Windows)](https://github.com/PaddlePaddle/PaddleDetection/blob/master/deploy/cpp) - [2. C++预测(支持 Linux 和 Windows)](https://github.com/PaddlePaddle/PaddleDetection/blob/master/deploy/cpp)
- [3. 移动端部署参考Paddle-Lite文档](https://paddle-lite.readthedocs.io/zh/latest/) - [3. 在线服务化部署](./serving/README.md)
- [4. 移动端部署](https://github.com/PaddlePaddle/Paddle-Lite-Demo)
# 服务端预测部署
`PaddleDetection`训练出来的模型可以使用[Serving](https://github.com/PaddlePaddle/Serving) 部署在服务端。
本教程以在路标数据集[roadsign_voc](https://paddlemodels.bj.bcebos.com/object_detection/roadsign_voc.tar) 使用`configs/yolov3_mobilenet_v1_roadsign.yml`算法训练的模型进行部署。
预训练模型权重文件为[yolov3_mobilenet_v1_roadsign.pdparams](https://paddlemodels.bj.bcebos.com/object_detection/yolov3_mobilenet_v1_roadsign.pdparams)
## 1. 首先验证模型
```
python tools/infer.py -c configs/yolov3_mobilenet_v1_roadsign.yml -o use_gpu=true weights=https://paddlemodels.bj.bcebos.com/object_detection/yolov3_mobilenet_v1_roadsign.pdparams --infer_img=demo/road554.png
```
## 2. 安装 paddle serving
```
# 安装 paddle-serving-client
pip install paddle-serving-client -i https://mirror.baidu.com/pypi/simple
# 安装 paddle-serving-server
pip install paddle-serving-server -i https://mirror.baidu.com/pypi/simple
# 安装 paddle-serving-server-gpu
pip install paddle-serving-server-gpu -i https://mirror.baidu.com/pypi/simple
```
## 3. 导出模型
PaddleDetection在训练过程包括网络的前向和优化器相关参数,而在部署过程中,我们只需要前向参数,具体参考:[导出模型](https://github.com/PaddlePaddle/PaddleDetection/blob/master/docs/advanced_tutorials/deploy/EXPORT_MODEL.md)
```
python tools/export_serving_model.py -c configs/yolov3_mobilenet_v1_roadsign.yml -o use_gpu=true weights=https://paddlemodels.bj.bcebos.com/object_detection/yolov3_mobilenet_v1_roadsign.pdparams --output_dir=./inference_model
```
以上命令会在./inference_model文件夹下生成一个`yolov3_mobilenet_v1_roadsign`文件夹:
```
inference_model
│ ├── yolov3_mobilenet_v1_roadsign
│ │ ├── infer_cfg.yml
│ │ ├── serving_client
│ │ │ ├── serving_client_conf.prototxt
│ │ │ ├── serving_client_conf.stream.prototxt
│ │ ├── serving_server
│ │ │ ├── conv1_bn_mean
│ │ │ ├── conv1_bn_offset
│ │ │ ├── conv1_bn_scale
│ │ │ ├── ...
```
`serving_client`文件夹下`serving_client_conf.prototxt`详细说明了模型输入输出信息
`serving_client_conf.prototxt`文件内容为:
```
feed_var {
name: "image"
alias_name: "image"
is_lod_tensor: false
feed_type: 1
shape: 3
shape: 608
shape: 608
}
feed_var {
name: "im_size"
alias_name: "im_size"
is_lod_tensor: false
feed_type: 2
shape: 2
}
fetch_var {
name: "multiclass_nms_0.tmp_0"
alias_name: "multiclass_nms_0.tmp_0"
is_lod_tensor: true
fetch_type: 1
shape: -1
}
```
## 4. 启动PaddleServing服务
```
cd inference_model/yolov3_mobilenet_v1_roadsign/
# GPU
python -m paddle_serving_server_gpu.serve --model serving_server --port 9393 --gpu_ids 0
# CPU
python -m paddle_serving_server.serve --model serving_server --port 9393
```
## 5. 测试部署的服务
准备`label_list.txt`文件
```
# 进入到导出模型文件夹
cd inference_model/yolov3_mobilenet_v1_roadsign/
# 将数据集对应的label_list.txt文件拷贝到当前文件夹下
cp ../../dataset/roadsign_voc/label_list.txt .
```
设置`prototxt`文件路径为`serving_client/serving_client_conf.prototxt`
设置`fetch``fetch=["multiclass_nms_0.tmp_0"])`
测试
```
# 进入目录
cd inference_model/yolov3_mobilenet_v1_roadsign/
# 测试代码 test_client.py 会自动创建output文件夹,并在output下生成`bbox.json`和`road554.png`两个文件
python ../../deploy/serving/test_client.py ../../demo/road554.png
```
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import sys
import numpy as np
from paddle_serving_client import Client
from paddle_serving_app.reader import *
import cv2
preprocess = Sequential([
File2Image(), BGR2RGB(), Resize(
(608, 608), interpolation=cv2.INTER_LINEAR), Div(255.0), Transpose(
(2, 0, 1))
])
postprocess = RCNNPostprocess("label_list.txt", "output", [608, 608])
client = Client()
client.load_client_config("serving_client/serving_client_conf.prototxt")
client.connect(['127.0.0.1:9393'])
im = preprocess(sys.argv[1])
fetch_map = client.predict(
feed={
"image": im,
"im_size": np.array(list(im.shape[1:])),
},
fetch=["multiclass_nms_0.tmp_0"])
fetch_map["image"] = sys.argv[1]
postprocess(fetch_map)
...@@ -18,6 +18,10 @@ ...@@ -18,6 +18,10 @@
batch size可以达到每GPU 4 (Tesla V100 16GB)。 batch size可以达到每GPU 4 (Tesla V100 16GB)。
**Q:** 哪些参数会影响内存使用量? </br>
**A:** 会影响内存使用量的参数有:`是否使用多进程use_process、 batch_size、reader中的bufsize、reader中的memsize、数据预处理中的RandomExpand ratio参数、以及图像本身大小`等。
**Q:** 如何修改数据预处理? </br> **Q:** 如何修改数据预处理? </br>
**A:** 可在配置文件中设置 `sample_transform`。注意需要在配置文件中加入**完整预处理** **A:** 可在配置文件中设置 `sample_transform`。注意需要在配置文件中加入**完整预处理**
例如RCNN模型中`DecodeImage`, `NormalizeImage` and `Permute` 例如RCNN模型中`DecodeImage`, `NormalizeImage` and `Permute`
......
```yml
#####################################基础配置#####################################
# 检测算法使用YOLOv3,backbone使用MobileNet_v1
# 检测模型的名称
architecture: YOLOv3
# 根据硬件选择是否使用GPU
use_gpu: true
# ### max_iters为最大迭代次数,而一个iter会运行batch_size * device_num张图片。batch_size在下面 TrainReader.batch_size设置。
max_iters: 1200
# log平滑参数,平滑窗口大小,会从取历史窗口中取log_smooth_window大小的loss求平均值
log_smooth_window: 20
# 模型保存文件夹
save_dir: output
# 每隔多少迭代保存模型
snapshot_iter: 200
# ### mAP 评估方式,mAP评估方式可以选择COCO和VOC或WIDERFACE,其中VOC有11point和integral两种评估方法
metric: COCO
# ### pretrain_weights 可以是imagenet的预训练好的分类模型权重,也可以是在VOC或COCO数据集上的预训练的检测模型权重
# 模型配置文件和权重文件可参考[模型库](https://github.com/PaddlePaddle/PaddleDetection/blob/release/0.4/docs/MODEL_ZOO.md)
pretrain_weights: https://paddlemodels.bj.bcebos.com/object_detection/yolov3_mobilenet_v1.tar
# 模型保存文件夹,如果开启了--eval,会在这个文件夹下保存best_model
weights: output/yolov3_mobilenet_v1_roadsign_coco_template/
# ### 根据用户数据设置类别数,注意这里不含背景类
num_classes: 4
# finetune时忽略的参数,按照正则化匹配,匹配上的参数会被忽略掉
finetune_exclude_pretrained_params: ['yolo_output']
# use_fine_grained_loss
use_fine_grained_loss: false
# 检测模型的结构
YOLOv3:
# 默认是 MobileNetv1
backbone: MobileNet
yolo_head: YOLOv3Head
# 检测模型的backbone
MobileNet:
norm_decay: 0.
conv_group_scale: 1
with_extra_blocks: false
# 检测模型的Head
YOLOv3Head:
# anchor_masks
anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
# 3x3 anchors
anchors: [[10, 13], [16, 30], [33, 23],
[30, 61], [62, 45], [59, 119],
[116, 90], [156, 198], [373, 326]]
# yolo_loss
yolo_loss: YOLOv3Loss
# nms 类型参数,可以设置为[MultiClassNMS, MultiClassSoftNMS, MatrixNMS], 默认使用 MultiClassNMS
nms:
# background_label,背景标签(类别)的索引,如果设置为 0 ,则忽略背景标签(类别)。如果设置为 -1 ,则考虑所有类别。默认值:0
background_label: -1
# NMS步骤后每个图像要保留的总bbox数。 -1表示在NMS步骤之后保留所有bbox。
keep_top_k: 100
# 在NMS中用于剔除检测框IOU的阈值,默认值:0.3 。
nms_threshold: 0.45
# 基于 score_threshold 的过滤检测后,根据置信度保留的最大检测次数。
nms_top_k: 1000
# 是否归一化,默认值:True 。
normalized: false
# 过滤掉低置信度分数的边界框的阈值。
score_threshold: 0.01
YOLOv3Loss:
# 这里的batch_size与训练中的batch_size(即TrainReader.batch_size)不同.
# 仅且当use_fine_grained_loss=true时,计算Loss时使用,且必须要与TrainReader.batch_size设置成一样
batch_size: 8
# 忽略样本的阈值 ignore_thresh
ignore_thresh: 0.7
# 是否使用label_smooth
label_smooth: true
LearningRate:
# ### 学习率设置 参考 https://github.com/PaddlePaddle/PaddleDetection/blob/release/0.4/docs/FAQ.md#faq%E5%B8%B8%E8%A7%81%E9%97%AE%E9%A2%98
# base_lr
base_lr: 0.0001
# 学习率调整策略
# 具体实现参考[API](fluid.layers.piecewise_decay)
schedulers:
# 学习率调整策略
- !PiecewiseDecay
gamma: 0.1
milestones:
# ### 参考 https://github.com/PaddlePaddle/PaddleDetection/blob/release/0.4/docs/FAQ.md#faq%E5%B8%B8%E8%A7%81%E9%97%AE%E9%A2%98
# ### 8/12 11/12
- 800
- 1100
# 在训练开始时,调低学习率为base_lr * start_factor,然后逐步增长到base_lr,这个过程叫学习率热身,按照以下公式更新学习率
# linear_step = end_lr - start_lr
# lr = start_lr + linear_step * (global_step / warmup_steps)
# 具体实现参考[API](fluid.layers.linear_lr_warmup)
- !LinearWarmup
start_factor: 0.3333333333333333
steps: 100
OptimizerBuilder:
# 默认使用SGD+Momentum进行训练
# 具体实现参考[API](fluid.optimizer)
optimizer:
momentum: 0.9
type: Momentum
# 默认使用SGD+Momentum进行训练
# 具体实现参考[API](fluid.optimizer)
regularizer:
factor: 0.0005
type: L2
#####################################数据配置#####################################
# 模型训练集设置参考
# 训练、验证、测试使用的数据配置主要区别在数据路径、模型输入、数据增强参数设置
# 如果使用 yolov3_reader.yml,下面的参数设置优先级高,会覆盖yolov3_reader.yml中的参数设置。
# _READER_: 'yolov3_reader.yml'
TrainReader:
# 训练过程中模型的输入设置
# 包括图片,图片长宽高等基本信息,图片id,标记的目标框,类别等信息
inputs_def:
fields: ['image', 'gt_bbox', 'gt_class', 'gt_score']
# num_max_boxes,每个样本的groud truth的最多保留个数,若不够用0填充。
num_max_boxes: 50
# 训练数据集路径
dataset:
# 指定数据集格式
!COCODataSet
# 图片文件夹相对路径,路径是相对于dataset_dir,图像路径= dataset_dir + image_dir + image_name
image_dir: train2017
# anno_path,路径是相对于dataset_dir
anno_path: annotations/instances_train2017.json
# 数据集相对路径,路径是相对于PaddleDetection
dataset_dir: dataset/coco
# 是否包含背景类,若with_background=true,num_classes需要+1
# YOLO 系列with_background必须是false,FasterRCNN系列是true ###
with_background: false
sample_transforms:
# 读取Image图像为numpy数组
# 可以选择将图片从BGR转到RGB,可以选择对一个batch中的图片做mixup增强
- !DecodeImage
to_rgb: True
with_mixup: True
# MixupImage
- !MixupImage
alpha: 1.5
beta: 1.5
# ColorDistort
- !ColorDistort {}
# RandomExpand
- !RandomExpand
fill_value: [123.675, 116.28, 103.53]
# 随机扩充比例,默认值是4.0
ratio: 1.5
- !RandomCrop {}
- !RandomFlipImage
is_normalized: false
# 归一化坐标
- !NormalizeBox {}
# 如果 bboxes 数量小于 num_max_boxes,填充值为0的 box
- !PadBox
num_max_boxes: 50
# 坐标格式转化,从XYXY转成XYWH格式
- !BboxXYXY2XYWH {}
# 以下是对一个batch中的所有图片同时做的数据处理
batch_transforms:
# 多尺度训练时,从list中随机选择一个尺寸,对一个batch数据同时同时resize
- !RandomShape
sizes: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608]
random_inter: True
# NormalizeImage
- !NormalizeImage
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
is_scale: True
is_channel_first: false
- !Permute
to_bgr: false
channel_first: True
# Gt2YoloTarget is only used when use_fine_grained_loss set as true,
# this operator will be deleted automatically if use_fine_grained_loss
# is set as false
- !Gt2YoloTarget
anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
anchors: [[10, 13], [16, 30], [33, 23],
[30, 61], [62, 45], [59, 119],
[116, 90], [156, 198], [373, 326]]
downsample_ratios: [32, 16, 8]
# 1个GPU的batch size,默认为1。需要注意:每个iter迭代会运行batch_size * device_num张图片
batch_size: 8
# 是否shuffle
shuffle: true
# mixup,-1表示不做Mixup数据增强。注意,这里是epoch为单位
mixup_epoch: 250
# 注意,在某些情况下,drop_last=false时训练过程中可能会出错,建议训练时都设置为true
drop_last: true
# 若选用多进程,设置使用多进程/线程的数目
# 开启多进程后,占用内存会成倍增加,根据内存设置###
worker_num: 8
# 共享内存bufsize。注意,缓存是以batch为单位,缓存的样本数据总量为batch_size * bufsize,所以请注意不要设置太大,请根据您的硬件设置。
bufsize: 16
# 是否使用多进程
use_process: true
EvalReader:
# 评估过程中模型的输入设置
# 包括图片,图片长宽高等基本信息,图片id,标记的目标框,类别等信息
inputs_def:
fields: ['image', 'im_size', 'im_id']
# num_max_boxes,每个样本的groud truth的最多保留个数,若不够用0填充。
num_max_boxes: 50
# 数据集路径
dataset:
!COCODataSet
# 图片文件夹相对路径,路径是相对于dataset_dir,图像路径= dataset_dir + image_dir + image_name
image_dir: val2017
# anno_path,路径是相对于dataset_dir
anno_path: annotations/instances_val2017.json
# 数据集相对路径,路径是相对于PaddleDetection
dataset_dir: dataset/coco
# 是否包含背景类,若with_background=true,num_classes需要+1
# YOLO 系列with_background必须是false,FasterRCNN系列是true ###
with_background: false
sample_transforms:
# 读取Image图像为numpy数组
# 可以选择将图片从BGR转到RGB,可以选择对一个batch中的图片做mixup增强
- !DecodeImage
to_rgb: True
# ResizeImage
- !ResizeImage
target_size: 608
interp: 2
# NormalizeImage
- !NormalizeImage
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
is_scale: True
is_channel_first: false
# 如果 bboxes 数量小于 num_max_boxes,填充值为0的 box
- !PadBox
num_max_boxes: 50
- !Permute
to_bgr: false
channel_first: True
# 1个GPU的batch size,默认为1。需要注意:每个iter迭代会运行batch_size * device_num张图片
batch_size: 8
# drop_empty
drop_empty: false
# 若选用多进程,设置使用多进程/线程的数目
# 开启多进程后,占用内存会成倍增加,根据内存设置###
worker_num: 8
# 共享内存bufsize。注意,缓存是以batch为单位,缓存的样本数据总量为batch_size * bufsize,所以请注意不要设置太大,请根据您的硬件设置。
bufsize: 16
TestReader:
# 预测过程中模型的输入设置
# 包括图片,图片长宽高等基本信息,图片id,标记的目标框,类别等信息
inputs_def:
# 预测图像输入尺寸
image_shape: [3, 608, 608]
fields: ['image', 'im_size', 'im_id']
# 数据集路径
dataset:
!ImageFolder
# anno_path,路径是相对于dataset_dir
anno_path: annotations/instances_val2017.json
# 是否包含背景类,若with_background=true,num_classes需要+1
# YOLO 系列with_background必须是false,FasterRCNN系列是true ###
with_background: false
sample_transforms:
- !DecodeImage
to_rgb: True
# ResizeImage
- !ResizeImage
target_size: 608
interp: 2
# NormalizeImage
- !NormalizeImage
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
is_scale: True
is_channel_first: false
# Permute
- !Permute
to_bgr: false
channel_first: True
# 1个GPU的batch size,默认为1
batch_size: 1
```
因为 它太大了无法显示 image diff 。你可以改为 查看blob
...@@ -6,8 +6,9 @@ ...@@ -6,8 +6,9 @@
- [将数据集转换为VOC格式](#方式二将数据集转换为VOC格式) - [将数据集转换为VOC格式](#方式二将数据集转换为VOC格式)
- [添加新数据源](#方式三添加新数据源) - [添加新数据源](#方式三添加新数据源)
- [2.选择模型](#2选择模型) - [2.选择模型](#2选择模型)
- [3.修改参数配置](#3修改参数配置) - [3.生成Anchor](#3生成Anchor)
- [4.开始训练与部署](#4开始训练与部署) - [4.修改参数配置](#4修改参数配置)
- [5.开始训练与部署](#5开始训练与部署)
- [附:一个自定义数据集demo](#附一个自定义数据集demo) - [附:一个自定义数据集demo](#附一个自定义数据集demo)
## 1.准备数据 ## 1.准备数据
...@@ -97,8 +98,23 @@ PaddleDetection中提供了丰富的模型库,具体可在[模型库](../MODEL ...@@ -97,8 +98,23 @@ PaddleDetection中提供了丰富的模型库,具体可在[模型库](../MODEL
同时也可以尝试PaddleDetection中开发的[YOLOv3增强模型](../featured_model/YOLOv3_ENHANCEMENT.md)[YOLOv4模型](../featured_model/YOLO_V4.md)[Anchor Free模型](../featured_model/ANCHOR_FREE_DETECTION.md)等。 同时也可以尝试PaddleDetection中开发的[YOLOv3增强模型](../featured_model/YOLOv3_ENHANCEMENT.md)[YOLOv4模型](../featured_model/YOLO_V4.md)[Anchor Free模型](../featured_model/ANCHOR_FREE_DETECTION.md)等。
## 3.生成Anchor
## 3.修改参数配置 在yolo系列模型中,可以运行`tools/anchor_cluster.py`来得到适用于你的数据集Anchor,使用方法如下:
``` bash
python tools/anchor_cluster.py -c configs/ppyolo/ppyolo.yml -n 9 -s 608 -m v2 -i 1000
```
目前`tools/anchor_cluster.py`支持的主要参数配置如下表所示:
| 参数 | 用途 | 默认值 | 备注 |
|:------:|:------:|:------:|:------:|
| -c/--config | 模型的配置文件 | 无默认值 | 必须指定 |
| -n/--n | 聚类的簇数 | 9 | Anchor的数目 |
| -s/--size | 图片的输入尺寸 | None | 若指定,则使用指定的尺寸,如果不指定, 则尝试从配置文件中读取图片尺寸 |
| -m/--method | 使用的Anchor聚类方法 | v2 | 目前只支持yolov2/v5的聚类算法 |
| -i/--iters | kmeans聚类算法的迭代次数 | 1000 | kmeans算法收敛或者达到迭代次数后终止 |
| -gi/--gen_iters | 遗传算法的迭代次数 | 1000 | 该参数只用于yolov5的Anchor聚类算法 |
| -t/--thresh| Anchor尺度的阈值 | 0.25 | 该参数只用于yolov5的Anchor聚类算法 |
## 4.修改参数配置
选择好模型后,需要在`configs`目录中找到对应的配置文件,为了适配在自定义数据集上训练,需要对参数配置做一些修改: 选择好模型后,需要在`configs`目录中找到对应的配置文件,为了适配在自定义数据集上训练,需要对参数配置做一些修改:
...@@ -133,7 +149,7 @@ PaddleDetection中提供了丰富的模型库,具体可在[模型库](../MODEL ...@@ -133,7 +149,7 @@ PaddleDetection中提供了丰富的模型库,具体可在[模型库](../MODEL
- 预训练模型配置:通过在yaml配置文件中的`pretrain_weights: path/to/weights`参数可以配置路径,可以是链接或权重文件路径。可直接沿用配置文件中给出的在ImageNet数据集上的预训练模型。同时我们支持训练在COCO或Obj365数据集上的模型权重作为预训练模型,做迁移学习,详情可参考[迁移学习文档](../advanced_tutorials/TRANSFER_LEARNING_cn.md) - 预训练模型配置:通过在yaml配置文件中的`pretrain_weights: path/to/weights`参数可以配置路径,可以是链接或权重文件路径。可直接沿用配置文件中给出的在ImageNet数据集上的预训练模型。同时我们支持训练在COCO或Obj365数据集上的模型权重作为预训练模型,做迁移学习,详情可参考[迁移学习文档](../advanced_tutorials/TRANSFER_LEARNING_cn.md)
## 4.开始训练与部署 ## 5.开始训练与部署
- 参数配置完成后,就可以开始训练模型了,具体可参考[训练/评估/预测](GETTING_STARTED_cn.md)入门文档。 - 参数配置完成后,就可以开始训练模型了,具体可参考[训练/评估/预测](GETTING_STARTED_cn.md)入门文档。
- 训练测试完成后,根据需要可以进行模型部署:首先需要导出可预测的模型,可参考[导出模型教程](../advanced_tutorials/deploy/EXPORT_MODEL.md);导出模型后就可以进行[C++预测部署](../advanced_tutorials/deploy/DEPLOY_CPP.md)或者[python端预测部署](../advanced_tutorials/deploy/DEPLOY_PY.md) - 训练测试完成后,根据需要可以进行模型部署:首先需要导出可预测的模型,可参考[导出模型教程](../advanced_tutorials/deploy/EXPORT_MODEL.md);导出模型后就可以进行[C++预测部署](../advanced_tutorials/deploy/DEPLOY_CPP.md)或者[python端预测部署](../advanced_tutorials/deploy/DEPLOY_PY.md)
......
# 如何准备训练数据
## 目录
- [目标检测数据说明](#目标检测数据说明)
- [准备训练数据](#准备训练数据)
- [VOC数据数据](#VOC数据数据)
- [VOC数据集下载](#VOC数据集下载)
- [VOC数据标注文件介绍](#VOC数据标注文件介绍)
- [COCO数据数据](#COCO数据数据)
- [COCO数据集下载](#COCO数据下载)
- [COCO数据标注文件介绍](#COCO数据标注文件介绍)
- [用户数据](#用户数据)
- [用户数据转成VOC数据](#用户数据转成VOC数据)
- [用户数据转成COCO数据](#用户数据转成COCO数据)
- [用户数据自定义reader](#用户数据自定义reader)
- [用户数据数据转换示例](#用户数据数据转换示例)
### 目标检测数据说明
目标检测的数据比分类复杂,一张图像中,需要标记出各个目标区域的位置和类别。
一般的目标区域位置用一个矩形框来表示,一般用以下3种方式表达:
| 表达方式 | 说明 |
| :----------------: | :--------------------------------: |
| x1,y1,x2,y2 | (x1,y1)为左上角坐标,(x2,y2)为右下角坐标 |
| x,y,w,h | (x,y)为左上角坐标,w为目标区域宽度,h为目标区域高度 |
| xc,yc,w,h | (xc,yc)为目标区域中心坐标,w为目标区域宽度,h为目标区域高度 |
常见的目标检测数据集如Pascal VOC和COCO,采用的是第一种 `x1,y1,x2,y2` 表示物体的bounding box.
### 准备训练数据
PaddleDetection默认支持[COCO](http://cocodataset.org)[Pascal VOC](http://host.robots.ox.ac.uk/pascal/VOC/)[WIDER-FACE](http://shuoyang1213.me/WIDERFACE/) 数据源。
同时还支持自定义数据源,包括:
(1) 自定义数据数据转换成VOC数据;
(2) 自定义数据数据转换成COCO数据;
(3) 自定义新的数据源,增加自定义的reader。
首先进入到`PaddleDetection`根目录下
```
cd PaddleDetection/
ppdet_root=$(pwd)
```
#### VOC数据数据
VOC数据是[Pascal VOC](http://host.robots.ox.ac.uk/pascal/VOC/) 比赛使用的数据。Pascal VOC比赛不仅包含图像分类分类任务,还包含图像目标检测、图像分割等任务,其标注文件中包含多个任务的标注内容。
VOC数据集指的是Pascal VOC比赛使用的数据。用户自定义的VOC数据,xml文件中的非必须字段,请根据实际情况选择是否标注或是否使用默认值。
##### VOC数据集下载
- 通过代码自动化下载VOC数据集
```
# 执行代码自动化下载VOC数据集
python dataset/voc/download_voc.py
```
代码执行完成后VOC数据集文件组织结构为:
```
>>cd dataset/voc/
>>tree
├── create_list.py
├── download_voc.py
├── generic_det_label_list.txt
├── generic_det_label_list_zh.txt
├── label_list.txt
├── VOCdevkit/VOC2007
│ ├── annotations
│ ├── 001789.xml
│ | ...
│ ├── JPEGImages
│ ├── 001789.jpg
│ | ...
│ ├── ImageSets
│ | ...
├── VOCdevkit/VOC2012
│ ├── Annotations
│ ├── 2011_003876.xml
│ | ...
│ ├── JPEGImages
│ ├── 2011_003876.jpg
│ | ...
│ ├── ImageSets
│ | ...
| ...
```
各个文件说明
```
# label_list.txt 是类别名称列表,文件名必须是 label_list.txt。若使用VOC数据集,config文件中use_default_label为true时不需要这个文件
>>cat label_list.txt
aeroplane
bicycle
...
# trainval.txt 是训练数据集文件列表
>>cat trainval.txt
VOCdevkit/VOC2007/JPEGImages/007276.jpg VOCdevkit/VOC2007/Annotations/007276.xml
VOCdevkit/VOC2012/JPEGImages/2011_002612.jpg VOCdevkit/VOC2012/Annotations/2011_002612.xml
...
# test.txt 是测试数据集文件列表
>>cat test.txt
VOCdevkit/VOC2007/JPEGImages/000001.jpg VOCdevkit/VOC2007/Annotations/000001.xml
...
# label_list.txt voc 类别名称列表
>>cat label_list.txt
aeroplane
bicycle
...
```
- 已下载VOC数据集
按照如上数据文件组织结构组织文件即可。
##### VOC数据标注文件介绍
VOC数据是每个图像文件对应一个同名的xml文件,xml文件中标记物体框的坐标和类别等信息。例如图像`2007_002055.jpg`
![](../images/2007_002055.jpg)
图片对应的xml文件内包含对应图片的基本信息,比如文件名、来源、图像尺寸以及图像中包含的物体区域信息和类别信息等。
xml文件中包含以下字段:
- filename,表示图像名称。
- size,表示图像尺寸。包括:图像宽度、图像高度、图像深度。
```
<size>
<width>500</width>
<height>375</height>
<depth>3</depth>
</size>
```
- object字段,表示每个物体。包括:
| 标签 | 说明 |
| :--------: | :-----------: |
| name | 物体类别名称 |
| pose | 关于目标物体姿态描述(非必须字段) |
| truncated | 如果物体的遮挡超过15-20%并且位于边界框之外,请标记为`truncated`(非必须字段) |
| difficult | 难以识别的物体标记为`difficult`(非必须字段) |
| bndbox子标签 | (xmin,ymin) 左上角坐标,(xmax,ymax) 右下角坐标, |
#### COCO数据
COCO数据是[COCO](http://cocodataset.org) 比赛使用的数据。同样的,COCO比赛数也包含多个比赛任务,其标注文件中包含多个任务的标注内容。
COCO数据集指的是COCO比赛使用的数据。用户自定义的COCO数据,json文件中的一些字段,请根据实际情况选择是否标注或是否使用默认值。
##### COCO数据下载
- 通过代码自动化下载COCO数据集
```
# 执行代码自动化下载COCO数据集
python dataset/voc/download_coco.py
```
代码执行完成后COCO数据集文件组织结构为:
```
>>cd dataset/coco/
>>tree
├── annotations
│ ├── instances_train2017.json
│ ├── instances_val2017.json
│ | ...
├── train2017
│ ├── 000000000009.jpg
│ ├── 000000580008.jpg
│ | ...
├── val2017
│ ├── 000000000139.jpg
│ ├── 000000000285.jpg
│ | ...
| ...
```
- 已下载COCO数据集
按照如上数据文件组织结构组织文件即可。
##### COCO数据标注介绍
COCO数据标注是将所有训练图像的标注都存放到一个json文件中。数据以字典嵌套的形式存放。
json文件中包含以下key:
- info,表示标注文件info。
- licenses,表示标注文件licenses。
- images,表示标注文件中图像信息列表,每个元素是一张图像的信息。如下为其中一张图像的信息:
```
{
'license': 3, # license
'file_name': '000000391895.jpg', # file_name
# coco_url
'coco_url': 'http://images.cocodataset.org/train2017/000000391895.jpg',
'height': 360, # image height
'width': 640, # image width
'date_captured': '2013-11-14 11:18:45', # date_captured
# flickr_url
'flickr_url': 'http://farm9.staticflickr.com/8186/8119368305_4e622c8349_z.jpg',
'id': 391895 # image id
}
```
- annotations,表示标注文件中目标物体的标注信息列表,每个元素是一个目标物体的标注信息。如下为其中一个目标物体的标注信息:
```
{
'segmentation': # 物体的分割标注
'area': 2765.1486500000005, # 物体的区域面积
'iscrowd': 0, # iscrowd
'image_id': 558840, # image id
'bbox': [199.84, 200.46, 77.71, 70.88], # bbox
'category_id': 58, # category_id
'id': 156 # image id
}
```
```
# 查看COCO标注文件
import json
coco_anno = json.load(open('./annotations/instances_train2017.json'))
# coco_anno.keys
print('\nkeys:', coco_anno.keys())
# 查看类别信息
print('\n物体类别:', coco_anno['categories'])
# 查看一共多少张图
print('\n图像数量:', len(coco_anno['images']))
# 查看一共多少个目标物体
print('\n标注物体数量:', len(coco_anno['annotations']))
# 查看一条目标物体标注信息
print('\n查看一条目标物体标注信息:', coco_anno['annotations'][0])
```
COCO数据准备如下。
`dataset/coco/`最初文件组织结构
```
>>cd dataset/coco/
>>tree
├── download_coco.py
```
#### 用户数据
对于用户数据有3种处理方法:
(1) 将用户数据转成VOC数据(根据需要仅包含物体检测所必须的标签即可)
(2) 将用户数据转成COCO数据(根据需要仅包含物体检测所必须的标签即可)
(3) 自定义一个用户数据的reader(较复杂数据,需要自定义reader)
##### 用户数据转成VOC数据
用户数据集转成VOC数据后目录结构如下(注意数据集中路径名、文件名尽量不要使用中文,避免中文编码问题导致出错):
```
dataset/xxx/
├── annotations
│ ├── xxx1.xml
│ ├── xxx2.xml
│ ├── xxx3.xml
│ | ...
├── images
│ ├── xxx1.jpg
│ ├── xxx2.jpg
│ ├── xxx3.jpg
│ | ...
├── label_list.txt (必须提供,且文件名称必须是label_list.txt )
├── train.txt (训练数据集文件列表, ./images/xxx1.jpg ./annotations/xxx1.xml)
└── valid.txt (测试数据集文件列表)
```
各个文件说明
```
# label_list.txt 是类别名称列表,改文件名必须是这个
>>cat label_list.txt
classname1
classname2
...
# train.txt 是训练数据文件列表
>>cat train.txt
./images/xxx1.jpg ./annotations/xxx1.xml
./images/xxx2.jpg ./annotations/xxx2.xml
...
# valid.txt 是验证数据文件列表
>>cat valid.txt
./images/xxx3.jpg ./annotations/xxx3.xml
...
```
##### 用户数据转成COCO
`./tools/`中提供了`x2coco.py`用于将VOC数据集、labelme标注的数据集或cityscape数据集转换为COCO数据,例如:
(1)labelmes数据转换为COCO数据:
```bash
python tools/x2coco.py \
--dataset_type labelme \
--json_input_dir ./labelme_annos/ \
--image_input_dir ./labelme_imgs/ \
--output_dir ./cocome/ \
--train_proportion 0.8 \
--val_proportion 0.2 \
--test_proportion 0.0
```
(2)voc数据转换为COCO数据:
```bash
python tools/x2coco.py \
--dataset_type voc \
--voc_anno_dir path/to/VOCdevkit/VOC2007/Annotations/ \
--voc_anno_list path/to/VOCdevkit/VOC2007/ImageSets/Main/trainval.txt \
--voc_label_list dataset/voc/label_list.txt \
--voc_out_name voc_train.json
```
用户数据集转成COCO数据后目录结构如下(注意数据集中路径名、文件名尽量不要使用中文,避免中文编码问题导致出错):
```
dataset/xxx/
├── annotations
│ ├── train.json # coco数据的标注文件
│ ├── valid.json # coco数据的标注文件
├── images
│ ├── xxx1.jpg
│ ├── xxx2.jpg
│ ├── xxx3.jpg
│ | ...
...
```
##### 用户数据自定义reader
如果数据集有新的数据需要添加进PaddleDetection中,您可参考数据处理文档中的[添加新数据源](../advanced_tutorials/READER.md#添加新数据源)文档部分,开发相应代码完成新的数据源支持,同时数据处理具体代码解析等可阅读[数据处理文档](../advanced_tutorials/READER.md)
#### 用户数据数据转换示例
[Kaggle数据集](https://www.kaggle.com/andrewmvd/road-sign-detection) 比赛数据为例,说明如何准备自定义数据。
Kaggle上的 [road-sign-detection](https://www.kaggle.com/andrewmvd/road-sign-detection) 比赛数据包含877张图像,数据类别4类:crosswalk,speedlimit,stop,trafficlight。
可从Kaggle上下载,也可以从[下载链接](https://paddlemodels.bj.bcebos.com/object_detection/roadsign.zip) 下载。
路标数据集示例图:
![](../images/road554.png)
```
# 下载解压数据
>>cd $(ppdet_root)/dataset
# 下载kaggle数据集并解压,当前文件组织结构如下
├── annotations
│ ├── road0.xml
│ ├── road1.xml
│ ├── road10.xml
│ | ...
├── images
│ ├── road0.jpg
│ ├── road1.jpg
│ ├── road2.jpg
│ | ...
```
将数据划分为训练集和测试集
```
# 生成 label_list.txt 文件
>>echo "speedlimit\ncrosswalk\ntrafficlight\nstop" > label_list.txt
# 生成 train.txt、valid.txt和test.txt列表文件
>>ls images/*.png | shuf > all_image_list.txt
>>awk -F"/" '{print $2}' all_image_list.txt | awk -F".png" '{print $1}' | awk -F"\t" '{print "images/"$1".png annotations/"$1".xml"}' > all_list.txt
# 训练集、验证集、测试集比例分别约80%、10%、10%。
>>head -n 88 all_list.txt > test.txt
>>head -n 176 all_list.txt | tail -n 88 > valid.txt
>>tail -n 701 all_list.txt > train.txt
# 删除不用文件
>>rm -rf all_image_list.txt all_list.txt
最终数据集文件组织结构为:
├── annotations
│ ├── road0.xml
│ ├── road1.xml
│ ├── road10.xml
│ | ...
├── images
│ ├── road0.jpg
│ ├── road1.jpg
│ ├── road2.jpg
│ | ...
├── label_list.txt
├── test.txt
├── train.txt
└── valid.txt
# label_list.txt 是类别名称列表,文件名必须是 label_list.txt
>>cat label_list.txt
crosswalk
speedlimit
stop
trafficlight
# train.txt 是训练数据集文件列表,每一行是一张图像路径和对应标注文件路径,以空格分开。注意这里的路径是数据集文件夹内的相对路径。
>>cat train.txt
./images/road839.png ./annotations/road839.xml
./images/road363.png ./annotations/road363.xml
...
# valid.txt 是验证数据集文件列表,每一行是一张图像路径和对应标注文件路径,以空格分开。注意这里的路径是数据集文件夹内的相对路径。
>>cat valid.txt
./images/road218.png ./annotations/road218.xml
./images/road681.png ./annotations/road681.xml
```
也可以下载准备好的数据[下载链接](https://paddlemodels.bj.bcebos.com/object_detection/roadsign_voc.zip) ,解压到`dataset/roadsign_voc/`文件夹下即可。
准备好数据后,一般的我们要对数据有所了解,比如图像量,图像尺寸,每一类目标区域个数,目标区域大小等。如有必要,还要对数据进行清洗。
roadsign数据集统计:
| 数据 | 图片数量 |
| :--------: | :-----------: |
| train | 701 |
| valid | 176 |
**说明:**(1)用户数据,建议在训练前仔细检查数据,避免因数据标注格式错误或图像数据不完整造成训练过程中的crash
(2)如果图像尺寸太大的话,在不限制读入数据尺寸情况下,占用内存较多,会造成内存/显存溢出,请合理设置batch_size,可从小到大尝试
[English](QUICK_STARTED.md) | 简体中文 [English](QUICK_STARTED.md) | 简体中文
# 快速开始 # 快速开始
为了使得用户能够在很短时间内快速产出模型,掌握PaddleDetection的使用方式,这篇教程通过一个预训练检测模型对小数据集进行finetune。在较短时间内即可产出一个效果不错的模型。实际业务中,建议用户根据需要选择合适模型配置文件进行适配。
为了使得用户能够在很短的时间内快速产出模型,掌握PaddleDetection的使用方式,这篇教程通过一个预训练检测模型对小数据集进行finetune。在P40上单卡大约20min即可产出一个效果不错的模型。
- **注:在开始前,如果有GPU设备,指定GPU设备号。**
```bash ## 一、快速体验
export CUDA_VISIBLE_DEVICES=0 ```
# 用PP-YOLO算法在COCO数据集上预训练模型预测一张图片
python tools/infer.py -c configs/ppyolo/ppyolo.yml -o use_gpu=true weights=https://paddlemodels.bj.bcebos.com/object_detection/ppyolo.pdparams --infer_img=demo/000000014439.jpg
``` ```
结果如下图:
## 数据准备 ![](../images/000000014439.jpg)
数据集参考[Kaggle数据集](https://www.kaggle.com/mbkinaci/fruit-images-for-object-detection),其中训练数据集240张图片,测试数据集60张图片,数据类别为3类:苹果,橘子,香蕉。[下载链接](https://dataset.bj.bcebos.com/PaddleDetection_demo/fruit-detection.tar)。数据下载后分别解压即可, 数据准备脚本位于[download_fruit.py](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dataset/fruit/download_fruit.py)。下载数据方式如下:
```bash ## 二、准备数据
python dataset/fruit/download_fruit.py 数据集参考[Kaggle数据集](https://www.kaggle.com/andrewmvd/road-sign-detection) ,包含877张图像,数据类别4类:crosswalk,speedlimit,stop,trafficlight。
将数据划分为训练集701张图和测试集176张图,[下载链接](https://paddlemodels.bj.bcebos.com/object_detection/roadsign_voc.tar).
```
# 注意:可跳过这步下载,后面训练会自动下载
python dataset/roadsign_voc/download_roadsign_voc.py
``` ```
## 开始训练 ## 三、训练、评估、预测
### 1、训练
```
# 边训练边测试 CPU需要约1小时(use_gpu=false),1080Ti GPU需要约5分钟。
# -c 参数表示指定使用哪个配置文件
# -o 参数表示指定配置文件种的全局变量(覆盖配置文件种的设置),这里设置使用gpu,
# --eval 参数表示边训练边评估,会自动保存一个评估结果最的名为best_model.pdmodel的模型
训练命令如下:
```bash python tools/train.py -c configs/yolov3_mobilenet_v1_roadsign.yml --eval -o use_gpu=true
python -u tools/train.py -c configs/yolov3_mobilenet_v1_fruit.yml --eval
``` ```
训练使用`yolov3_mobilenet_v1`基于COCO数据集训练好的模型进行finetune 如果想通过VisualDL实时观察loss变化去去曲线,在训练命令种添加--use_vdl=true,以及通过--vdl_log_dir设置日志保存路径
**但注意VisualDL需Python>=3.5**
如果想通过VisualDL实时观察loss和精度值,启动命令添加`--use_vdl=True`,以及通过`--vdl_log_dir`设置日志保存路径,但注意**VisualDL需Python>=3.5** 首先安装[VisualDL](https://github.com/PaddlePaddle/VisualDL)
```
python -m pip install visualdl -i https://mirror.baidu.com/pypi/simple
```
```bash ```
python -u tools/train.py -c configs/yolov3_mobilenet_v1_fruit.yml \ python -u tools/train.py -c configs/yolov3_mobilenet_v1_roadsign.yml \
--use_vdl=True \ --use_vdl=true \
--vdl_log_dir=vdl_fruit_dir/scalar \ --vdl_log_dir=vdl_dir/scalar \
--eval --eval
``` ```
通过visualdl命令实时查看变化曲线:
通过`visualdl`命令实时查看变化曲线: ```
visualdl --logdir vdl_dir/scalar/ --host <host_IP> --port <port_num>
```bash
visualdl --logdir vdl_fruit_dir/scalar/ --host <host_IP> --port <port_num>
``` ```
VisualDL结果显示如下: ### 2、评估
```
![](../images/visualdl_fruit.jpg) # 评估 默认使用训练过程中保存的best_model
# -c 参数表示指定使用哪个配置文件
训练模型[下载链接](https://paddlemodels.bj.bcebos.com/object_detection/yolov3_mobilenet_v1_fruit.tar) # -o 参数表示指定配置文件种的全局变量(覆盖配置文件种的设置)
python tools/eval.py -c configs/yolov3_mobilenet_v1_roadsign.yml-o use_gpu=true
## 评估预测
评估命令如下:
```bash
python -u tools/eval.py -c configs/yolov3_mobilenet_v1_fruit.yml
``` ```
预测命令如下
```bash ### 3、预测
python -u tools/infer.py -c configs/yolov3_mobilenet_v1_fruit.yml \
-o weights=https://paddlemodels.bj.bcebos.com/object_detection/yolov3_mobilenet_v1_fruit.tar \
--infer_img=demo/orange_71.jpg
``` ```
# -c 参数表示指定使用哪个配置文件
# -o 参数表示指定配置文件种的全局变量(覆盖配置文件种的设置)
# --infer_img 参数指定预测图像路径
# 预测结束后会在output文件夹中生成一张画有预测结果的同名图像
预测图片如下:
![](../../demo/orange_71.jpg) python tools/infer.py -c configs/yolov3_mobilenet_v1_roadsign.yml -o use_gpu=true --infer_img=demo/road554.png
```
![](../images/orange_71_detection.jpg)
结果如下图:
更多训练及评估流程,请参考[入门使用文档](GETTING_STARTED_cn.md) ![](../images/road554.png)
...@@ -97,6 +97,15 @@ def load_config(file_path): ...@@ -97,6 +97,15 @@ def load_config(file_path):
del cfg[READER_KEY] del cfg[READER_KEY]
merge_config(cfg) merge_config(cfg)
# NOTE: training batch size defined only in TrainReader, sychornized
# batch size config to global, models can get batch size config
# from global config when building model.
# batch size in evaluation or inference can also be added here
if 'TrainReader' in global_config:
global_config['train_batch_size'] = global_config['TrainReader'][
'batch_size']
return global_config return global_config
......
...@@ -41,7 +41,7 @@ class DataSet(object): ...@@ -41,7 +41,7 @@ class DataSet(object):
anno_path=None, anno_path=None,
sample_num=-1, sample_num=-1,
with_background=True, with_background=True,
use_default_label=None, use_default_label=False,
**kwargs): **kwargs):
super(DataSet, self).__init__() super(DataSet, self).__init__()
self.anno_path = anno_path self.anno_path = anno_path
...@@ -117,7 +117,7 @@ class ImageFolder(DataSet): ...@@ -117,7 +117,7 @@ class ImageFolder(DataSet):
anno_path=None, anno_path=None,
sample_num=-1, sample_num=-1,
with_background=True, with_background=True,
use_default_label=None, use_default_label=False,
**kwargs): **kwargs):
super(ImageFolder, self).__init__(dataset_dir, image_dir, anno_path, super(ImageFolder, self).__init__(dataset_dir, image_dir, anno_path,
sample_num, with_background, sample_num, with_background,
......
...@@ -51,7 +51,7 @@ class VOCDataSet(DataSet): ...@@ -51,7 +51,7 @@ class VOCDataSet(DataSet):
image_dir=None, image_dir=None,
anno_path=None, anno_path=None,
sample_num=-1, sample_num=-1,
use_default_label=True, use_default_label=False,
with_background=True, with_background=True,
label_list='label_list.txt'): label_list='label_list.txt'):
super(VOCDataSet, self).__init__( super(VOCDataSet, self).__init__(
......
...@@ -32,17 +32,17 @@ class YOLOv3Loss(object): ...@@ -32,17 +32,17 @@ class YOLOv3Loss(object):
Combined loss for YOLOv3 network Combined loss for YOLOv3 network
Args: Args:
batch_size (int): training batch size train_batch_size (int): training batch size
ignore_thresh (float): threshold to ignore confidence loss ignore_thresh (float): threshold to ignore confidence loss
label_smooth (bool): whether to use label smoothing label_smooth (bool): whether to use label smoothing
use_fine_grained_loss (bool): whether use fine grained YOLOv3 loss use_fine_grained_loss (bool): whether use fine grained YOLOv3 loss
instead of fluid.layers.yolov3_loss instead of fluid.layers.yolov3_loss
""" """
__inject__ = ['iou_loss', 'iou_aware_loss'] __inject__ = ['iou_loss', 'iou_aware_loss']
__shared__ = ['use_fine_grained_loss'] __shared__ = ['use_fine_grained_loss', 'train_batch_size']
def __init__(self, def __init__(self,
batch_size=8, train_batch_size=8,
ignore_thresh=0.7, ignore_thresh=0.7,
label_smooth=True, label_smooth=True,
use_fine_grained_loss=False, use_fine_grained_loss=False,
...@@ -51,7 +51,7 @@ class YOLOv3Loss(object): ...@@ -51,7 +51,7 @@ class YOLOv3Loss(object):
downsample=[32, 16, 8], downsample=[32, 16, 8],
scale_x_y=1., scale_x_y=1.,
match_score=False): match_score=False):
self._batch_size = batch_size self._train_batch_size = train_batch_size
self._ignore_thresh = ignore_thresh self._ignore_thresh = ignore_thresh
self._label_smooth = label_smooth self._label_smooth = label_smooth
self._use_fine_grained_loss = use_fine_grained_loss self._use_fine_grained_loss = use_fine_grained_loss
...@@ -65,7 +65,7 @@ class YOLOv3Loss(object): ...@@ -65,7 +65,7 @@ class YOLOv3Loss(object):
anchor_masks, mask_anchors, num_classes, prefix_name): anchor_masks, mask_anchors, num_classes, prefix_name):
if self._use_fine_grained_loss: if self._use_fine_grained_loss:
return self._get_fine_grained_loss( return self._get_fine_grained_loss(
outputs, targets, gt_box, self._batch_size, num_classes, outputs, targets, gt_box, self._train_batch_size, num_classes,
mask_anchors, self._ignore_thresh) mask_anchors, self._ignore_thresh)
else: else:
losses = [] losses = []
...@@ -95,7 +95,7 @@ class YOLOv3Loss(object): ...@@ -95,7 +95,7 @@ class YOLOv3Loss(object):
outputs, outputs,
targets, targets,
gt_box, gt_box,
batch_size, train_batch_size,
num_classes, num_classes,
mask_anchors, mask_anchors,
ignore_thresh, ignore_thresh,
...@@ -108,7 +108,7 @@ class YOLOv3Loss(object): ...@@ -108,7 +108,7 @@ class YOLOv3Loss(object):
targets ([Variables]): List of Variables, The targets for yolo targets ([Variables]): List of Variables, The targets for yolo
loss calculatation. loss calculatation.
gt_box (Variable): The ground-truth boudding boxes. gt_box (Variable): The ground-truth boudding boxes.
batch_size (int): The training batch size train_batch_size (int): The training batch size
num_classes (int): class num of dataset num_classes (int): class num of dataset
mask_anchors ([[float]]): list of anchors in each output layer mask_anchors ([[float]]): list of anchors in each output layer
ignore_thresh (float): prediction bbox overlap any gt_box greater ignore_thresh (float): prediction bbox overlap any gt_box greater
...@@ -171,7 +171,7 @@ class YOLOv3Loss(object): ...@@ -171,7 +171,7 @@ class YOLOv3Loss(object):
loss_h = fluid.layers.reduce_sum(loss_h, dim=[1, 2, 3]) loss_h = fluid.layers.reduce_sum(loss_h, dim=[1, 2, 3])
if self._iou_loss is not None: if self._iou_loss is not None:
loss_iou = self._iou_loss(x, y, w, h, tx, ty, tw, th, anchors, loss_iou = self._iou_loss(x, y, w, h, tx, ty, tw, th, anchors,
downsample, self._batch_size, downsample, self._train_batch_size,
scale_x_y) scale_x_y)
loss_iou = loss_iou * tscale_tobj loss_iou = loss_iou * tscale_tobj
loss_iou = fluid.layers.reduce_sum(loss_iou, dim=[1, 2, 3]) loss_iou = fluid.layers.reduce_sum(loss_iou, dim=[1, 2, 3])
...@@ -180,14 +180,14 @@ class YOLOv3Loss(object): ...@@ -180,14 +180,14 @@ class YOLOv3Loss(object):
if self._iou_aware_loss is not None: if self._iou_aware_loss is not None:
loss_iou_aware = self._iou_aware_loss( loss_iou_aware = self._iou_aware_loss(
ioup, x, y, w, h, tx, ty, tw, th, anchors, downsample, ioup, x, y, w, h, tx, ty, tw, th, anchors, downsample,
self._batch_size, scale_x_y) self._train_batch_size, scale_x_y)
loss_iou_aware = loss_iou_aware * tobj loss_iou_aware = loss_iou_aware * tobj
loss_iou_aware = fluid.layers.reduce_sum( loss_iou_aware = fluid.layers.reduce_sum(
loss_iou_aware, dim=[1, 2, 3]) loss_iou_aware, dim=[1, 2, 3])
loss_iou_awares.append(fluid.layers.reduce_mean(loss_iou_aware)) loss_iou_awares.append(fluid.layers.reduce_mean(loss_iou_aware))
loss_obj_pos, loss_obj_neg = self._calc_obj_loss( loss_obj_pos, loss_obj_neg = self._calc_obj_loss(
output, obj, tobj, gt_box, self._batch_size, anchors, output, obj, tobj, gt_box, self._train_batch_size, anchors,
num_classes, downsample, self._ignore_thresh, scale_x_y) num_classes, downsample, self._ignore_thresh, scale_x_y)
loss_cls = fluid.layers.sigmoid_cross_entropy_with_logits(cls, tcls) loss_cls = fluid.layers.sigmoid_cross_entropy_with_logits(cls, tcls)
......
...@@ -78,6 +78,12 @@ DATASETS = { ...@@ -78,6 +78,12 @@ DATASETS = {
'https://dataset.bj.bcebos.com/PaddleDetection_demo/fruit.tar', 'https://dataset.bj.bcebos.com/PaddleDetection_demo/fruit.tar',
'baa8806617a54ccf3685fa7153388ae6', ), ], 'baa8806617a54ccf3685fa7153388ae6', ), ],
['Annotations', 'JPEGImages']), ['Annotations', 'JPEGImages']),
'roadsign_voc': ([(
'https://paddlemodels.bj.bcebos.com/object_detection/roadsign_voc.tar',
'8d629c0f880dd8b48de9aeff44bf1f3e', ), ], ['annotations', 'images']),
'roadsign_coco': ([(
'https://paddlemodels.bj.bcebos.com/object_detection/roadsign_coco.tar',
'49ce5a9b5ad0d6266163cd01de4b018e', ), ], ['annotations', 'images']),
'objects365': (), 'objects365': (),
} }
...@@ -117,7 +123,7 @@ def get_dataset_path(path, annotation, image_dir): ...@@ -117,7 +123,7 @@ def get_dataset_path(path, annotation, image_dir):
"https://www.objects365.org/download.html".format(name)) "https://www.objects365.org/download.html".format(name))
data_dir = osp.join(DATASET_HOME, name) data_dir = osp.join(DATASET_HOME, name)
# For voc, only check dir VOCdevkit/VOC2012, VOCdevkit/VOC2007 # For voc, only check dir VOCdevkit/VOC2012, VOCdevkit/VOC2007
if name == 'voc' or name == 'fruit': if name == 'voc' or name == 'fruit' or name == 'roadsign_voc':
exists = True exists = True
for sub_dir in dataset[1]: for sub_dir in dataset[1]:
check_dir = osp.join(data_dir, sub_dir) check_dir = osp.join(data_dir, sub_dir)
...@@ -129,7 +135,7 @@ def get_dataset_path(path, annotation, image_dir): ...@@ -129,7 +135,7 @@ def get_dataset_path(path, annotation, image_dir):
return data_dir return data_dir
# voc exist is checked above, voc is not exist here # voc exist is checked above, voc is not exist here
check_exist = name != 'voc' and name != 'fruit' check_exist = name != 'voc' and name != 'fruit' and name != 'roadsign_voc'
for url, md5sum in dataset[0]: for url, md5sum in dataset[0]:
get_path(url, data_dir, md5sum, check_exist) get_path(url, data_dir, md5sum, check_exist)
...@@ -139,10 +145,11 @@ def get_dataset_path(path, annotation, image_dir): ...@@ -139,10 +145,11 @@ def get_dataset_path(path, annotation, image_dir):
return data_dir return data_dir
# not match any dataset in DATASETS # not match any dataset in DATASETS
raise ValueError("Dataset {} is not valid and cannot parse dataset type " raise ValueError(
"'{}' for automaticly downloading, which only supports " "Dataset {} is not valid and cannot parse dataset type "
"'voc' , 'coco', 'wider_face' and 'fruit' currently". "'{}' for automaticly downloading, which only supports "
format(path, osp.split(path)[-1])) "'voc' , 'coco', 'wider_face', 'fruit' and 'roadsign_voc' currently".
format(path, osp.split(path)[-1]))
def create_voc_list(data_dir, devkit_subdir='VOCdevkit'): def create_voc_list(data_dir, devkit_subdir='VOCdevkit'):
...@@ -232,13 +239,19 @@ def _dataset_exists(path, annotation, image_dir): ...@@ -232,13 +239,19 @@ def _dataset_exists(path, annotation, image_dir):
if annotation: if annotation:
annotation_path = osp.join(path, annotation) annotation_path = osp.join(path, annotation)
if not osp.exists(annotation_path):
logger.error("Config dataset_dir {} is not exits!".format(path))
if not osp.isfile(annotation_path): if not osp.isfile(annotation_path):
logger.debug("Config annotation {} is not a " logger.warning("Config annotation {} is not a "
"file, dataset config is not " "file, dataset config is not "
"valid".format(annotation_path)) "valid".format(annotation_path))
return False return False
if image_dir: if image_dir:
image_path = osp.join(path, image_dir) image_path = osp.join(path, image_dir)
if not osp.exists(image_path):
logger.warning("Config dataset_dir {} is not exits!".format(path))
if not osp.isdir(image_path): if not osp.isdir(image_path):
logger.warning("Config image_dir {} is not a " logger.warning("Config image_dir {} is not a "
"directory, dataset config is not " "directory, dataset config is not "
......
...@@ -107,8 +107,8 @@ def bbox_eval(results, ...@@ -107,8 +107,8 @@ def bbox_eval(results,
logger.info("Accumulating evaluatation results...") logger.info("Accumulating evaluatation results...")
detection_map.accumulate() detection_map.accumulate()
map_stat = 100. * detection_map.get_map() map_stat = 100. * detection_map.get_map()
logger.info("mAP({:.2f}, {}) = {:.2f}".format(overlap_thresh, map_type, logger.info("mAP({:.2f}, {}) = {:.2f}%".format(overlap_thresh, map_type,
map_stat)) map_stat))
return map_stat return map_stat
......
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import sys
# add python path of PadleDetection to sys.path
parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 2)))
if parent_path not in sys.path:
sys.path.append(parent_path)
from scipy.cluster.vq import kmeans
import random
import numpy as np
from tqdm import tqdm
from ppdet.utils.cli import ArgsParser
from ppdet.utils.check import check_gpu, check_version, check_config
from ppdet.core.workspace import load_config, merge_config, create
import logging
FORMAT = '%(asctime)s-%(levelname)s: %(message)s'
logging.basicConfig(level=logging.INFO, format=FORMAT)
logger = logging.getLogger(__name__)
class BaseAnchorCluster(object):
def __init__(self, n, cache_path, cache, verbose=True):
"""
Base Anchor Cluster
Args:
n (int): number of clusters
cache_path (str): cache directory path
cache (bool): whether using cache
verbose (bool): whether print results
"""
super(BaseAnchorCluster, self).__init__()
self.n = n
self.cache_path = cache_path
self.cache = cache
self.verbose = verbose
def print_result(self, centers):
raise NotImplementedError('%s.print_result is not available' %
self.__class__.__name__)
def get_whs(self):
whs_cache_path = os.path.join(self.cache_path, 'whs.npy')
shapes_cache_path = os.path.join(self.cache_path, 'shapes.npy')
if self.cache and os.path.exists(whs_cache_path) and os.path.exists(
shapes_cache_path):
self.whs = np.load(whs_cache_path)
self.shapes = np.load(shapes_cache_path)
return self.whs, self.shapes
whs = np.zeros((0, 2))
shapes = np.zeros((0, 2))
roidbs = self.dataset.get_roidb()
for rec in tqdm(roidbs):
h, w = rec['h'], rec['w']
bbox = rec['gt_bbox']
wh = bbox[:, 2:4] - bbox[:, 0:2] + 1
wh = wh / np.array([[w, h]])
shape = np.ones_like(wh) * np.array([[w, h]])
whs = np.vstack((whs, wh))
shapes = np.vstack((shapes, shape))
if self.cache:
os.makedirs(self.cache_path, exist_ok=True)
np.save(whs_cache_path, whs)
np.save(shapes_cache_path, shapes)
self.whs = whs
self.shapes = shapes
return self.whs, self.shapes
def calc_anchors(self):
raise NotImplementedError('%s.calc_anchors is not available' %
self.__class__.__name__)
def __call__(self):
self.get_whs()
centers = self.calc_anchors()
if self.verbose:
self.print_result(centers)
return centers
class YOLOv2AnchorCluster(BaseAnchorCluster):
def __init__(self,
n,
dataset,
size,
cache_path,
cache,
iters=1000,
verbose=True):
super(YOLOv2AnchorCluster, self).__init__(
n, cache_path, cache, verbose=verbose)
"""
YOLOv2 Anchor Cluster
Reference:
https://github.com/AlexeyAB/darknet/blob/master/scripts/gen_anchors.py
Args:
n (int): number of clusters
dataset (DataSet): DataSet instance, VOC or COCO
size (list): [w, h]
cache_path (str): cache directory path
cache (bool): whether using cache
iters (int): kmeans algorithm iters
verbose (bool): whether print results
"""
self.dataset = dataset
self.size = size
self.iters = iters
def print_result(self, centers):
logger.info('%d anchor cluster result: [w, h]' % self.n)
for w, h in centers:
logger.info('[%d, %d]' % (round(w), round(h)))
def metric(self, whs, centers):
wh1 = whs[:, None]
wh2 = centers[None]
inter = np.minimum(wh1, wh2).prod(2)
return inter / (wh1.prod(2) + wh2.prod(2) - inter)
def kmeans_expectation(self, whs, centers, assignments):
dist = self.metric(whs, centers)
new_assignments = dist.argmax(1)
converged = (new_assignments == assignments).all()
return converged, new_assignments
def kmeans_maximizations(self, whs, centers, assignments):
new_centers = np.zeros_like(centers)
for i in range(centers.shape[0]):
mask = (assignments == i)
if mask.sum():
new_centers[i, :] = whs[mask].mean(0)
return new_centers
def calc_anchors(self):
self.whs = self.whs * np.array([self.size])
# random select k centers
whs, n, iters = self.whs, self.n, self.iters
logger.info('Running kmeans for %d anchors on %d points...' %
(n, len(whs)))
idx = np.random.choice(whs.shape[0], size=n, replace=False)
centers = whs[idx]
assignments = np.zeros(whs.shape[0:1]) * -1
# kmeans
if n == 1:
return self.kmeans_maximizations(whs, centers, assignments)
pbar = tqdm(range(iters), desc='Cluster anchors with k-means algorithm')
for _ in pbar:
# E step
converged, assignments = self.kmeans_expectation(whs, centers,
assignments)
if converged:
break
# M step
centers = self.kmeans_maximizations(whs, centers, assignments)
ious = self.metric(whs, centers)
pbar.desc = 'avg_iou: %.4f' % (ious.max(1).mean())
centers = sorted(centers, key=lambda x: x[0] * x[1])
return centers
class YOLOv5AnchorCluster(BaseAnchorCluster):
def __init__(self,
n,
dataset,
size,
cache_path,
cache,
iters=300,
gen_iters=1000,
thresh=0.25,
verbose=True):
super(YOLOv5AnchorCluster, self).__init__(
n, cache_path, cache, verbose=verbose)
"""
YOLOv5 Anchor Cluster
Reference:
https://github.com/ultralytics/yolov5/blob/master/utils/general.py
Args:
n (int): number of clusters
dataset (DataSet): DataSet instance, VOC or COCO
size (list): [w, h]
cache_path (str): cache directory path
cache (bool): whether using cache
iters (int): iters of kmeans algorithm
gen_iters (int): iters of genetic algorithm
threshold (float): anchor scale threshold
verbose (bool): whether print results
"""
self.dataset = dataset
self.size = size
self.iters = iters
self.gen_iters = gen_iters
self.thresh = thresh
def print_result(self, centers):
whs = self.whs
centers = centers[np.argsort(centers.prod(1))]
x, best = self.metric(whs, centers)
bpr, aat = (
best > self.thresh).mean(), (x > self.thresh).mean() * self.n
logger.info(
'thresh=%.2f: %.4f best possible recall, %.2f anchors past thr' %
(self.thresh, bpr, aat))
logger.info(
'n=%g, img_size=%s, metric_all=%.3f/%.3f-mean/best, past_thresh=%.3f-mean: '
% (self.n, self.size, x.mean(), best.mean(),
x[x > self.thresh].mean()))
logger.info('%d anchor cluster result: [w, h]' % self.n)
for w, h in centers:
logger.info('[%d, %d]' % (round(w), round(h)))
def metric(self, whs, centers):
r = whs[:, None] / centers[None]
x = np.minimum(r, 1. / r).min(2)
return x, x.max(1)
def fitness(self, whs, centers):
_, best = self.metric(whs, centers)
return (best * (best > self.thresh)).mean()
def calc_anchors(self):
self.whs = self.whs * self.shapes / self.shapes.max(
1, keepdims=True) * np.array([self.size])
wh0 = self.whs
i = (wh0 < 3.0).any(1).sum()
if i:
logger.warn('Extremely small objects found. %d of %d'
'labels are < 3 pixels in width or height' %
(i, len(wh0)))
wh = wh0[(wh0 >= 2.0).any(1)]
logger.info('Running kmeans for %g anchors on %g points...' %
(self.n, len(wh)))
s = wh.std(0)
centers, dist = kmeans(wh / s, self.n, iter=self.iters)
centers *= s
f, sh, mp, s = self.fitness(wh, centers), centers.shape, 0.9, 0.1
pbar = tqdm(
range(self.gen_iters),
desc='Evolving anchors with Genetic Algorithm')
for _ in pbar:
v = np.ones(sh)
while (v == 1).all():
v = ((np.random.random(sh) < mp) * np.random.random() *
np.random.randn(*sh) * s + 1).clip(0.3, 3.0)
new_centers = (centers.copy() * v).clip(min=2.0)
new_f = self.fitness(wh, new_centers)
if new_f > f:
f, centers = new_f, new_centers.copy()
pbar.desc = 'Evolving anchors with Genetic Algorithm: fitness = %.4f' % f
return centers
def main():
parser = ArgsParser()
parser.add_argument(
'--n', '-n', default=9, type=int, help='num of clusters')
parser.add_argument(
'--iters',
'-i',
default=1000,
type=int,
help='num of iterations for kmeans')
parser.add_argument(
'--gen_iters',
'-gi',
default=1000,
type=int,
help='num of iterations for genetic algorithm')
parser.add_argument(
'--thresh',
'-t',
default=0.25,
type=float,
help='anchor scale threshold')
parser.add_argument(
'--verbose', '-v', default=True, type=bool, help='whether print result')
parser.add_argument(
'--size',
'-s',
default=None,
type=str,
help='image size: w,h, using comma as delimiter')
parser.add_argument(
'--method',
'-m',
default='v2',
type=str,
help='cluster method, [v2, v5] are supported now')
parser.add_argument(
'--cache_path', default='cache', type=str, help='cache path')
parser.add_argument(
'--cache', action='store_true', help='whether use cache')
FLAGS = parser.parse_args()
cfg = load_config(FLAGS.config)
merge_config(FLAGS.opt)
check_config(cfg)
# check if set use_gpu=True in paddlepaddle cpu version
check_gpu(cfg.use_gpu)
# check if paddlepaddle version is satisfied
check_version()
# get dataset
dataset = cfg['TrainReader']['dataset']
if FLAGS.size:
if ',' in FLAGS.size:
size = list(map(int, FLAGS.size.split(',')))
assert len(size) == 2, "the format of size is incorrect"
else:
size = int(FLAGS.size)
size = [size, size]
elif 'image_shape' in cfg['TrainReader']['inputs_def']:
size = cfg['TrainReader']['inputs_def']['image_shape'][1:]
else:
raise ValueError('size is not specified')
if FLAGS.method == 'v2':
cluster = YOLOv2AnchorCluster(FLAGS.n, dataset, size, FLAGS.cache_path,
FLAGS.cache, FLAGS.iters, FLAGS.verbose)
elif FLAGS.method == 'v5':
cluster = YOLOv5AnchorCluster(FLAGS.n, dataset, size, FLAGS.cache_path,
FLAGS.cache, FLAGS.iters, FLAGS.gen_iters,
FLAGS.thresh, FLAGS.verbose)
else:
raise ValueError('cluster method: %s is not supported' % FLAGS.method)
anchors = cluster()
if __name__ == "__main__":
main()
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册