diff --git a/README.md b/README.md deleted file mode 100644 index 755016871e8328a8164ba628d8c2bb63eaa140f8..0000000000000000000000000000000000000000 --- a/README.md +++ /dev/null @@ -1,176 +0,0 @@ -English | [简体中文](README_cn.md) - -Documentation:[https://paddledetection.readthedocs.io](https://paddledetection.readthedocs.io) - -# PaddleDetection - -PaddleDetection is an end-to-end object detection development kit based on PaddlePaddle, which -aims to help developers in the whole development of training models, optimizing performance and -inference speed, and deploying models. PaddleDetection provides varied object detection architectures -in modular design, and wealthy data augmentation methods, network components, loss functions, etc. -PaddleDetection supported practical projects such as industrial quality inspection, remote sensing -image object detection, and automatic inspection with its practical features such as model compression -and multi-platform deployment. - -[PP-YOLO](https://arxiv.org/abs/2007.12099), which is faster and has higer performance than YOLOv4, -has been released, it reached mAP(0.5:0.95) as 45.2% on COCO test2019 dataset and 72.9 FPS on single -Test V100. Please refer to [PP-YOLO](configs/ppyolo/README.md) for details. - -**Now all models in PaddleDetection require PaddlePaddle version 1.8 or higher, or suitable develop version.** - -
- -
- - -## Introduction - -Features: - -- Rich models: - - PaddleDetection provides rich of models, including 100+ pre-trained models -such as object detection, instance segmentation, face detection etc. It covers -the champion models, the practical detection models for cloud and edge device. - -- Production Ready: - - Key operations are implemented in C++ and CUDA, together with PaddlePaddle's -highly efficient inference engine, enables easy deployment in server environments. - -- Highly Flexible: - - Components are designed to be modular. Model architectures, as well as data -preprocess pipelines, can be easily customized with simple configuration -changes. - -- Performance Optimized: - - With the help of the underlying PaddlePaddle framework, faster training and -reduced GPU memory footprint is achieved. Notably, YOLOv3 training is -much faster compared to other frameworks. Another example is Mask-RCNN -(ResNet50), we managed to fit up to 4 images per GPU (Tesla V100 16GB) during -multi-GPU training. - -Supported Architectures: - -| | ResNet | ResNet-vd [1](#vd) | ResNeXt-vd | SENet | MobileNet | HRNet | Res2Net | -| ------------------- | :----: | ----------------------------: | :--------: | :---: | :-------: |:------:|:-----: | -| Faster R-CNN | ✓ | ✓ | x | ✓ | ✗ | ✗ | ✗ | -| Faster R-CNN + FPN | ✓ | ✓ | ✓ | ✓ | ✗ | ✓ | ✓ | -| Mask R-CNN | ✓ | ✓ | x | ✓ | ✗ | ✗ | ✗ | -| Mask R-CNN + FPN | ✓ | ✓ | ✓ | ✓ | ✗ | ✗ | ✓ | -| Cascade Faster-RCNN | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ | -| Cascade Mask-RCNN | ✓ | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ | -| Libra R-CNN | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | -| RetinaNet | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | -| YOLOv3 | ✓ | ✓ | ✗ | ✗ | ✓ | ✗ | ✗ | -| SSD | ✗ | ✗ | ✗ | ✗ | ✓ | ✗ | ✗ | -| BlazeFace | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | -| Faceboxes | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | - -[1] [ResNet-vd](https://arxiv.org/pdf/1812.01187) models offer much improved accuracy with negligible performance cost. - -**NOTE:** ✓ for config file and pretrain model provided in [Model Zoo](docs/MODEL_ZOO.md), ✗ for not provided but is supported generally. - -More models: - -- EfficientDet -- FCOS -- CornerNet-Squeeze -- YOLOv4 -- PP-YOLO - -More Backbones: - -- DarkNet -- VGG -- GCNet -- CBNet - -Advanced Features: - -- [x] **Synchronized Batch Norm** -- [x] **Group Norm** -- [x] **Modulated Deformable Convolution** -- [x] **Deformable PSRoI Pooling** -- [x] **Non-local and GCNet** - -**NOTE:** Synchronized batch normalization can only be used on multiple GPU devices, can not be used on CPU devices or single GPU device. - -The following is the relationship between COCO mAP and FPS on Tesla V100 of representative models of each architectures and backbones. - -
- -
- -**NOTE:** -- `CBResNet` stands for `Cascade-Faster-RCNN-CBResNet200vd-FPN`, which has highest mAP on COCO as 53.3% in PaddleDetection models -- `Cascade-Faster-RCNN` stands for `Cascade-Faster-RCNN-ResNet50vd-DCN`, which has been optimized to 20 FPS inference speed when COCO mAP as 47.8% -- The enhanced `YOLOv3-ResNet50vd-DCN` is 10.6 absolute percentage points higher than paper on COCO mAP, and inference speed is nearly 70% faster than the darknet framework -- All these models can be get in [Model Zoo](#Model-Zoo) - -The following is the relationship between COCO mAP and FPS on Tesla V100 of SOTA object detecters and PP-YOLO, which is faster and has better performance than YOLOv4, and reached mAP(0.5:0.95) as 45.2% on COCO test2019 dataset and 72.9 FPS on single Test V100. Please refer to [PP-YOLO](configs/ppyolo/README.md) for details. - -
- -
- -## Tutorials - - -### Get Started - -- [Installation guide](docs/tutorials/INSTALL.md) -- [Quick start on small dataset](docs/tutorials/QUICK_STARTED.md) -- [Train/Evaluation/Inference](docs/tutorials/GETTING_STARTED.md) -- [How to train a custom dataset](docs/tutorials/Custom_DataSet.md) -- [FAQ](docs/FAQ.md) - -### Advanced Tutorial - -- [Guide to preprocess pipeline and dataset definition](docs/advanced_tutorials/READER.md) -- [Models technical](docs/advanced_tutorials/MODEL_TECHNICAL.md) -- [Transfer learning document](docs/advanced_tutorials/TRANSFER_LEARNING.md) -- [Parameter configuration](docs/advanced_tutorials/config_doc): - - [Introduction to the configuration workflow](docs/advanced_tutorials/config_doc/CONFIG.md) - - [Parameter configuration for RCNN model](docs/advanced_tutorials/config_doc/RCNN_PARAMS_DOC.md) -- [IPython Notebook demo](demo/mask_rcnn_demo.ipynb) -- [Model compression](slim) - - [Model compression benchmark](slim) - - [Quantization](slim/quantization) - - [Model pruning](slim/prune) - - [Model distillation](slim/distillation) - - [Neural Architecture Search](slim/nas) -- [Deployment](deploy) - - [Export model for inference](docs/advanced_tutorials/deploy/EXPORT_MODEL.md) - - [Python inference](deploy/python) - - [C++ inference](deploy/cpp) - - [Inference benchmark](docs/advanced_tutorials/deploy/BENCHMARK_INFER_cn.md) - -## Model Zoo - -- Pretrained models are available in the [PaddleDetection model zoo](docs/MODEL_ZOO.md). -- [Mobile models](configs/mobile/README.md) -- [Anchor free models](configs/anchor_free/README.md) -- [Face detection models](docs/featured_model/FACE_DETECTION_en.md) -- [Pretrained models for pedestrian detection](docs/featured_model/CONTRIB.md) -- [Pretrained models for vehicle detection](docs/featured_model/CONTRIB.md) -- [YOLOv3 enhanced model](docs/featured_model/YOLOv3_ENHANCEMENT.md): Compared to MAP of 33.0% in paper, enhanced YOLOv3 reaches the MAP of 43.6%, and inference speed is improved as well -- [PP-YOLO](configs/ppyolo/README.md): PP-YOLO reeached mAP as 45.3% on COCO dataset,and 72.9 FPS on single Tesla V100 -- [Objects365 2019 Challenge champion model](docs/featured_model/champion_model/CACascadeRCNN.md) -- [Best single model of Open Images 2019-Object Detction](docs/featured_model/champion_model/OIDV5_BASELINE_MODEL.md) -- [Practical Server-side detection method](configs/rcnn_enhance/README_en.md): Inference speed on single V100 GPU can reach 20FPS when COCO mAP is 47.8%. -- [Large-scale practical object detection models](docs/featured_model/LARGE_SCALE_DET_MODEL_en.md): Large-scale practical server-side detection pretrained models with 676 categories are provided for most application scenarios, which can be used not only for direct inference but also finetuning on other datasets. - - -## License -PaddleDetection is released under the [Apache 2.0 license](LICENSE). - -## Updates -v0.4.0 was released at `05/2020`, add PP-YOLO, TTFNet, HTC, ACFPN, etc. And add BlaceFace face landmark detection model, add a series of optimized SSDLite models on mobile side, add data augmentations GridMask and RandomErasing, add Matrix NMS and EMA training, and improved ease of use, fix many known bugs, etc. -Please refer to [版本更新文档](docs/CHANGELOG.md) for details. - -## Contributing - -Contributions are highly welcomed and we would really appreciate your feedback!! diff --git a/README.md b/README.md new file mode 120000 index 0000000000000000000000000000000000000000..4015683cfa5969297febc12e7ca1264afabbc0b5 --- /dev/null +++ b/README.md @@ -0,0 +1 @@ +README_cn.md \ No newline at end of file diff --git a/README_cn.md b/README_cn.md index 630e81fe3dbf4cf361d25e5c4d63ee466acbe732..8b3ee850e1aebb21c3b72f7d16f44351080f5c1f 100644 --- a/README_cn.md +++ b/README_cn.md @@ -1,92 +1,177 @@ -简体中文 | [English](README.md) +简体中文 | [English](README_en.md) 文档:[https://paddledetection.readthedocs.io](https://paddledetection.readthedocs.io) -# PaddleDetection +# 简介 -飞桨推出的PaddleDetection是端到端目标检测开发套件,旨在帮助开发者更快更好地完成检测模型的训练、精度速度优化到部署全流程。PaddleDetection以模块化的设计实现了多种主流目标检测算法,并且提供了丰富的数据增强、网络组件、损失函数等模块,集成了模型压缩和跨平台高性能部署能力。目前基于PaddleDetection已经完成落地的项目涉及工业质检、遥感图像检测、无人巡检等多个领域。 +PaddleDetection飞桨目标检测开发套件,旨在帮助开发者更快更好地完成检测模型的组建、训练、优化及部署等全开发流程。 -PaddleDetection新发布精度速度领先的[PP-YOLO](https://arxiv.org/abs/2007.12099)模型,COCO数据集精度达到45.2%,单卡Tesla V100预测速度达到72.9 FPS,详细信息见[PP-YOLO模型](configs/ppyolo/README_cn.md) +PaddleDetection模块化地实现了多种主流目标检测算法,提供了丰富的数据增强策略、网络模块组件(如骨干网络)、损失函数等,并集成了模型压缩和跨平台高性能部署能力。 -**目前检测库下模型均要求使用PaddlePaddle 1.8及以上版本或适当的develop版本。** +经过长时间产业实践打磨,PaddleDetection已拥有顺畅、卓越的使用体验,被工业质检、遥感图像检测、无人巡检、新零售、互联网、科研等十多个行业的开发者广泛应用。
- +
- -## 简介 - -特性: - -- 模型丰富: - - PaddleDetection提供了丰富的模型,包含目标检测、实例分割、人脸检测等100+个预训练模型,涵盖多种数据集竞赛冠军方案、适合云端/边缘端设备部署的检测方案。 - -- 易部署: - - PaddleDetection的模型中使用的核心算子均通过C++或CUDA实现,同时基于PaddlePaddle的高性能推理引擎可以方便地部署在多种硬件平台上。 - -- 高灵活度: - - PaddleDetection通过模块化设计来解耦各个组件,基于配置文件可以轻松地搭建各种检测模型。 - -- 高性能: - - 基于PaddlePaddle框架的高性能内核,在模型训练速度、显存占用上有一定的优势。例如,YOLOv3的训练速度快于其他框架,在Tesla V100 16GB环境下,Mask-RCNN(ResNet50)可以单卡Batch Size可以达到4 (甚至到5)。 - - -支持的模型结构: - -| | ResNet | ResNet-vd [1](#vd) | ResNeXt-vd | SENet | MobileNet | HRNet | Res2Net | -|--------------------|:------:|------------------------------:|:----------:|:-----:|:---------:|:------:| :--: | -| Faster R-CNN | ✓ | ✓ | x | ✓ | ✗ | ✗ | ✗ | -| Faster R-CNN + FPN | ✓ | ✓ | ✓ | ✓ | ✗ | ✓ | ✓ | -| Mask R-CNN | ✓ | ✓ | x | ✓ | ✗ | ✗ | ✗ | -| Mask R-CNN + FPN | ✓ | ✓ | ✓ | ✓ | ✗ | ✗ | ✓ | -| Cascade Faster-RCNN | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ | -| Cascade Mask-RCNN | ✓ | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ | -| Libra R-CNN | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | -| RetinaNet | ✓ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | -| YOLOv3 | ✓ | ✓ | ✗ | ✗ | ✓ | ✗ | ✗ | -| SSD | ✗ | ✗ | ✗ | ✗ | ✓ | ✗ | ✗ | -| BlazeFace | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | -| Faceboxes | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | - -[1] [ResNet-vd](https://arxiv.org/pdf/1812.01187) 模型预测速度基本不变的情况下提高了精度。 - -**说明:** ✓ 为[模型库](docs/MODEL_ZOO_cn.md)中提供了对应配置文件和预训练模型,✗ 为未提供参考配置,但一般都支持。 - -更多的模型: - -- EfficientDet -- FCOS -- CornerNet-Squeeze -- YOLOv4 -- PP-YOLO - -更多的Backone: - -- DarkNet -- VGG -- GCNet -- CBNet -- Hourglass - -扩展特性: - -- [x] **Synchronized Batch Norm** -- [x] **Group Norm** -- [x] **Modulated Deformable Convolution** -- [x] **Deformable PSRoI Pooling** -- [x] **Non-local和GCNet** - -**注意:** Synchronized batch normalization 只能在多GPU环境下使用,不能在CPU环境或者单GPU环境下使用。 - -以下为选取各模型结构和骨干网络的代表模型COCO数据集精度mAP和单卡Tesla V100上预测速度(FPS)关系图。 +### 产品动态 + +- 2020.09.21-27: 【目标检测7日打卡课】手把手教你从入门到进阶,深入了解目标检测算法的前世今生。立即加入课程QQ交流群(1136406895)一起学习吧 :) +- 2020.07.24: 发布**产业最实用**目标检测模型 [PP-YOLO](https://arxiv.org/abs/2007.12099) ,深入考虑产业应用对精度速度的双重面诉求,COCO数据集精度45.2%,Tesla V100预测速度72.9 FPS,详细信息见[文档](configs/ppyolo/README_cn.md)。 +- 2020.06.11: 发布676类大规模服务器端实用目标检测模型,适用于绝大部分使用场景,可以直接用来预测,也可以用于微调其他任务。 + +### 特性 + +- **模型丰富**: 包含**目标检测**、**实例分割**、**人脸检测**等**100+个预训练模型**,涵盖多种**全球竞赛冠军**方案 +- **使用简洁**:模块化设计,解耦各个网络组件,开发者轻松搭建、试用各种检测模型及优化策略,快速得到高性能、定制化的算法。 +- **端到端打通**: 从数据增强、组网、训练、压缩、部署端到端打通,并完备支持**云端**/**边缘端**多架构、多设备部署。 +- **高性能**: 基于飞桨的高性能内核,模型训练速度及显存占用优势明显。支持FP16训练, 支持多机训练。 + +#### 套件结构概览 + + + + + + + + + + + + + + + + + + + + +
+ Architectures + + Backbones + + Components + + Data Augmentation +
+
  • Two-Stage Detection
  • +
      +
    • Faster RCNN
    • +
    • FPN
    • +
    • Cascade-RCNN
    • +
    • Libra RCNN
    • +
    • Hybrid Task RCNN
    • +
    • PSS-Det RCNN
    • +
    +
+
  • One-Stage Detection
  • +
      +
    • RetinaNet
    • +
    • YOLOv3
    • +
    • YOLOv4
    • +
    • PP-YOLO
    • +
    • SSD
    • +
    +
+
  • Anchor Free
  • +
      +
    • CornerNet-Squeeze
    • +
    • FCOS
    • +
    • TTFNet
    • +
    +
+
    +
  • Instance Segmentation
  • +
      +
    • Mask RCNN
    • +
    • SOLOv2 is coming soon
    • +
    +
+
    +
  • Face-Detction
  • +
      +
    • FaceBoxes
    • +
    • BlazeFace
    • +
    • BlazeFace-NAS
    • +
    +
+
+
    +
  • ResNet(&vd)
  • +
  • ResNeXt(&vd)
  • +
  • SENet
  • +
  • Res2Net
  • +
  • HRNet
  • +
  • Hourglass
  • +
  • CBNet
  • +
  • GCNet
  • +
  • DarkNet
  • +
  • CSPDarkNet
  • +
  • VGG
  • +
  • MobileNetv1/v3
  • +
  • GhostNet
  • +
  • Efficientnet
  • +
+
+
  • Common
  • +
      +
    • Sync-BN
    • +
    • Group Norm
    • +
    • DCNv2
    • +
    • Non-local
    • +
    +
+
  • FPN
  • +
      +
    • BiFPN
    • +
    • BFP
    • +
    • HRFPN
    • +
    • ACFPN
    • +
    +
+
  • Loss
  • +
      +
    • Smooth-L1
    • +
    • GIoU/DIoU/CIoU
    • +
    • IoUAware
    • +
    +
+
  • Post-processing
  • +
      +
    • SoftNMS
    • +
    • MatrixNMS
    • +
    +
+
  • Speed
  • +
      +
    • FP16 training
    • +
    • Multi-machine training
    • +
    +
+
+
    +
  • Resize
  • +
  • Flipping
  • +
  • Expand
  • +
  • Crop
  • +
  • Color Distort
  • +
  • Random Erasing
  • +
  • Mixup
  • +
  • Cutmix
  • +
  • Grid Mask
  • +
  • Auto Augment
  • +
+
+ +#### 模型性能概览 + +各模型结构和骨干网络的代表模型在COCO数据集上精度mAP和单卡Tesla V100上预测速度(FPS)对比图。
- +
**说明:** @@ -95,11 +180,6 @@ PaddleDetection新发布精度速度领先的[PP-YOLO](https://arxiv.org/abs/200 - PaddleDetection增强版`YOLOv3-ResNet50vd-DCN`在COCO数据集mAP高于原作10.6个绝对百分点,推理速度为61.3FPS,快于原作约70% - 图中模型均可在[模型库](#模型库)中获取 -以下为PaddleDetection发布的精度和预测速度优于YOLOv4模型的PP-YOLO与前沿目标检测算法的COCO数据集精度与单卡Tesla V100预测速度(FPS)关系图, PP-YOLO模型在[COCO](http://cocodataset.org) test2019数据集上精度达到45.2%,在单卡V100上FP32推理速度为72.9 FPS,详细信息见[PP-YOLO模型](configs/ppyolo/README_cn.md) - -
- -
## 文档教程 @@ -108,51 +188,56 @@ PaddleDetection新发布精度速度领先的[PP-YOLO](https://arxiv.org/abs/200 - [安装说明](docs/tutorials/INSTALL_cn.md) - [快速开始](docs/tutorials/QUICK_STARTED_cn.md) - [训练/评估/预测流程](docs/tutorials/GETTING_STARTED_cn.md) -- [如何训练自定义数据集](docs/tutorials/Custom_DataSet.md) +- [如何自定义数据集](docs/tutorials/Custom_DataSet.md) - [常见问题汇总](docs/FAQ.md) ### 进阶教程 -- [数据预处理及数据集定义](docs/advanced_tutorials/READER.md) -- [搭建模型步骤](docs/advanced_tutorials/MODEL_TECHNICAL.md) -- [模型参数配置](docs/advanced_tutorials/config_doc): +- 参数配置 - [配置模块设计和介绍](docs/advanced_tutorials/config_doc/CONFIG_cn.md) - - [RCNN模型参数说明](docs/advanced_tutorials/config_doc/RCNN_PARAMS_DOC.md) -- [迁移学习教程](docs/advanced_tutorials/TRANSFER_LEARNING_cn.md) -- [IPython Notebook demo](demo/mask_rcnn_demo.ipynb) -- [模型压缩](slim) + - [RCNN参数说明](docs/advanced_tutorials/config_doc/RCNN_PARAMS_DOC.md) + - [YOLOv3参数说明]() +- 迁移学习 + - [如何加载预训练](docs/advanced_tutorials/TRANSFER_LEARNING_cn.md) +- 模型压缩(基于[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim)) - [压缩benchmark](slim) - - [量化](slim/quantization) - - [剪枝](slim/prune) - - [蒸馏](slim/distillation) - - [神经网络搜索](slim/nas) -- [推理部署](deploy) + - [量化](slim/quantization), [剪枝](slim/prune), [蒸馏](slim/distillation), [搜索](slim/nas) +- 推理部署 - [模型导出教程](docs/advanced_tutorials/deploy/EXPORT_MODEL.md) - - [Python端推理部署](deploy/python) - - [C++端推理部署](deploy/cpp) + - [服务器端Python部署](deploy/python) + - [服务器端C++部署](deploy/cpp) + - [移动端部署](https://github.com/PaddlePaddle/Paddle-Lite-Demo) + - [在线Serving部署](https://github.com/PaddlePaddle/Serving) - [推理Benchmark](docs/advanced_tutorials/deploy/BENCHMARK_INFER_cn.md) +- 进阶开发 + - [新增数据预处理](docs/advanced_tutorials/READER.md) + - [新增检测算法](docs/advanced_tutorials/MODEL_TECHNICAL.md) + ## 模型库 -- [模型库](docs/MODEL_ZOO_cn.md) -- [移动端模型](configs/mobile/README.md) -- [Anchor free模型](configs/anchor_free/README.md) -- [人脸检测模型](docs/featured_model/FACE_DETECTION.md) -- [YOLOv3增强模型](docs/featured_model/YOLOv3_ENHANCEMENT.md): COCO mAP高达43.6%,原论文精度为33.0% -- [PP-YOLO模型](configs/ppyolo/README_cn.md): COCO mAP高达45.3%,单卡Tesla V100预测速度高达72.9 FPS -- [行人检测预训练模型](docs/featured_model/CONTRIB_cn.md) -- [车辆检测预训练模型](docs/featured_model/CONTRIB_cn.md) -- [Objects365 2019 Challenge夺冠模型](docs/featured_model/champion_model/CACascadeRCNN.md) -- [Open Images 2019-Object Detction比赛最佳单模型](docs/featured_model/champion_model/OIDV5_BASELINE_MODEL.md) -- [服务器端实用目标检测模型](configs/rcnn_enhance/README.md): V100上速度20FPS时,COCO mAP高达47.8%。 -- [大规模实用目标检测模型](docs/featured_model/LARGE_SCALE_DET_MODEL.md): 提供了包含676个类别的大规模服务器端实用目标检测模型,适用于绝大部分使用场景,可以直接用来预测,也可以用于微调其他任务。 +- 通用目标检测: + - [模型库和基线](docs/MODEL_ZOO_cn.md) + - [移动端模型](configs/mobile/README.md) + - [Anchor Free](configs/anchor_free/README.md) + - [PP-YOLO模型](configs/ppyolo/README_cn.md) + - [676类目标检测](docs/featured_model/LARGE_SCALE_DET_MODEL.md) + - [两阶段实用模型PSS-Det](configs/rcnn_enhance/README.md) +- 垂类领域 + - [人脸检测](docs/featured_model/FACE_DETECTION.md) + - [行人检测](docs/featured_model/CONTRIB_cn.md) + - [车辆检测](docs/featured_model/CONTRIB_cn.md) +- 比赛方案 + - [Objects365 2019 Challenge夺冠模型](docs/featured_model/champion_model/CACascadeRCNN.md) + - [Open Images 2019-Object Detction比赛最佳单模型](docs/featured_model/champion_model/OIDV5_BASELINE_MODEL.md) + +## 版本更新 +v0.4.0版本已经在`07/2020`发布,增加PP-YOLO, TTFNet, HTC, ACFPN等多个模型,新增BlazeFace人脸关键点检测模型,新增移动端SSDLite系列优化模型,新增GridMask,RandomErasing数据增强方法,新增Matrix NMS和EMA训练,提升易用性,修复已知诸多bug等,详细内容请参考[版本更新文档](docs/CHANGELOG.md)。 ## 许可证书 本项目的发布受[Apache 2.0 license](LICENSE)许可认证。 -## 版本更新 -v0.4.0版本已经在`07/2020`发布,增加PP-YOLO, TTFNet, HTC, ACFPN等多个模型,新增BlazeFace人脸关键点检测模型,新增移动端SSDLite系列优化模型,新增GridMask,RandomErasing数据增强方法,新增Matrix NMS和EMA训练,提升易用性,修复已知诸多bug等,详细内容请参考[版本更新文档](docs/CHANGELOG.md)。 -## 如何贡献代码 +## 贡献代码 我们非常欢迎你可以为PaddleDetection提供代码,也十分感谢你的反馈。 diff --git a/README_en.md b/README_en.md new file mode 100644 index 0000000000000000000000000000000000000000..755016871e8328a8164ba628d8c2bb63eaa140f8 --- /dev/null +++ b/README_en.md @@ -0,0 +1,176 @@ +English | [简体中文](README_cn.md) + +Documentation:[https://paddledetection.readthedocs.io](https://paddledetection.readthedocs.io) + +# PaddleDetection + +PaddleDetection is an end-to-end object detection development kit based on PaddlePaddle, which +aims to help developers in the whole development of training models, optimizing performance and +inference speed, and deploying models. PaddleDetection provides varied object detection architectures +in modular design, and wealthy data augmentation methods, network components, loss functions, etc. +PaddleDetection supported practical projects such as industrial quality inspection, remote sensing +image object detection, and automatic inspection with its practical features such as model compression +and multi-platform deployment. + +[PP-YOLO](https://arxiv.org/abs/2007.12099), which is faster and has higer performance than YOLOv4, +has been released, it reached mAP(0.5:0.95) as 45.2% on COCO test2019 dataset and 72.9 FPS on single +Test V100. Please refer to [PP-YOLO](configs/ppyolo/README.md) for details. + +**Now all models in PaddleDetection require PaddlePaddle version 1.8 or higher, or suitable develop version.** + +
+ +
+ + +## Introduction + +Features: + +- Rich models: + + PaddleDetection provides rich of models, including 100+ pre-trained models +such as object detection, instance segmentation, face detection etc. It covers +the champion models, the practical detection models for cloud and edge device. + +- Production Ready: + + Key operations are implemented in C++ and CUDA, together with PaddlePaddle's +highly efficient inference engine, enables easy deployment in server environments. + +- Highly Flexible: + + Components are designed to be modular. Model architectures, as well as data +preprocess pipelines, can be easily customized with simple configuration +changes. + +- Performance Optimized: + + With the help of the underlying PaddlePaddle framework, faster training and +reduced GPU memory footprint is achieved. Notably, YOLOv3 training is +much faster compared to other frameworks. Another example is Mask-RCNN +(ResNet50), we managed to fit up to 4 images per GPU (Tesla V100 16GB) during +multi-GPU training. + +Supported Architectures: + +| | ResNet | ResNet-vd [1](#vd) | ResNeXt-vd | SENet | MobileNet | HRNet | Res2Net | +| ------------------- | :----: | ----------------------------: | :--------: | :---: | :-------: |:------:|:-----: | +| Faster R-CNN | ✓ | ✓ | x | ✓ | ✗ | ✗ | ✗ | +| Faster R-CNN + FPN | ✓ | ✓ | ✓ | ✓ | ✗ | ✓ | ✓ | +| Mask R-CNN | ✓ | ✓ | x | ✓ | ✗ | ✗ | ✗ | +| Mask R-CNN + FPN | ✓ | ✓ | ✓ | ✓ | ✗ | ✗ | ✓ | +| Cascade Faster-RCNN | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ | +| Cascade Mask-RCNN | ✓ | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ | +| Libra R-CNN | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | +| RetinaNet | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | +| YOLOv3 | ✓ | ✓ | ✗ | ✗ | ✓ | ✗ | ✗ | +| SSD | ✗ | ✗ | ✗ | ✗ | ✓ | ✗ | ✗ | +| BlazeFace | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | +| Faceboxes | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | + +[1] [ResNet-vd](https://arxiv.org/pdf/1812.01187) models offer much improved accuracy with negligible performance cost. + +**NOTE:** ✓ for config file and pretrain model provided in [Model Zoo](docs/MODEL_ZOO.md), ✗ for not provided but is supported generally. + +More models: + +- EfficientDet +- FCOS +- CornerNet-Squeeze +- YOLOv4 +- PP-YOLO + +More Backbones: + +- DarkNet +- VGG +- GCNet +- CBNet + +Advanced Features: + +- [x] **Synchronized Batch Norm** +- [x] **Group Norm** +- [x] **Modulated Deformable Convolution** +- [x] **Deformable PSRoI Pooling** +- [x] **Non-local and GCNet** + +**NOTE:** Synchronized batch normalization can only be used on multiple GPU devices, can not be used on CPU devices or single GPU device. + +The following is the relationship between COCO mAP and FPS on Tesla V100 of representative models of each architectures and backbones. + +
+ +
+ +**NOTE:** +- `CBResNet` stands for `Cascade-Faster-RCNN-CBResNet200vd-FPN`, which has highest mAP on COCO as 53.3% in PaddleDetection models +- `Cascade-Faster-RCNN` stands for `Cascade-Faster-RCNN-ResNet50vd-DCN`, which has been optimized to 20 FPS inference speed when COCO mAP as 47.8% +- The enhanced `YOLOv3-ResNet50vd-DCN` is 10.6 absolute percentage points higher than paper on COCO mAP, and inference speed is nearly 70% faster than the darknet framework +- All these models can be get in [Model Zoo](#Model-Zoo) + +The following is the relationship between COCO mAP and FPS on Tesla V100 of SOTA object detecters and PP-YOLO, which is faster and has better performance than YOLOv4, and reached mAP(0.5:0.95) as 45.2% on COCO test2019 dataset and 72.9 FPS on single Test V100. Please refer to [PP-YOLO](configs/ppyolo/README.md) for details. + +
+ +
+ +## Tutorials + + +### Get Started + +- [Installation guide](docs/tutorials/INSTALL.md) +- [Quick start on small dataset](docs/tutorials/QUICK_STARTED.md) +- [Train/Evaluation/Inference](docs/tutorials/GETTING_STARTED.md) +- [How to train a custom dataset](docs/tutorials/Custom_DataSet.md) +- [FAQ](docs/FAQ.md) + +### Advanced Tutorial + +- [Guide to preprocess pipeline and dataset definition](docs/advanced_tutorials/READER.md) +- [Models technical](docs/advanced_tutorials/MODEL_TECHNICAL.md) +- [Transfer learning document](docs/advanced_tutorials/TRANSFER_LEARNING.md) +- [Parameter configuration](docs/advanced_tutorials/config_doc): + - [Introduction to the configuration workflow](docs/advanced_tutorials/config_doc/CONFIG.md) + - [Parameter configuration for RCNN model](docs/advanced_tutorials/config_doc/RCNN_PARAMS_DOC.md) +- [IPython Notebook demo](demo/mask_rcnn_demo.ipynb) +- [Model compression](slim) + - [Model compression benchmark](slim) + - [Quantization](slim/quantization) + - [Model pruning](slim/prune) + - [Model distillation](slim/distillation) + - [Neural Architecture Search](slim/nas) +- [Deployment](deploy) + - [Export model for inference](docs/advanced_tutorials/deploy/EXPORT_MODEL.md) + - [Python inference](deploy/python) + - [C++ inference](deploy/cpp) + - [Inference benchmark](docs/advanced_tutorials/deploy/BENCHMARK_INFER_cn.md) + +## Model Zoo + +- Pretrained models are available in the [PaddleDetection model zoo](docs/MODEL_ZOO.md). +- [Mobile models](configs/mobile/README.md) +- [Anchor free models](configs/anchor_free/README.md) +- [Face detection models](docs/featured_model/FACE_DETECTION_en.md) +- [Pretrained models for pedestrian detection](docs/featured_model/CONTRIB.md) +- [Pretrained models for vehicle detection](docs/featured_model/CONTRIB.md) +- [YOLOv3 enhanced model](docs/featured_model/YOLOv3_ENHANCEMENT.md): Compared to MAP of 33.0% in paper, enhanced YOLOv3 reaches the MAP of 43.6%, and inference speed is improved as well +- [PP-YOLO](configs/ppyolo/README.md): PP-YOLO reeached mAP as 45.3% on COCO dataset,and 72.9 FPS on single Tesla V100 +- [Objects365 2019 Challenge champion model](docs/featured_model/champion_model/CACascadeRCNN.md) +- [Best single model of Open Images 2019-Object Detction](docs/featured_model/champion_model/OIDV5_BASELINE_MODEL.md) +- [Practical Server-side detection method](configs/rcnn_enhance/README_en.md): Inference speed on single V100 GPU can reach 20FPS when COCO mAP is 47.8%. +- [Large-scale practical object detection models](docs/featured_model/LARGE_SCALE_DET_MODEL_en.md): Large-scale practical server-side detection pretrained models with 676 categories are provided for most application scenarios, which can be used not only for direct inference but also finetuning on other datasets. + + +## License +PaddleDetection is released under the [Apache 2.0 license](LICENSE). + +## Updates +v0.4.0 was released at `05/2020`, add PP-YOLO, TTFNet, HTC, ACFPN, etc. And add BlaceFace face landmark detection model, add a series of optimized SSDLite models on mobile side, add data augmentations GridMask and RandomErasing, add Matrix NMS and EMA training, and improved ease of use, fix many known bugs, etc. +Please refer to [版本更新文档](docs/CHANGELOG.md) for details. + +## Contributing + +Contributions are highly welcomed and we would really appreciate your feedback!! diff --git a/configs/dcn/yolov3_r50vd_dcn.yml b/configs/dcn/yolov3_r50vd_dcn.yml index 0493597b1c804f4b842242dd419278cab51548e2..99815fc324fc2e7b2a23dcef045901469b7341b6 100755 --- a/configs/dcn/yolov3_r50vd_dcn.yml +++ b/configs/dcn/yolov3_r50vd_dcn.yml @@ -40,11 +40,6 @@ YOLOv3Head: score_threshold: 0.01 YOLOv3Loss: - # batch_size here is only used for fine grained loss, not used - # for training batch_size setting, training batch_size setting - # is in configs/yolov3_reader.yml TrainReader.batch_size, batch - # size here should be set as same value as TrainReader.batch_size - batch_size: 8 ignore_thresh: 0.7 label_smooth: false diff --git a/configs/dcn/yolov3_r50vd_dcn_db_iouaware_obj365_pretrained_coco.yml b/configs/dcn/yolov3_r50vd_dcn_db_iouaware_obj365_pretrained_coco.yml index 6177aaac7f26dd0c6395a040cddfa11ffc705d39..63c4cad4eeeb32cddc09f5ada7732551b0ca0301 100755 --- a/configs/dcn/yolov3_r50vd_dcn_db_iouaware_obj365_pretrained_coco.yml +++ b/configs/dcn/yolov3_r50vd_dcn_db_iouaware_obj365_pretrained_coco.yml @@ -44,7 +44,6 @@ YOLOv3Head: drop_block: true YOLOv3Loss: - batch_size: 8 ignore_thresh: 0.7 label_smooth: false use_fine_grained_loss: true diff --git a/configs/dcn/yolov3_r50vd_dcn_db_iouloss_obj365_pretrained_coco.yml b/configs/dcn/yolov3_r50vd_dcn_db_iouloss_obj365_pretrained_coco.yml index 5e943145330c49b30c1742169a101018187f654c..037c52714b91723e39bc0a3aba6a1daeb745c7f5 100755 --- a/configs/dcn/yolov3_r50vd_dcn_db_iouloss_obj365_pretrained_coco.yml +++ b/configs/dcn/yolov3_r50vd_dcn_db_iouloss_obj365_pretrained_coco.yml @@ -42,11 +42,6 @@ YOLOv3Head: drop_block: true YOLOv3Loss: - # batch_size here is only used for fine grained loss, not used - # for training batch_size setting, training batch_size setting - # is in configs/yolov3_reader.yml TrainReader.batch_size, batch - # size here should be set as same value as TrainReader.batch_size - batch_size: 8 ignore_thresh: 0.7 label_smooth: false use_fine_grained_loss: true diff --git a/configs/dcn/yolov3_r50vd_dcn_db_obj365_pretrained_coco.yml b/configs/dcn/yolov3_r50vd_dcn_db_obj365_pretrained_coco.yml index 3c69e410a303212df3539c261465a8cd02e7911c..084930b96d78abb7cc8b7e2b503008ea52b41cf6 100755 --- a/configs/dcn/yolov3_r50vd_dcn_db_obj365_pretrained_coco.yml +++ b/configs/dcn/yolov3_r50vd_dcn_db_obj365_pretrained_coco.yml @@ -43,11 +43,6 @@ YOLOv3Head: keep_prob: 0.94 YOLOv3Loss: - # batch_size here is only used for fine grained loss, not used - # for training batch_size setting, training batch_size setting - # is in configs/yolov3_reader.yml TrainReader.batch_size, batch - # size here should be set as same value as TrainReader.batch_size - batch_size: 8 ignore_thresh: 0.7 label_smooth: false use_fine_grained_loss: true diff --git a/configs/dcn/yolov3_r50vd_dcn_obj365_pretrained_coco.yml b/configs/dcn/yolov3_r50vd_dcn_obj365_pretrained_coco.yml index 014a7947e2d711af8d9be1d04139b5058b3b52bf..31e781980d9e63203c1f8e14ef4dddff0f59f7d9 100755 --- a/configs/dcn/yolov3_r50vd_dcn_obj365_pretrained_coco.yml +++ b/configs/dcn/yolov3_r50vd_dcn_obj365_pretrained_coco.yml @@ -41,11 +41,6 @@ YOLOv3Head: score_threshold: 0.01 YOLOv3Loss: - # batch_size here is only used for fine grained loss, not used - # for training batch_size setting, training batch_size setting - # is in configs/yolov3_reader.yml TrainReader.batch_size, batch - # size here should be set as same value as TrainReader.batch_size - batch_size: 8 ignore_thresh: 0.7 label_smooth: false use_fine_grained_loss: true diff --git a/configs/ppyolo/README.md b/configs/ppyolo/README.md index 11837a1b60ec4173549a4d5aed758dde6a6b006f..92790e947960189707258f454277a5f085c9e8e3 100644 --- a/configs/ppyolo/README.md +++ b/configs/ppyolo/README.md @@ -82,6 +82,12 @@ Training PP-YOLO on 8 GPUs with following command(all commands should be run und CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python tools/train.py -c configs/ppyolo/ppyolo.yml --eval ``` +optional: Run `tools/anchor_cluster.py` to get anchors suitable for your dataset, and modify the anchor setting in `configs/ppyolo/ppyolo.yml`. + +``` bash +python tools/anchor_cluster.py -c configs/ppyolo/ppyolo.yml -n 9 -s 608 -m v2 -i 1000 +``` + ### 2. Evaluation Evaluating PP-YOLO on COCO val2017 dataset in single GPU with following commands: diff --git a/configs/ppyolo/README_cn.md b/configs/ppyolo/README_cn.md index 2c81cd9d6379dcdba11d33095c450c6211f275d1..7f3fb5104f4e9d3f0529ad58a4f266b4d463c982 100644 --- a/configs/ppyolo/README_cn.md +++ b/configs/ppyolo/README_cn.md @@ -82,6 +82,10 @@ PP-YOLO从如下方面优化和提升YOLOv3模型的精度和速度: ```bash CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python tools/train.py -c configs/ppyolo/ppyolo.yml --eval ``` +可选:在训练之前使用`tools/anchor_cluster.py`得到适用于你的数据集的anchor,并修改`configs/ppyolo/ppyolo.yml`中的anchor设置 +```bash +python tools/anchor_cluster.py -c configs/ppyolo/ppyolo.yml -n 9 -s 608 -m v2 -i 1000 +``` ### 2. 评估 diff --git a/configs/ppyolo/ppyolo.yml b/configs/ppyolo/ppyolo.yml index 59d5faa3a45a02cca84d15b48d25cac3d426a425..a1a9e99598512dd1683893d3c6b71a8a5ec7e581 100644 --- a/configs/ppyolo/ppyolo.yml +++ b/configs/ppyolo/ppyolo.yml @@ -44,7 +44,6 @@ YOLOv3Head: drop_block: true YOLOv3Loss: - batch_size: 24 ignore_thresh: 0.7 scale_x_y: 1.05 label_smooth: false diff --git a/configs/ppyolo/ppyolo_2x.yml b/configs/ppyolo/ppyolo_2x.yml index 8c2493372e7f5814dbaabd29d3ea78bc37647211..a781588679c1d45e4119a82df2518aa9e1f25896 100644 --- a/configs/ppyolo/ppyolo_2x.yml +++ b/configs/ppyolo/ppyolo_2x.yml @@ -44,7 +44,6 @@ YOLOv3Head: drop_block: true YOLOv3Loss: - batch_size: 24 ignore_thresh: 0.7 scale_x_y: 1.05 label_smooth: false diff --git a/configs/ppyolo/ppyolo_r18vd.yml b/configs/ppyolo/ppyolo_r18vd.yml index c054d5f5d574dc7419b07cbc8e4a22113f0778ae..a686a209960b87168c000bc141d6bdc545af6266 100755 --- a/configs/ppyolo/ppyolo_r18vd.yml +++ b/configs/ppyolo/ppyolo_r18vd.yml @@ -39,7 +39,6 @@ YOLOv3Head: drop_block: true YOLOv3Loss: - batch_size: 32 ignore_thresh: 0.7 scale_x_y: 1.05 label_smooth: false diff --git a/configs/ppyolo/ppyolo_test.yml b/configs/ppyolo/ppyolo_test.yml index 840865a0b8aac1a84724b3b03b24e32435992b02..a9b16dd44fce8f3269c088a995cd263bd6370480 100644 --- a/configs/ppyolo/ppyolo_test.yml +++ b/configs/ppyolo/ppyolo_test.yml @@ -47,7 +47,6 @@ YOLOv3Head: drop_block: true YOLOv3Loss: - batch_size: 24 ignore_thresh: 0.7 scale_x_y: 1.05 label_smooth: false diff --git a/configs/yolov3_darknet.yml b/configs/yolov3_darknet.yml index b84d811036da52d833f7c3cd1702d0b9148ac1a8..c3b4477f9c88d3a1fca5382f9fffa56826480845 100644 --- a/configs/yolov3_darknet.yml +++ b/configs/yolov3_darknet.yml @@ -35,11 +35,6 @@ YOLOv3Head: score_threshold: 0.01 YOLOv3Loss: - # batch_size here is only used for fine grained loss, not used - # for training batch_size setting, training batch_size setting - # is in configs/yolov3_reader.yml TrainReader.batch_size, batch - # size here should be set as same value as TrainReader.batch_size - batch_size: 8 ignore_thresh: 0.7 label_smooth: true diff --git a/configs/yolov3_darknet_voc.yml b/configs/yolov3_darknet_voc.yml index b1c48f5f6dfc64ce5c3d63ea4543b652905782cf..362989c2866facc54cec2987cb2a2bdd64a94c3e 100644 --- a/configs/yolov3_darknet_voc.yml +++ b/configs/yolov3_darknet_voc.yml @@ -36,11 +36,6 @@ YOLOv3Head: score_threshold: 0.01 YOLOv3Loss: - # batch_size here is only used for fine grained loss, not used - # for training batch_size setting, training batch_size setting - # is in configs/yolov3_reader.yml TrainReader.batch_size, batch - # size here should be set as same value as TrainReader.batch_size - batch_size: 8 ignore_thresh: 0.7 label_smooth: false diff --git a/configs/yolov3_darknet_voc_diouloss.yml b/configs/yolov3_darknet_voc_diouloss.yml index 62c912dc1dcdf3731271c1cfec8c6d90520bc86b..8a006fe8bd3206ea29eabe19a5654991e31d4a15 100644 --- a/configs/yolov3_darknet_voc_diouloss.yml +++ b/configs/yolov3_darknet_voc_diouloss.yml @@ -36,7 +36,6 @@ YOLOv3Head: score_threshold: 0.01 YOLOv3Loss: - batch_size: 8 ignore_thresh: 0.7 label_smooth: false iou_loss: DiouLossYolo diff --git a/configs/yolov3_mobilenet_v1.yml b/configs/yolov3_mobilenet_v1.yml index 040f0f2c935ebccfddad7513f2959b5ec31f5cea..3325bd4fbd1cbf0a938d6e28800e029308be3a79 100644 --- a/configs/yolov3_mobilenet_v1.yml +++ b/configs/yolov3_mobilenet_v1.yml @@ -36,11 +36,6 @@ YOLOv3Head: score_threshold: 0.01 YOLOv3Loss: - # batch_size here is only used for fine grained loss, not used - # for training batch_size setting, training batch_size setting - # is in configs/yolov3_reader.yml TrainReader.batch_size, batch - # size here should be set as same value as TrainReader.batch_size - batch_size: 8 ignore_thresh: 0.7 label_smooth: true diff --git a/configs/yolov3_mobilenet_v1_fruit.yml b/configs/yolov3_mobilenet_v1_fruit.yml index 78f50206e367862a6d9d8ccc65211bd328996ca4..b9e576c29b11a6ddb235ff3332d01c8df8148ba8 100644 --- a/configs/yolov3_mobilenet_v1_fruit.yml +++ b/configs/yolov3_mobilenet_v1_fruit.yml @@ -38,11 +38,6 @@ YOLOv3Head: score_threshold: 0.01 YOLOv3Loss: - # batch_size here is only used for fine grained loss, not used - # for training batch_size setting, training batch_size setting - # is in configs/yolov3_reader.yml TrainReader.batch_size, batch - # size here should be set as same value as TrainReader.batch_size - batch_size: 8 ignore_thresh: 0.7 label_smooth: true diff --git a/configs/yolov3_mobilenet_v1_roadsign.yml b/configs/yolov3_mobilenet_v1_roadsign.yml new file mode 100644 index 0000000000000000000000000000000000000000..53194ce059f15270f90e25e02793abb87146a54c --- /dev/null +++ b/configs/yolov3_mobilenet_v1_roadsign.yml @@ -0,0 +1,302 @@ +#####################################基础配置##################################### +# 检测算法使用YOLOv3,backbone使用MobileNet_v1 +# 检测模型的名称 +architecture: YOLOv3 +# 根据硬件选择是否使用GPU +use_gpu: true + # ### max_iters为最大迭代次数,而一个iter会运行batch_size * device_num张图片。batch_size在下面 TrainReader.batch_size设置。 +max_iters: 1200 +# log平滑参数,平滑窗口大小,会从取历史窗口中取log_smooth_window大小的loss求平均值 +log_smooth_window: 20 +# 模型保存文件夹 +save_dir: output +# 每隔多少迭代保存模型 +snapshot_iter: 200 +# ### mAP 评估方式,mAP评估方式可以选择COCO和VOC或WIDERFACE,其中VOC有11point和integral两种评估方法 +# VOC数据格式只能使用VOC mAP评估方法 +metric: VOC +map_type: integral +# ### pretrain_weights 可以是imagenet的预训练好的分类模型权重,也可以是在VOC或COCO数据集上的预训练的检测模型权重 +# 模型配置文件和权重文件可参考[模型库](https://github.com/PaddlePaddle/PaddleDetection/blob/release/0.4/docs/MODEL_ZOO.md) +pretrain_weights: https://paddlemodels.bj.bcebos.com/object_detection/yolov3_mobilenet_v1.tar +# 模型保存文件夹,如果开启了--eval,会在这个文件夹下保存best_model +weights: output/yolov3_mobilenet_v1_roadsign_coco_template/ +# ### 根据用户数据设置类别数,注意这里不含背景类 +num_classes: 4 +# finetune时忽略的参数,按照正则化匹配,匹配上的参数会被忽略掉 +finetune_exclude_pretrained_params: ['yolo_output'] +# use_fine_grained_loss +use_fine_grained_loss: false + +# 检测模型的结构 +YOLOv3: + # 默认是 MobileNetv1 + backbone: MobileNet + yolo_head: YOLOv3Head + +# 检测模型的backbone +MobileNet: + norm_decay: 0. + conv_group_scale: 1 + with_extra_blocks: false + +# 检测模型的Head +YOLOv3Head: + # anchor_masks + anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] + # 3x3 anchors + anchors: [[10, 13], [16, 30], [33, 23], + [30, 61], [62, 45], [59, 119], + [116, 90], [156, 198], [373, 326]] + # yolo_loss + yolo_loss: YOLOv3Loss + # nms 类型参数,可以设置为[MultiClassNMS, MultiClassSoftNMS, MatrixNMS], 默认使用 MultiClassNMS + nms: + # background_label,背景标签(类别)的索引,如果设置为 0 ,则忽略背景标签(类别)。如果设置为 -1 ,则考虑所有类别。默认值:0 + background_label: -1 + # NMS步骤后每个图像要保留的总bbox数。 -1表示在NMS步骤之后保留所有bbox。 + keep_top_k: 100 + # 在NMS中用于剔除检测框IOU的阈值,默认值:0.3 。 + nms_threshold: 0.45 + # 基于 score_threshold 的过滤检测后,根据置信度保留的最大检测次数。 + nms_top_k: 1000 + # 是否归一化,默认值:True 。 + normalized: false + # 过滤掉低置信度分数的边界框的阈值。 + score_threshold: 0.01 + +YOLOv3Loss: + # 这里的batch_size与训练中的batch_size(即TrainReader.batch_size)不同. + # 仅且当use_fine_grained_loss=true时,计算Loss时使用,且必须要与TrainReader.batch_size设置成一样 + batch_size: 8 + # 忽略样本的阈值 ignore_thresh + ignore_thresh: 0.7 + # 是否使用label_smooth + label_smooth: true + +LearningRate: + # ### 学习率设置 参考 https://github.com/PaddlePaddle/PaddleDetection/blob/release/0.4/docs/FAQ.md#faq%E5%B8%B8%E8%A7%81%E9%97%AE%E9%A2%98 + # base_lr + base_lr: 0.0001 + # 学习率调整策略 + # 具体实现参考[API](fluid.layers.piecewise_decay) + schedulers: + # 学习率调整策略 + - !PiecewiseDecay + gamma: 0.1 + milestones: + # ### 参考 https://github.com/PaddlePaddle/PaddleDetection/blob/release/0.4/docs/FAQ.md#faq%E5%B8%B8%E8%A7%81%E9%97%AE%E9%A2%98 + # ### 8/12 11/12 + - 800 + - 1100 + # 在训练开始时,调低学习率为base_lr * start_factor,然后逐步增长到base_lr,这个过程叫学习率热身,按照以下公式更新学习率 + # linear_step = end_lr - start_lr + # lr = start_lr + linear_step * (global_step / warmup_steps) + # 具体实现参考[API](fluid.layers.linear_lr_warmup) + - !LinearWarmup + start_factor: 0.3333333333333333 + steps: 100 + +OptimizerBuilder: + # 默认使用SGD+Momentum进行训练 + # 具体实现参考[API](fluid.optimizer) + optimizer: + momentum: 0.9 + type: Momentum + # 默认使用SGD+Momentum进行训练 + # 具体实现参考[API](fluid.optimizer) + regularizer: + factor: 0.0005 + type: L2 + +#####################################数据配置##################################### + +# 模型训练集设置参考 +# 训练、验证、测试使用的数据配置主要区别在数据路径、模型输入、数据增强参数设置 +# 如果使用 yolov3_reader.yml,下面的参数设置优先级高,会覆盖yolov3_reader.yml中的参数设置。 +# _READER_: 'yolov3_reader.yml' + +TrainReader: + # 训练过程中模型的输入设置 + # 包括图片,图片长宽高等基本信息,图片id,标记的目标框,类别等信息 + inputs_def: + fields: ['image', 'gt_bbox', 'gt_class', 'gt_score'] + # num_max_boxes,每个样本的groud truth的最多保留个数,若不够用0填充。 + num_max_boxes: 50 + # 训练数据集路径 + dataset: + # 指定数据集格式 + !VOCDataSet + #dataset/xxx/ + #├── annotations + #│ ├── xxx1.xml + #│ ├── xxx2.xml + #│ ├── xxx3.xml + #│ | ... + #├── images + #│ ├── xxx1.png + #│ ├── xxx2.png + #│ ├── xxx3.png + #│ | ... + #├── label_list.txt (用户自定义必须提供,且文件名称必须是label_list.txt。当使用VOC数据且use_default_label=true时,可不提供 ) + #├── train.txt (训练数据集文件列表, ./images/xxx1.png ./Annotations/xxx1.xml) + #└── valid.txt (测试数据集文件列表) + # 图片文件夹相对路径,路径是相对于dataset_dir,图像路径= dataset_dir + image_dir + image_name + dataset_dir: dataset/roadsign_voc + # 标记文件名 + anno_path: train.txt + # 是否包含背景类,若with_background=true,num_classes需要+1 + # YOLO 系列with_background必须是false,FasterRCNN系列是true ### + with_background: false + sample_transforms: + # 读取Image图像为numpy数组 + # 可以选择将图片从BGR转到RGB,可以选择对一个batch中的图片做mixup增强 + - !DecodeImage + to_rgb: True + with_mixup: True + # MixupImage + - !MixupImage + alpha: 1.5 + beta: 1.5 + # ColorDistort + - !ColorDistort {} + # RandomExpand + - !RandomExpand + fill_value: [123.675, 116.28, 103.53] + # 随机扩充比例,默认值是4.0 + ratio: 1.5 + - !RandomCrop {} + - !RandomFlipImage + is_normalized: false + # 归一化坐标 + - !NormalizeBox {} + # 如果 bboxes 数量小于 num_max_boxes,填充值为0的 box + - !PadBox + num_max_boxes: 50 + # 坐标格式转化,从XYXY转成XYWH格式 + - !BboxXYXY2XYWH {} + # 以下是对一个batch中的所有图片同时做的数据处理 + batch_transforms: + # 多尺度训练时,从list中随机选择一个尺寸,对一个batch数据同时同时resize + - !RandomShape + sizes: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608] + random_inter: True + # NormalizeImage + - !NormalizeImage + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + is_scale: True + is_channel_first: false + - !Permute + to_bgr: false + channel_first: True + # Gt2YoloTarget is only used when use_fine_grained_loss set as true, + # this operator will be deleted automatically if use_fine_grained_loss + # is set as false + - !Gt2YoloTarget + anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] + anchors: [[10, 13], [16, 30], [33, 23], + [30, 61], [62, 45], [59, 119], + [116, 90], [156, 198], [373, 326]] + downsample_ratios: [32, 16, 8] + # 1个GPU的batch size,默认为1。需要注意:每个iter迭代会运行batch_size * device_num张图片 + batch_size: 8 + # 是否shuffle + shuffle: true + # mixup,-1表示不做Mixup数据增强。注意,这里是epoch为单位 + mixup_epoch: 250 + # 注意,在某些情况下,drop_last=false时训练过程中可能会出错,建议训练时都设置为true + drop_last: true + # 若选用多进程,设置使用多进程/线程的数目 + # 开启多进程后,占用内存会成倍增加,根据内存设置### + worker_num: 4 + # 共享内存bufsize。注意,缓存是以batch为单位,缓存的样本数据总量为batch_size * bufsize,所以请注意不要设置太大,请根据您的硬件设置。 + bufsize: 2 + # 是否使用多进程 + use_process: true + + +EvalReader: + # 评估过程中模型的输入设置 + # 包括图片,图片长宽高等基本信息,图片id,标记的目标框,类别等信息 + inputs_def: + fields: ['image', 'im_size', 'im_id', 'gt_bbox', 'gt_class', 'is_difficult'] + # num_max_boxes,每个样本的groud truth的最多保留个数,若不够用0填充。 + num_max_boxes: 50 + # 数据集路径 + dataset: + !VOCDataSet + # 图片文件夹相对路径,路径是相对于dataset_dir,图像路径= dataset_dir + image_dir + image_name + dataset_dir: dataset/roadsign_voc + # 评估文件列表 + anno_path: valid.txt + # 是否包含背景类,若with_background=true,num_classes需要+1 + # YOLO 系列with_background必须是false,FasterRCNN系列是true ### + with_background: false + sample_transforms: + # 读取Image图像为numpy数组 + # 可以选择将图片从BGR转到RGB,可以选择对一个batch中的图片做mixup增强 + - !DecodeImage + to_rgb: True + # ResizeImage + - !ResizeImage + target_size: 608 + interp: 2 + # NormalizeImage + - !NormalizeImage + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + is_scale: True + is_channel_first: false + # 如果 bboxes 数量小于 num_max_boxes,填充值为0的 box + - !PadBox + num_max_boxes: 50 + - !Permute + to_bgr: false + channel_first: True + # 1个GPU的batch size,默认为1。需要注意:每个iter迭代会运行batch_size * device_num张图片 + batch_size: 8 + # drop_empty + drop_empty: false + # 若选用多进程,设置使用多进程/线程的数目 + # 开启多进程后,占用内存会成倍增加,根据内存设置### + worker_num: 4 + # 共享内存bufsize。注意,缓存是以batch为单位,缓存的样本数据总量为batch_size * bufsize,所以请注意不要设置太大,请根据您的硬件设置。 + bufsize: 2 + +TestReader: + # 预测过程中模型的输入设置 + # 包括图片,图片长宽高等基本信息,图片id,标记的目标框,类别等信息 + inputs_def: + # 预测图像输入尺寸 + image_shape: [3, 608, 608] + fields: ['image', 'im_size', 'im_id'] + # 数据集路径 + dataset: + # 预测数据 + !ImageFolder + # anno_path + anno_path: dataset/roadsign_voc/label_list.txt + # 是否包含背景类,若with_background=true,num_classes需要+1 + # YOLO 系列with_background必须是false,FasterRCNN系列是true ### + with_background: false + sample_transforms: + - !DecodeImage + to_rgb: True + # ResizeImage + - !ResizeImage + # 注意与上面图像尺寸保持一致 + target_size: 608 + interp: 2 + # NormalizeImage + - !NormalizeImage + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + is_scale: True + is_channel_first: false + # Permute + - !Permute + to_bgr: false + channel_first: True + # 1个GPU的batch size,默认为1 + batch_size: 1 diff --git a/configs/yolov3_mobilenet_v1_voc.yml b/configs/yolov3_mobilenet_v1_voc.yml index 1b7097ad301ea17080e9fda5b166c9e7da0722e4..3df184b25c82f5743da27e2dd7f4fb846aca18b4 100644 --- a/configs/yolov3_mobilenet_v1_voc.yml +++ b/configs/yolov3_mobilenet_v1_voc.yml @@ -37,11 +37,6 @@ YOLOv3Head: score_threshold: 0.01 YOLOv3Loss: - # batch_size here is only used for fine grained loss, not used - # for training batch_size setting, training batch_size setting - # is in configs/yolov3_reader.yml TrainReader.batch_size, batch - # size here should be set as same value as TrainReader.batch_size - batch_size: 8 ignore_thresh: 0.7 label_smooth: false diff --git a/configs/yolov3_mobilenet_v3.yml b/configs/yolov3_mobilenet_v3.yml index 223d14c4909998b6d76f54ff3b45d6d1badd9d1e..d8526f6a82b07fcfbc916b1f0c753da96647aeec 100644 --- a/configs/yolov3_mobilenet_v3.yml +++ b/configs/yolov3_mobilenet_v3.yml @@ -38,11 +38,6 @@ YOLOv3Head: score_threshold: 0.01 YOLOv3Loss: - # batch_size here is only used for fine grained loss, not used - # for training batch_size setting, training batch_size setting - # is in configs/yolov3_reader.yml TrainReader.batch_size, batch - # size here should be set as same value as TrainReader.batch_size - batch_size: 8 ignore_thresh: 0.7 label_smooth: false diff --git a/configs/yolov3_r34.yml b/configs/yolov3_r34.yml index da887cf3d4d9921eb7102dec879c05571111d3e1..ca4d50b4ccfd48b302dd5c9dcb9badde10d1ed46 100644 --- a/configs/yolov3_r34.yml +++ b/configs/yolov3_r34.yml @@ -38,11 +38,6 @@ YOLOv3Head: score_threshold: 0.01 YOLOv3Loss: - # batch_size here is only used for fine grained loss, not used - # for training batch_size setting, training batch_size setting - # is in configs/yolov3_reader.yml TrainReader.batch_size, batch - # size here should be set as same value as TrainReader.batch_size - batch_size: 8 ignore_thresh: 0.7 label_smooth: true diff --git a/configs/yolov3_r34_voc.yml b/configs/yolov3_r34_voc.yml index 2d980dd0c2129b16bc5d50b45a103f68203d0a7c..6aa4aa74c88150b9fe6773eb95db28181a6fb6fa 100644 --- a/configs/yolov3_r34_voc.yml +++ b/configs/yolov3_r34_voc.yml @@ -39,11 +39,6 @@ YOLOv3Head: score_threshold: 0.01 YOLOv3Loss: - # batch_size here is only used for fine grained loss, not used - # for training batch_size setting, training batch_size setting - # is in configs/yolov3_reader.yml TrainReader.batch_size, batch - # size here should be set as same value as TrainReader.batch_size - batch_size: 8 ignore_thresh: 0.7 label_smooth: false diff --git a/configs/yolov4/README.md b/configs/yolov4/README.md index 5127d88c8a8f1514e127d327806d752cd9ac8c12..9394975af1bd19aca032c9349b327f9b5f8998b0 100644 --- a/configs/yolov4/README.md +++ b/configs/yolov4/README.md @@ -21,6 +21,21 @@ - label_smooth - grid_sensitive +目前支持YOLO系列的Anchor聚类算法 +``` bash +python tools/anchor_cluster.py -c ${config} -m ${method} -s ${size} +``` +主要参数配置参考下表 +| 参数 | 用途 | 默认值 | 备注 | +|:------:|:------:|:------:|:------:| +| -c/--config | 模型的配置文件 | 无默认值 | 必须指定 | +| -n/--n | 聚类的簇数 | 9 | Anchor的数目 | +| -s/--size | 图片的输入尺寸 | None | 若指定,则使用指定的尺寸,如果不指定, 则尝试从配置文件中读取图片尺寸 | +| -m/--method | 使用的Anchor聚类方法 | v2 | 目前只支持yolov2/v5的聚类算法 | +| -i/--iters | kmeans聚类算法的迭代次数 | 1000 | kmeans算法收敛或者达到迭代次数后终止 | +| -gi/--gen_iters | 遗传算法的迭代次数 | 1000 | 该参数只用于yolov5的Anchor聚类算法 | +| -t/--thresh| Anchor尺度的阈值 | 0.25 | 该参数只用于yolov5的Anchor聚类算法 | + ## 模型库 下表中展示了当前支持的网络结构。 diff --git a/configs/yolov4/yolov4_cspdarknet.yml b/configs/yolov4/yolov4_cspdarknet.yml index 4411b054fff7e420fd6eed6bec0d4673f3140c1b..cbc69d12252f0e540f49aa16efb4fed2051da694 100644 --- a/configs/yolov4/yolov4_cspdarknet.yml +++ b/configs/yolov4/yolov4_cspdarknet.yml @@ -35,11 +35,6 @@ YOLOv4Head: scale_x_y: [1.2, 1.1, 1.05] YOLOv3Loss: - # batch_size here is only used for fine grained loss, not used - # for training batch_size setting, training batch_size setting - # is in configs/yolov3_reader.yml TrainReader.batch_size, batch - # size here should be set as same value as TrainReader.batch_size - batch_size: 4 ignore_thresh: 0.7 label_smooth: true downsample: [8,16,32] diff --git a/configs/yolov4/yolov4_cspdarknet_coco.yml b/configs/yolov4/yolov4_cspdarknet_coco.yml index 8b4a15dc530d491c2bf985e27948d5571a5d6489..a711a177aa02908d1a77b229937c0913a9341cc5 100644 --- a/configs/yolov4/yolov4_cspdarknet_coco.yml +++ b/configs/yolov4/yolov4_cspdarknet_coco.yml @@ -34,11 +34,6 @@ YOLOv4Head: scale_x_y: [1.2, 1.1, 1.05] YOLOv3Loss: - # batch_size here is only used for fine grained loss, not used - # for training batch_size setting, training batch_size setting - # is in configs/yolov3_reader.yml TrainReader.batch_size, batch - # size here should be set as same value as TrainReader.batch_size - batch_size: 8 ignore_thresh: 0.7 label_smooth: true downsample: [8,16,32] diff --git a/configs/yolov4/yolov4_cspdarknet_voc.yml b/configs/yolov4/yolov4_cspdarknet_voc.yml index beefaa0f153ca8f9fd0541907628a2c02854024c..3f2af08a6868629ab3f513bcc087e264abb14d1d 100644 --- a/configs/yolov4/yolov4_cspdarknet_voc.yml +++ b/configs/yolov4/yolov4_cspdarknet_voc.yml @@ -34,11 +34,6 @@ YOLOv4Head: scale_x_y: [1.2, 1.1, 1.05] YOLOv3Loss: - # batch_size here is only used for fine grained loss, not used - # for training batch_size setting, training batch_size setting - # is in configs/yolov3_reader.yml TrainReader.batch_size, batch - # size here should be set as same value as TrainReader.batch_size - batch_size: 4 ignore_thresh: 0.7 label_smooth: true downsample: [8,16,32] diff --git a/dataset/roadsign_voc/download_roadsign_voc.py b/dataset/roadsign_voc/download_roadsign_voc.py new file mode 100644 index 0000000000000000000000000000000000000000..3cb517d3cf362e3ad2ec7b4ebf3bff54acb244d4 --- /dev/null +++ b/dataset/roadsign_voc/download_roadsign_voc.py @@ -0,0 +1,28 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import sys +import os.path as osp +import logging +# add python path of PadleDetection to sys.path +parent_path = osp.abspath(osp.join(__file__, *(['..'] * 3))) +if parent_path not in sys.path: + sys.path.append(parent_path) + +from ppdet.utils.download import download_dataset + +logging.basicConfig(level=logging.INFO) + +download_path = osp.split(osp.realpath(sys.argv[0]))[0] +download_dataset(download_path, 'roadsign_voc') diff --git a/dataset/roadsign_voc/label_list.txt b/dataset/roadsign_voc/label_list.txt new file mode 100644 index 0000000000000000000000000000000000000000..1be460f457a2fdbec91d3a69377c232ae4a6beb0 --- /dev/null +++ b/dataset/roadsign_voc/label_list.txt @@ -0,0 +1,4 @@ +speedlimit +crosswalk +trafficlight +stop \ No newline at end of file diff --git a/demo/road554.png b/demo/road554.png new file mode 100644 index 0000000000000000000000000000000000000000..7733e57f922b0fee893775da4f698c202804966f Binary files /dev/null and b/demo/road554.png differ diff --git a/deploy/README.md b/deploy/README.md index 09654788f6b1037d708e7843ec14e426573ec4a6..137afa02af8b853528f8e62c646add348c766f41 100644 --- a/deploy/README.md +++ b/deploy/README.md @@ -1,6 +1,9 @@ # PaddleDetection 预测部署 -`PaddleDetection`目前支持使用`Python`和`C++`部署在`Windows` 和`Linux` 上运行。 +`PaddleDetection`目前支持: +- 使用`Python`和`C++`部署在`Windows` 和`Linux` 上运行 +- [在线服务化部署](./serving/README.md) +- [移动端部署](https://github.com/PaddlePaddle/Paddle-Lite-Demo) ## 模型导出 训练得到一个满足要求的模型后,如果想要将该模型接入到C++服务器端预测库或移动端预测库,需要通过`tools/export_model.py`导出该模型。 @@ -20,4 +23,5 @@ yolov3_darknet # 模型目录 ## 预测部署 - [1. Python预测(支持 Linux 和 Windows)](https://github.com/PaddlePaddle/PaddleDetection/blob/master/deploy/python) - [2. C++预测(支持 Linux 和 Windows)](https://github.com/PaddlePaddle/PaddleDetection/blob/master/deploy/cpp) -- [3. 移动端部署参考Paddle-Lite文档](https://paddle-lite.readthedocs.io/zh/latest/) +- [3. 在线服务化部署](./serving/README.md) +- [4. 移动端部署](https://github.com/PaddlePaddle/Paddle-Lite-Demo) diff --git a/deploy/serving/README.md b/deploy/serving/README.md new file mode 100644 index 0000000000000000000000000000000000000000..48fc2f03d687328aa1f8d26116ca8165ab202e98 --- /dev/null +++ b/deploy/serving/README.md @@ -0,0 +1,106 @@ +# 服务端预测部署 + +`PaddleDetection`训练出来的模型可以使用[Serving](https://github.com/PaddlePaddle/Serving) 部署在服务端。 +本教程以在路标数据集[roadsign_voc](https://paddlemodels.bj.bcebos.com/object_detection/roadsign_voc.tar) 使用`configs/yolov3_mobilenet_v1_roadsign.yml`算法训练的模型进行部署。 +预训练模型权重文件为[yolov3_mobilenet_v1_roadsign.pdparams](https://paddlemodels.bj.bcebos.com/object_detection/yolov3_mobilenet_v1_roadsign.pdparams) 。 + +## 1. 首先验证模型 +``` +python tools/infer.py -c configs/yolov3_mobilenet_v1_roadsign.yml -o use_gpu=true weights=https://paddlemodels.bj.bcebos.com/object_detection/yolov3_mobilenet_v1_roadsign.pdparams --infer_img=demo/road554.png +``` + +## 2. 安装 paddle serving +``` +# 安装 paddle-serving-client +pip install paddle-serving-client -i https://mirror.baidu.com/pypi/simple + +# 安装 paddle-serving-server +pip install paddle-serving-server -i https://mirror.baidu.com/pypi/simple + +# 安装 paddle-serving-server-gpu +pip install paddle-serving-server-gpu -i https://mirror.baidu.com/pypi/simple +``` + +## 3. 导出模型 +PaddleDetection在训练过程包括网络的前向和优化器相关参数,而在部署过程中,我们只需要前向参数,具体参考:[导出模型](https://github.com/PaddlePaddle/PaddleDetection/blob/master/docs/advanced_tutorials/deploy/EXPORT_MODEL.md) + +``` +python tools/export_serving_model.py -c configs/yolov3_mobilenet_v1_roadsign.yml -o use_gpu=true weights=https://paddlemodels.bj.bcebos.com/object_detection/yolov3_mobilenet_v1_roadsign.pdparams --output_dir=./inference_model +``` + +以上命令会在./inference_model文件夹下生成一个`yolov3_mobilenet_v1_roadsign`文件夹: +``` +inference_model +│ ├── yolov3_mobilenet_v1_roadsign +│ │ ├── infer_cfg.yml +│ │ ├── serving_client +│ │ │ ├── serving_client_conf.prototxt +│ │ │ ├── serving_client_conf.stream.prototxt +│ │ ├── serving_server +│ │ │ ├── conv1_bn_mean +│ │ │ ├── conv1_bn_offset +│ │ │ ├── conv1_bn_scale +│ │ │ ├── ... +``` + +`serving_client`文件夹下`serving_client_conf.prototxt`详细说明了模型输入输出信息 +`serving_client_conf.prototxt`文件内容为: +``` +feed_var { + name: "image" + alias_name: "image" + is_lod_tensor: false + feed_type: 1 + shape: 3 + shape: 608 + shape: 608 +} +feed_var { + name: "im_size" + alias_name: "im_size" + is_lod_tensor: false + feed_type: 2 + shape: 2 +} +fetch_var { + name: "multiclass_nms_0.tmp_0" + alias_name: "multiclass_nms_0.tmp_0" + is_lod_tensor: true + fetch_type: 1 + shape: -1 +} +``` + +## 4. 启动PaddleServing服务 + +``` +cd inference_model/yolov3_mobilenet_v1_roadsign/ + +# GPU +python -m paddle_serving_server_gpu.serve --model serving_server --port 9393 --gpu_ids 0 + +# CPU +python -m paddle_serving_server.serve --model serving_server --port 9393 +``` + +## 5. 测试部署的服务 +准备`label_list.txt`文件 +``` +# 进入到导出模型文件夹 +cd inference_model/yolov3_mobilenet_v1_roadsign/ + +# 将数据集对应的label_list.txt文件拷贝到当前文件夹下 +cp ../../dataset/roadsign_voc/label_list.txt . +``` + +设置`prototxt`文件路径为`serving_client/serving_client_conf.prototxt` 。 +设置`fetch`为`fetch=["multiclass_nms_0.tmp_0"])` + +测试 +``` +# 进入目录 +cd inference_model/yolov3_mobilenet_v1_roadsign/ + +# 测试代码 test_client.py 会自动创建output文件夹,并在output下生成`bbox.json`和`road554.png`两个文件 +python ../../deploy/serving/test_client.py ../../demo/road554.png +``` diff --git a/deploy/serving/test_client.py b/deploy/serving/test_client.py new file mode 100644 index 0000000000000000000000000000000000000000..7c2a6395f1ef34c9ca5a7bea4d23c47e9b7a63b9 --- /dev/null +++ b/deploy/serving/test_client.py @@ -0,0 +1,40 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import sys +import numpy as np +from paddle_serving_client import Client +from paddle_serving_app.reader import * +import cv2 +preprocess = Sequential([ + File2Image(), BGR2RGB(), Resize( + (608, 608), interpolation=cv2.INTER_LINEAR), Div(255.0), Transpose( + (2, 0, 1)) +]) + +postprocess = RCNNPostprocess("label_list.txt", "output", [608, 608]) +client = Client() + +client.load_client_config("serving_client/serving_client_conf.prototxt") +client.connect(['127.0.0.1:9393']) + +im = preprocess(sys.argv[1]) +fetch_map = client.predict( + feed={ + "image": im, + "im_size": np.array(list(im.shape[1:])), + }, + fetch=["multiclass_nms_0.tmp_0"]) +fetch_map["image"] = sys.argv[1] +postprocess(fetch_map) diff --git a/docs/FAQ.md b/docs/FAQ.md index 63811d846d8adf761258705f9de5c3e9a82b7e78..d3c035764a723ef99047fd6b053d216c5f9e61dd 100644 --- a/docs/FAQ.md +++ b/docs/FAQ.md @@ -18,6 +18,10 @@ batch size可以达到每GPU 4 (Tesla V100 16GB)。 +**Q:** 哪些参数会影响内存使用量?
+**A:** 会影响内存使用量的参数有:`是否使用多进程use_process、 batch_size、reader中的bufsize、reader中的memsize、数据预处理中的RandomExpand ratio参数、以及图像本身大小`等。 + + **Q:** 如何修改数据预处理?
**A:** 可在配置文件中设置 `sample_transform`。注意需要在配置文件中加入**完整预处理** 例如RCNN模型中`DecodeImage`, `NormalizeImage` and `Permute`。 diff --git a/docs/advanced_tutorials/config_doc/yolov3_mobilenet_v1.md b/docs/advanced_tutorials/config_doc/yolov3_mobilenet_v1.md new file mode 100644 index 0000000000000000000000000000000000000000..cf55c84db61a2189727cfb97ec07a73d83a9c95d --- /dev/null +++ b/docs/advanced_tutorials/config_doc/yolov3_mobilenet_v1.md @@ -0,0 +1,291 @@ + +```yml +#####################################基础配置##################################### +# 检测算法使用YOLOv3,backbone使用MobileNet_v1 +# 检测模型的名称 +architecture: YOLOv3 +# 根据硬件选择是否使用GPU +use_gpu: true + # ### max_iters为最大迭代次数,而一个iter会运行batch_size * device_num张图片。batch_size在下面 TrainReader.batch_size设置。 +max_iters: 1200 +# log平滑参数,平滑窗口大小,会从取历史窗口中取log_smooth_window大小的loss求平均值 +log_smooth_window: 20 +# 模型保存文件夹 +save_dir: output +# 每隔多少迭代保存模型 +snapshot_iter: 200 +# ### mAP 评估方式,mAP评估方式可以选择COCO和VOC或WIDERFACE,其中VOC有11point和integral两种评估方法 +metric: COCO +# ### pretrain_weights 可以是imagenet的预训练好的分类模型权重,也可以是在VOC或COCO数据集上的预训练的检测模型权重 +# 模型配置文件和权重文件可参考[模型库](https://github.com/PaddlePaddle/PaddleDetection/blob/release/0.4/docs/MODEL_ZOO.md) +pretrain_weights: https://paddlemodels.bj.bcebos.com/object_detection/yolov3_mobilenet_v1.tar +# 模型保存文件夹,如果开启了--eval,会在这个文件夹下保存best_model +weights: output/yolov3_mobilenet_v1_roadsign_coco_template/ +# ### 根据用户数据设置类别数,注意这里不含背景类 +num_classes: 4 +# finetune时忽略的参数,按照正则化匹配,匹配上的参数会被忽略掉 +finetune_exclude_pretrained_params: ['yolo_output'] +# use_fine_grained_loss +use_fine_grained_loss: false + +# 检测模型的结构 +YOLOv3: + # 默认是 MobileNetv1 + backbone: MobileNet + yolo_head: YOLOv3Head + +# 检测模型的backbone +MobileNet: + norm_decay: 0. + conv_group_scale: 1 + with_extra_blocks: false + +# 检测模型的Head +YOLOv3Head: + # anchor_masks + anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] + # 3x3 anchors + anchors: [[10, 13], [16, 30], [33, 23], + [30, 61], [62, 45], [59, 119], + [116, 90], [156, 198], [373, 326]] + # yolo_loss + yolo_loss: YOLOv3Loss + # nms 类型参数,可以设置为[MultiClassNMS, MultiClassSoftNMS, MatrixNMS], 默认使用 MultiClassNMS + nms: + # background_label,背景标签(类别)的索引,如果设置为 0 ,则忽略背景标签(类别)。如果设置为 -1 ,则考虑所有类别。默认值:0 + background_label: -1 + # NMS步骤后每个图像要保留的总bbox数。 -1表示在NMS步骤之后保留所有bbox。 + keep_top_k: 100 + # 在NMS中用于剔除检测框IOU的阈值,默认值:0.3 。 + nms_threshold: 0.45 + # 基于 score_threshold 的过滤检测后,根据置信度保留的最大检测次数。 + nms_top_k: 1000 + # 是否归一化,默认值:True 。 + normalized: false + # 过滤掉低置信度分数的边界框的阈值。 + score_threshold: 0.01 + +YOLOv3Loss: + # 这里的batch_size与训练中的batch_size(即TrainReader.batch_size)不同. + # 仅且当use_fine_grained_loss=true时,计算Loss时使用,且必须要与TrainReader.batch_size设置成一样 + batch_size: 8 + # 忽略样本的阈值 ignore_thresh + ignore_thresh: 0.7 + # 是否使用label_smooth + label_smooth: true + +LearningRate: + # ### 学习率设置 参考 https://github.com/PaddlePaddle/PaddleDetection/blob/release/0.4/docs/FAQ.md#faq%E5%B8%B8%E8%A7%81%E9%97%AE%E9%A2%98 + # base_lr + base_lr: 0.0001 + # 学习率调整策略 + # 具体实现参考[API](fluid.layers.piecewise_decay) + schedulers: + # 学习率调整策略 + - !PiecewiseDecay + gamma: 0.1 + milestones: + # ### 参考 https://github.com/PaddlePaddle/PaddleDetection/blob/release/0.4/docs/FAQ.md#faq%E5%B8%B8%E8%A7%81%E9%97%AE%E9%A2%98 + # ### 8/12 11/12 + - 800 + - 1100 + # 在训练开始时,调低学习率为base_lr * start_factor,然后逐步增长到base_lr,这个过程叫学习率热身,按照以下公式更新学习率 + # linear_step = end_lr - start_lr + # lr = start_lr + linear_step * (global_step / warmup_steps) + # 具体实现参考[API](fluid.layers.linear_lr_warmup) + - !LinearWarmup + start_factor: 0.3333333333333333 + steps: 100 + +OptimizerBuilder: + # 默认使用SGD+Momentum进行训练 + # 具体实现参考[API](fluid.optimizer) + optimizer: + momentum: 0.9 + type: Momentum + # 默认使用SGD+Momentum进行训练 + # 具体实现参考[API](fluid.optimizer) + regularizer: + factor: 0.0005 + type: L2 + +#####################################数据配置##################################### + +# 模型训练集设置参考 +# 训练、验证、测试使用的数据配置主要区别在数据路径、模型输入、数据增强参数设置 +# 如果使用 yolov3_reader.yml,下面的参数设置优先级高,会覆盖yolov3_reader.yml中的参数设置。 +# _READER_: 'yolov3_reader.yml' + +TrainReader: + # 训练过程中模型的输入设置 + # 包括图片,图片长宽高等基本信息,图片id,标记的目标框,类别等信息 + inputs_def: + fields: ['image', 'gt_bbox', 'gt_class', 'gt_score'] + # num_max_boxes,每个样本的groud truth的最多保留个数,若不够用0填充。 + num_max_boxes: 50 + # 训练数据集路径 + dataset: + # 指定数据集格式 + !COCODataSet + # 图片文件夹相对路径,路径是相对于dataset_dir,图像路径= dataset_dir + image_dir + image_name + image_dir: train2017 + # anno_path,路径是相对于dataset_dir + anno_path: annotations/instances_train2017.json + # 数据集相对路径,路径是相对于PaddleDetection + dataset_dir: dataset/coco + # 是否包含背景类,若with_background=true,num_classes需要+1 + # YOLO 系列with_background必须是false,FasterRCNN系列是true ### + with_background: false + sample_transforms: + # 读取Image图像为numpy数组 + # 可以选择将图片从BGR转到RGB,可以选择对一个batch中的图片做mixup增强 + - !DecodeImage + to_rgb: True + with_mixup: True + # MixupImage + - !MixupImage + alpha: 1.5 + beta: 1.5 + # ColorDistort + - !ColorDistort {} + # RandomExpand + - !RandomExpand + fill_value: [123.675, 116.28, 103.53] + # 随机扩充比例,默认值是4.0 + ratio: 1.5 + - !RandomCrop {} + - !RandomFlipImage + is_normalized: false + # 归一化坐标 + - !NormalizeBox {} + # 如果 bboxes 数量小于 num_max_boxes,填充值为0的 box + - !PadBox + num_max_boxes: 50 + # 坐标格式转化,从XYXY转成XYWH格式 + - !BboxXYXY2XYWH {} + # 以下是对一个batch中的所有图片同时做的数据处理 + batch_transforms: + # 多尺度训练时,从list中随机选择一个尺寸,对一个batch数据同时同时resize + - !RandomShape + sizes: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608] + random_inter: True + # NormalizeImage + - !NormalizeImage + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + is_scale: True + is_channel_first: false + - !Permute + to_bgr: false + channel_first: True + # Gt2YoloTarget is only used when use_fine_grained_loss set as true, + # this operator will be deleted automatically if use_fine_grained_loss + # is set as false + - !Gt2YoloTarget + anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] + anchors: [[10, 13], [16, 30], [33, 23], + [30, 61], [62, 45], [59, 119], + [116, 90], [156, 198], [373, 326]] + downsample_ratios: [32, 16, 8] + # 1个GPU的batch size,默认为1。需要注意:每个iter迭代会运行batch_size * device_num张图片 + batch_size: 8 + # 是否shuffle + shuffle: true + # mixup,-1表示不做Mixup数据增强。注意,这里是epoch为单位 + mixup_epoch: 250 + # 注意,在某些情况下,drop_last=false时训练过程中可能会出错,建议训练时都设置为true + drop_last: true + # 若选用多进程,设置使用多进程/线程的数目 + # 开启多进程后,占用内存会成倍增加,根据内存设置### + worker_num: 8 + # 共享内存bufsize。注意,缓存是以batch为单位,缓存的样本数据总量为batch_size * bufsize,所以请注意不要设置太大,请根据您的硬件设置。 + bufsize: 16 + # 是否使用多进程 + use_process: true + + +EvalReader: + # 评估过程中模型的输入设置 + # 包括图片,图片长宽高等基本信息,图片id,标记的目标框,类别等信息 + inputs_def: + fields: ['image', 'im_size', 'im_id'] + # num_max_boxes,每个样本的groud truth的最多保留个数,若不够用0填充。 + num_max_boxes: 50 + # 数据集路径 + dataset: + !COCODataSet + # 图片文件夹相对路径,路径是相对于dataset_dir,图像路径= dataset_dir + image_dir + image_name + image_dir: val2017 + # anno_path,路径是相对于dataset_dir + anno_path: annotations/instances_val2017.json + # 数据集相对路径,路径是相对于PaddleDetection + dataset_dir: dataset/coco + # 是否包含背景类,若with_background=true,num_classes需要+1 + # YOLO 系列with_background必须是false,FasterRCNN系列是true ### + with_background: false + sample_transforms: + # 读取Image图像为numpy数组 + # 可以选择将图片从BGR转到RGB,可以选择对一个batch中的图片做mixup增强 + - !DecodeImage + to_rgb: True + # ResizeImage + - !ResizeImage + target_size: 608 + interp: 2 + # NormalizeImage + - !NormalizeImage + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + is_scale: True + is_channel_first: false + # 如果 bboxes 数量小于 num_max_boxes,填充值为0的 box + - !PadBox + num_max_boxes: 50 + - !Permute + to_bgr: false + channel_first: True + # 1个GPU的batch size,默认为1。需要注意:每个iter迭代会运行batch_size * device_num张图片 + batch_size: 8 + # drop_empty + drop_empty: false + # 若选用多进程,设置使用多进程/线程的数目 + # 开启多进程后,占用内存会成倍增加,根据内存设置### + worker_num: 8 + # 共享内存bufsize。注意,缓存是以batch为单位,缓存的样本数据总量为batch_size * bufsize,所以请注意不要设置太大,请根据您的硬件设置。 + bufsize: 16 + +TestReader: + # 预测过程中模型的输入设置 + # 包括图片,图片长宽高等基本信息,图片id,标记的目标框,类别等信息 + inputs_def: + # 预测图像输入尺寸 + image_shape: [3, 608, 608] + fields: ['image', 'im_size', 'im_id'] + # 数据集路径 + dataset: + !ImageFolder + # anno_path,路径是相对于dataset_dir + anno_path: annotations/instances_val2017.json + # 是否包含背景类,若with_background=true,num_classes需要+1 + # YOLO 系列with_background必须是false,FasterRCNN系列是true ### + with_background: false + sample_transforms: + - !DecodeImage + to_rgb: True + # ResizeImage + - !ResizeImage + target_size: 608 + interp: 2 + # NormalizeImage + - !NormalizeImage + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + is_scale: True + is_channel_first: false + # Permute + - !Permute + to_bgr: false + channel_first: True + # 1个GPU的batch size,默认为1 + batch_size: 1 +``` diff --git a/docs/images/000000014439.jpg b/docs/images/000000014439.jpg new file mode 100644 index 0000000000000000000000000000000000000000..56a4f66768c439adf0fadbde7b150b520c6d09e3 Binary files /dev/null and b/docs/images/000000014439.jpg differ diff --git a/docs/images/000000014439_640x640.jpg b/docs/images/000000014439_640x640.jpg deleted file mode 100644 index 3096009bef48a844cbf3b6a5cf60fa06d2cc71ed..0000000000000000000000000000000000000000 Binary files a/docs/images/000000014439_640x640.jpg and /dev/null differ diff --git a/docs/images/football.gif b/docs/images/football.gif new file mode 100644 index 0000000000000000000000000000000000000000..e9e3fbd6b046f7ed60808ae6396f58a4006c4414 Binary files /dev/null and b/docs/images/football.gif differ diff --git a/docs/images/road554.png b/docs/images/road554.png new file mode 100644 index 0000000000000000000000000000000000000000..1ecd45d9403897aa048417a9b69ad06e7ce41016 Binary files /dev/null and b/docs/images/road554.png differ diff --git a/docs/tutorials/Custom_DataSet.md b/docs/tutorials/Custom_DataSet.md index aab35436a8ea6a44e84f57152514fd201a1a0932..95416e9d1608aa63e52e502b1b7b2349cada8819 100644 --- a/docs/tutorials/Custom_DataSet.md +++ b/docs/tutorials/Custom_DataSet.md @@ -6,8 +6,9 @@ - [将数据集转换为VOC格式](#方式二将数据集转换为VOC格式) - [添加新数据源](#方式三添加新数据源) - [2.选择模型](#2选择模型) -- [3.修改参数配置](#3修改参数配置) -- [4.开始训练与部署](#4开始训练与部署) +- [3.生成Anchor](#3生成Anchor) +- [4.修改参数配置](#4修改参数配置) +- [5.开始训练与部署](#5开始训练与部署) - [附:一个自定义数据集demo](#附一个自定义数据集demo) ## 1.准备数据 @@ -97,8 +98,23 @@ PaddleDetection中提供了丰富的模型库,具体可在[模型库](../MODEL 同时也可以尝试PaddleDetection中开发的[YOLOv3增强模型](../featured_model/YOLOv3_ENHANCEMENT.md)、[YOLOv4模型](../featured_model/YOLO_V4.md)与[Anchor Free模型](../featured_model/ANCHOR_FREE_DETECTION.md)等。 - -## 3.修改参数配置 +## 3.生成Anchor +在yolo系列模型中,可以运行`tools/anchor_cluster.py`来得到适用于你的数据集Anchor,使用方法如下: +``` bash +python tools/anchor_cluster.py -c configs/ppyolo/ppyolo.yml -n 9 -s 608 -m v2 -i 1000 +``` +目前`tools/anchor_cluster.py`支持的主要参数配置如下表所示: +| 参数 | 用途 | 默认值 | 备注 | +|:------:|:------:|:------:|:------:| +| -c/--config | 模型的配置文件 | 无默认值 | 必须指定 | +| -n/--n | 聚类的簇数 | 9 | Anchor的数目 | +| -s/--size | 图片的输入尺寸 | None | 若指定,则使用指定的尺寸,如果不指定, 则尝试从配置文件中读取图片尺寸 | +| -m/--method | 使用的Anchor聚类方法 | v2 | 目前只支持yolov2/v5的聚类算法 | +| -i/--iters | kmeans聚类算法的迭代次数 | 1000 | kmeans算法收敛或者达到迭代次数后终止 | +| -gi/--gen_iters | 遗传算法的迭代次数 | 1000 | 该参数只用于yolov5的Anchor聚类算法 | +| -t/--thresh| Anchor尺度的阈值 | 0.25 | 该参数只用于yolov5的Anchor聚类算法 | + +## 4.修改参数配置 选择好模型后,需要在`configs`目录中找到对应的配置文件,为了适配在自定义数据集上训练,需要对参数配置做一些修改: @@ -133,7 +149,7 @@ PaddleDetection中提供了丰富的模型库,具体可在[模型库](../MODEL - 预训练模型配置:通过在yaml配置文件中的`pretrain_weights: path/to/weights`参数可以配置路径,可以是链接或权重文件路径。可直接沿用配置文件中给出的在ImageNet数据集上的预训练模型。同时我们支持训练在COCO或Obj365数据集上的模型权重作为预训练模型,做迁移学习,详情可参考[迁移学习文档](../advanced_tutorials/TRANSFER_LEARNING_cn.md)。 -## 4.开始训练与部署 +## 5.开始训练与部署 - 参数配置完成后,就可以开始训练模型了,具体可参考[训练/评估/预测](GETTING_STARTED_cn.md)入门文档。 - 训练测试完成后,根据需要可以进行模型部署:首先需要导出可预测的模型,可参考[导出模型教程](../advanced_tutorials/deploy/EXPORT_MODEL.md);导出模型后就可以进行[C++预测部署](../advanced_tutorials/deploy/DEPLOY_CPP.md)或者[python端预测部署](../advanced_tutorials/deploy/DEPLOY_PY.md)。 diff --git a/docs/tutorials/PrepareDataSet.md b/docs/tutorials/PrepareDataSet.md new file mode 100644 index 0000000000000000000000000000000000000000..c27359dd97395a48a6ba503b002c76c37519d569 --- /dev/null +++ b/docs/tutorials/PrepareDataSet.md @@ -0,0 +1,417 @@ +# 如何准备训练数据 +## 目录 +- [目标检测数据说明](#目标检测数据说明) +- [准备训练数据](#准备训练数据) + - [VOC数据数据](#VOC数据数据) + - [VOC数据集下载](#VOC数据集下载) + - [VOC数据标注文件介绍](#VOC数据标注文件介绍) + - [COCO数据数据](#COCO数据数据) + - [COCO数据集下载](#COCO数据下载) + - [COCO数据标注文件介绍](#COCO数据标注文件介绍) + - [用户数据](#用户数据) + - [用户数据转成VOC数据](#用户数据转成VOC数据) + - [用户数据转成COCO数据](#用户数据转成COCO数据) + - [用户数据自定义reader](#用户数据自定义reader) + - [用户数据数据转换示例](#用户数据数据转换示例) + +### 目标检测数据说明 +目标检测的数据比分类复杂,一张图像中,需要标记出各个目标区域的位置和类别。 + +一般的目标区域位置用一个矩形框来表示,一般用以下3种方式表达: + +| 表达方式 | 说明 | +| :----------------: | :--------------------------------: | +| x1,y1,x2,y2 | (x1,y1)为左上角坐标,(x2,y2)为右下角坐标 | +| x,y,w,h | (x,y)为左上角坐标,w为目标区域宽度,h为目标区域高度 | +| xc,yc,w,h | (xc,yc)为目标区域中心坐标,w为目标区域宽度,h为目标区域高度 | + +常见的目标检测数据集如Pascal VOC和COCO,采用的是第一种 `x1,y1,x2,y2` 表示物体的bounding box. + +### 准备训练数据 +PaddleDetection默认支持[COCO](http://cocodataset.org)和[Pascal VOC](http://host.robots.ox.ac.uk/pascal/VOC/) 和[WIDER-FACE](http://shuoyang1213.me/WIDERFACE/) 数据源。 +同时还支持自定义数据源,包括: + +(1) 自定义数据数据转换成VOC数据; +(2) 自定义数据数据转换成COCO数据; +(3) 自定义新的数据源,增加自定义的reader。 + + +首先进入到`PaddleDetection`根目录下 +``` +cd PaddleDetection/ +ppdet_root=$(pwd) +``` + +#### VOC数据数据 +VOC数据是[Pascal VOC](http://host.robots.ox.ac.uk/pascal/VOC/) 比赛使用的数据。Pascal VOC比赛不仅包含图像分类分类任务,还包含图像目标检测、图像分割等任务,其标注文件中包含多个任务的标注内容。 +VOC数据集指的是Pascal VOC比赛使用的数据。用户自定义的VOC数据,xml文件中的非必须字段,请根据实际情况选择是否标注或是否使用默认值。 +##### VOC数据集下载 + +- 通过代码自动化下载VOC数据集 + + ``` + # 执行代码自动化下载VOC数据集 + python dataset/voc/download_voc.py + ``` + + 代码执行完成后VOC数据集文件组织结构为: + ``` + >>cd dataset/voc/ + >>tree + ├── create_list.py + ├── download_voc.py + ├── generic_det_label_list.txt + ├── generic_det_label_list_zh.txt + ├── label_list.txt + ├── VOCdevkit/VOC2007 + │ ├── annotations + │ ├── 001789.xml + │ | ... + │ ├── JPEGImages + │ ├── 001789.jpg + │ | ... + │ ├── ImageSets + │ | ... + ├── VOCdevkit/VOC2012 + │ ├── Annotations + │ ├── 2011_003876.xml + │ | ... + │ ├── JPEGImages + │ ├── 2011_003876.jpg + │ | ... + │ ├── ImageSets + │ | ... + | ... + ``` + + 各个文件说明 + ``` + # label_list.txt 是类别名称列表,文件名必须是 label_list.txt。若使用VOC数据集,config文件中use_default_label为true时不需要这个文件 + >>cat label_list.txt + aeroplane + bicycle + ... + + # trainval.txt 是训练数据集文件列表 + >>cat trainval.txt + VOCdevkit/VOC2007/JPEGImages/007276.jpg VOCdevkit/VOC2007/Annotations/007276.xml + VOCdevkit/VOC2012/JPEGImages/2011_002612.jpg VOCdevkit/VOC2012/Annotations/2011_002612.xml + ... + + # test.txt 是测试数据集文件列表 + >>cat test.txt + VOCdevkit/VOC2007/JPEGImages/000001.jpg VOCdevkit/VOC2007/Annotations/000001.xml + ... + + # label_list.txt voc 类别名称列表 + >>cat label_list.txt + + aeroplane + bicycle + ... + ``` +- 已下载VOC数据集 + 按照如上数据文件组织结构组织文件即可。 + +##### VOC数据标注文件介绍 +VOC数据是每个图像文件对应一个同名的xml文件,xml文件中标记物体框的坐标和类别等信息。例如图像`2007_002055.jpg`: +![](../images/2007_002055.jpg) + +图片对应的xml文件内包含对应图片的基本信息,比如文件名、来源、图像尺寸以及图像中包含的物体区域信息和类别信息等。 + +xml文件中包含以下字段: +- filename,表示图像名称。 +- size,表示图像尺寸。包括:图像宽度、图像高度、图像深度。 + ``` + + 500 + 375 + 3 + + ``` +- object字段,表示每个物体。包括: + + | 标签 | 说明 | + | :--------: | :-----------: | + | name | 物体类别名称 | + | pose | 关于目标物体姿态描述(非必须字段) | + | truncated | 如果物体的遮挡超过15-20%并且位于边界框之外,请标记为`truncated`(非必须字段) | + | difficult | 难以识别的物体标记为`difficult`(非必须字段) | + | bndbox子标签 | (xmin,ymin) 左上角坐标,(xmax,ymax) 右下角坐标, | + + +#### COCO数据 +COCO数据是[COCO](http://cocodataset.org) 比赛使用的数据。同样的,COCO比赛数也包含多个比赛任务,其标注文件中包含多个任务的标注内容。 +COCO数据集指的是COCO比赛使用的数据。用户自定义的COCO数据,json文件中的一些字段,请根据实际情况选择是否标注或是否使用默认值。 + + +##### COCO数据下载 +- 通过代码自动化下载COCO数据集 + + ``` + # 执行代码自动化下载COCO数据集 + python dataset/voc/download_coco.py + ``` + + 代码执行完成后COCO数据集文件组织结构为: + ``` + >>cd dataset/coco/ + >>tree + ├── annotations + │ ├── instances_train2017.json + │ ├── instances_val2017.json + │ | ... + ├── train2017 + │ ├── 000000000009.jpg + │ ├── 000000580008.jpg + │ | ... + ├── val2017 + │ ├── 000000000139.jpg + │ ├── 000000000285.jpg + │ | ... + | ... + ``` +- 已下载COCO数据集 + 按照如上数据文件组织结构组织文件即可。 + +##### COCO数据标注介绍 +COCO数据标注是将所有训练图像的标注都存放到一个json文件中。数据以字典嵌套的形式存放。 + +json文件中包含以下key: +- info,表示标注文件info。 +- licenses,表示标注文件licenses。 +- images,表示标注文件中图像信息列表,每个元素是一张图像的信息。如下为其中一张图像的信息: + ``` + { + 'license': 3, # license + 'file_name': '000000391895.jpg', # file_name + # coco_url + 'coco_url': 'http://images.cocodataset.org/train2017/000000391895.jpg', + 'height': 360, # image height + 'width': 640, # image width + 'date_captured': '2013-11-14 11:18:45', # date_captured + # flickr_url + 'flickr_url': 'http://farm9.staticflickr.com/8186/8119368305_4e622c8349_z.jpg', + 'id': 391895 # image id + } + ``` +- annotations,表示标注文件中目标物体的标注信息列表,每个元素是一个目标物体的标注信息。如下为其中一个目标物体的标注信息: + ``` + { + + 'segmentation': # 物体的分割标注 + 'area': 2765.1486500000005, # 物体的区域面积 + 'iscrowd': 0, # iscrowd + 'image_id': 558840, # image id + 'bbox': [199.84, 200.46, 77.71, 70.88], # bbox + 'category_id': 58, # category_id + 'id': 156 # image id + } + ``` + + ``` + # 查看COCO标注文件 + import json + coco_anno = json.load(open('./annotations/instances_train2017.json')) + + # coco_anno.keys + print('\nkeys:', coco_anno.keys()) + + # 查看类别信息 + print('\n物体类别:', coco_anno['categories']) + + # 查看一共多少张图 + print('\n图像数量:', len(coco_anno['images'])) + + # 查看一共多少个目标物体 + print('\n标注物体数量:', len(coco_anno['annotations'])) + + # 查看一条目标物体标注信息 + print('\n查看一条目标物体标注信息:', coco_anno['annotations'][0]) + ``` + + COCO数据准备如下。 + `dataset/coco/`最初文件组织结构 + ``` + >>cd dataset/coco/ + >>tree + ├── download_coco.py + ``` + +#### 用户数据 +对于用户数据有3种处理方法: +(1) 将用户数据转成VOC数据(根据需要仅包含物体检测所必须的标签即可) +(2) 将用户数据转成COCO数据(根据需要仅包含物体检测所必须的标签即可) +(3) 自定义一个用户数据的reader(较复杂数据,需要自定义reader) + +##### 用户数据转成VOC数据 +用户数据集转成VOC数据后目录结构如下(注意数据集中路径名、文件名尽量不要使用中文,避免中文编码问题导致出错): + +``` +dataset/xxx/ +├── annotations +│ ├── xxx1.xml +│ ├── xxx2.xml +│ ├── xxx3.xml +│ | ... +├── images +│ ├── xxx1.jpg +│ ├── xxx2.jpg +│ ├── xxx3.jpg +│ | ... +├── label_list.txt (必须提供,且文件名称必须是label_list.txt ) +├── train.txt (训练数据集文件列表, ./images/xxx1.jpg ./annotations/xxx1.xml) +└── valid.txt (测试数据集文件列表) +``` + +各个文件说明 +``` +# label_list.txt 是类别名称列表,改文件名必须是这个 +>>cat label_list.txt +classname1 +classname2 +... + +# train.txt 是训练数据文件列表 +>>cat train.txt +./images/xxx1.jpg ./annotations/xxx1.xml +./images/xxx2.jpg ./annotations/xxx2.xml +... + +# valid.txt 是验证数据文件列表 +>>cat valid.txt +./images/xxx3.jpg ./annotations/xxx3.xml +... +``` + +##### 用户数据转成COCO +在`./tools/`中提供了`x2coco.py`用于将VOC数据集、labelme标注的数据集或cityscape数据集转换为COCO数据,例如: + +(1)labelmes数据转换为COCO数据: +```bash +python tools/x2coco.py \ + --dataset_type labelme \ + --json_input_dir ./labelme_annos/ \ + --image_input_dir ./labelme_imgs/ \ + --output_dir ./cocome/ \ + --train_proportion 0.8 \ + --val_proportion 0.2 \ + --test_proportion 0.0 +``` +(2)voc数据转换为COCO数据: +```bash +python tools/x2coco.py \ + --dataset_type voc \ + --voc_anno_dir path/to/VOCdevkit/VOC2007/Annotations/ \ + --voc_anno_list path/to/VOCdevkit/VOC2007/ImageSets/Main/trainval.txt \ + --voc_label_list dataset/voc/label_list.txt \ + --voc_out_name voc_train.json +``` + +用户数据集转成COCO数据后目录结构如下(注意数据集中路径名、文件名尽量不要使用中文,避免中文编码问题导致出错): +``` +dataset/xxx/ +├── annotations +│ ├── train.json # coco数据的标注文件 +│ ├── valid.json # coco数据的标注文件 +├── images +│ ├── xxx1.jpg +│ ├── xxx2.jpg +│ ├── xxx3.jpg +│ | ... +... +``` + +##### 用户数据自定义reader +如果数据集有新的数据需要添加进PaddleDetection中,您可参考数据处理文档中的[添加新数据源](../advanced_tutorials/READER.md#添加新数据源)文档部分,开发相应代码完成新的数据源支持,同时数据处理具体代码解析等可阅读[数据处理文档](../advanced_tutorials/READER.md) + + +#### 用户数据数据转换示例 + +以[Kaggle数据集](https://www.kaggle.com/andrewmvd/road-sign-detection) 比赛数据为例,说明如何准备自定义数据。 +Kaggle上的 [road-sign-detection](https://www.kaggle.com/andrewmvd/road-sign-detection) 比赛数据包含877张图像,数据类别4类:crosswalk,speedlimit,stop,trafficlight。 +可从Kaggle上下载,也可以从[下载链接](https://paddlemodels.bj.bcebos.com/object_detection/roadsign.zip) 下载。 +路标数据集示例图: +![](../images/road554.png) + +``` +# 下载解压数据 +>>cd $(ppdet_root)/dataset +# 下载kaggle数据集并解压,当前文件组织结构如下 + +├── annotations +│ ├── road0.xml +│ ├── road1.xml +│ ├── road10.xml +│ | ... +├── images +│ ├── road0.jpg +│ ├── road1.jpg +│ ├── road2.jpg +│ | ... +``` + +将数据划分为训练集和测试集 +``` +# 生成 label_list.txt 文件 +>>echo "speedlimit\ncrosswalk\ntrafficlight\nstop" > label_list.txt + +# 生成 train.txt、valid.txt和test.txt列表文件 +>>ls images/*.png | shuf > all_image_list.txt +>>awk -F"/" '{print $2}' all_image_list.txt | awk -F".png" '{print $1}' | awk -F"\t" '{print "images/"$1".png annotations/"$1".xml"}' > all_list.txt + +# 训练集、验证集、测试集比例分别约80%、10%、10%。 +>>head -n 88 all_list.txt > test.txt +>>head -n 176 all_list.txt | tail -n 88 > valid.txt +>>tail -n 701 all_list.txt > train.txt + +# 删除不用文件 +>>rm -rf all_image_list.txt all_list.txt + +最终数据集文件组织结构为: + +├── annotations +│ ├── road0.xml +│ ├── road1.xml +│ ├── road10.xml +│ | ... +├── images +│ ├── road0.jpg +│ ├── road1.jpg +│ ├── road2.jpg +│ | ... +├── label_list.txt +├── test.txt +├── train.txt +└── valid.txt + +# label_list.txt 是类别名称列表,文件名必须是 label_list.txt +>>cat label_list.txt +crosswalk +speedlimit +stop +trafficlight + +# train.txt 是训练数据集文件列表,每一行是一张图像路径和对应标注文件路径,以空格分开。注意这里的路径是数据集文件夹内的相对路径。 +>>cat train.txt +./images/road839.png ./annotations/road839.xml +./images/road363.png ./annotations/road363.xml +... + +# valid.txt 是验证数据集文件列表,每一行是一张图像路径和对应标注文件路径,以空格分开。注意这里的路径是数据集文件夹内的相对路径。 +>>cat valid.txt +./images/road218.png ./annotations/road218.xml +./images/road681.png ./annotations/road681.xml +``` + +也可以下载准备好的数据[下载链接](https://paddlemodels.bj.bcebos.com/object_detection/roadsign_voc.zip) ,解压到`dataset/roadsign_voc/`文件夹下即可。 +准备好数据后,一般的我们要对数据有所了解,比如图像量,图像尺寸,每一类目标区域个数,目标区域大小等。如有必要,还要对数据进行清洗。 +roadsign数据集统计: + +| 数据 | 图片数量 | +| :--------: | :-----------: | +| train | 701 | +| valid | 176 | + +**说明:**(1)用户数据,建议在训练前仔细检查数据,避免因数据标注格式错误或图像数据不完整造成训练过程中的crash +(2)如果图像尺寸太大的话,在不限制读入数据尺寸情况下,占用内存较多,会造成内存/显存溢出,请合理设置batch_size,可从小到大尝试 diff --git a/docs/tutorials/QUICK_STARTED_cn.md b/docs/tutorials/QUICK_STARTED_cn.md index 34b6106e9081dc15a89834560a7b9be76ddbf106..1e0fe3b679b07718e3a3a095758a799641072772 100644 --- a/docs/tutorials/QUICK_STARTED_cn.md +++ b/docs/tutorials/QUICK_STARTED_cn.md @@ -1,77 +1,79 @@ [English](QUICK_STARTED.md) | 简体中文 - # 快速开始 +为了使得用户能够在很短时间内快速产出模型,掌握PaddleDetection的使用方式,这篇教程通过一个预训练检测模型对小数据集进行finetune。在较短时间内即可产出一个效果不错的模型。实际业务中,建议用户根据需要选择合适模型配置文件进行适配。 -为了使得用户能够在很短的时间内快速产出模型,掌握PaddleDetection的使用方式,这篇教程通过一个预训练检测模型对小数据集进行finetune。在P40上单卡大约20min即可产出一个效果不错的模型。 - -- **注:在开始前,如果有GPU设备,指定GPU设备号。** -```bash -export CUDA_VISIBLE_DEVICES=0 +## 一、快速体验 +``` +# 用PP-YOLO算法在COCO数据集上预训练模型预测一张图片 +python tools/infer.py -c configs/ppyolo/ppyolo.yml -o use_gpu=true weights=https://paddlemodels.bj.bcebos.com/object_detection/ppyolo.pdparams --infer_img=demo/000000014439.jpg ``` +结果如下图: -## 数据准备 +![](../images/000000014439.jpg) -数据集参考[Kaggle数据集](https://www.kaggle.com/mbkinaci/fruit-images-for-object-detection),其中训练数据集240张图片,测试数据集60张图片,数据类别为3类:苹果,橘子,香蕉。[下载链接](https://dataset.bj.bcebos.com/PaddleDetection_demo/fruit-detection.tar)。数据下载后分别解压即可, 数据准备脚本位于[download_fruit.py](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dataset/fruit/download_fruit.py)。下载数据方式如下: -```bash -python dataset/fruit/download_fruit.py +## 二、准备数据 +数据集参考[Kaggle数据集](https://www.kaggle.com/andrewmvd/road-sign-detection) ,包含877张图像,数据类别4类:crosswalk,speedlimit,stop,trafficlight。 +将数据划分为训练集701张图和测试集176张图,[下载链接](https://paddlemodels.bj.bcebos.com/object_detection/roadsign_voc.tar). + +``` +# 注意:可跳过这步下载,后面训练会自动下载 +python dataset/roadsign_voc/download_roadsign_voc.py ``` -## 开始训练 +## 三、训练、评估、预测 +### 1、训练 +``` +# 边训练边测试 CPU需要约1小时(use_gpu=false),1080Ti GPU需要约5分钟。 +# -c 参数表示指定使用哪个配置文件 +# -o 参数表示指定配置文件种的全局变量(覆盖配置文件种的设置),这里设置使用gpu, +# --eval 参数表示边训练边评估,会自动保存一个评估结果最的名为best_model.pdmodel的模型 -训练命令如下: -```bash -python -u tools/train.py -c configs/yolov3_mobilenet_v1_fruit.yml --eval +python tools/train.py -c configs/yolov3_mobilenet_v1_roadsign.yml --eval -o use_gpu=true ``` -训练使用`yolov3_mobilenet_v1`基于COCO数据集训练好的模型进行finetune。 +如果想通过VisualDL实时观察loss变化去去曲线,在训练命令种添加--use_vdl=true,以及通过--vdl_log_dir设置日志保存路径。 +**但注意VisualDL需Python>=3.5** -如果想通过VisualDL实时观察loss和精度值,启动命令添加`--use_vdl=True`,以及通过`--vdl_log_dir`设置日志保存路径,但注意**VisualDL需Python>=3.5**: - +首先安装[VisualDL](https://github.com/PaddlePaddle/VisualDL) +``` +python -m pip install visualdl -i https://mirror.baidu.com/pypi/simple +``` -```bash -python -u tools/train.py -c configs/yolov3_mobilenet_v1_fruit.yml \ - --use_vdl=True \ - --vdl_log_dir=vdl_fruit_dir/scalar \ +``` +python -u tools/train.py -c configs/yolov3_mobilenet_v1_roadsign.yml \ + --use_vdl=true \ + --vdl_log_dir=vdl_dir/scalar \ --eval ``` - -通过`visualdl`命令实时查看变化曲线: - -```bash -visualdl --logdir vdl_fruit_dir/scalar/ --host --port +通过visualdl命令实时查看变化曲线: +``` +visualdl --logdir vdl_dir/scalar/ --host --port ``` -VisualDL结果显示如下: - -![](../images/visualdl_fruit.jpg) - -训练模型[下载链接](https://paddlemodels.bj.bcebos.com/object_detection/yolov3_mobilenet_v1_fruit.tar) - -## 评估预测 - -评估命令如下: - -```bash -python -u tools/eval.py -c configs/yolov3_mobilenet_v1_fruit.yml +### 2、评估 +``` +# 评估 默认使用训练过程中保存的best_model +# -c 参数表示指定使用哪个配置文件 +# -o 参数表示指定配置文件种的全局变量(覆盖配置文件种的设置) +python tools/eval.py -c configs/yolov3_mobilenet_v1_roadsign.yml-o use_gpu=true ``` -预测命令如下 -```bash -python -u tools/infer.py -c configs/yolov3_mobilenet_v1_fruit.yml \ - -o weights=https://paddlemodels.bj.bcebos.com/object_detection/yolov3_mobilenet_v1_fruit.tar \ - --infer_img=demo/orange_71.jpg +### 3、预测 ``` +# -c 参数表示指定使用哪个配置文件 +# -o 参数表示指定配置文件种的全局变量(覆盖配置文件种的设置) +# --infer_img 参数指定预测图像路径 +# 预测结束后会在output文件夹中生成一张画有预测结果的同名图像 -预测图片如下: -![](../../demo/orange_71.jpg) - -![](../images/orange_71_detection.jpg) +python tools/infer.py -c configs/yolov3_mobilenet_v1_roadsign.yml -o use_gpu=true --infer_img=demo/road554.png +``` +结果如下图: -更多训练及评估流程,请参考[入门使用文档](GETTING_STARTED_cn.md)。 +![](../images/road554.png) diff --git a/ppdet/core/workspace.py b/ppdet/core/workspace.py index b7f7370b4bbbe023b2ffe7a18d77255dc275d44b..a5124b5cd9526d9e77391a5af29a6da0cc1feb28 100644 --- a/ppdet/core/workspace.py +++ b/ppdet/core/workspace.py @@ -97,6 +97,15 @@ def load_config(file_path): del cfg[READER_KEY] merge_config(cfg) + + # NOTE: training batch size defined only in TrainReader, sychornized + # batch size config to global, models can get batch size config + # from global config when building model. + # batch size in evaluation or inference can also be added here + if 'TrainReader' in global_config: + global_config['train_batch_size'] = global_config['TrainReader'][ + 'batch_size'] + return global_config diff --git a/ppdet/data/source/dataset.py b/ppdet/data/source/dataset.py index fda0c7d22f616cdd54b9ac95d080f97bbdcff01e..7cddaa93a8cf8fdae0ae5008ea970aa4fdb91166 100644 --- a/ppdet/data/source/dataset.py +++ b/ppdet/data/source/dataset.py @@ -41,7 +41,7 @@ class DataSet(object): anno_path=None, sample_num=-1, with_background=True, - use_default_label=None, + use_default_label=False, **kwargs): super(DataSet, self).__init__() self.anno_path = anno_path @@ -117,7 +117,7 @@ class ImageFolder(DataSet): anno_path=None, sample_num=-1, with_background=True, - use_default_label=None, + use_default_label=False, **kwargs): super(ImageFolder, self).__init__(dataset_dir, image_dir, anno_path, sample_num, with_background, diff --git a/ppdet/data/source/voc.py b/ppdet/data/source/voc.py index 560ed17ea24028963c51ecddedcf4ac095a2e9f8..84c5990c3745574e0878bc087876cd8dd28312af 100644 --- a/ppdet/data/source/voc.py +++ b/ppdet/data/source/voc.py @@ -51,7 +51,7 @@ class VOCDataSet(DataSet): image_dir=None, anno_path=None, sample_num=-1, - use_default_label=True, + use_default_label=False, with_background=True, label_list='label_list.txt'): super(VOCDataSet, self).__init__( diff --git a/ppdet/modeling/losses/yolo_loss.py b/ppdet/modeling/losses/yolo_loss.py index 6823c024bc5aea54c4f9b5195da03cb16dfee8a3..e978eb992ec0aca9038e61ebb367cd47e6885a7b 100644 --- a/ppdet/modeling/losses/yolo_loss.py +++ b/ppdet/modeling/losses/yolo_loss.py @@ -32,17 +32,17 @@ class YOLOv3Loss(object): Combined loss for YOLOv3 network Args: - batch_size (int): training batch size + train_batch_size (int): training batch size ignore_thresh (float): threshold to ignore confidence loss label_smooth (bool): whether to use label smoothing use_fine_grained_loss (bool): whether use fine grained YOLOv3 loss instead of fluid.layers.yolov3_loss """ __inject__ = ['iou_loss', 'iou_aware_loss'] - __shared__ = ['use_fine_grained_loss'] + __shared__ = ['use_fine_grained_loss', 'train_batch_size'] def __init__(self, - batch_size=8, + train_batch_size=8, ignore_thresh=0.7, label_smooth=True, use_fine_grained_loss=False, @@ -51,7 +51,7 @@ class YOLOv3Loss(object): downsample=[32, 16, 8], scale_x_y=1., match_score=False): - self._batch_size = batch_size + self._train_batch_size = train_batch_size self._ignore_thresh = ignore_thresh self._label_smooth = label_smooth self._use_fine_grained_loss = use_fine_grained_loss @@ -65,7 +65,7 @@ class YOLOv3Loss(object): anchor_masks, mask_anchors, num_classes, prefix_name): if self._use_fine_grained_loss: return self._get_fine_grained_loss( - outputs, targets, gt_box, self._batch_size, num_classes, + outputs, targets, gt_box, self._train_batch_size, num_classes, mask_anchors, self._ignore_thresh) else: losses = [] @@ -95,7 +95,7 @@ class YOLOv3Loss(object): outputs, targets, gt_box, - batch_size, + train_batch_size, num_classes, mask_anchors, ignore_thresh, @@ -108,7 +108,7 @@ class YOLOv3Loss(object): targets ([Variables]): List of Variables, The targets for yolo loss calculatation. gt_box (Variable): The ground-truth boudding boxes. - batch_size (int): The training batch size + train_batch_size (int): The training batch size num_classes (int): class num of dataset mask_anchors ([[float]]): list of anchors in each output layer ignore_thresh (float): prediction bbox overlap any gt_box greater @@ -171,7 +171,7 @@ class YOLOv3Loss(object): loss_h = fluid.layers.reduce_sum(loss_h, dim=[1, 2, 3]) if self._iou_loss is not None: loss_iou = self._iou_loss(x, y, w, h, tx, ty, tw, th, anchors, - downsample, self._batch_size, + downsample, self._train_batch_size, scale_x_y) loss_iou = loss_iou * tscale_tobj loss_iou = fluid.layers.reduce_sum(loss_iou, dim=[1, 2, 3]) @@ -180,14 +180,14 @@ class YOLOv3Loss(object): if self._iou_aware_loss is not None: loss_iou_aware = self._iou_aware_loss( ioup, x, y, w, h, tx, ty, tw, th, anchors, downsample, - self._batch_size, scale_x_y) + self._train_batch_size, scale_x_y) loss_iou_aware = loss_iou_aware * tobj loss_iou_aware = fluid.layers.reduce_sum( loss_iou_aware, dim=[1, 2, 3]) loss_iou_awares.append(fluid.layers.reduce_mean(loss_iou_aware)) loss_obj_pos, loss_obj_neg = self._calc_obj_loss( - output, obj, tobj, gt_box, self._batch_size, anchors, + output, obj, tobj, gt_box, self._train_batch_size, anchors, num_classes, downsample, self._ignore_thresh, scale_x_y) loss_cls = fluid.layers.sigmoid_cross_entropy_with_logits(cls, tcls) diff --git a/ppdet/utils/download.py b/ppdet/utils/download.py index a7a999ebc71416f2a9357418ea6a8bb2ec58a96a..cc78e47539cbd5ddddf7f070677686a4024d5623 100644 --- a/ppdet/utils/download.py +++ b/ppdet/utils/download.py @@ -78,6 +78,12 @@ DATASETS = { 'https://dataset.bj.bcebos.com/PaddleDetection_demo/fruit.tar', 'baa8806617a54ccf3685fa7153388ae6', ), ], ['Annotations', 'JPEGImages']), + 'roadsign_voc': ([( + 'https://paddlemodels.bj.bcebos.com/object_detection/roadsign_voc.tar', + '8d629c0f880dd8b48de9aeff44bf1f3e', ), ], ['annotations', 'images']), + 'roadsign_coco': ([( + 'https://paddlemodels.bj.bcebos.com/object_detection/roadsign_coco.tar', + '49ce5a9b5ad0d6266163cd01de4b018e', ), ], ['annotations', 'images']), 'objects365': (), } @@ -117,7 +123,7 @@ def get_dataset_path(path, annotation, image_dir): "https://www.objects365.org/download.html".format(name)) data_dir = osp.join(DATASET_HOME, name) # For voc, only check dir VOCdevkit/VOC2012, VOCdevkit/VOC2007 - if name == 'voc' or name == 'fruit': + if name == 'voc' or name == 'fruit' or name == 'roadsign_voc': exists = True for sub_dir in dataset[1]: check_dir = osp.join(data_dir, sub_dir) @@ -129,7 +135,7 @@ def get_dataset_path(path, annotation, image_dir): return data_dir # voc exist is checked above, voc is not exist here - check_exist = name != 'voc' and name != 'fruit' + check_exist = name != 'voc' and name != 'fruit' and name != 'roadsign_voc' for url, md5sum in dataset[0]: get_path(url, data_dir, md5sum, check_exist) @@ -139,10 +145,11 @@ def get_dataset_path(path, annotation, image_dir): return data_dir # not match any dataset in DATASETS - raise ValueError("Dataset {} is not valid and cannot parse dataset type " - "'{}' for automaticly downloading, which only supports " - "'voc' , 'coco', 'wider_face' and 'fruit' currently". - format(path, osp.split(path)[-1])) + raise ValueError( + "Dataset {} is not valid and cannot parse dataset type " + "'{}' for automaticly downloading, which only supports " + "'voc' , 'coco', 'wider_face', 'fruit' and 'roadsign_voc' currently". + format(path, osp.split(path)[-1])) def create_voc_list(data_dir, devkit_subdir='VOCdevkit'): @@ -232,13 +239,19 @@ def _dataset_exists(path, annotation, image_dir): if annotation: annotation_path = osp.join(path, annotation) + if not osp.exists(annotation_path): + logger.error("Config dataset_dir {} is not exits!".format(path)) + if not osp.isfile(annotation_path): - logger.debug("Config annotation {} is not a " - "file, dataset config is not " - "valid".format(annotation_path)) + logger.warning("Config annotation {} is not a " + "file, dataset config is not " + "valid".format(annotation_path)) return False if image_dir: image_path = osp.join(path, image_dir) + if not osp.exists(image_path): + logger.warning("Config dataset_dir {} is not exits!".format(path)) + if not osp.isdir(image_path): logger.warning("Config image_dir {} is not a " "directory, dataset config is not " diff --git a/ppdet/utils/voc_eval.py b/ppdet/utils/voc_eval.py index 1b82928ff4ec7379fcd3b59d9735c0e9d71b11a8..4ffd91260c25b295d435caa3b07e75bc60597a8b 100644 --- a/ppdet/utils/voc_eval.py +++ b/ppdet/utils/voc_eval.py @@ -107,8 +107,8 @@ def bbox_eval(results, logger.info("Accumulating evaluatation results...") detection_map.accumulate() map_stat = 100. * detection_map.get_map() - logger.info("mAP({:.2f}, {}) = {:.2f}".format(overlap_thresh, map_type, - map_stat)) + logger.info("mAP({:.2f}, {}) = {:.2f}%".format(overlap_thresh, map_type, + map_stat)) return map_stat diff --git a/tools/anchor_cluster.py b/tools/anchor_cluster.py new file mode 100644 index 0000000000000000000000000000000000000000..5ec26355c00ec283c230deea7cbeedf2b521c87f --- /dev/null +++ b/tools/anchor_cluster.py @@ -0,0 +1,362 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import os +import sys +# add python path of PadleDetection to sys.path +parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 2))) +if parent_path not in sys.path: + sys.path.append(parent_path) + +from scipy.cluster.vq import kmeans +import random +import numpy as np +from tqdm import tqdm +from ppdet.utils.cli import ArgsParser +from ppdet.utils.check import check_gpu, check_version, check_config +from ppdet.core.workspace import load_config, merge_config, create + +import logging +FORMAT = '%(asctime)s-%(levelname)s: %(message)s' +logging.basicConfig(level=logging.INFO, format=FORMAT) +logger = logging.getLogger(__name__) + + +class BaseAnchorCluster(object): + def __init__(self, n, cache_path, cache, verbose=True): + """ + Base Anchor Cluster + + Args: + n (int): number of clusters + cache_path (str): cache directory path + cache (bool): whether using cache + verbose (bool): whether print results + """ + super(BaseAnchorCluster, self).__init__() + self.n = n + self.cache_path = cache_path + self.cache = cache + self.verbose = verbose + + def print_result(self, centers): + raise NotImplementedError('%s.print_result is not available' % + self.__class__.__name__) + + def get_whs(self): + whs_cache_path = os.path.join(self.cache_path, 'whs.npy') + shapes_cache_path = os.path.join(self.cache_path, 'shapes.npy') + if self.cache and os.path.exists(whs_cache_path) and os.path.exists( + shapes_cache_path): + self.whs = np.load(whs_cache_path) + self.shapes = np.load(shapes_cache_path) + return self.whs, self.shapes + whs = np.zeros((0, 2)) + shapes = np.zeros((0, 2)) + roidbs = self.dataset.get_roidb() + for rec in tqdm(roidbs): + h, w = rec['h'], rec['w'] + bbox = rec['gt_bbox'] + wh = bbox[:, 2:4] - bbox[:, 0:2] + 1 + wh = wh / np.array([[w, h]]) + shape = np.ones_like(wh) * np.array([[w, h]]) + whs = np.vstack((whs, wh)) + shapes = np.vstack((shapes, shape)) + + if self.cache: + os.makedirs(self.cache_path, exist_ok=True) + np.save(whs_cache_path, whs) + np.save(shapes_cache_path, shapes) + + self.whs = whs + self.shapes = shapes + return self.whs, self.shapes + + def calc_anchors(self): + raise NotImplementedError('%s.calc_anchors is not available' % + self.__class__.__name__) + + def __call__(self): + self.get_whs() + centers = self.calc_anchors() + if self.verbose: + self.print_result(centers) + return centers + + +class YOLOv2AnchorCluster(BaseAnchorCluster): + def __init__(self, + n, + dataset, + size, + cache_path, + cache, + iters=1000, + verbose=True): + super(YOLOv2AnchorCluster, self).__init__( + n, cache_path, cache, verbose=verbose) + """ + YOLOv2 Anchor Cluster + + Reference: + https://github.com/AlexeyAB/darknet/blob/master/scripts/gen_anchors.py + + Args: + n (int): number of clusters + dataset (DataSet): DataSet instance, VOC or COCO + size (list): [w, h] + cache_path (str): cache directory path + cache (bool): whether using cache + iters (int): kmeans algorithm iters + verbose (bool): whether print results + """ + self.dataset = dataset + self.size = size + self.iters = iters + + def print_result(self, centers): + logger.info('%d anchor cluster result: [w, h]' % self.n) + for w, h in centers: + logger.info('[%d, %d]' % (round(w), round(h))) + + def metric(self, whs, centers): + wh1 = whs[:, None] + wh2 = centers[None] + inter = np.minimum(wh1, wh2).prod(2) + return inter / (wh1.prod(2) + wh2.prod(2) - inter) + + def kmeans_expectation(self, whs, centers, assignments): + dist = self.metric(whs, centers) + new_assignments = dist.argmax(1) + converged = (new_assignments == assignments).all() + return converged, new_assignments + + def kmeans_maximizations(self, whs, centers, assignments): + new_centers = np.zeros_like(centers) + for i in range(centers.shape[0]): + mask = (assignments == i) + if mask.sum(): + new_centers[i, :] = whs[mask].mean(0) + return new_centers + + def calc_anchors(self): + self.whs = self.whs * np.array([self.size]) + # random select k centers + whs, n, iters = self.whs, self.n, self.iters + logger.info('Running kmeans for %d anchors on %d points...' % + (n, len(whs))) + idx = np.random.choice(whs.shape[0], size=n, replace=False) + centers = whs[idx] + assignments = np.zeros(whs.shape[0:1]) * -1 + # kmeans + if n == 1: + return self.kmeans_maximizations(whs, centers, assignments) + + pbar = tqdm(range(iters), desc='Cluster anchors with k-means algorithm') + for _ in pbar: + # E step + converged, assignments = self.kmeans_expectation(whs, centers, + assignments) + if converged: + break + # M step + centers = self.kmeans_maximizations(whs, centers, assignments) + ious = self.metric(whs, centers) + pbar.desc = 'avg_iou: %.4f' % (ious.max(1).mean()) + + centers = sorted(centers, key=lambda x: x[0] * x[1]) + return centers + + +class YOLOv5AnchorCluster(BaseAnchorCluster): + def __init__(self, + n, + dataset, + size, + cache_path, + cache, + iters=300, + gen_iters=1000, + thresh=0.25, + verbose=True): + super(YOLOv5AnchorCluster, self).__init__( + n, cache_path, cache, verbose=verbose) + """ + YOLOv5 Anchor Cluster + + Reference: + https://github.com/ultralytics/yolov5/blob/master/utils/general.py + + Args: + n (int): number of clusters + dataset (DataSet): DataSet instance, VOC or COCO + size (list): [w, h] + cache_path (str): cache directory path + cache (bool): whether using cache + iters (int): iters of kmeans algorithm + gen_iters (int): iters of genetic algorithm + threshold (float): anchor scale threshold + verbose (bool): whether print results + """ + self.dataset = dataset + self.size = size + self.iters = iters + self.gen_iters = gen_iters + self.thresh = thresh + + def print_result(self, centers): + whs = self.whs + centers = centers[np.argsort(centers.prod(1))] + x, best = self.metric(whs, centers) + bpr, aat = ( + best > self.thresh).mean(), (x > self.thresh).mean() * self.n + logger.info( + 'thresh=%.2f: %.4f best possible recall, %.2f anchors past thr' % + (self.thresh, bpr, aat)) + logger.info( + 'n=%g, img_size=%s, metric_all=%.3f/%.3f-mean/best, past_thresh=%.3f-mean: ' + % (self.n, self.size, x.mean(), best.mean(), + x[x > self.thresh].mean())) + logger.info('%d anchor cluster result: [w, h]' % self.n) + for w, h in centers: + logger.info('[%d, %d]' % (round(w), round(h))) + + def metric(self, whs, centers): + r = whs[:, None] / centers[None] + x = np.minimum(r, 1. / r).min(2) + return x, x.max(1) + + def fitness(self, whs, centers): + _, best = self.metric(whs, centers) + return (best * (best > self.thresh)).mean() + + def calc_anchors(self): + self.whs = self.whs * self.shapes / self.shapes.max( + 1, keepdims=True) * np.array([self.size]) + wh0 = self.whs + i = (wh0 < 3.0).any(1).sum() + if i: + logger.warn('Extremely small objects found. %d of %d' + 'labels are < 3 pixels in width or height' % + (i, len(wh0))) + + wh = wh0[(wh0 >= 2.0).any(1)] + logger.info('Running kmeans for %g anchors on %g points...' % + (self.n, len(wh))) + s = wh.std(0) + centers, dist = kmeans(wh / s, self.n, iter=self.iters) + centers *= s + + f, sh, mp, s = self.fitness(wh, centers), centers.shape, 0.9, 0.1 + pbar = tqdm( + range(self.gen_iters), + desc='Evolving anchors with Genetic Algorithm') + for _ in pbar: + v = np.ones(sh) + while (v == 1).all(): + v = ((np.random.random(sh) < mp) * np.random.random() * + np.random.randn(*sh) * s + 1).clip(0.3, 3.0) + new_centers = (centers.copy() * v).clip(min=2.0) + new_f = self.fitness(wh, new_centers) + if new_f > f: + f, centers = new_f, new_centers.copy() + pbar.desc = 'Evolving anchors with Genetic Algorithm: fitness = %.4f' % f + + return centers + + +def main(): + parser = ArgsParser() + parser.add_argument( + '--n', '-n', default=9, type=int, help='num of clusters') + parser.add_argument( + '--iters', + '-i', + default=1000, + type=int, + help='num of iterations for kmeans') + parser.add_argument( + '--gen_iters', + '-gi', + default=1000, + type=int, + help='num of iterations for genetic algorithm') + parser.add_argument( + '--thresh', + '-t', + default=0.25, + type=float, + help='anchor scale threshold') + parser.add_argument( + '--verbose', '-v', default=True, type=bool, help='whether print result') + parser.add_argument( + '--size', + '-s', + default=None, + type=str, + help='image size: w,h, using comma as delimiter') + parser.add_argument( + '--method', + '-m', + default='v2', + type=str, + help='cluster method, [v2, v5] are supported now') + parser.add_argument( + '--cache_path', default='cache', type=str, help='cache path') + parser.add_argument( + '--cache', action='store_true', help='whether use cache') + FLAGS = parser.parse_args() + + cfg = load_config(FLAGS.config) + merge_config(FLAGS.opt) + check_config(cfg) + # check if set use_gpu=True in paddlepaddle cpu version + check_gpu(cfg.use_gpu) + # check if paddlepaddle version is satisfied + check_version() + + # get dataset + dataset = cfg['TrainReader']['dataset'] + if FLAGS.size: + if ',' in FLAGS.size: + size = list(map(int, FLAGS.size.split(','))) + assert len(size) == 2, "the format of size is incorrect" + else: + size = int(FLAGS.size) + size = [size, size] + + elif 'image_shape' in cfg['TrainReader']['inputs_def']: + size = cfg['TrainReader']['inputs_def']['image_shape'][1:] + else: + raise ValueError('size is not specified') + + if FLAGS.method == 'v2': + cluster = YOLOv2AnchorCluster(FLAGS.n, dataset, size, FLAGS.cache_path, + FLAGS.cache, FLAGS.iters, FLAGS.verbose) + elif FLAGS.method == 'v5': + cluster = YOLOv5AnchorCluster(FLAGS.n, dataset, size, FLAGS.cache_path, + FLAGS.cache, FLAGS.iters, FLAGS.gen_iters, + FLAGS.thresh, FLAGS.verbose) + else: + raise ValueError('cluster method: %s is not supported' % FLAGS.method) + + anchors = cluster() + + +if __name__ == "__main__": + main()