未验证 提交 06f22754 编写于 作者: W wangguanzhong 提交者: GitHub

update ppdet docs & remove static code, test=document_fix (#6418)

* update ppdet docs & remove static code, test=document_fix

* add pphuman doc, test=document_fix

* update action doc, test=document_fix

* update activity, test=document_fix
上级 99f16815
......@@ -23,8 +23,17 @@
## <img src="https://user-images.githubusercontent.com/48054808/157793354-6e7f381a-0aa6-4bb7-845c-9acf2ecc05c3.png" width="20"/> 产品动态
- 🔥 **2022.3.24:PaddleDetection发布[release/2.4版本](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.4)**
- 🔥 **2022.7.14:[行人分析工具PP-Human v2](./deploy/pipeline)发布**
- 四大产业特色功能:高性能易扩展的五大复杂行为识别、闪电级人体属性识别、一行代码即可实现的人流检测与轨迹留存以及高精度跨镜跟踪
- 底层核心算法性能强劲:覆盖行人检测、跟踪、属性三类核心算法能力,对目标人数、光线、背景均无限制
- 极低使用门槛:提供保姆级全流程开发及模型优化策略、一行命令完成推理、兼容各类数据输入格式
**活动预告** 7月19日晚20点,PaddleDetection举办PP-Human v2线上私享交流会,欢迎大家扫码进群,获取线上会议链接!名额有限,抓紧报名!
<div align="center">
<img src="https://user-images.githubusercontent.com/22989727/178771163-66639dc0-cb65-4413-8de4-6ac5c5eed9f5.jpg" width="200"/>
</div>
- 2022.3.24:PaddleDetection发布[release/2.4版本](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.4)
- 发布高精度云边一体SOTA目标检测模型[PP-YOLOE](configs/ppyoloe),提供s/m/l/x版本,l版本COCO test2017数据集精度51.6%,V100预测速度78.1 FPS,支持混合精度训练,训练较PP-YOLOv2加速33%,全系列多尺度模型,满足不同硬件算力需求,可适配服务器、边缘端GPU及其他服务器端AI加速卡。
- 发布边缘端和CPU端超轻量SOTA目标检测模型[PP-PicoDet增强版](configs/picodet),精度提升2%左右,CPU预测速度提升63%,新增参数量0.7M的PicoDet-XS模型,提供模型稀疏化和量化功能,便于模型加速,各类硬件无需单独开发后处理模块,降低部署门槛。
- 发布实时行人分析工具[PP-Human](deploy/pipeline),支持行人跟踪、人流量统计、人体属性识别与摔倒检测四大能力,基于真实场景数据特殊优化,精准识别各类摔倒姿势,适应不同环境背景、光线及摄像角度。
......@@ -353,19 +362,22 @@
<summary><b> 5. 产业级实时行人分析工具</b></summary>
| 功能\模型 | 目标检测 | 多目标跟踪 | 属性识别 | 关键点检测 | 行为识别 | ReID |
|:------------------------- |:-------------------------------------------------------------------------------------- |:-------------------------------------------------------------------------------------- |:-----------------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------:|:-----------------------------------------------------------------:|:----------------------------------------------------------------------:|
| 行人检测 | [](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | | | | | |
| 行人跟踪 | | [](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | | | | |
| 属性识别(图片) | [](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | | [](https://bj.bcebos.com/v1/paddledet/models/pipeline/strongbaseline_r50_30e_pa100k.zip) | | | |
| 属性识别(视频) | | [](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | | | | |
| 摔倒检测 | | [](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | | [](https://bj.bcebos.com/v1/paddledet/models/pipeline/strongbaseline_r50_30e_pa100k.zip) | [](https://bj.bcebos.com/v1/paddledet/models/pipeline/STGCN.zip) | |
| 跨镜跟踪 | | [](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | | | | [](https://bj.bcebos.com/v1/paddledet/models/pipeline/reid_model.zip) |
| **模型精度** | **mAP 56.3** | **MOTA 72.0** | **mA 94.86** | **AP 87.1** | **AP 96.43** | **mAP 98.8** |
| **T4 TensorRT FP16 预测速度** | **28.0ms** | **33.1ms** | **单人2ms** | **单人2.9ms** | **单人2.7ms** | **单人1.5ms** |
| 任务 | 端到端速度(ms)| 模型方案 | 模型体积 |
| :---------: | :-------: | :------: |:------: |
| 行人检测(高精度) | 25.1ms | [目标检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 182M |
| 行人检测(轻量级) | 16.2ms | [目标检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_pipeline.zip) | 27M |
| 行人跟踪(高精度) | 31.8ms | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 182M |
| 行人跟踪(轻量级) | 21.0ms | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_pipeline.zip) | 27M |
| 属性识别(高精度) | 单人8.5ms | [目标检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)<br> [属性识别](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_small_person_attribute_954_infer.zip) | 目标检测:182M<br>属性识别:86M |
| 属性识别(轻量级) | 单人7.1ms | [目标检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)<br> [属性识别](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPLCNet_x1_0_person_attribute_945_infer.zip) | 目标检测:182M<br>属性识别:86M |
| 摔倒识别 | 单人10ms | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) <br> [关键点检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.zip) <br> [基于关键点行为识别](https://bj.bcebos.com/v1/paddledet/models/pipeline/STGCN.zip) | 多目标跟踪:182M<br>关键点检测:101M<br>基于关键点行为识别:21.8M |
| 闯入识别 | 31.8ms | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 182M |
| 打架识别 | 19.7ms | [视频分类](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 90M |
| 抽烟识别 | 单人15.1ms | [目标检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)<br>[基于人体id的目标检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/ppyoloe_crn_s_80e_smoking_visdrone.zip) | 目标检测:182M<br>基于人体id的目标检测:27M |
| 打电话识别 | 单人ms | [目标检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)<br>[基于人体id的图像分类](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_tiny_calling_halfbody.zip) | 目标检测:182M<br>基于人体id的图像分类:45M |
**点击“ ✅ ”即可下载对应模型**
点击模型方案中的模型即可下载指定模型
详细信息参考[文档](deploy/pipeline)
......@@ -471,7 +483,7 @@
- 感谢[FL77N](https://github.com/FL77N/)贡献`Sparse-RCNN`模型。
- 感谢[Chen-Song](https://github.com/Chen-Song)贡献`Swin Faster-RCNN`模型。
- 感谢[yangyudong](https://github.com/yangyudong2020), [hchhtc123](https://github.com/hchhtc123) 开发PP-Tracking GUI界面
- 感谢[Shigure19](https://github.com/Shigure19) 开发PP-TinyPose健身APP
- 感谢Shigure19 开发PP-TinyPose健身APP
- 感谢[manangoel99](https://github.com/manangoel99)贡献Wandb可视化方式
## <img src="https://user-images.githubusercontent.com/48054808/157835276-9aab9d1c-1c46-446b-bdd4-5ab75c5cfa48.png" width="20"/> 引用
......
此差异已折叠。
......@@ -120,11 +120,11 @@ cd PaddleDetection
pip install -r requirements.txt
```
详细安装文档请参考[文档](../docs/tutorials/INSTALL_cn.md)
详细安装文档请参考[文档](../../docs/tutorials/INSTALL_cn.md)
### 2. 数据准备
用户需要准备训练数据集,建议标注文件使用COCO数据格式。如果使用lableme或者VOC数据格式,先使用[格式转换脚本](../tools/x2coco.py)将标注格式转化为COCO,详细数据准备文档请参考[文档](../docs/tutorials/PrepareDataSet.md)
用户需要准备训练数据集,建议标注文件使用COCO数据格式。如果使用lableme或者VOC数据格式,先使用[格式转换脚本](../../tools/x2coco.py)将标注格式转化为COCO,详细数据准备文档请参考[文档](../../docs/tutorials/PrepareDataSet.md)
本文档以新能源电池工业质检子数据集为例展开,数据下载[链接](https://bj.bcebos.com/v1/paddle-smrt/data/battery_mini.zip)
......@@ -170,7 +170,7 @@ python tools/eval.py -c configs/smrt/ppyoloe/ppyoloe_crn_m_300e_battery_1024.yml
python tools/infer.py -c configs/smrt/ppyoloe/ppyoloe_crn_m_300e_battery_1024.yml -o weights=output/ppyoloe_crn_m_300e_battery_1024/model_final.pdparams --infer_img=images/Board_diaojiao_1591.png
```
更多模型训练参数请参考[文档](../docs/tutorials/GETTING_STARTED_cn.md)
更多模型训练参数请参考[文档](../../docs/tutorials/GETTING_STARTED_cn.md)
### 4. 模型导出部署
......@@ -182,7 +182,7 @@ python tools/infer.py -c configs/smrt/ppyoloe/ppyoloe_crn_m_300e_battery_1024.ym
python tools/export_model.py -c configs/smrt/ppyoloe/ppyoloe_crn_m_300e_battery_1024.yml -o weights=output/ppyoloe_crn_m_300e_battery_1024/model_final.pdparams
```
接下来可以使用PaddleDetection中的部署代码实现C++部署,详细步骤请参考[文档](../deploy/cpp/README.md)
接下来可以使用PaddleDetection中的部署代码实现C++部署,详细步骤请参考[文档](../../deploy/cpp/README.md)
如果期望使用可视化界面的方式进行部署,可以参考下面部分的内容。
......
......@@ -4,12 +4,13 @@
**PP-Human是基于飞桨深度学习框架的业界首个开源产业级实时行人分析工具,具有功能丰富,应用广泛和部署高效三大优势。**
![](https://user-images.githubusercontent.com/48054808/173030254-ecf282bd-2cfe-43d5-b598-8fed29e22020.gif)
![](https://user-images.githubusercontent.com/22989727/178892756-e2717a2c-beb0-4d88-ad32-ca37e24b47f8.gif)
PP-Human支持图片/单镜头视频/多镜头视频多种输入方式,功能覆盖多目标跟踪、属性识别、行为分析及人流量计数与轨迹记录。能够广泛应用于智慧交通、智慧社区、工业巡检等领域。支持服务器端部署及TensorRT加速,T4服务器上可达到实时。
## 📣 近期更新
- 🔥 **2022.7.13:PP-Human v2发布,行为识别、人体属性识别、流量计数、跨镜跟踪四大产业特色功能全面升级,覆盖行人检测、跟踪、属性三类核心算法能力,提供保姆级全流程开发及模型优化策略。**
- 2022.4.18:新增PP-Human全流程实战教程, 覆盖训练、部署、动作类型扩展等内容,AIStudio项目请见[链接](https://aistudio.baidu.com/aistudio/projectdetail/3842982)
- 2022.4.10:新增PP-Human范例,赋能社区智能精细化管理, AIStudio快速上手教程[链接](https://aistudio.baidu.com/aistudio/projectdetail/3679564)
- 2022.4.5:全新发布实时行人分析工具PP-Human,支持行人跟踪、人流量统计、人体属性识别与摔倒检测四大能力,基于真实场景数据特殊优化,精准识别各类摔倒姿势,适应不同环境背景、光线及摄像角度
......@@ -21,24 +22,51 @@ PP-Human支持图片/单镜头视频/多镜头视频多种输入方式,功能
| **跨镜跟踪(ReID)** | 超强性能:针对目标遮挡、完整度、模糊度等难点特殊优化,实现mAP 98.8、1.5ms/人 | <img src="https://user-images.githubusercontent.com/48054808/173037607-0a5deadc-076e-4dcc-bd96-d54eea205f1f.png" title="" alt="" width="191"> |
| **属性分析** | 兼容多种数据格式:支持图片、视频输入<br/><br/>高性能:融合开源数据集与企业真实数据进行训练,实现mAP 94.86、2ms/人<br/><br/>支持26种属性:性别、年龄、眼镜、上衣、鞋子、帽子、背包等26种高频属性 | <img src="https://user-images.githubusercontent.com/48054808/173036043-68b90df7-e95e-4ada-96ae-20f52bc98d7c.png" title="" alt="" width="207"> |
| **行为识别** | 功能丰富:支持摔倒、打架、抽烟、打电话、人员闯入五种高频异常行为识别<br/><br/>鲁棒性强:对光照、视角、背景环境无限制<br/><br/>性能高:与视频识别技术相比,模型计算量大幅降低,支持本地化与服务化快速部署<br/><br/>训练速度快:仅需15分钟即可产出高精度行为识别模型 | <img src="https://user-images.githubusercontent.com/48054808/173034825-623e4f78-22a5-4f14-9b83-dc47aa868478.gif" title="" alt="" width="209"> |
| **人流量计数轨迹记录** | 简洁易用:单个参数即可开启人流量计数与轨迹记录功能 | <img src="https://user-images.githubusercontent.com/22989727/174736440-87cd5169-c939-48f8-90a1-0495a1fcb2b1.gif" title="" alt="" width="200"> |
| **人流量计数**<br>**轨迹记录** | 简洁易用:单个参数即可开启人流量计数与轨迹记录功能 | <img src="https://user-images.githubusercontent.com/22989727/174736440-87cd5169-c939-48f8-90a1-0495a1fcb2b1.gif" title="" alt="" width="200"> |
## 🗳 模型库
| 任务 | 适用场景 | 精度 | 预测速度(ms) | 模型权重 | 预测部署模型 |
<details>
<summary><b>单模型效果(点击展开)</b></summary>
| 任务 | 适用场景 | 精度 | 预测速度(ms)| 模型体积 | 预测部署模型 |
| :---------: |:---------: |:--------------- | :-------: | :------: | :------: |
| 目标检测(高精度) | 图片输入 | mAP: 56.6 | 28.0ms |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.pdparams) |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) |
| 目标检测(轻量级) | 图片输入 | mAP: 53.2 | 22.1ms |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_pipeline.pdparams) |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_pipeline.zip) |
| 目标跟踪(高精度) | 视频输入 | MOTA: 79.5 | 33.1ms |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.pdparams) |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) |
| 目标跟踪(轻量级) | 视频输入 | MOTA: 69.1 | 27.2ms |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_pipeline.pdparams) |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_pipeline.zip) |
| 属性识别 | 图片/视频输入 属性识别 | mA: 94.86 | 单人2ms | - |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/strongbaseline_r50_30e_pa100k.zip) |
| 关键点检测 | 视频输入 行为识别 | AP: 87.1 | 单人2.9ms |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.pdparams) |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.zip) |
| 摔倒行为识别 | 视频输入 行为识别 | 准确率: 96.43 | 单人2.7ms | - |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/STGCN.zip) |
| 打电话行为识别 | 视频输入 行为识别 | 准确率: 86.85 | 单人2.94ms | - |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_tiny_calling_halfbody.zip) |
| 抽烟行为识别 | 视频输入 行为识别 | mAP: 39.7 | 单人2.0ms | - |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/ppyoloe_crn_s_80e_smoking_visdrone.zip) |
| ReID | 视频输入 跨镜跟踪 | mAP: 98.8 | 单人1.5ms | - |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/reid_model.zip) |
下载预测部署模型并解压存放至`./output_inference`新建目录中
| 目标检测(高精度) | 图片输入 | mAP: 57.8 | 25.1ms | 182M |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) |
| 目标检测(轻量级) | 图片输入 | mAP: 53.2 | 16.2ms | 27M |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_pipeline.zip) |
| 目标跟踪(高精度) | 视频输入 | MOTA: 82.2 | 31.8ms | 182M |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) |
| 目标跟踪(轻量级) | 视频输入 | MOTA: 73.9 | 21.0ms |27M |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_pipeline.zip) |
| 属性识别(高精度) | 图片/视频输入 属性识别 | mA: 95.4 | 单人4.2ms | 86M |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_small_person_attribute_954_infer.zip) |
| 属性识别(轻量级) | 图片/视频输入 属性识别 | mA: 94.5 | 单人2.9ms | 7.2M |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPLCNet_x1_0_person_attribute_945_infer.zip) |
| 关键点检测 | 视频输入 行为识别 | AP: 87.1 | 单人5.7ms | 101M |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.zip) |
| 基于关键点序列分类 | 视频输入 行为识别 | 准确率: 96.43 | 单人0.07ms | 21.8M |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/STGCN.zip) |
| 基于人体id图像分类 | 视频输入 行为识别 | 准确率: 86.85 | 单人1.8ms | 45M |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_tiny_calling_halfbody.zip) |
| 基于人体id检测 | 视频输入 行为识别 | AP50: 79.5 | 单人10.9ms | 27M |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/ppyoloe_crn_s_80e_smoking_visdrone.zip) |
| 视频分类 | 视频输入 行为识别 | Accuracy: 89.0 | 19.7ms/1s视频 | 90M | [下载链接](https://videotag.bj.bcebos.com/PaddleVideo-release2.3/ppTSM_fight.pdparams) |
| ReID | 视频输入 跨镜跟踪 | mAP: 98.8 | 单人0.23ms | 85M |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/reid_model.zip) |
</details>
<details>
<summary><b>端到端模型效果(点击展开)</b></summary>
| 任务 | 端到端速度(ms)| 模型方案 | 模型体积 |
| :---------: | :-------: | :------: |:------: |
| 行人检测(高精度) | 25.1ms | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 182M |
| 行人检测(轻量级) | 16.2ms | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_pipeline.zip) | 27M |
| 行人跟踪(高精度) | 31.8ms | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 182M |
| 行人跟踪(轻量级) | 21.0ms | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_pipeline.zip) | 27M |
| 属性识别(高精度) | 单人8.5ms | [目标检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)<br> [属性识别](https://bj.bcebos.com/v1/paddledet/models/pipeline/strongbaseline_r50_30e_pa100k.zip) | 目标检测:182M<br>属性识别:86M |
| 属性识别(轻量级) | 单人7.1ms | [目标检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)<br> [属性识别](https://bj.bcebos.com/v1/paddledet/models/pipeline/strongbaseline_r50_30e_pa100k.zip) | 目标检测:182M<br>属性识别:86M |
| 摔倒识别 | 单人10ms | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) <br> [关键点检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.zip) <br> [基于关键点行为识别](https://bj.bcebos.com/v1/paddledet/models/pipeline/STGCN.zip) | 多目标跟踪:182M<br>关键点检测:101M<br>基于关键点行为识别:21.8M |
| 闯入识别 | 31.8ms | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 182M |
| 打架识别 | 19.7ms | [视频分类](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 90M |
| 抽烟识别 | 单人15.1ms | [目标检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)<br>[基于人体id的目标检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/ppyoloe_crn_s_80e_smoking_visdrone.zip) | 目标检测:182M<br>基于人体id的目标检测:27M |
| 打电话识别 | 单人ms | [目标检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)<br>[基于人体id的图像分类](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_tiny_calling_halfbody.zip) | 目标检测:182M<br>基于人体id的图像分类:45M |
</details>
点击模型方案中的模型即可下载指定模型,下载后解压存放至`./output_inference`目录中
## 📚 文档教程
......@@ -66,7 +94,7 @@ PP-Human支持图片/单镜头视频/多镜头视频多种输入方式,功能
### 跨镜跟踪ReID
* [快速开始](docs/tutorials/mtmct.md)
* [二次开发教程]()
* [二次开发教程](../../docs/advanced_tutorials/customization/mtmct.md)
* 数据准备
* 模型优化
......
# 快速开始
## 一、环境准备
## 目录
- [环境准备](#环境准备)
- [模型下载](#模型下载)
- [配置文件说明](#配置文件说明)
- [预测部署](#预测部署)
- [参数说明](#参数说明)
- [方案介绍](#方案介绍)
- [行人检测](#行人检测)
- [行人跟踪](#行人跟踪)
- [跨镜行人跟踪](#跨镜行人跟踪)
- [属性识别](#属性识别)
- [行为识别](#行为识别)
## 环境准备
环境要求: PaddleDetection版本 >= release/2.4 或 develop版本
......@@ -25,20 +39,23 @@ pip install -r requirements.txt
1. 详细安装文档参考[文档](../../../../docs/tutorials/INSTALL_cn.md)
2. 如果需要TensorRT推理加速(测速方式),请安装带`TensorRT版本Paddle`。您可以从[Paddle安装包](https://paddleinference.paddlepaddle.org.cn/v2.2/user_guides/download_lib.html#python)下载安装,或者按照[指导文档](https://www.paddlepaddle.org.cn/inference/master/optimize/paddle_trt.html)使用docker或自编译方式准备Paddle环境。
## 二、模型下载
## 模型下载
PP-Human提供了目标检测、属性识别、行为识别、ReID预训练模型,以实现不同使用场景,用户可以直接下载使用
| 任务 | 适用场景 | 精度 | 预测速度(ms) | 模型权重 | 预测部署模型 |
| :---------: |:---------: |:--------------- | :-------: | :------: | :------: |
| 目标检测(高精度) | 图片输入 | mAP: 56.6 | 28.0ms |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.pdparams) |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) |
| 目标检测(轻量级) | 图片输入 | mAP: 53.2 | 22.1ms |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_pipeline.pdparams) |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_pipeline.zip) |
| 目标跟踪(高精度) | 视频输入 | MOTA: 79.5 | 33.1ms |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.pdparams) |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) |
| 目标跟踪(轻量级) | 视频输入 | MOTA: 69.1 | 27.2ms |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_pipeline.pdparams) |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_pipeline.zip) |
| 属性识别 | 图片/视频输入 属性识别 | mA: 94.86 | 单人2ms | - |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/strongbaseline_r50_30e_pa100k.zip) |
| 关键点检测 | 视频输入 行为识别 | AP: 87.1 | 单人2.9ms |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.pdparams) |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.zip)
| 行为识别 | 视频输入 行为识别 | 准确率: 96.43 | 单人2.7ms | - |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/STGCN.zip) |
| ReID | 视频输入 跨镜跟踪 | mAP: 98.8 | 单人1.5ms | - |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/reid_model.zip) |
| 任务 | 端到端速度(ms)| 模型方案 | 模型体积 |
| :---------: | :-------: | :------: |:------: |
| 行人检测(高精度) | 25.1ms | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 182M |
| 行人检测(轻量级) | 16.2ms | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_pipeline.zip) | 27M |
| 行人跟踪(高精度) | 31.8ms | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 182M |
| 行人跟踪(轻量级) | 21.0ms | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_pipeline.zip) | 27M |
| 属性识别(高精度) | 单人8.5ms | [目标检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)<br> [属性识别](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_small_person_attribute_954_infer.zip) | 目标检测:182M<br>属性识别:86M |
| 属性识别(轻量级) | 单人7.1ms | [目标检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)<br> [属性识别](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPLCNet_x1_0_person_attribute_945_infer.zip) | 目标检测:182M<br>属性识别:86M |
| 摔倒识别 | 单人10ms | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) <br> [关键点检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.zip) <br> [基于关键点行为识别](https://bj.bcebos.com/v1/paddledet/models/pipeline/STGCN.zip) | 多目标跟踪:182M<br>关键点检测:101M<br>基于关键点行为识别:21.8M |
| 闯入识别 | 31.8ms | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 多目标跟踪:182M |
| 打架识别 | 19.7ms | [视频分类](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 90M |
| 抽烟识别 | 单人15.1ms | [目标检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)<br>[基于人体id的目标检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/ppyoloe_crn_s_80e_smoking_visdrone.zip) | 目标检测:182M<br>基于人体id的目标检测:27M |
| 打电话识别 | 单人ms | [目标检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)<br>[基于人体id的图像分类](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_tiny_calling_halfbody.zip) | 目标检测:182M<br>基于人体id的图像分类:45M |
下载模型后,解压至`./output_inference`文件夹。
......@@ -50,7 +67,7 @@ PP-Human提供了目标检测、属性识别、行为识别、ReID预训练模
- ReID模型精度为Market1501数据集测试结果
- 预测速度为T4下,开启TensorRT FP16的效果, 模型预测速度包含数据预处理、模型预测、后处理全流程
## 三、配置文件说明
## 配置文件说明
PP-Human相关配置位于```deploy/pipeline/config/infer_cfg_pphuman.yml```中,存放模型路径,完成不同功能需要设置不同的任务类型
......@@ -74,13 +91,13 @@ MOT:
tracker_config: deploy/pipeline/config/tracker_config.yml
batch_size: 1
basemode: "idbased"
enable: False
enable: True
ATTR:
model_dir: output_inference/strongbaseline_r50_30e_pa100k/
batch_size: 8
basemode: "idbased"
enable: False
enable: True
```
**注意:**
......@@ -89,30 +106,30 @@ ATTR:
- 如果用户仅需要修改模型文件路径,可以在命令行中加入 `--model_dir det=ppyoloe/` 即可,也可以手动修改配置文件中的相应模型路径,详细说明参考下方参数说明文档。
### 四、预测部署
## 预测部署
```
# 行人检测,指定配置文件路径和测试图片
python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml --image_file=test_image.jpg --device=gpu [--run_mode trt_fp16]
# 行人跟踪,指定配置文件路径和测试视频,在配置文件```deploy/pipeline/config/infer_cfg_pphuman.yml```中的MOT部分enable设置为```True```
# 行人跟踪,指定配置文件路径和测试视频,在配置文件```deploy/pipeline/config/infer_cfg_pphuman.yml```中的MOT部分enable设置为```True```
python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml --video_file=test_video.mp4 --device=gpu [--run_mode trt_fp16]
# 行人跟踪,指定配置文件路径,模型路径和测试视频,在配置文件```deploy/pipeline/config/infer_cfg_pphuman.yml```中的MOT部分enable设置为```True```
# 行人跟踪,指定配置文件路径,模型路径和测试视频,在配置文件```deploy/pipeline/config/infer_cfg_pphuman.yml```中的MOT部分enable设置为```True```
# 命令行中指定的模型路径优先级高于配置文件
python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml --video_file=test_video.mp4 --device=gpu --model_dir det=ppyoloe/ [--run_mode trt_fp16]
# 行人属性识别,指定配置文件路径和测试视频,在配置文件```deploy/pipeline/config/infer_cfg_pphuman.yml```中的ATTR部分enable设置为```True```
# 行人属性识别,指定配置文件路径和测试视频,在配置文件```deploy/pipeline/config/infer_cfg_pphuman.yml```中的ATTR部分enable设置为```True```
python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml --video_file=test_video.mp4 --device=gpu [--run_mode trt_fp16]
# 行为识别,指定配置文件路径和测试视频,在配置文件中```deploy/pipeline/config/infer_cfg_pphuman.yml```中的SKELETON_ACTION部分enable设置为```True```
# 行为识别,以摔倒识别为例,指定配置文件路径和测试视频,在配置文件```deploy/pipeline/config/infer_cfg_pphuman.yml```中的SKELETON_ACTION部分enable设置为```True```
python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml --video_file=test_video.mp4 --device=gpu [--run_mode trt_fp16]
# 行人跨境跟踪,指定配置文件路径和测试视频列表文件夹,在配置文件```deploy/pipeline/config/infer_cfg_pphuman.yml```中的REID部分enable设置为```True```
# 行人跨境跟踪,指定配置文件路径和测试视频列表文件夹,在配置文件```deploy/pipeline/config/infer_cfg_pphuman.yml```中的REID部分enable设置为```True```
python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml --video_dir=mtmct_dir/ --device=gpu [--run_mode trt_fp16]
```
### 4.1 参数说明
### 参数说明
| 参数 | 是否必须|含义 |
|-------|-------|----------|
......@@ -131,7 +148,7 @@ python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pph
| --do_entrance_counting | Option | 是否统计出入口流量,默认为False |
| --draw_center_traj | Option | 是否绘制跟踪轨迹,默认为False |
## 五、方案介绍
## 方案介绍
PP-Human整体方案如下图所示
......@@ -140,29 +157,31 @@ PP-Human整体方案如下图所示
</div>
### 1. 行人检测
### 行人检测
- 采用PP-YOLOE L 作为目标检测模型
- 详细文档参考[PP-YOLOE](../../../../configs/ppyoloe/)[检测跟踪文档](mot.md)
### 2. 行人跟踪
### 行人跟踪
- 采用SDE方案完成行人跟踪
- 检测模型使用PP-YOLOE L(高精度)和S(轻量级)
- 跟踪模块采用Bytetrack方案
- 详细文档参考[Bytetrack](../../../../configs/mot/bytetrack)[检测跟踪文档](mot.md)
- 跟踪模块采用OC-SORT方案
- 详细文档参考[OC-SORT](../../../../configs/mot/ocsort)[检测跟踪文档](mot.md)
### 3. 跨镜行人跟踪
- 使用PP-YOLOE + Bytetrack得到单镜头多目标跟踪轨迹
- 使用ReID(centroid网络)对每一帧的检测结果提取特征
### 跨镜行人跟踪
- 使用PP-YOLOE + OC-SORT得到单镜头多目标跟踪轨迹
- 使用ReID(StrongBaseline网络)对每一帧的检测结果提取特征
- 多镜头轨迹特征进行匹配,得到跨镜头跟踪结果
- 详细文档参考[跨镜跟踪](mtmct.md)
### 4. 属性识别
- 使用PP-YOLOE + Bytetrack跟踪人体
### 属性识别
- 使用PP-YOLOE + OC-SORT跟踪人体
- 使用StrongBaseline(多分类模型)完成识别属性,主要属性包括年龄、性别、帽子、眼睛、上衣下衣款式、背包等
- 详细文档参考[属性识别](attribute.md)
### 5. 行为识别:
- 使用PP-YOLOE + Bytetrack跟踪人体
- 使用HRNet进行关键点检测得到人体17个骨骼点
- 结合50帧内同一个人骨骼点的变化,通过ST-GCN判断50帧内发生的动作是否为摔倒
### 行为识别:
- 提供四种行为识别方案
- 1. 基于骨骼点的行为识别,例如摔倒识别
- 2. 基于图像分类的行为识别,例如打电话识别
- 3. 基于检测的行为识别,例如吸烟识别
- 4. 基于视频分类的行为识别,例如打架识别
- 详细文档参考[行为识别](action.md)
......@@ -2,37 +2,40 @@
# PP-Human行为识别模块
行为识别在智慧社区,安防监控等方向具有广泛应用,根据行为的不同,PP-Human中集成了基于视频分类、基于检测、基于图像分类以及基于骨骼点的行为识别模块,方便用户根据需求进行选择。
## 目录
## 模型库
在这里,我们提供了检测/跟踪、关键点识别、识别打架、打电话行为、抽烟行为、以及摔倒动作的预训练模型,用户可以直接下载使用。
- [基于骨骼点的行为识别](#基于骨骼点的行为识别)
- [基于图像分类的行为识别](#基于图像分类的行为识别)
- [基于检测的行为识别](#基于检测的行为识别)
- [基于行人轨迹的行为识别](#基于行人轨迹的行为识别)
- [基于视频分类的行为识别](#基于视频分类的行为识别)
行为识别在智慧社区,安防监控等方向具有广泛应用,根据行为的不同,PP-Human中集成了基于视频分类、基于检测、基于图像分类,基于行人轨迹以及基于骨骼点的行为识别模块,方便用户根据需求进行选择。
## 基于骨骼点的行为识别
应用行为:摔倒识别
<div align="center">
<img src="../images/action.gif" width='1000'/>
<center>数据来源及版权归属:天覆科技,感谢提供并开源实际场景数据,仅限学术研究使用</center>
</div>
### 模型库
基于骨骼点的行为识别包含行人检测/跟踪,关键点检测和摔倒行为识别三个模型,首先需要下载以下预训练模型
| 任务 | 算法 | 精度 | 预测速度(ms) | 模型权重 | 预测部署模型 |
|:---------------------|:---------:|:------:|:------:| :------: |:---------------------------------------------------------------------------------: |
| 行人检测/跟踪 | PP-YOLOE | mAP: 56.3 <br> MOTA: 72.0 | 检测: 28ms <br> 跟踪:33.1ms |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.pdparams) |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) |
| 打电话行为识别 | PP-HGNet | 准确率: 86.85 | 单人 2.94ms | [下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_tiny_calling_halfbody.pdparams) | [下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_tiny_calling_halfbody.zip) |
| 抽烟行为识别 | PP-YOLOE | mAP: 39.7 | 单人 2.0ms | [下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/ppyoloe_crn_s_80e_smoking_visdrone.pdparams) | [下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/ppyoloe_crn_s_80e_smoking_visdrone.zip) |
| 行人检测/跟踪 | PP-YOLOE | mAP: 56.3 <br> MOTA: 72.0 | 检测: 16.2ms <br> 跟踪:22.3ms |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.pdparams) |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) |
| 关键点识别 | HRNet | AP: 87.1 | 单人 2.9ms |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.pdparams) |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.zip)|
| 摔倒行为识别 | ST-GCN | 准确率: 96.43 | 单人 2.7ms | - |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/STGCN.zip) |
| 打架识别 | PP-TSM | 准确率:89.06% | 2s视频 128ms | [下载链接](https://videotag.bj.bcebos.com/PaddleVideo-release2.3/ppTSM_fight.pdparams) | [下载链接](https://videotag.bj.bcebos.com/PaddleVideo-release2.3/ppTSM_fight.zip) |
注:
1. 检测/跟踪模型精度为[MOT17](https://motchallenge.net/)[CrowdHuman](http://www.crowdhuman.org/)[HIEVE](http://humaninevents.org/)和部分业务数据融合训练测试得到。
2. 关键点模型使用[COCO](https://cocodataset.org/)[UAV-Human](https://github.com/SUTDCV/UAV-Human)和部分业务数据融合训练, 精度在业务数据测试集上得到。
3. 摔倒行为识别模型使用[NTU-RGB+D](https://rose1.ntu.edu.sg/dataset/actionRecognition/)[UR Fall Detection Dataset](http://fenix.univ.rzeszow.pl/~mkepski/ds/uf.html)和部分业务数据融合训练,精度在业务数据测试集上得到。
4. 打电话行为识别模型使用[UAV-Human](https://github.com/SUTDCV/UAV-Human)的打电话行为部分进行训练和测试。
5. 抽烟行为识别模型使用业务数据进行训练和测试。
6. 打架识别模型基于6个公开数据集训练得到:Surveillance Camera Fight Dataset、A Dataset for Automatic Violence Detection in Videos、Hockey Fight Detection Dataset、Video Fight Detection Dataset、Real Life Violence Situations Dataset、UBI Abnormal Event Detection Dataset。
7. 预测速度为NVIDIA T4 机器上使用TensorRT FP16时的速度, 速度包含数据预处理、模型预测、后处理全流程。
## 基于骨骼点的行为识别——摔倒识别
<div align="center">
<img src="../images/action.gif" width='1000'/>
<center>数据来源及版权归属:天覆科技,感谢提供并开源实际场景数据,仅限学术研究使用</center>
</div>
4. 预测速度为NVIDIA T4 机器上使用TensorRT FP16时的速度, 速度包含数据预处理、模型预测、后处理全流程。
### 配置说明
[配置文件](../../config/infer_cfg_pphuman.yml)中与行为识别相关的参数如下:
......@@ -69,7 +72,7 @@ python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pph
### 方案说明
1. 使用目标检测与多目标跟踪获取视频输入中的行人检测框及跟踪ID序号,模型方案为PP-YOLOE,详细文档参考[PP-YOLOE](../../../../configs/ppyoloe/README_cn.md)
1. 使用多目标跟踪获取视频输入中的行人检测框及跟踪ID序号,模型方案为PP-YOLOE,详细文档参考[PP-YOLOE](../../../../configs/ppyoloe/README_cn.md),跟踪方案为OC-SORT,详细文档参考[OC-SORT](../../../../configs/mot/ocsort)
2. 通过行人检测框的坐标在输入视频的对应帧中截取每个行人。
3. 使用[关键点识别模型](../../../../configs/keypoint/hrnet/dark_hrnet_w32_256x192.yml)得到对应的17个骨骼特征点。骨骼特征点的顺序及类型与COCO一致,详见[如何准备关键点数据集](../../../../docs/tutorials/data/PrepareKeypointDataSet.md)中的`COCO数据集`部分。
4. 每个跟踪ID对应的目标行人各自累计骨骼特征点结果,组成该人物的时序关键点序列。当累计到预定帧数或跟踪丢失后,使用行为识别模型判断时序关键点序列的动作类型。当前版本模型支持摔倒行为的识别,预测得到的`class id`对应关系为:
......@@ -79,13 +82,29 @@ python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pph
```
- 摔倒行为识别模型使用了[ST-GCN](https://arxiv.org/abs/1801.07455),并基于[PaddleVideo](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/docs/zh-CN/model_zoo/recognition/stgcn.md)套件完成模型训练。
## 基于图像分类的行为识别——打电话识别
## 基于图像分类的行为识别
应用行为:打电话识别
<div align="center">
<img src="../images/calling.gif" width='1000'/>
<center>数据来源及版权归属:天覆科技,感谢提供并开源实际场景数据,仅限学术研究使用</center>
</div>
### 模型库
基于图像分类的行为识别包含行人检测/跟踪,打电话识别两个模型,首先需要下载以下预训练模型
| 任务 | 算法 | 精度 | 预测速度(ms) | 模型权重 | 预测部署模型 |
|:---------------------|:---------:|:------:|:------:| :------: |:---------------------------------------------------------------------------------: |
| 行人检测/跟踪 | PP-YOLOE | mAP: 56.3 <br> MOTA: 72.0 | 检测: 16.2ms <br> 跟踪:22.3ms |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.pdparams) |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) |
| 打电话识别 | PP-HGNet | 准确率: 86.85 | 单人 2.94ms | [下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_tiny_calling_halfbody.pdparams) | [下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_tiny_calling_halfbody.zip) |
注:
1. 检测/跟踪模型精度为[MOT17](https://motchallenge.net/)[CrowdHuman](http://www.crowdhuman.org/)[HIEVE](http://humaninevents.org/)和部分业务数据融合训练测试得到。
2. 打电话行为识别模型使用[UAV-Human](https://github.com/SUTDCV/UAV-Human)的打电话行为部分进行训练和测试。
3. 预测速度为NVIDIA T4 机器上使用TensorRT FP16时的速度, 速度包含数据预处理、模型预测、后处理全流程。
### 配置说明
[配置文件](../../config/infer_cfg_pphuman.yml)中相关的参数如下:
......@@ -111,7 +130,7 @@ python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pph
4. 启动命令中的完整参数说明,请参考[参数说明](./QUICK_STARTED.md)
### 方案说明
1. 使用目标检测与多目标跟踪获取视频输入中的行人检测框及跟踪ID序号,模型方案为PP-YOLOE,详细文档参考[PP-YOLOE](../../../../configs/ppyoloe/README_cn.md)
1. 使用目标检测与多目标跟踪获取视频输入中的行人检测框及跟踪ID序号,模型方案为PP-YOLOE,详细文档参考[PP-YOLOE](../../../../configs/ppyoloe/README_cn.md),跟踪方案为OC-SORT,详细文档参考[OC-SORT](../../../../configs/mot/ocsort)
2. 通过行人检测框的坐标在输入视频的对应帧中截取每个行人。
3. 通过在帧级别的行人图像通过图像分类的方式实现。当图片所属类别为对应行为时,即认为在一定时间段内该人物处于该行为状态中。该任务使用[PP-HGNet](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/models/PP-HGNet.md)实现,当前版本模型支持打电话行为的识别,预测得到的`class id`对应关系为:
```
......@@ -121,12 +140,27 @@ python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pph
- 基于分类的行为识别基于[PaddleClas](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/models/PP-HGNet.md#3.3)完成模型训练。
## 基于检测的行为识别——吸烟识别
## 基于检测的行为识别
应用行为:吸烟识别
<div align="center">
<img src="../images/smoking.gif" width='1000'/>
<center>数据来源及版权归属:天覆科技,感谢提供并开源实际场景数据,仅限学术研究使用</center>
</div>
### 模型库
在这里,我们提供了行人检测/跟踪、吸烟行为识别的预训练模型,用户可以直接下载使用。
| 任务 | 算法 | 精度 | 预测速度(ms) | 模型权重 | 预测部署模型 |
|:---------------------|:---------:|:------:|:------:| :------: |:---------------------------------------------------------------------------------: |
| 行人检测/跟踪 | PP-YOLOE | mAP: 56.3 <br> MOTA: 72.0 | 检测: 16.2ms <br> 跟踪:22.3ms |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.pdparams) |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) |
| 吸烟行为识别 | PP-YOLOE | mAP: 39.7 | 单人 2.0ms | [下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/ppyoloe_crn_s_80e_smoking_visdrone.pdparams) | [下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/ppyoloe_crn_s_80e_smoking_visdrone.zip) |
注:
1. 检测/跟踪模型精度为[MOT17](https://motchallenge.net/)[CrowdHuman](http://www.crowdhuman.org/)[HIEVE](http://humaninevents.org/)和部分业务数据融合训练测试得到。
2. 抽烟行为识别模型使用业务数据进行训练和测试。
3. 预测速度为NVIDIA T4 机器上使用TensorRT FP16时的速度, 速度包含数据预处理、模型预测、后处理全流程。
### 配置说明
[配置文件](../../config/infer_cfg_pphuman.yml)中相关的参数如下:
......@@ -152,7 +186,7 @@ python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pph
4. 启动命令中的完整参数说明,请参考[参数说明](./QUICK_STARTED.md)
### 方案说明
1. 使用目标检测与多目标跟踪获取视频输入中的行人检测框及跟踪ID序号,模型方案为PP-YOLOE,详细文档参考[PP-YOLOE](../../../../configs/ppyoloe/README_cn.md)
1. 使用目标检测与多目标跟踪获取视频输入中的行人检测框及跟踪ID序号,模型方案为PP-YOLOE,详细文档参考[PP-YOLOE](../../../../configs/ppyoloe/README_cn.md),跟踪方案为OC-SORT,详细文档参考[OC-SORT](../../../../configs/mot/ocsort)
2. 通过行人检测框的坐标在输入视频的对应帧中截取每个行人。
3. 通过在帧级别的行人图像中检测该行为的典型特定目标实现。当检测到特定目标(在这里即烟头)以后,即认为在一定时间段内该人物处于该行为状态中。该任务使用[PP-YOLOE](../../../../configs/ppyoloe/README_cn.md)实现,当前版本模型支持吸烟行为的识别,预测得到的`class id`对应关系为:
```
......@@ -160,26 +194,43 @@ python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pph
1: 其他
```
## 基于检测的行为识别——闯入识别
## 基于行人轨迹的行为识别
应用行为:闯入识别
<div align="center">
<img src="https://user-images.githubusercontent.com/22989727/178769370-03ab1965-cfd1-401b-9902-82620a06e43c.gif" width='1000'/>
</div>
具体使用请参照[PP-Human检测跟踪模块](mot.md)`5. 区域闯入判断和计数`
### 方案说明
1. 使用目标检测与多目标跟踪获取视频输入中的行人检测框及跟踪ID序号,模型方案为PP-YOLOE,详细文档参考[PP-YOLOE](../../../../configs/ppyoloe/README_cn.md)
1. 使用多目标跟踪获取视频输入中的行人检测框及跟踪ID序号,模型方案为PP-YOLOE,详细文档参考[PP-YOLOE](../../../../configs/ppyoloe/README_cn.md),跟踪方案为OC-SORT,详细文档参考[OC-SORT](../../../../configs/mot/ocsort)
2. 通过行人检测框的下边界中点在相邻帧位于用户所选区域的内外位置,来识别是否闯入所选区域。
## 基于视频分类的行为识别——打架识别
## 基于视频分类的行为识别
随着监控摄像头部署覆盖范围越来越广,人工查看是否存在打架等异常行为耗时费力、效率低,AI+安防助理智慧安防。PP-Human中集成了打架识别模块,识别视频中是否存在打架行为。我们提供了预训练模型,用户可直接下载使用。
应用行为:打架识别
<div align="center">
<img src="../images/fight_demo.gif" width='1000'/>
<center>数据来源及版权归属:Surveillance Camera Fight Dataset。</center>
</div>
该方案关注的场景为监控摄像头下的打架行为识别。打架行为涉及多人,基于骨骼点技术的方案更适用于单人的行为识别。此外,打架行为对时序信息依赖较强,基于检测和分类的方案也不太适用。由于监控场景背景复杂,人的密集程度、光线、拍摄角度等都会对识别造成影响,本方案采用基于视频分类的方式判断视频中是否存在打架行为。针对摄像头距离人较远的情况,通过增大输入图像分辨率优化。由于训练数据有限,采用数据增强的方式提升模型的泛化性能。
### 模型库
在这里,我们提供了打架识别的预训练模型,用户可以直接下载使用。
| 任务 | 算法 | 精度 | 预测速度(ms) | 模型权重 | 预测部署模型 |
| ---- | ---- | ---------- | ---- | ---- | ---------- |
| 打架识别 | PP-TSM | 准确率:89.06% | T4, 2s视频128ms | [下载链接](https://videotag.bj.bcebos.com/PaddleVideo-release2.3/ppTSM_fight.pdparams) | [下载链接](https://videotag.bj.bcebos.com/PaddleVideo-release2.3/ppTSM_fight.zip) |
|:---------------------|:---------:|:------:|:------:| :------: |:---------------------------------------------------------------------------------: |
| 打架识别 | PP-TSM | 准确率:89.06% | 2s视频 128ms | [下载链接](https://videotag.bj.bcebos.com/PaddleVideo-release2.3/ppTSM_fight.pdparams) | [下载链接](https://videotag.bj.bcebos.com/PaddleVideo-release2.3/ppTSM_fight.zip) |
打架识别模型基于6个公开数据集训练得到:Surveillance Camera Fight Dataset、A Dataset for Automatic Violence Detection in Videos、Hockey Fight Detection Dataset、Video Fight Detection Dataset、Real Life Violence Situations Dataset、UBI Abnormal Event Detection Dataset。
注:
1. 打架识别模型基于6个公开数据集训练得到:Surveillance Camera Fight Dataset、A Dataset for Automatic Violence Detection in Videos、Hockey Fight Detection Dataset、Video Fight Detection Dataset、Real Life Violence Situations Dataset、UBI Abnormal Event Detection Dataset。
2. 预测速度为NVIDIA T4 机器上使用TensorRT FP16时的速度, 速度包含数据预处理、模型预测、后处理全流程。
本项目关注的场景为监控摄像头下的打架行为识别。打架行为涉及多人,基于骨骼点技术的方案更适用于单人的行为识别。此外,打架行为对时序信息依赖较强,基于检测和分类的方案也不太适用。由于监控场景背景复杂,人的密集程度、光线、拍摄角度等都会对识别造成影响,本方案采用基于视频分类的方式判断视频中是否存在打架行为。针对摄像头距离人较远的情况,通过增大输入图像分辨率优化。由于训练数据有限,采用数据增强的方式提升模型的泛化性能。
### 配置说明
[配置文件](../../config/infer_cfg_pphuman.yml)中与行为识别相关的参数如下:
......@@ -206,29 +257,11 @@ python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pph
```
4. 启动命令中的完整参数说明,请参考[参数说明](./QUICK_STARTED.md)
测试效果如下:
<div width="1000" align="center">
<img src="../images/fight_demo.gif"/>
</div>
数据来源及版权归属:Surveillance Camera Fight Dataset。
### 方案说明
目前打架识别模型使用的是[PP-TSM](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/docs/zh-CN/model_zoo/recognition/pp-tsm.md),并在PP-TSM视频分类模型训练流程的基础上修改适配,完成模型训练。对于输入的视频或者视频流,进行等间隔抽帧,当视频帧累计到指定数目时,输入到视频分类模型中判断是否存在打架行为。
## 自定义模型训练
我们已经提供了检测/跟踪、关键点识别以及识别摔倒、吸烟、打电话以及打架的预训练模型,可直接下载使用。如果希望使用自定义场景数据训练,或是对模型进行优化,根据具体模型,分别参考下面的链接:
| 任务 | 算法 | 模型开发文档 |
| ---- | ---- | -------- |
| 行人检测/跟踪 | PP-YOLOE | [使用教程](../../../../configs/ppyoloe/README_cn.md#使用教程) |
| 关键点识别 | HRNet | [使用教程](../../../../configs/keypoint#3训练与测试) |
| 行为识别(摔倒)| ST-GCN | [使用教程](../../../../docs/advanced_tutorials/customization/action_recognotion/skeletonbased_rec.md) |
| 行为识别(吸烟)| PP-YOLOE | [使用教程](../../../../docs/advanced_tutorials/customization/action_recognotion/idbased_det.md) |
| 行为识别(打电话)| PP-HGNet | [使用教程](../../../../docs/advanced_tutorials/customization/action_recognotion/idbased_clas.md) |
| 行为识别 (打架)| PP-TSM | [使用教程](../../../../docs/advanced_tutorials/customization/action_recognotion/videobased_rec.md)
## 参考文献
```
......@@ -238,4 +271,4 @@ python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pph
booktitle = {AAAI},
year = {2018},
}
```
`````
......@@ -6,14 +6,17 @@
| 任务 | 算法 | 精度 | 预测速度(ms) |下载链接 |
|:---------------------|:---------:|:------:|:------:| :---------------------------------------------------------------------------------: |
| 行人属性高精度模型 | PP-HGNet_small | mA: 95.4 | 单人 1.54ms | [下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_small_person_attribute_954_infer.tar) |
| 行人属性轻量级模型 | PP-LCNet_x1_0 | mA: 94.5 | 单人 0.54ms | [下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPLCNet_x1_0_person_attribute_945_infer.tar) |
| 行人属性精度与速度均衡模型 | PP-HGNet_tiny | mA: 95.2 | 单人 1.14ms | [下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_tiny_person_attribute_952_infer.tar) |
1. 行人属性分析精度为[PA100k](https://github.com/xh-liu/HydraPlus-Net#pa-100k-dataset)[RAPv2](http://www.rapdataset.com/rapv2.html)[PETA](http://mmlab.ie.cuhk.edu.hk/projects/PETA.html)和部分业务数据融合训练测试得到
2. 预测速度为V100 机器上使用TensorRT FP16时的速度, 该处测速速度为模型预测速度
3. 属性模型应用依赖跟踪模型结果,请在[跟踪模型页面](./mot.md)下载跟踪模型,依自身需求选择高精或轻量级下载。
4. 模型下载后解压放置在PaddleDetection/output_inference/目录下。
| 行人检测/跟踪 | PP-YOLOE | mAP: 56.3 <br> MOTA: 72.0 | 检测: 16.2ms <br> 跟踪:22.3ms |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) |
| 行人属性高精度模型 | PP-HGNet_small | mA: 95.4 | 单人 1.54ms | [下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_small_person_attribute_954_infer.zip) |
| 行人属性轻量级模型 | PP-LCNet_x1_0 | mA: 94.5 | 单人 0.54ms | [下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPLCNet_x1_0_person_attribute_945_infer.zip) |
| 行人属性精度与速度均衡模型 | PP-HGNet_tiny | mA: 95.2 | 单人 1.14ms | [下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_tiny_person_attribute_952_infer.zip) |
1. 检测/跟踪模型精度为[MOT17](https://motchallenge.net/)[CrowdHuman](http://www.crowdhuman.org/)[HIEVE](http://humaninevents.org/)和部分业务数据融合训练测试得到。
2. 行人属性分析精度为[PA100k](https://github.com/xh-liu/HydraPlus-Net#pa-100k-dataset)[RAPv2](http://www.rapdataset.com/rapv2.html)[PETA](http://mmlab.ie.cuhk.edu.hk/projects/PETA.html)和部分业务数据融合训练测试得到
3. 预测速度为V100 机器上使用TensorRT FP16时的速度, 该处测速速度为模型预测速度
4. 属性模型应用依赖跟踪模型结果,请在[跟踪模型页面](./mot.md)下载跟踪模型,依自身需求选择高精或轻量级下载。
5. 模型下载后解压放置在PaddleDetection/output_inference/目录下。
## 使用方法
......
......@@ -6,8 +6,8 @@
| 任务 | 算法 | 精度 | 预测速度(ms) |下载链接 |
|:---------------------|:---------:|:------:|:------:| :---------------------------------------------------------------------------------: |
| 行人检测/跟踪 | PP-YOLOE-l | mAP: 56.6 <br> MOTA: 79.5 | 检测: 28.0ms <br> 跟踪:33.1ms | [下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) |
| 行人检测/跟踪 | PP-YOLOE-s | mAP: 53.2 <br> MOTA: 69.1 | 检测: 22.1ms <br> 跟踪:27.2ms | [下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_pipeline.zip) |
| 行人检测/跟踪 | PP-YOLOE-l | mAP: 56.6 <br> MOTA: 79.5 | 检测: 25.1ms <br> 跟踪:31.8ms | [下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) |
| 行人检测/跟踪 | PP-YOLOE-s | mAP: 53.2 <br> MOTA: 69.1 | 检测: 16.2ms <br> 跟踪:21.0ms | [下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_pipeline.zip) |
1. 检测/跟踪模型精度为[COCO-Person](http://cocodataset.org/), [CrowdHuman](http://www.crowdhuman.org/), [HIEVE](http://humaninevents.org/) 和部分业务数据融合训练测试得到,验证集为业务数据
2. 预测速度为T4 机器上使用TensorRT FP16时的速度, 速度包含数据预处理、模型预测、后处理全流程
......@@ -71,12 +71,16 @@ python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pph
- `--region_type`表示流量计数的区域,当设置`--do_break_in_counting`时仅可选择`custom`,默认是`custom`,表示以用户自定义区域为出入口,同一物体框的下边界中点坐标在相邻两秒内从区域外到区域内,即完成计数加一。
- `--region_polygon`表示用户自定义区域的多边形的点坐标序列,每两个为一对点坐标(x,y坐标),按顺时针顺序连成一个封闭区域,至少需要3对点也即6个整数,默认值是`[]`,需要用户自行设置点坐标。用户可以运行[此段代码](../../tools/get_video_info.py)获取所测视频的分辨率帧数,以及可以自定义画出自己想要的多边形区域的可视化并自己调整。运行方式如下:``` ```
自定义多边形区域的可视化代码运行如下:
<details>
```python
```python
python3.7 get_video_info.py --video_file=demo.mp4 --region_polygon 200 200 400 200 300 400 100 400
```
</details>
测试效果如下:
<div align="center">
<img src="https://user-images.githubusercontent.com/22989727/178769370-03ab1965-cfd1-401b-9902-82620a06e43c.gif" width='1000'/>
</div>
## 方案说明
......
......@@ -7,7 +7,7 @@ PP-Human跨镜头跟踪模块主要目的在于提供一套简洁、高效的跨
## 使用方法
1. 下载模型 [REID模型](https://bj.bcebos.com/v1/paddledet/models/pipeline/reid_model.zip) 并解压到```./output_inference```路径下,修改配置文件中模型路径。也可简单起见直接用默认配置,自动下载模型。 MOT模型请参考[mot说明](./mot.md)文件下载。
1. 下载模型 [行人跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)[REID模型](https://bj.bcebos.com/v1/paddledet/models/pipeline/reid_model.zip) 并解压到```./output_inference```路径下,修改配置文件中模型路径。也可简单起见直接用默认配置,自动下载模型。 MOT模型请参考[mot说明](./mot.md)文件下载。
2. 跨镜头跟踪模式下,要求输入的多个视频放在同一目录下,同时开启infer_cfg_pphuman.yml 中的REID选择中的enable=True, 命令如下:
```python
......
# 行为识别任务二次开发
在产业落地过程中应用行为识别算法,不可避免地会出现希望自定义类型的行为识别的需求,或是对已有行为识别模型的优化,以提升在特定场景下模型的效果。鉴于行为的多样性,PP-Human支持抽烟、打电话、摔倒、打架、人员闯入五种异常行为识别,并根据行为的不同,集成了基于视频分类、基于检测、基于图像分类以及基于骨骼点的四种行为识别技术方案,可覆盖90%+动作类型的识别,满足各类开发需求。我们在本文档通过案例来介绍如何根据期望识别的行为来进行行为识别方案的选择,以及使用PaddleDetection进行行为识别算法二次开发工作,包括:方案选择、数据准备、模型优化思路和新增行为的开发流程。
在产业落地过程中应用行为识别算法,不可避免地会出现希望自定义类型的行为识别的需求,或是对已有行为识别模型的优化,以提升在特定场景下模型的效果。鉴于行为的多样性,PP-Human支持抽烟、打电话、摔倒、打架、人员闯入五种异常行为识别,并根据行为的不同,集成了基于视频分类、基于检测、基于图像分类、基于跟踪以及基于骨骼点的五种行为识别技术方案,可覆盖90%+动作类型的识别,满足各类开发需求。我们在本文档通过案例来介绍如何根据期望识别的行为来进行行为识别方案的选择,以及使用PaddleDetection进行行为识别算法二次开发工作,包括:方案选择、数据准备、模型优化思路和新增行为的开发流程。
## 方案选择
在PaddleDetection的PP-Human中,我们为行为识别提供了多种方案:基于视频分类、基于图像分类、基于检测、以及基于骨骼点的行为识别方案,以期望满足不同场景、不同目标行为的需求。对于二次开发,首先我们需要确定要采用何种方案来实现行为识别的需求,其核心是要通过对场景和具体行为的分析、并考虑数据采集成本等因素,综合选择一个合适的识别方案。我们在这里简要列举了当前PaddleDetection中所支持的方案的优劣势和适用场景,供大家参考。
| 技术方案 | 方案说明 | 方案优势 | 方案劣势 | 适用场景 |
| :--: | :--: | :--: | :--: | :--: |
| 基于人体骨骼点的行为识别 | 1. 通过目标检测技术识别出图像中的人;<br> 2. 针对每个人,基于关键点检测技术识别出关键点;<br>3. 基于关键点序列变化识别出具体行为。 | 1. 可识别出每个人的行为;<br>2. 聚焦动作本身,鲁棒性和泛化性好; | 1. 对关键点检测依赖较强,人员较密集或存在遮挡等情况效果不佳;<br>2. 无法准确识别多人交互动作;<br>3. 难以处理需要外观及场景信息的动作;<br>4. 数据收集和标注困难; | 适用于根据人体结构关键点能够区分的行为,背景简单,人数不多场景,如健身场景。 |
| 基于人体id的分类 | 1. 通过目标检测技术得到图像中的人;<br>2. 针对每个人通过图像分类技术得到具体的行为类别。 | 1.通过检测技术可以为分类剔除无关背景的干扰,提升最终识别精度;<br>2. 方案简单,易于训练;<br>3. 数据采集容易;<br>4. 可结合跳帧及结果复用逻辑,速度快; | 1. 缺少时序信息;<br>2. 精度不高; | 对时序信息要求不强的动作,且动作既可通过人也可通过人+物的方式判断,如打电话。 |
| 基于人体id的检测 | 1. 通过目标检测技术得到画面中的人;<br>2. 根据检测结果将人物从原图中抠出,再在扣得的图像中再次用目标检测技术检测与行为强相关的目标。 | 1. 方案简单,易于训练;<br> 2. 可解释性强;<br> 3. 数据采集容易;<br> 4. 可结合跳帧及结果复用逻辑,速度快; | 1. 缺少时序信息;<br>2. 分辨率较低情况下效果不佳;<br> 3. 密集场景容易发生动作误匹配 | 行为与某特定目标强相关的场景,且目标较小,需要两级检测才能准确定位,如吸烟。 |
| 基于视频分类的行为识别 | 应用视频分类技术对整个视频场景进行分类。 | 1.充分利用背景上下文和时序信息;<br>2. 可利用语音、字幕等多模态信息;<br>3. 不依赖检测及跟踪模型;<br>4. 可处理多人共同组成的动作; | 1. 无法定位到具体某个人的行为;<br>2. 场景泛化能力较弱;<br>3.真实数据采集困难; | 无需具体到人的场景的判定,即判断是否存在某种特定行为,多人或对背景依赖较强的动作,如监控画面中打架识别等场景。 |
<img width="1091" alt="image" src="https://user-images.githubusercontent.com/22989727/178742352-d0c61784-3e93-4406-b2a2-9067f42cb343.png">
下面以PaddleDetection目前已经支持的几个具体动作为例,介绍每个动作方案的选型依据:
### 吸烟
方案选择:基于人体id的检测
方案选择:基于人体id检测的行为识别
原因:吸烟动作中具有香烟这个明显特征目标,因此我们可以认为当在某个人物的对应图像中检测到香烟时,该人物即在吸烟动作中。相比于基于视频或基于骨骼点的识别方案,训练检测模型需要采集的是图片级别而非视频级别的数据,可以明显减轻数据收集与标注的难度。此外,目标检测任务具有丰富的预训练模型资源,整体模型的效果会更有保障,
### 打电话
方案选择:基于人体id的分类
方案选择:基于人体id分类的行为识别
原因:打电话动作中虽然有手机这个特征目标,但为了区分看手机等动作,以及考虑到在安防场景下打电话动作中会出现较多对手机的遮挡(如手对手机的遮挡、人头对手机的遮挡等等),不利于检测模型正确检测到目标。同时打电话通常持续的时间较长,且人物本身的动作不会发生太大变化,因此可以因此采用帧级别图像分类的策略。
此外,打电话这个动作主要可以通过上半身判别,可以采用半身图片,去除冗余信息以降低模型训练的难度。
......@@ -36,6 +30,12 @@
原因:摔倒是一个明显的时序行为的动作,可由一个人物本身进行区分,具有场景无关的特性。由于PP-Human的场景定位偏向安防监控场景,背景变化较为复杂,且部署上需要考虑到实时性,因此采用了基于骨骼点的行为识别方案,以获得更好的泛化性及运行速度。
### 闯入
方案选择:基于人体id跟踪的行为识别
原因:闯入识别判断行人的路径或所在位置是否在某区域内即可,与人体自身动作无关,因此只需要跟踪人体跟踪结果分析是否存在闯入行为。
### 打架
方案选择:基于视频分类的行为识别
......@@ -45,7 +45,8 @@
下面详细展开四大类方案的数据准备、模型优化和新增行为识别方法
1. [基于人体id的检测](./idbased_det.md)
2. [基于人体id的分类](./idbased_clas.md)
1. [基于人体id检测的行为识别](./idbased_det.md)
2. [基于人体id分类的行为识别](./idbased_clas.md)
3. [基于人体骨骼点的行为识别](./skeletonbased_rec.md)
4. [基于视频分类的行为识别](./videobased_rec.md)
4. [基于人体id跟踪的行为识别](../mot.md)
5. [基于视频分类的行为识别](./videobased_rec.md)
......@@ -156,4 +156,4 @@ So these are the all steps which you need to follow in order to run FairMOT on y
A companion article which explains in details of this procedure will be released soon and a link to that article will be updated here soon.
To see the full code, please take a look at [Paddle OpenVINO Prediction](docs/advanced_tutorials/openvino_inference/fairmot_onnx_openvino.py).
\ No newline at end of file
To see the full code, please take a look at [Paddle OpenVINO Prediction](./fairmot_onnx_openvino.py).
......@@ -93,7 +93,7 @@ def predict(exec_net, input):
return result
```
您可能会惊讶地看到, 最激动人心的步骤居然如此简单。 不过下一个阶段会加复杂。
您可能会惊讶地看到, 最激动人心的步骤居然如此简单。 不过下一个阶段会更加复杂。
4. ### 后处理
......@@ -154,4 +154,4 @@ online_im = plot_tracking_dict(
之后会有一篇详细解释此过程的配套文章将会发布,并且该文章的链接将很快在此处更新。
完整代码请查看 [Paddle OpenVINO 预测](docs/advanced_tutorials/openvino_inference/fairmot_onnx_openvino.py).
\ No newline at end of file
完整代码请查看 [Paddle OpenVINO 预测](./fairmot_onnx_openvino.py).
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
README_cn.md
\ No newline at end of file
简体中文 | [English](README_en.md)
文档:[https://paddledetection.readthedocs.io](https://paddledetection.readthedocs.io)
# 简介
PaddleDetection飞桨目标检测开发套件,旨在帮助开发者更快更好地完成检测模型的组建、训练、优化及部署等全开发流程。
PaddleDetection模块化地实现了多种主流目标检测算法,提供了丰富的数据增强策略、网络模块组件(如骨干网络)、损失函数等,并集成了模型压缩和跨平台高性能部署能力。
经过长时间产业实践打磨,PaddleDetection已拥有顺畅、卓越的使用体验,被工业质检、遥感图像检测、无人巡检、新零售、互联网、科研等十多个行业的开发者广泛应用。
<div align="center">
<img src="docs/images/football.gif" width='800'/>
</div>
### 产品动态
- 2021.02.07: 发布release/2.0-rc版本,PaddleDetection动态图试用版本,详情参考[PaddleDetection动态图](dygraph)
- 2020.11.20: 发布release/0.5版本,详情请参考[版本更新文档](docs/CHANGELOG.md)
- 2020.11.10: 添加实例分割模型[SOLOv2](configs/solov2),在Tesla V100上达到38.6 FPS, COCO-val数据集上mask ap达到38.8,预测速度提高24%,mAP提高2.4个百分点。
- 2020.10.30: PP-YOLO支持矩形图像输入,并新增PACT模型量化策略。
- 2020.09.30: 发布[移动端检测demo](deploy/android_demo),可直接扫码安装体验。
- 2020.09.21-27: 【目标检测7日打卡课】手把手教你从入门到进阶,深入了解目标检测算法的前世今生。立即加入课程QQ交流群(1136406895)一起学习吧 :)
- 2020.07.24: 发布**产业最实用**目标检测模型 [PP-YOLO](https://arxiv.org/abs/2007.12099) ,深入考虑产业应用对精度速度的双重面诉求,COCO数据集精度45.2%(最新45.9%),Tesla V100预测速度72.9 FPS,详细信息见[文档](configs/ppyolo/README_cn.md)
- 2020.06.11: 发布676类大规模服务器端实用目标检测模型,适用于绝大部分使用场景,可以直接用来预测,也可以用于微调其他任务。
### 特性
- **模型丰富**: 包含**目标检测****实例分割****人脸检测****100+个预训练模型**,涵盖多种**全球竞赛冠军**方案
- **使用简洁**:模块化设计,解耦各个网络组件,开发者轻松搭建、试用各种检测模型及优化策略,快速得到高性能、定制化的算法。
- **端到端打通**: 从数据增强、组网、训练、压缩、部署端到端打通,并完备支持**云端**/**边缘端**多架构、多设备部署。
- **高性能**: 基于飞桨的高性能内核,模型训练速度及显存占用优势明显。支持FP16训练, 支持多机训练。
#### 套件结构概览
<table>
<tbody>
<tr align="center" valign="bottom">
<td>
<b>Architectures</b>
</td>
<td>
<b>Backbones</b>
</td>
<td>
<b>Components</b>
</td>
<td>
<b>Data Augmentation</b>
</td>
</tr>
<tr valign="top">
<td>
<ul><li><b>Two-Stage Detection</b></li>
<ul>
<li>Faster RCNN</li>
<li>FPN</li>
<li>Cascade-RCNN</li>
<li>Libra RCNN</li>
<li>Hybrid Task RCNN</li>
<li>PSS-Det RCNN</li>
</ul>
</ul>
<ul><li><b>One-Stage Detection</b></li>
<ul>
<li>RetinaNet</li>
<li>YOLOv3</li>
<li>YOLOv4</li>
<li>PP-YOLO</li>
<li>SSD</li>
</ul>
</ul>
<ul><li><b>Anchor Free</b></li>
<ul>
<li>CornerNet-Squeeze</li>
<li>FCOS</li>
<li>TTFNet</li>
</ul>
</ul>
<ul>
<li><b>Instance Segmentation</b></li>
<ul>
<li>Mask RCNN</li>
<li>SOLOv2</li>
</ul>
</ul>
<ul>
<li><b>Face-Detction</b></li>
<ul>
<li>FaceBoxes</li>
<li>BlazeFace</li>
<li>BlazeFace-NAS</li>
</ul>
</ul>
</td>
<td>
<ul>
<li>ResNet(&vd)</li>
<li>ResNeXt(&vd)</li>
<li>SENet</li>
<li>Res2Net</li>
<li>HRNet</li>
<li>Hourglass</li>
<li>CBNet</li>
<li>GCNet</li>
<li>DarkNet</li>
<li>CSPDarkNet</li>
<li>VGG</li>
<li>MobileNetv1/v3</li>
<li>GhostNet</li>
<li>Efficientnet</li>
</ul>
</td>
<td>
<ul><li><b>Common</b></li>
<ul>
<li>Sync-BN</li>
<li>Group Norm</li>
<li>DCNv2</li>
<li>Non-local</li>
</ul>
</ul>
<ul><li><b>FPN</b></li>
<ul>
<li>BiFPN</li>
<li>BFP</li>
<li>HRFPN</li>
<li>ACFPN</li>
</ul>
</ul>
<ul><li><b>Loss</b></li>
<ul>
<li>Smooth-L1</li>
<li>GIoU/DIoU/CIoU</li>
<li>IoUAware</li>
</ul>
</ul>
<ul><li><b>Post-processing</b></li>
<ul>
<li>SoftNMS</li>
<li>MatrixNMS</li>
</ul>
</ul>
<ul><li><b>Speed</b></li>
<ul>
<li>FP16 training</li>
<li>Multi-machine training </li>
</ul>
</ul>
</td>
<td>
<ul>
<li>Resize</li>
<li>Flipping</li>
<li>Expand</li>
<li>Crop</li>
<li>Color Distort</li>
<li>Random Erasing</li>
<li>Mixup </li>
<li>Cutmix </li>
<li>Grid Mask</li>
<li>Auto Augment</li>
</ul>
</td>
</tr>
</td>
</tr>
</tbody>
</table>
#### 模型性能概览
各模型结构和骨干网络的代表模型在COCO数据集上精度mAP和单卡Tesla V100上预测速度(FPS)对比图。
<div align="center">
<img src="docs/images/map_fps.png" />
</div>
**说明:**
- `CBResNet``Cascade-Faster-RCNN-CBResNet200vd-FPN`模型,COCO数据集mAP高达53.3%
- `Cascade-Faster-RCNN``Cascade-Faster-RCNN-ResNet50vd-DCN`,PaddleDetection将其优化到COCO数据mAP为47.8%时推理速度为20FPS
- PaddleDetection增强版`YOLOv3-ResNet50vd-DCN`在COCO数据集mAP高于原作10.6个绝对百分点,推理速度为61.3FPS,快于原作约70%
- 图中模型均可在[模型库](#模型库)中获取
## 文档教程
### 入门教程
- [安装说明](docs/tutorials/INSTALL_cn.md)
- [快速开始](docs/tutorials/QUICK_STARTED_cn.md)
- [如何准备数据](docs/tutorials/PrepareDataSet.md)
- [训练/评估/预测/部署流程](docs/tutorials/DetectionPipeline.md)
- [如何自定义数据集](docs/tutorials/Custom_DataSet.md)
- [常见问题汇总](docs/FAQ.md)
### 进阶教程
- 参数配置
- [配置模块设计和介绍](docs/advanced_tutorials/config_doc/CONFIG_cn.md)
- [RCNN参数说明](docs/advanced_tutorials/config_doc/RCNN_PARAMS_DOC.md)
- [YOLOv3参数说明](docs/advanced_tutorials/config_doc/yolov3_mobilenet_v1.md)
- 迁移学习
- [如何加载预训练](docs/advanced_tutorials/TRANSFER_LEARNING_cn.md)
- 模型压缩(基于[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim))
- [压缩benchmark](slim)
- [量化](slim/quantization), [剪枝](slim/prune), [蒸馏](slim/distillation), [搜索](slim/nas)
- 推理部署
- [模型导出教程](docs/advanced_tutorials/deploy/EXPORT_MODEL.md)
- [服务器端Python部署](deploy/python)
- [服务器端C++部署](deploy/cpp)
- [移动端部署](https://github.com/PaddlePaddle/Paddle-Lite-Demo)
- [在线Serving部署](deploy/serving)
- [推理Benchmark](docs/advanced_tutorials/deploy/BENCHMARK_INFER_cn.md)
- 进阶开发
- [新增数据预处理](docs/advanced_tutorials/READER.md)
- [新增检测算法](docs/advanced_tutorials/MODEL_TECHNICAL.md)
## 模型库
- 通用目标检测:
- [模型库和基线](docs/MODEL_ZOO_cn.md)
- [移动端模型](configs/mobile/README.md)
- [Anchor Free](configs/anchor_free/README.md)
- [PP-YOLO模型](configs/ppyolo/README_cn.md)
- [676类目标检测](docs/featured_model/LARGE_SCALE_DET_MODEL.md)
- [两阶段实用模型PSS-Det](configs/rcnn_enhance/README.md)
- 通用实例分割:
- [SOLOv2](configs/solov2/README.md)
- 垂类领域
- [人脸检测](docs/featured_model/FACE_DETECTION.md)
- [行人检测](docs/featured_model/CONTRIB_cn.md)
- [车辆检测](docs/featured_model/CONTRIB_cn.md)
- 比赛方案
- [Objects365 2019 Challenge夺冠模型](docs/featured_model/champion_model/CACascadeRCNN.md)
- [Open Images 2019-Object Detction比赛最佳单模型](docs/featured_model/champion_model/OIDV5_BASELINE_MODEL.md)
## 应用案例
- [人像圣诞特效自动生成工具](application/christmas)
## 第三方教程推荐
- [PaddleDetection在Windows下的部署(一)](https://zhuanlan.zhihu.com/p/268657833)
- [PaddleDetection在Windows下的部署(二)](https://zhuanlan.zhihu.com/p/280206376)
- [Jetson Nano上部署PaddleDetection经验分享](https://zhuanlan.zhihu.com/p/319371293)
- [安全帽检测YOLOv3模型在树莓派上的部署](https://github.com/PaddleCV-FAQ/PaddleDetection-FAQ/blob/main/Lite%E9%83%A8%E7%BD%B2/yolov3_for_raspi.md)
- [使用SSD-MobileNetv1完成一个项目--准备数据集到完成树莓派部署](https://github.com/PaddleCV-FAQ/PaddleDetection-FAQ/blob/main/Lite%E9%83%A8%E7%BD%B2/ssd_mobilenet_v1_for_raspi.md)
## 版本更新
v2.0-rc版本已经在`02/2021`发布,新增动态图版本,支持RCNN, YOLOv3, PP-YOLO, SSD/SSDLite, FCOS, TTFNet, SOLOv2等系列模型,支持模型剪裁和量化,支持预测部署及TensorRT推理加速,详细内容请参考[版本更新文档](docs/CHANGELOG.md)
## 许可证书
本项目的发布受[Apache 2.0 license](LICENSE)许可认证。
## 贡献代码
我们非常欢迎你可以为PaddleDetection提供代码,也十分感谢你的反馈。
English | [简体中文](README_cn.md)
Documentation:[https://paddledetection.readthedocs.io](https://paddledetection.readthedocs.io)
# Introduction
PaddleDetection is an end-to-end object detection development kit based on PaddlePaddle, which aims to help developers in the whole development of constructing, training, optimizing and deploying detection models in a faster and better way.
PaddleDetection implements varied mainstream object detection algorithms in modular design, and provides wealthy data augmentation methods, network components(such as backbones), loss functions, etc., and integrates abilities of model compression and cross-platform high-performance deployment.
After a long time of industry practice polishing, PaddleDetection has had smooth and excellent user experience, it has been widely used by developers in more than ten industries such as industrial quality inspection, remote sensing image object detection, automatic inspection, new retail, Internet, and scientific research.
<div align="center">
<img src="docs/images/football.gif" width='800'/>
</div>
### Product dynamic
- 2020.11.20: Release `release/0.5` version, Please refer to [change log](docs/CHANGELOG.md) for details.
- 2020.11.10: Added [SOLOv2](configs/solov2) as an instance segmentation model, which reached 38.6 FPS on a single Tesla V100, 38.8 mask AP on Coco-Val dataset, and inference speed increased by 24% and mAP by 2.4 percentage points.
- 2020.10.30: PP-YOLO support rectangular image input, and add a new PACT quantization strategy for slim。
- 2020.09.30: Released the [mobile-side detection demo](deploy/android_demo), and you can directly scan the code for installation experience.
- 2020.09.21-27: [Object detection 7 days of punching class] Hand in hand to teach you from the beginning to the advanced level, in-depth understanding of the object detection algorithm life. Join the course QQ group (1136406895) to study together :)
- 2020.07.24: [PP-YOLO](https://arxiv.org/abs/2007.12099), which is **the most practical** object detection model, was released, it deeply considers the double demands of industrial applications for accuracy and speed, and reached accuracy as 45.2% (the latest 45.9%) on COCO dataset, inference speed as 72.9 FPS on a single Test V100. Please refer to [PP-YOLO](configs/ppyolo/README.md) for details.
- 2020.06.11: Publish 676 classes of large-scale server-side practical object detection models that are applicable to most application scenarios and can be used directly for prediction or for fine-tuning other tasks.
### Features
- **Rich Models**
PaddleDetection provides rich of models, including **100+ pre-trained models** such as **object detection**, **instance segmentation**, **face detection** etc. It covers a variety of **global competition champion** schemes.
- **Use Concisely**
Modular design, decouple each network component, developers easily build and try various detection models and optimization strategies, quickly get high-performance, customized algorithm.
- **Getting Through End to End**
From data augmentation, constructing models, training, compression, depolyment, get through end to end, and complete support for multi-architecture, multi-device deployment for **cloud and edge device**.
- **High Performance:**
Based on the high performance core of PaddlePaddle, advantages of training speed and memory occupation are obvious. Support FP16 training, support multi-machine training.
#### Overview of Kit Structures
<table>
<tbody>
<tr align="center" valign="bottom">
<td>
<b>Architectures</b>
</td>
<td>
<b>Backbones</b>
</td>
<td>
<b>Components</b>
</td>
<td>
<b>Data Augmentation</b>
</td>
</tr>
<tr valign="top">
<td>
<ul><li><b>Two-Stage Detection</b></li>
<ul>
<li>Faster RCNN</li>
<li>FPN</li>
<li>Cascade-RCNN</li>
<li>Libra RCNN</li>
<li>Hybrid Task RCNN</li>
<li>PSS-Det RCNN</li>
</ul>
</ul>
<ul><li><b>One-Stage Detection</b></li>
<ul>
<li>RetinaNet</li>
<li>YOLOv3</li>
<li>YOLOv4</li>
<li>PP-YOLO</li>
<li>SSD</li>
</ul>
</ul>
<ul><li><b>Anchor Free</b></li>
<ul>
<li>CornerNet-Squeeze</li>
<li>FCOS</li>
<li>TTFNet</li>
</ul>
</ul>
<ul>
<li><b>Instance Segmentation</b></li>
<ul>
<li>Mask RCNN</li>
<li>SOLOv2</li>
</ul>
</ul>
<ul>
<li><b>Face-Detction</b></li>
<ul>
<li>FaceBoxes</li>
<li>BlazeFace</li>
<li>BlazeFace-NAS</li>
</ul>
</ul>
</td>
<td>
<ul>
<li>ResNet(&vd)</li>
<li>ResNeXt(&vd)</li>
<li>SENet</li>
<li>Res2Net</li>
<li>HRNet</li>
<li>Hourglass</li>
<li>CBNet</li>
<li>GCNet</li>
<li>DarkNet</li>
<li>CSPDarkNet</li>
<li>VGG</li>
<li>MobileNetv1/v3</li>
<li>GhostNet</li>
<li>Efficientnet</li>
</ul>
</td>
<td>
<ul><li><b>Common</b></li>
<ul>
<li>Sync-BN</li>
<li>Group Norm</li>
<li>DCNv2</li>
<li>Non-local</li>
</ul>
</ul>
<ul><li><b>FPN</b></li>
<ul>
<li>BiFPN</li>
<li>BFP</li>
<li>HRFPN</li>
<li>ACFPN</li>
</ul>
</ul>
<ul><li><b>Loss</b></li>
<ul>
<li>Smooth-L1</li>
<li>GIoU/DIoU/CIoU</li>
<li>IoUAware</li>
</ul>
</ul>
<ul><li><b>Post-processing</b></li>
<ul>
<li>SoftNMS</li>
<li>MatrixNMS</li>
</ul>
</ul>
<ul><li><b>Speed</b></li>
<ul>
<li>FP16 training</li>
<li>Multi-machine training </li>
</ul>
</ul>
</td>
<td>
<ul>
<li>Resize</li>
<li>Flipping</li>
<li>Expand</li>
<li>Crop</li>
<li>Color Distort</li>
<li>Random Erasing</li>
<li>Mixup </li>
<li>Cutmix </li>
<li>Grid Mask</li>
<li>Auto Augment</li>
</ul>
</td>
</tr>
</td>
</tr>
</tbody>
</table>
#### Overview of Model Performance
The relationship between COCO mAP and FPS on Tesla V100 of representative models of each architectures and backbones.
<div align="center">
<img src="docs/images/map_fps.png" />
</div>
**NOTE:**
- `CBResNet stands` for `Cascade-Faster-RCNN-CBResNet200vd-FPN`, which has highest mAP on COCO as 53.3%
- `Cascade-Faster-RCNN` stands for `Cascade-Faster-RCNN-ResNet50vd-DCN`, which has been optimized to 20 FPS inference speed when COCO mAP as 47.8% in PaddleDetection models
- The enhanced PaddleDetection model `YOLOv3-ResNet50vd-DCN` is 10.6 absolute percentage points higher than paper on COCO mAP, and inference speed is 61.3 fps, nearly 70% faster than the darknet framework.
All these models can be get in [Model Zoo](#ModelZoo)
## Tutorials
### Get Started
- [Installation guide](docs/tutorials/INSTALL_cn.md)
- [Quick start on small dataset](docs/tutorials/QUICK_STARTED_cn.md)
- [Prepare dataset](docs/tutorials/PrepareDataSet.md)
- [Train/Evaluation/Inference/Deploy](docs/tutorials/DetectionPipeline.md)
- [How to train a custom dataset](docs/tutorials/Custom_DataSet.md)
- [FAQ](docs/FAQ.md)
### Advanced Tutorials
- Parameter configuration
- [Introduction to the configuration workflow](docs/advanced_tutorials/config_doc/CONFIG_cn.md)
- [Parameter configuration for RCNN model](docs/advanced_tutorials/config_doc/RCNN_PARAMS_DOC.md)
- [Parameter configuration for YOLOv3 model](docs/advanced_tutorials/config_doc/yolov3_mobilenet_v1.md)
- Tansfer learning
- [How to load pretrained model](docs/advanced_tutorials/TRANSFER_LEARNING_cn.md)
- Model Compression(Based on [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim))
- [Model compression benchmark](slim)
- [Quantization](slim/quantization)
- [Model pruning](slim/prune)
- [Model distillation](slim/distillation)
- [Neural Architecture Search](slim/nas)
- Inference and deployment
- [Export model for inference](docs/advanced_tutorials/deploy/EXPORT_MODEL.md)
- [Python inference](deploy/python)
- [C++ inference](deploy/cpp)
- [Mobile](https://github.com/PaddlePaddle/Paddle-Lite-Demo)
- [Serving](deploy/serving)
- [Inference benchmark](docs/advanced_tutorials/deploy/BENCHMARK_INFER_cn.md)
- Advanced development
- [New data augmentations](docs/advanced_tutorials/READER.md)
- [New detection algorithms](docs/advanced_tutorials/MODEL_TECHNICAL.md)
## Model Zoo
- Universal object detection
- [Model library and baselines](docs/MODEL_ZOO_cn.md)
- [Mobile models](configs/mobile/README.md)
- [Anchor free models](configs/anchor_free/README.md)
- [PP-YOLO](configs/ppyolo/README_cn.md)
- [676 classes of object detection](docs/featured_model/LARGE_SCALE_DET_MODEL.md)
- [Two-stage practical PSS-Det](configs/rcnn_enhance/README.md)
- Universal instance segmentation
- [SOLOv2](configs/solov2/README.md)
- Vertical field
- [Face detection](docs/featured_model/FACE_DETECTION.md)
- [Pedestrian detection](docs/featured_model/CONTRIB_cn.md)
- [Vehicle detection](docs/featured_model/CONTRIB_cn.md)
- Competition Plan
- [Objects365 2019 Challenge champion model](docs/featured_model/champion_model/CACascadeRCNN.md)
- [Best single model of Open Images 2019-Object Detction](docs/featured_model/champion_model/OIDV5_BASELINE_MODEL.md)
## Applications
- [Christmas portrait automatic generation tool](application/christmas)
## Updates
v2.0-rc was released at `02/2021`, add dygraph version, which supports RCNN, YOLOv3, PP-YOLO, SSD/SSDLite, FCOS, TTFNet, SOLOv2, etc. supports model pruning and quantization, supports deploying and accelerating by TensorRT, etc. Please refer to [change log](docs/CHANGELOG.md) for details.
## License
PaddleDetection is released under the [Apache 2.0 license](LICENSE).
## Contributing
Contributions are highly welcomed and we would really appreciate your feedback!!
# 人像圣诞特效自动生成工具
通过SOLOv2实例分割模型分割人像,并通过BlazeFace关键点模型检测人脸关键点,然后根据两个模型输出结果更换圣诞风格背景并为人脸加上圣诞老人胡子、圣诞眼镜及圣诞帽等特效。本项目通过PaddleHub可直接发布Server服务,供本地调试与前端直接调用接口。您可通过以下二维码中微信小程序直接体验:
<div align="center">
<img src="demo_images/wechat_app.jpeg" width='400'/>
</div>
## 环境搭建
### 环境依赖
- paddlepaddle >= 2.0.0rc0
- paddlehub >= 2.0.0b1
### 模型准备
- 首先要获取模型,可在[模型配置文件](../../configs)里配置`solov2``blazeface_keypoint`,训练模型,并[导出模型](../../docs/advanced_tutorials/deploy/EXPORT_MODEL.md)。也可直接下载我们准备好模型:
[blazeface_keypoint模型](https://paddlemodels.bj.bcebos.com/object_detection/application/blazeface_keypoint.tar)
[solov2模型](https://paddlemodels.bj.bcebos.com/object_detection/application/solov2_r101_vd_fpn_3x.tar)
**注意:** 下载的模型需要解压后使用。
- 然后将两个模型文件夹中的文件(`infer_cfg.yml``__model__``__params__`)分别拷贝至`blazeface/blazeface_keypoint/``solov2/solov2_r101_vd_fpn_3x/`文件夹内。
### hub安装blazeface和solov2模型
```shell
hub install solov2
hub install blazeface
```
### hub安装solov2_blazeface圣诞特效自动生成串联模型
```shell
$ hub install solov2_blazeface
```
## 开始测试
### 本地测试
```shell
python test_main.py
```
运行成功后,预测结果会保存到`chrismas_final.png`
### serving测试
- step1: 启动服务
```shell
export CUDA_VISIBLE_DEVICES=0
hub serving start -m solov2_blazeface -p 8880
```
- step2: 在服务端发送预测请求
```shell
python test_server.py
```
运行成功后,预测结果会保存到`chrismas_final.png`
## 效果展示
<div align="center">
<img src="demo_images/test.jpg" height="600px" ><img src="demo_images/result.png" height="600px" >
</div>
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import base64
import cv2
import numpy as np
from PIL import Image, ImageDraw
import paddle.fluid as fluid
def create_inputs(im, im_info):
"""generate input for different model type
Args:
im (np.ndarray): image (np.ndarray)
im_info (dict): info of image
Returns:
inputs (dict): input of model
"""
inputs = {}
inputs['image'] = im
origin_shape = list(im_info['origin_shape'])
resize_shape = list(im_info['resize_shape'])
pad_shape = list(im_info['pad_shape']) if im_info[
'pad_shape'] is not None else list(im_info['resize_shape'])
scale_x, scale_y = im_info['scale']
scale = scale_x
im_info = np.array([resize_shape + [scale]]).astype('float32')
inputs['im_info'] = im_info
return inputs
def visualize_box_mask(im,
results,
labels=None,
mask_resolution=14,
threshold=0.5):
"""
Args:
im (str/np.ndarray): path of image/np.ndarray read by cv2
results (dict): include 'boxes': np.ndarray: shape:[N,6], N: number of box,
matix element:[class, score, x_min, y_min, x_max, y_max]
MaskRCNN's results include 'masks': np.ndarray:
shape:[N, class_num, mask_resolution, mask_resolution]
labels (list): labels:['class1', ..., 'classn']
mask_resolution (int): shape of a mask is:[mask_resolution, mask_resolution]
threshold (float): Threshold of score.
Returns:
im (PIL.Image.Image): visualized image
"""
if not labels:
labels = ['background', 'person']
if isinstance(im, str):
im = Image.open(im).convert('RGB')
else:
im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
im = Image.fromarray(im)
if 'masks' in results and 'boxes' in results:
im = draw_mask(
im,
results['boxes'],
results['masks'],
labels,
resolution=mask_resolution)
if 'boxes' in results:
im = draw_box(im, results['boxes'], labels)
if 'segm' in results:
im = draw_segm(
im,
results['segm'],
results['label'],
results['score'],
labels,
threshold=threshold)
if 'landmark' in results:
im = draw_lmk(im, results['landmark'])
return im
def get_color_map_list(num_classes):
"""
Args:
num_classes (int): number of class
Returns:
color_map (list): RGB color list
"""
color_map = num_classes * [0, 0, 0]
for i in range(0, num_classes):
j = 0
lab = i
while lab:
color_map[i * 3] |= (((lab >> 0) & 1) << (7 - j))
color_map[i * 3 + 1] |= (((lab >> 1) & 1) << (7 - j))
color_map[i * 3 + 2] |= (((lab >> 2) & 1) << (7 - j))
j += 1
lab >>= 3
color_map = [color_map[i:i + 3] for i in range(0, len(color_map), 3)]
return color_map
def expand_boxes(boxes, scale=0.0):
"""
Args:
boxes (np.ndarray): shape:[N,4], N:number of box,
matix element:[x_min, y_min, x_max, y_max]
scale (float): scale of boxes
Returns:
boxes_exp (np.ndarray): expanded boxes
"""
w_half = (boxes[:, 2] - boxes[:, 0]) * .5
h_half = (boxes[:, 3] - boxes[:, 1]) * .5
x_c = (boxes[:, 2] + boxes[:, 0]) * .5
y_c = (boxes[:, 3] + boxes[:, 1]) * .5
w_half *= scale
h_half *= scale
boxes_exp = np.zeros(boxes.shape)
boxes_exp[:, 0] = x_c - w_half
boxes_exp[:, 2] = x_c + w_half
boxes_exp[:, 1] = y_c - h_half
boxes_exp[:, 3] = y_c + h_half
return boxes_exp
def draw_mask(im, np_boxes, np_masks, labels, resolution=14, threshold=0.5):
"""
Args:
im (PIL.Image.Image): PIL image
np_boxes (np.ndarray): shape:[N,6], N: number of box,
matix element:[class, score, x_min, y_min, x_max, y_max]
np_masks (np.ndarray): shape:[N, class_num, resolution, resolution]
labels (list): labels:['class1', ..., 'classn']
resolution (int): shape of a mask is:[resolution, resolution]
threshold (float): threshold of mask
Returns:
im (PIL.Image.Image): visualized image
"""
color_list = get_color_map_list(len(labels))
scale = (resolution + 2.0) / resolution
im_w, im_h = im.size
w_ratio = 0.4
alpha = 0.7
im = np.array(im).astype('float32')
rects = np_boxes[:, 2:]
expand_rects = expand_boxes(rects, scale)
expand_rects = expand_rects.astype(np.int32)
clsid_scores = np_boxes[:, 0:2]
padded_mask = np.zeros((resolution + 2, resolution + 2), dtype=np.float32)
clsid2color = {}
for idx in range(len(np_boxes)):
clsid, score = clsid_scores[idx].tolist()
clsid = int(clsid)
xmin, ymin, xmax, ymax = expand_rects[idx].tolist()
w = xmax - xmin + 1
h = ymax - ymin + 1
w = np.maximum(w, 1)
h = np.maximum(h, 1)
padded_mask[1:-1, 1:-1] = np_masks[idx, int(clsid), :, :]
resized_mask = cv2.resize(padded_mask, (w, h))
resized_mask = np.array(resized_mask > threshold, dtype=np.uint8)
x0 = min(max(xmin, 0), im_w)
x1 = min(max(xmax + 1, 0), im_w)
y0 = min(max(ymin, 0), im_h)
y1 = min(max(ymax + 1, 0), im_h)
im_mask = np.zeros((im_h, im_w), dtype=np.uint8)
im_mask[y0:y1, x0:x1] = resized_mask[(y0 - ymin):(y1 - ymin), (
x0 - xmin):(x1 - xmin)]
if clsid not in clsid2color:
clsid2color[clsid] = color_list[clsid]
color_mask = clsid2color[clsid]
for c in range(3):
color_mask[c] = color_mask[c] * (1 - w_ratio) + w_ratio * 255
idx = np.nonzero(im_mask)
color_mask = np.array(color_mask)
im[idx[0], idx[1], :] *= 1.0 - alpha
im[idx[0], idx[1], :] += alpha * color_mask
return Image.fromarray(im.astype('uint8'))
def draw_box(im, np_boxes, labels):
"""
Args:
im (PIL.Image.Image): PIL image
np_boxes (np.ndarray): shape:[N,6], N: number of box,
matix element:[class, score, x_min, y_min, x_max, y_max]
labels (list): labels:['class1', ..., 'classn']
Returns:
im (PIL.Image.Image): visualized image
"""
draw_thickness = min(im.size) // 320
draw = ImageDraw.Draw(im)
clsid2color = {}
color_list = get_color_map_list(len(labels))
for dt in np_boxes:
clsid, bbox, score = int(dt[0]), dt[2:], dt[1]
xmin, ymin, xmax, ymax = bbox
w = xmax - xmin
h = ymax - ymin
if clsid not in clsid2color:
clsid2color[clsid] = color_list[clsid]
color = tuple(clsid2color[clsid])
# draw bbox
draw.line(
[(xmin, ymin), (xmin, ymax), (xmax, ymax), (xmax, ymin),
(xmin, ymin)],
width=draw_thickness,
fill=color)
# draw label
text = "{} {:.4f}".format(labels[clsid], score)
tw, th = draw.textsize(text)
draw.rectangle(
[(xmin + 1, ymin - th), (xmin + tw + 1, ymin)], fill=color)
draw.text((xmin + 1, ymin - th), text, fill=(255, 255, 255))
return im
def draw_segm(im,
np_segms,
np_label,
np_score,
labels,
threshold=0.5,
alpha=0.7):
"""
Draw segmentation on image
"""
mask_color_id = 0
w_ratio = .4
color_list = get_color_map_list(len(labels))
im = np.array(im).astype('float32')
clsid2color = {}
np_segms = np_segms.astype(np.uint8)
index = np.where(np_label == 0)[0]
index = np.where(np_score[index] > threshold)[0]
person_segms = np_segms[index]
person_mask = np.sum(person_segms, axis=0)
person_mask[person_mask > 1] = 1
person_mask = np.expand_dims(person_mask, axis=2)
person_mask = np.repeat(person_mask, 3, axis=2)
im = im * person_mask
return Image.fromarray(im.astype('uint8'))
def load_predictor(model_dir,
run_mode='fluid',
batch_size=1,
use_gpu=False,
min_subgraph_size=3):
"""set AnalysisConfig, generate AnalysisPredictor
Args:
model_dir (str): root path of __model__ and __params__
use_gpu (bool): whether use gpu
Returns:
predictor (PaddlePredictor): AnalysisPredictor
Raises:
ValueError: predict by TensorRT need use_gpu == True.
"""
if not use_gpu and not run_mode == 'fluid':
raise ValueError(
"Predict by TensorRT mode: {}, expect use_gpu==True, but use_gpu == {}"
.format(run_mode, use_gpu))
if run_mode == 'trt_int8':
raise ValueError("TensorRT int8 mode is not supported now, "
"please use trt_fp32 or trt_fp16 instead.")
precision_map = {
'trt_int8': fluid.core.AnalysisConfig.Precision.Int8,
'trt_fp32': fluid.core.AnalysisConfig.Precision.Float32,
'trt_fp16': fluid.core.AnalysisConfig.Precision.Half
}
config = fluid.core.AnalysisConfig(
os.path.join(model_dir, '__model__'),
os.path.join(model_dir, '__params__'))
if use_gpu:
# initial GPU memory(M), device ID
config.enable_use_gpu(100, 0)
# optimize graph and fuse op
config.switch_ir_optim(True)
else:
config.disable_gpu()
if run_mode in precision_map.keys():
config.enable_tensorrt_engine(
workspace_size=1 << 10,
max_batch_size=batch_size,
min_subgraph_size=min_subgraph_size,
precision_mode=precision_map[run_mode],
use_static=False,
use_calib_mode=False)
# disable print log when predict
config.disable_glog_info()
# enable shared memory
config.enable_memory_optim()
# disable feed, fetch OP, needed by zero_copy_run
config.switch_use_feed_fetch_ops(False)
predictor = fluid.core.create_paddle_predictor(config)
return predictor
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
def lmk2out(bboxes, np_lmk, im_info, threshold=0.5, is_bbox_normalized=True):
image_w, image_h = im_info['origin_shape']
scale = im_info['scale']
face_index, landmark, prior_box = np_lmk[:]
xywh_res = []
if bboxes.shape == (1, 1) or bboxes is None:
return np.array([])
prior = np.reshape(prior_box, (-1, 4))
predict_lmk = np.reshape(landmark, (-1, 10))
k = 0
for i in range(bboxes.shape[0]):
score = bboxes[i][1]
if score < threshold:
continue
theindex = face_index[i][0]
me_prior = prior[theindex, :]
lmk_pred = predict_lmk[theindex, :]
prior_h = me_prior[2] - me_prior[0]
prior_w = me_prior[3] - me_prior[1]
prior_h_center = (me_prior[2] + me_prior[0]) / 2
prior_w_center = (me_prior[3] + me_prior[1]) / 2
lmk_decode = np.zeros((10))
for j in [0, 2, 4, 6, 8]:
lmk_decode[j] = lmk_pred[j] * 0.1 * prior_w + prior_h_center
for j in [1, 3, 5, 7, 9]:
lmk_decode[j] = lmk_pred[j] * 0.1 * prior_h + prior_w_center
if is_bbox_normalized:
lmk_decode = lmk_decode * np.array([
image_h, image_w, image_h, image_w, image_h, image_w, image_h,
image_w, image_h, image_w
])
xywh_res.append(lmk_decode)
return np.asarray(xywh_res)
def draw_lmk(image, lmk_results):
draw = ImageDraw.Draw(image)
for lmk_decode in lmk_results:
for j in range(5):
x1 = int(round(lmk_decode[2 * j]))
y1 = int(round(lmk_decode[2 * j + 1]))
draw.ellipse(
(x1 - 2, y1 - 2, x1 + 3, y1 + 3), fill='green', outline='green')
return image
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import time
from functools import reduce
import cv2
import numpy as np
from paddlehub.module.module import moduleinfo
import blazeface.data_feed as D
@moduleinfo(
name="blazeface",
type="CV/image_editing",
author="paddlepaddle",
author_email="",
summary="blazeface is a face key point detection model.",
version="1.0.0")
class Detector(object):
"""
Args:
config (object): config of model, defined by `Config(model_dir)`
model_dir (str): root path of __model__, __params__ and infer_cfg.yml
use_gpu (bool): whether use gpu
run_mode (str): mode of running(fluid/trt_fp32/trt_fp16)
threshold (float): threshold to reserve the result for output.
"""
def __init__(self,
min_subgraph_size=60,
use_gpu=False,
run_mode='fluid',
threshold=0.5):
model_dir = os.path.join(self.directory, 'blazeface_keypoint')
self.predictor = D.load_predictor(
model_dir,
run_mode=run_mode,
min_subgraph_size=min_subgraph_size,
use_gpu=use_gpu)
def face_img_process(self,
image,
mean=[104., 117., 123.],
std=[127.502231, 127.502231, 127.502231]):
image = np.array(image)
# HWC to CHW
if len(image.shape) == 3:
image = np.swapaxes(image, 1, 2)
image = np.swapaxes(image, 1, 0)
# RBG to BGR
image = image[[2, 1, 0], :, :]
image = image.astype('float32')
image -= np.array(mean)[:, np.newaxis, np.newaxis].astype('float32')
image /= np.array(std)[:, np.newaxis, np.newaxis].astype('float32')
image = [image]
image = np.array(image)
return image
def transform(self, image, shrink):
im_info = {
'scale': [1., 1.],
'origin_shape': None,
'resize_shape': None,
'pad_shape': None,
}
if isinstance(image, str):
with open(image, 'rb') as f:
im_read = f.read()
image = np.frombuffer(im_read, dtype='uint8')
image = cv2.imdecode(image, 1) # BGR mode, but need RGB mode
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
im_info['origin_shape'] = image.shape[:2]
else:
im_info['origin_shape'] = image.shape[:2]
image_shape = [3, image.shape[0], image.shape[1]]
h, w = shrink, shrink
image = cv2.resize(image, (w, h))
im_info['resize_shape'] = image.shape[:2]
image = self.face_img_process(image)
inputs = D.create_inputs(image, im_info)
return inputs, im_info
def postprocess(self, boxes_list, lmks_list, im_info, threshold=0.5):
assert len(boxes_list) == len(lmks_list)
best_np_boxes, best_np_lmk = boxes_list[0], lmks_list[0]
for i in range(1, len(boxes_list)):
#judgment detection score
if boxes_list[i][0][1] > 0.9:
break
face_width = boxes_list[i][0][4] - boxes_list[i][0][2]
if boxes_list[i][0][1] - best_np_boxes[0][
1] > 0.01 and face_width > 0.2:
best_np_boxes, best_np_lmk = boxes_list[i], lmks_list[i]
# postprocess output of predictor
results = {}
results['landmark'] = D.lmk2out(best_np_boxes, best_np_lmk, im_info,
threshold)
w, h = im_info['origin_shape']
best_np_boxes[:, 2] *= h
best_np_boxes[:, 3] *= w
best_np_boxes[:, 4] *= h
best_np_boxes[:, 5] *= w
expect_boxes = (best_np_boxes[:, 1] > threshold) & (
best_np_boxes[:, 0] > -1)
best_np_boxes = best_np_boxes[expect_boxes, :]
for box in best_np_boxes:
print('class_id:{:d}, confidence:{:.4f},'
'left_top:[{:.2f},{:.2f}],'
' right_bottom:[{:.2f},{:.2f}]'.format(
int(box[0]), box[1], box[2], box[3], box[4], box[5]))
results['boxes'] = best_np_boxes
return results
def predict(self,
image,
threshold=0.5,
repeats=1,
visualization=False,
with_lmk=True,
save_dir='blaze_result'):
'''
Args:
image (str/np.ndarray): path of image/ np.ndarray read by cv2
threshold (float): threshold of predicted box' score
Returns:
results (dict): include 'boxes': np.ndarray: shape:[N,6], N: number of box,
matix element:[class, score, x_min, y_min, x_max, y_max]
'''
shrink = [960, 640, 480, 320, 180]
boxes_list = []
lmks_list = []
for sh in shrink:
inputs, im_info = self.transform(image, shrink=sh)
np_boxes, np_lmk = None, None
input_names = self.predictor.get_input_names()
for i in range(len(input_names)):
input_tensor = self.predictor.get_input_tensor(input_names[i])
input_tensor.copy_from_cpu(inputs[input_names[i]])
t1 = time.time()
for i in range(repeats):
self.predictor.zero_copy_run()
output_names = self.predictor.get_output_names()
boxes_tensor = self.predictor.get_output_tensor(output_names[0])
np_boxes = boxes_tensor.copy_to_cpu()
if with_lmk == True:
face_index = self.predictor.get_output_tensor(output_names[
1])
landmark = self.predictor.get_output_tensor(output_names[2])
prior_boxes = self.predictor.get_output_tensor(output_names[
3])
np_face_index = face_index.copy_to_cpu()
np_prior_boxes = prior_boxes.copy_to_cpu()
np_landmark = landmark.copy_to_cpu()
np_lmk = [np_face_index, np_landmark, np_prior_boxes]
t2 = time.time()
ms = (t2 - t1) * 1000.0 / repeats
print("Inference: {} ms per batch image".format(ms))
# do not perform postprocess in benchmark mode
results = []
if reduce(lambda x, y: x * y, np_boxes.shape) < 6:
print('[WARNNING] No object detected.')
results = {'boxes': np.array([])}
else:
boxes_list.append(np_boxes)
lmks_list.append(np_lmk)
results = self.postprocess(
boxes_list, lmks_list, im_info, threshold=threshold)
if visualization:
if not os.path.exists(save_dir):
os.makedirs(save_dir)
output = D.visualize_box_mask(
im=image, results=results, labels=["background", "face"])
name = str(time.time()) + '.png'
save_path = os.path.join(save_dir, name)
output.save(save_path)
img = cv2.cvtColor(np.array(output), cv2.COLOR_RGB2BGR)
results['image'] = img
return results
{"path":"/Users/yuzhiliang/Downloads/docsmall-2/12.png","outputs":{"object":[{"name":"local","bndbox":{"xmin":282,"ymin":366,"xmax":3451,"ymax":4603}}]},"time_labeled":1608631688933,"labeled":true,"size":{"width":3714,"height":5725,"depth":3}}
\ No newline at end of file
{"path":"/Users/yuzhiliang/Downloads/docsmall-2/2.png","outputs":{"object":[{"name":"local","bndbox":{"xmin":336,"ymin":512,"xmax":3416,"ymax":4672}}]},"time_labeled":1608631696021,"labeled":true,"size":{"width":3714,"height":5275,"depth":3}}
\ No newline at end of file
{"path":"/Users/yuzhiliang/Downloads/docsmall-2/3.png","outputs":{"object":[{"name":"local","bndbox":{"xmin":376,"ymin":352,"xmax":3448,"ymax":4544}}]},"time_labeled":1608631701740,"labeled":true,"size":{"width":3714,"height":5275,"depth":3}}
\ No newline at end of file
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import base64
import cv2
import numpy as np
from PIL import Image, ImageDraw
import paddle.fluid as fluid
def create_inputs(im, im_info):
"""generate input for different model type
Args:
im (np.ndarray): image (np.ndarray)
im_info (dict): info of image
Returns:
inputs (dict): input of model
"""
inputs = {}
inputs['image'] = im
origin_shape = list(im_info['origin_shape'])
resize_shape = list(im_info['resize_shape'])
pad_shape = list(im_info['pad_shape']) if im_info[
'pad_shape'] is not None else list(im_info['resize_shape'])
scale_x, scale_y = im_info['scale']
scale = scale_x
im_info = np.array([resize_shape + [scale]]).astype('float32')
inputs['im_info'] = im_info
return inputs
def visualize_box_mask(im,
results,
labels=None,
mask_resolution=14,
threshold=0.5):
"""
Args:
im (str/np.ndarray): path of image/np.ndarray read by cv2
results (dict): include 'boxes': np.ndarray: shape:[N,6], N: number of box,
matix element:[class, score, x_min, y_min, x_max, y_max]
MaskRCNN's results include 'masks': np.ndarray:
shape:[N, class_num, mask_resolution, mask_resolution]
labels (list): labels:['class1', ..., 'classn']
mask_resolution (int): shape of a mask is:[mask_resolution, mask_resolution]
threshold (float): Threshold of score.
Returns:
im (PIL.Image.Image): visualized image
"""
if not labels:
labels = [
'background', 'person', 'bicycle', 'car', 'motorcycle', 'airplane',
'bus', 'train', 'truck', 'boat', 'traffic light', 'fire', 'hydrant',
'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog',
'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe',
'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat',
'baseball glove', 'skateboard', 'surfboard', 'tennis racket',
'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl',
'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot',
'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop',
'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase',
'scissors', 'teddy bear', 'hair drier', 'toothbrush'
]
if isinstance(im, str):
im = Image.open(im).convert('RGB')
else:
im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
im = Image.fromarray(im)
if 'masks' in results and 'boxes' in results:
im = draw_mask(
im,
results['boxes'],
results['masks'],
labels,
resolution=mask_resolution)
if 'boxes' in results:
im = draw_box(im, results['boxes'], labels)
if 'segm' in results:
im = draw_segm(
im,
results['segm'],
results['label'],
results['score'],
labels,
threshold=threshold)
return im
def get_color_map_list(num_classes):
"""
Args:
num_classes (int): number of class
Returns:
color_map (list): RGB color list
"""
color_map = num_classes * [0, 0, 0]
for i in range(0, num_classes):
j = 0
lab = i
while lab:
color_map[i * 3] |= (((lab >> 0) & 1) << (7 - j))
color_map[i * 3 + 1] |= (((lab >> 1) & 1) << (7 - j))
color_map[i * 3 + 2] |= (((lab >> 2) & 1) << (7 - j))
j += 1
lab >>= 3
color_map = [color_map[i:i + 3] for i in range(0, len(color_map), 3)]
return color_map
def expand_boxes(boxes, scale=0.0):
"""
Args:
boxes (np.ndarray): shape:[N,4], N:number of box,
matix element:[x_min, y_min, x_max, y_max]
scale (float): scale of boxes
Returns:
boxes_exp (np.ndarray): expanded boxes
"""
w_half = (boxes[:, 2] - boxes[:, 0]) * .5
h_half = (boxes[:, 3] - boxes[:, 1]) * .5
x_c = (boxes[:, 2] + boxes[:, 0]) * .5
y_c = (boxes[:, 3] + boxes[:, 1]) * .5
w_half *= scale
h_half *= scale
boxes_exp = np.zeros(boxes.shape)
boxes_exp[:, 0] = x_c - w_half
boxes_exp[:, 2] = x_c + w_half
boxes_exp[:, 1] = y_c - h_half
boxes_exp[:, 3] = y_c + h_half
return boxes_exp
def draw_mask(im, np_boxes, np_masks, labels, resolution=14, threshold=0.5):
"""
Args:
im (PIL.Image.Image): PIL image
np_boxes (np.ndarray): shape:[N,6], N: number of box,
matix element:[class, score, x_min, y_min, x_max, y_max]
np_masks (np.ndarray): shape:[N, class_num, resolution, resolution]
labels (list): labels:['class1', ..., 'classn']
resolution (int): shape of a mask is:[resolution, resolution]
threshold (float): threshold of mask
Returns:
im (PIL.Image.Image): visualized image
"""
color_list = get_color_map_list(len(labels))
scale = (resolution + 2.0) / resolution
im_w, im_h = im.size
w_ratio = 0.4
alpha = 0.7
im = np.array(im).astype('float32')
rects = np_boxes[:, 2:]
expand_rects = expand_boxes(rects, scale)
expand_rects = expand_rects.astype(np.int32)
clsid_scores = np_boxes[:, 0:2]
padded_mask = np.zeros((resolution + 2, resolution + 2), dtype=np.float32)
clsid2color = {}
for idx in range(len(np_boxes)):
clsid, score = clsid_scores[idx].tolist()
clsid = int(clsid)
xmin, ymin, xmax, ymax = expand_rects[idx].tolist()
w = xmax - xmin + 1
h = ymax - ymin + 1
w = np.maximum(w, 1)
h = np.maximum(h, 1)
padded_mask[1:-1, 1:-1] = np_masks[idx, int(clsid), :, :]
resized_mask = cv2.resize(padded_mask, (w, h))
resized_mask = np.array(resized_mask > threshold, dtype=np.uint8)
x0 = min(max(xmin, 0), im_w)
x1 = min(max(xmax + 1, 0), im_w)
y0 = min(max(ymin, 0), im_h)
y1 = min(max(ymax + 1, 0), im_h)
im_mask = np.zeros((im_h, im_w), dtype=np.uint8)
im_mask[y0:y1, x0:x1] = resized_mask[(y0 - ymin):(y1 - ymin), (
x0 - xmin):(x1 - xmin)]
if clsid not in clsid2color:
clsid2color[clsid] = color_list[clsid]
color_mask = clsid2color[clsid]
for c in range(3):
color_mask[c] = color_mask[c] * (1 - w_ratio) + w_ratio * 255
idx = np.nonzero(im_mask)
color_mask = np.array(color_mask)
im[idx[0], idx[1], :] *= 1.0 - alpha
im[idx[0], idx[1], :] += alpha * color_mask
return Image.fromarray(im.astype('uint8'))
def draw_box(im, np_boxes, labels):
"""
Args:
im (PIL.Image.Image): PIL image
np_boxes (np.ndarray): shape:[N,6], N: number of box,
matix element:[class, score, x_min, y_min, x_max, y_max]
labels (list): labels:['class1', ..., 'classn']
Returns:
im (PIL.Image.Image): visualized image
"""
draw_thickness = min(im.size) // 320
draw = ImageDraw.Draw(im)
clsid2color = {}
color_list = get_color_map_list(len(labels))
for dt in np_boxes:
clsid, bbox, score = int(dt[0]), dt[2:], dt[1]
xmin, ymin, xmax, ymax = bbox
w = xmax - xmin
h = ymax - ymin
if clsid not in clsid2color:
clsid2color[clsid] = color_list[clsid]
color = tuple(clsid2color[clsid])
# draw bbox
draw.line(
[(xmin, ymin), (xmin, ymax), (xmax, ymax), (xmax, ymin),
(xmin, ymin)],
width=draw_thickness,
fill=color)
# draw label
text = "{} {:.4f}".format(labels[clsid], score)
tw, th = draw.textsize(text)
draw.rectangle(
[(xmin + 1, ymin - th), (xmin + tw + 1, ymin)], fill=color)
draw.text((xmin + 1, ymin - th), text, fill=(255, 255, 255))
return im
def draw_segm(im,
np_segms,
np_label,
np_score,
labels,
threshold=0.5,
alpha=0.7):
"""
Draw segmentation on image
"""
mask_color_id = 0
w_ratio = .4
color_list = get_color_map_list(len(labels))
im = np.array(im).astype('float32')
clsid2color = {}
np_segms = np_segms.astype(np.uint8)
index = np.where(np_label == 0)[0]
index = np.where(np_score[index] > threshold)[0]
person_segms = np_segms[index]
person_mask = np.sum(person_segms, axis=0)
person_mask[person_mask > 1] = 1
person_mask = np.expand_dims(person_mask, axis=2)
person_mask = np.repeat(person_mask, 3, axis=2)
im = im * person_mask
return Image.fromarray(im.astype('uint8'))
def load_predictor(model_dir,
run_mode='fluid',
batch_size=1,
use_gpu=False,
min_subgraph_size=3):
"""set AnalysisConfig, generate AnalysisPredictor
Args:
model_dir (str): root path of __model__ and __params__
use_gpu (bool): whether use gpu
Returns:
predictor (PaddlePredictor): AnalysisPredictor
Raises:
ValueError: predict by TensorRT need use_gpu == True.
"""
if not use_gpu and not run_mode == 'fluid':
raise ValueError(
"Predict by TensorRT mode: {}, expect use_gpu==True, but use_gpu == {}"
.format(run_mode, use_gpu))
if run_mode == 'trt_int8':
raise ValueError("TensorRT int8 mode is not supported now, "
"please use trt_fp32 or trt_fp16 instead.")
precision_map = {
'trt_int8': fluid.core.AnalysisConfig.Precision.Int8,
'trt_fp32': fluid.core.AnalysisConfig.Precision.Float32,
'trt_fp16': fluid.core.AnalysisConfig.Precision.Half
}
config = fluid.core.AnalysisConfig(
os.path.join(model_dir, '__model__'),
os.path.join(model_dir, '__params__'))
if use_gpu:
# initial GPU memory(M), device ID
config.enable_use_gpu(100, 0)
# optimize graph and fuse op
config.switch_ir_optim(True)
else:
config.disable_gpu()
if run_mode in precision_map.keys():
config.enable_tensorrt_engine(
workspace_size=1 << 10,
max_batch_size=batch_size,
min_subgraph_size=min_subgraph_size,
precision_mode=precision_map[run_mode],
use_static=False,
use_calib_mode=False)
# disable print log when predict
config.disable_glog_info()
# enable shared memory
config.enable_memory_optim()
# disable feed, fetch OP, needed by zero_copy_run
config.switch_use_feed_fetch_ops(False)
predictor = fluid.core.create_paddle_predictor(config)
return predictor
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import time
from functools import reduce
import cv2
import numpy as np
from paddlehub.module.module import moduleinfo
import solov2.processor as P
import solov2.data_feed as D
class Detector(object):
"""
Args:
model_dir (str): root path of __model__, __params__ and infer_cfg.yml
use_gpu (bool): whether use gpu
run_mode (str): mode of running(fluid/trt_fp32/trt_fp16)
threshold (float): threshold to reserve the result for output.
"""
def __init__(self,
min_subgraph_size=60,
use_gpu=False,
run_mode='fluid',
threshold=0.5):
model_dir = os.path.join(self.directory, 'solov2_r101_vd_fpn_3x')
self.predictor = D.load_predictor(
model_dir,
run_mode=run_mode,
min_subgraph_size=min_subgraph_size,
use_gpu=use_gpu)
self.compose = [
P.Resize(max_size=1333), P.Normalize(
mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
P.Permute(), P.PadStride(stride=32)
]
def transform(self, im):
im, im_info = P.preprocess(im, self.compose)
inputs = D.create_inputs(im, im_info)
return inputs, im_info
def postprocess(self, np_boxes, np_masks, im_info, threshold=0.5):
# postprocess output of predictor
results = {}
expect_boxes = (np_boxes[:, 1] > threshold) & (np_boxes[:, 0] > -1)
np_boxes = np_boxes[expect_boxes, :]
for box in np_boxes:
print('class_id:{:d}, confidence:{:.4f},'
'left_top:[{:.2f},{:.2f}],'
' right_bottom:[{:.2f},{:.2f}]'.format(
int(box[0]), box[1], box[2], box[3], box[4], box[5]))
results['boxes'] = np_boxes
if np_masks is not None:
np_masks = np_masks[expect_boxes, :, :, :]
results['masks'] = np_masks
return results
def predict(self, image, threshold=0.5, warmup=0, repeats=1):
'''
Args:
image (str/np.ndarray): path of image/ np.ndarray read by cv2
threshold (float): threshold of predicted box' score
Returns:
results (dict): include 'boxes': np.ndarray: shape:[N,6], N: number of box,
matix element:[class, score, x_min, y_min, x_max, y_max]
MaskRCNN's results include 'masks': np.ndarray:
shape:[N, class_num, mask_resolution, mask_resolution]
'''
inputs, im_info = self.transform(image)
np_boxes, np_masks = None, None
input_names = self.predictor.get_input_names()
for i in range(len(input_names)):
input_tensor = self.predictor.get_input_tensor(input_names[i])
input_tensor.copy_from_cpu(inputs[input_names[i]])
for i in range(warmup):
self.predictor.zero_copy_run()
output_names = self.predictor.get_output_names()
boxes_tensor = self.predictor.get_output_tensor(output_names[0])
np_boxes = boxes_tensor.copy_to_cpu()
for i in range(repeats):
self.predictor.zero_copy_run()
output_names = self.predictor.get_output_names()
boxes_tensor = self.predictor.get_output_tensor(output_names[0])
np_boxes = boxes_tensor.copy_to_cpu()
# do not perform postprocess in benchmark mode
results = []
if reduce(lambda x, y: x * y, np_boxes.shape) < 6:
print('[WARNNING] No object detected.')
results = {'boxes': np.array([])}
else:
results = self.postprocess(
np_boxes, np_masks, im_info, threshold=threshold)
return results
@moduleinfo(
name="solov2",
type="CV/image_editing",
author="paddlepaddle",
author_email="",
summary="solov2 is a detection model, this module is trained with COCO dataset.",
version="1.0.0")
class DetectorSOLOv2(Detector):
def __init__(self, use_gpu=False, run_mode='fluid', threshold=0.5):
super(DetectorSOLOv2, self).__init__(
use_gpu=use_gpu, run_mode=run_mode, threshold=threshold)
def predict(self,
image,
threshold=0.5,
warmup=0,
repeats=1,
visualization=False,
save_dir='solov2_result'):
inputs, im_info = self.transform(image)
np_label, np_score, np_segms = None, None, None
input_names = self.predictor.get_input_names()
for i in range(len(input_names)):
input_tensor = self.predictor.get_input_tensor(input_names[i])
input_tensor.copy_from_cpu(inputs[input_names[i]])
for i in range(warmup):
self.predictor.zero_copy_run()
output_names = self.predictor.get_output_names()
np_label = self.predictor.get_output_tensor(output_names[
0]).copy_to_cpu()
np_score = self.predictor.get_output_tensor(output_names[
1]).copy_to_cpu()
np_segms = self.predictor.get_output_tensor(output_names[
2]).copy_to_cpu()
for i in range(repeats):
self.predictor.zero_copy_run()
output_names = self.predictor.get_output_names()
np_label = self.predictor.get_output_tensor(output_names[
0]).copy_to_cpu()
np_score = self.predictor.get_output_tensor(output_names[
1]).copy_to_cpu()
np_segms = self.predictor.get_output_tensor(output_names[
2]).copy_to_cpu()
output = dict(segm=np_segms, label=np_label, score=np_score)
if visualization:
if not os.path.exists(save_dir):
os.makedirs(save_dir)
image = D.visualize_box_mask(im=image, results=output)
name = str(time.time()) + '.png'
save_path = os.path.join(save_dir, name)
image.save(save_path)
img = cv2.cvtColor(np.array(image), cv2.COLOR_RGB2BGR)
output['image'] = img
return output
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from PIL import Image
import cv2
import numpy as np
def decode_image(im_file, im_info):
"""read rgb image
Args:
im_file (str/np.ndarray): path of image/ np.ndarray read by cv2
im_info (dict): info of image
Returns:
im (np.ndarray): processed image (np.ndarray)
im_info (dict): info of processed image
"""
if isinstance(im_file, str):
with open(im_file, 'rb') as f:
im_read = f.read()
data = np.frombuffer(im_read, dtype='uint8')
im = cv2.imdecode(data, 1) # BGR mode, but need RGB mode
im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
im_info['origin_shape'] = im.shape[:2]
im_info['resize_shape'] = im.shape[:2]
else:
im = im_file
im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
im_info['origin_shape'] = im.shape[:2]
im_info['resize_shape'] = im.shape[:2]
return im, im_info
class Resize(object):
"""resize image by target_size and max_size
Args:
arch (str): model type
target_size (int): the target size of image
max_size (int): the max size of image
use_cv2 (bool): whether us cv2
image_shape (list): input shape of model
interp (int): method of resize
"""
def __init__(self,
target_size=800,
max_size=1333,
use_cv2=True,
image_shape=None,
interp=cv2.INTER_LINEAR,
resize_box=False):
self.target_size = target_size
self.max_size = max_size
self.image_shape = image_shape
self.use_cv2 = use_cv2
self.interp = interp
def __call__(self, im, im_info):
"""
Args:
im (np.ndarray): image (np.ndarray)
im_info (dict): info of image
Returns:
im (np.ndarray): processed image (np.ndarray)
im_info (dict): info of processed image
"""
im_channel = im.shape[2]
im_scale_x, im_scale_y = self.generate_scale(im)
im_info['resize_shape'] = [
im_scale_x * float(im.shape[0]), im_scale_y * float(im.shape[1])
]
if self.use_cv2:
im = cv2.resize(
im,
None,
None,
fx=im_scale_x,
fy=im_scale_y,
interpolation=self.interp)
else:
resize_w = int(im_scale_x * float(im.shape[1]))
resize_h = int(im_scale_y * float(im.shape[0]))
if self.max_size != 0:
raise TypeError(
'If you set max_size to cap the maximum size of image,'
'please set use_cv2 to True to resize the image.')
im = im.astype('uint8')
im = Image.fromarray(im)
im = im.resize((int(resize_w), int(resize_h)), self.interp)
im = np.array(im)
# padding im when image_shape fixed by infer_cfg.yml
if self.max_size != 0 and self.image_shape is not None:
padding_im = np.zeros(
(self.max_size, self.max_size, im_channel), dtype=np.float32)
im_h, im_w = im.shape[:2]
padding_im[:im_h, :im_w, :] = im
im = padding_im
im_info['scale'] = [im_scale_x, im_scale_y]
return im, im_info
def generate_scale(self, im):
"""
Args:
im (np.ndarray): image (np.ndarray)
Returns:
im_scale_x: the resize ratio of X
im_scale_y: the resize ratio of Y
"""
origin_shape = im.shape[:2]
im_c = im.shape[2]
if self.max_size != 0:
im_size_min = np.min(origin_shape[0:2])
im_size_max = np.max(origin_shape[0:2])
im_scale = float(self.target_size) / float(im_size_min)
if np.round(im_scale * im_size_max) > self.max_size:
im_scale = float(self.max_size) / float(im_size_max)
im_scale_x = im_scale
im_scale_y = im_scale
else:
im_scale_x = float(self.target_size) / float(origin_shape[1])
im_scale_y = float(self.target_size) / float(origin_shape[0])
return im_scale_x, im_scale_y
class Normalize(object):
"""normalize image
Args:
mean (list): im - mean
std (list): im / std
is_scale (bool): whether need im / 255
is_channel_first (bool): if True: image shape is CHW, else: HWC
"""
def __init__(self, mean, std, is_scale=True, is_channel_first=False):
self.mean = mean
self.std = std
self.is_scale = is_scale
self.is_channel_first = is_channel_first
def __call__(self, im, im_info):
"""
Args:
im (np.ndarray): image (np.ndarray)
im_info (dict): info of image
Returns:
im (np.ndarray): processed image (np.ndarray)
im_info (dict): info of processed image
"""
im = im.astype(np.float32, copy=False)
if self.is_channel_first:
mean = np.array(self.mean)[:, np.newaxis, np.newaxis]
std = np.array(self.std)[:, np.newaxis, np.newaxis]
else:
mean = np.array(self.mean)[np.newaxis, np.newaxis, :]
std = np.array(self.std)[np.newaxis, np.newaxis, :]
if self.is_scale:
im = im / 255.0
im -= mean
im /= std
return im, im_info
class Permute(object):
"""permute image
Args:
to_bgr (bool): whether convert RGB to BGR
channel_first (bool): whether convert HWC to CHW
"""
def __init__(self, to_bgr=False, channel_first=True):
self.to_bgr = to_bgr
self.channel_first = channel_first
def __call__(self, im, im_info):
"""
Args:
im (np.ndarray): image (np.ndarray)
im_info (dict): info of image
Returns:
im (np.ndarray): processed image (np.ndarray)
im_info (dict): info of processed image
"""
if self.channel_first:
im = im.transpose((2, 0, 1)).copy()
if self.to_bgr:
im = im[[2, 1, 0], :, :]
return im, im_info
class PadStride(object):
""" padding image for model with FPN
Args:
stride (bool): model with FPN need image shape % stride == 0
"""
def __init__(self, stride=0):
self.coarsest_stride = stride
def __call__(self, im, im_info):
"""
Args:
im (np.ndarray): image (np.ndarray)
im_info (dict): info of image
Returns:
im (np.ndarray): processed image (np.ndarray)
im_info (dict): info of processed image
"""
coarsest_stride = self.coarsest_stride
if coarsest_stride == 0:
return im
im_c, im_h, im_w = im.shape
pad_h = int(np.ceil(float(im_h) / coarsest_stride) * coarsest_stride)
pad_w = int(np.ceil(float(im_w) / coarsest_stride) * coarsest_stride)
padding_im = np.zeros((im_c, pad_h, pad_w), dtype=np.float32)
padding_im[:, :im_h, :im_w] = im
im_info['pad_shape'] = padding_im.shape[1:]
return padding_im, im_info
def preprocess(im, preprocess_ops):
# process image by preprocess_ops
im_info = {
'scale': [1., 1.],
'origin_shape': None,
'resize_shape': None,
'pad_shape': None,
}
im, im_info = decode_image(im, im_info)
count = 0
for operator in preprocess_ops:
count += 1
im, im_info = operator(im, im_info)
im = np.array((im, )).astype('float32')
return im, im_info
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import cv2
import math
import numpy as np
HAT_SCALES = {
'1.png': [3.0, 0.9, .0],
'2.png': [3.0, 1.3, .5],
'3.png': [2.2, 1.5, .8],
'4.png': [2.2, 1.8, .0],
'5.png': [1.8, 1.2, .0],
}
GLASSES_SCALES = {
'1.png': [0.65, 2.5],
'2.png': [0.65, 2.5],
}
BEARD_SCALES = {'1.png': [700, 0.3], '2.png': [220, 0.2]}
def rotate(image, angle):
"""
angle is degree, not radian
"""
(h, w) = image.shape[:2]
(cx, cy) = (w / 2, h / 2)
M = cv2.getRotationMatrix2D((cx, cy), -angle, 1.0)
cos = np.abs(M[0, 0])
sin = np.abs(M[0, 1])
nW = int((h * sin) + (w * cos))
nH = int((h * cos) + (w * sin))
M[0, 2] += (nW / 2) - cx
M[1, 2] += (nH / 2) - cy
return cv2.warpAffine(image, M, (nW, nH))
def n_rotate_coord(angle, x, y):
"""
angle is radian, not degree
"""
rotatex = math.cos(angle) * x - math.sin(angle) * y
rotatey = math.cos(angle) * y + math.sin(angle) * x
return rotatex, rotatey
def r_rotate_coord(angle, x, y):
"""
angle is radian, not degree
"""
rotatex = math.cos(angle) * x + math.sin(angle) * y
rotatey = math.cos(angle) * y - math.sin(angle) * x
return rotatex, rotatey
def add_beard(person, kypoint, element_path):
beard_file_name = os.path.split(element_path)[1]
# element_len: top width of beard
# loc_offset_scale: scale relative to nose
element_len, loc_offset_scale = BEARD_SCALES[beard_file_name][:]
x1, y1, x2, y2, x3, y3, x4, y4, x5, y5 = kypoint[:]
mouth_len = np.sqrt(np.square(np.abs(y4 - y5)) + np.square(x4 - x5))
element = cv2.imread(element_path)
h, w, _ = element.shape
resize_scale = mouth_len / float(element_len)
h, w = round(h * resize_scale + 0.5), round(w * resize_scale + 0.5)
resized_element = cv2.resize(element, (w, h))
resized_ele_h, resized_ele_w, _ = resized_element.shape
# First find the keypoint of mouth in front face
m_center_x = (x4 + x5) / 2.
m_center_y = (y4 + y5) / 2.
# cal degree only according mouth coordinates
degree = np.arccos((x4 - x5) / mouth_len)
# coordinate of RoI in front face
half_w = int(resized_ele_w // 2)
scale = loc_offset_scale
roi_top_left_y = int(y3 + (((y5 + y4) // 2) - y3) * scale)
roi_top_left_x = int(x3 - half_w)
roi_top_right_y = roi_top_left_y
roi_top_right_x = int(x3 + half_w)
roi_bottom_left_y = roi_top_left_y + resized_ele_h
roi_bottom_left_x = roi_top_left_x
roi_bottom_right_y = roi_bottom_left_y
roi_bottom_right_x = roi_top_right_x
r_x11, r_y11 = roi_top_left_x - x3, roi_top_left_y - y3
r_x12, r_y12 = roi_top_right_x - x3, roi_top_right_y - y3
r_x21, r_y21 = roi_bottom_left_x - x3, roi_bottom_left_y - y3
r_x22, r_y22 = roi_bottom_right_x - x3, roi_bottom_right_y - y3
# coordinate of RoI in raw face
if m_center_x > x3:
x11, y11 = r_rotate_coord(degree, r_x11, r_y11)
x12, y12 = r_rotate_coord(degree, r_x12, r_y12)
x21, y21 = r_rotate_coord(degree, r_x21, r_y21)
x22, y22 = r_rotate_coord(degree, r_x22, r_y22)
else:
x11, y11 = n_rotate_coord(degree, r_x11, r_y11)
x12, y12 = n_rotate_coord(degree, r_x12, r_y12)
x21, y21 = n_rotate_coord(degree, r_x21, r_y21)
x22, y22 = n_rotate_coord(degree, r_x22, r_y22)
x11, y11 = x11 + x3, y11 + y3
x12, y12 = x12 + x3, y12 + y3
x21, y21 = x21 + x3, y21 + y3
x22, y22 = x22 + x3, y22 + y3
min_x = int(min(x11, x12, x21, x22))
max_x = int(max(x11, x12, x21, x22))
min_y = int(min(y11, y12, y21, y22))
max_y = int(max(y11, y12, y21, y22))
angle = np.degrees(degree)
if y4 < y5:
angle = -angle
rotated_element = rotate(resized_element, angle)
rotated_ele_h, rotated_ele_w, _ = rotated_element.shape
max_x = min_x + int(rotated_ele_w)
max_y = min_y + int(rotated_ele_h)
e2gray = cv2.cvtColor(rotated_element, cv2.COLOR_BGR2GRAY)
ret, mask = cv2.threshold(e2gray, 238, 255, cv2.THRESH_BINARY_INV)
mask_inv = cv2.bitwise_not(mask)
roi = person[min_y:max_y, min_x:max_x]
person_bg = cv2.bitwise_and(roi, roi, mask=mask)
element_fg = cv2.bitwise_and(
rotated_element, rotated_element, mask=mask_inv)
dst = cv2.add(person_bg, element_fg)
person[min_y:max_y, min_x:max_x] = dst
return person
def add_hat(person, kypoint, element_path):
x1, y1, x2, y2, x3, y3, x4, y4, x5, y5 = kypoint[:]
eye_len = np.sqrt(np.square(np.abs(y1 - y2)) + np.square(np.abs(x1 - x2)))
# cal degree only according eye coordinates
degree = np.arccos((x2 - x1) / eye_len)
angle = np.degrees(degree)
if y2 < y1:
angle = -angle
element = cv2.imread(element_path)
hat_file_name = os.path.split(element_path)[1]
# head_scale: size scale of hat
# high_scale: height scale above the eyes
# offect_scale: width offect of hat in face
head_scale, high_scale, offect_scale = HAT_SCALES[hat_file_name][:]
h, w, _ = element.shape
element_len = w
resize_scale = eye_len * head_scale / float(w)
h, w = round(h * resize_scale + 0.5), round(w * resize_scale + 0.5)
resized_element = cv2.resize(element, (w, h))
resized_ele_h, resized_ele_w, _ = resized_element.shape
m_center_x = (x1 + x2) / 2.
m_center_y = (y1 + y2) / 2.
head_len = int(eye_len * high_scale)
if angle > 0:
head_center_x = int(m_center_x + head_len * math.sin(degree))
head_center_y = int(m_center_y - head_len * math.cos(degree))
else:
head_center_x = int(m_center_x + head_len * math.sin(degree))
head_center_y = int(m_center_y - head_len * math.cos(degree))
rotated_element = rotate(resized_element, angle)
rotated_ele_h, rotated_ele_w, _ = rotated_element.shape
max_x = int(head_center_x + (resized_ele_w // 2) * math.cos(degree)) + int(
angle * head_scale) + int(eye_len * offect_scale)
min_y = int(head_center_y - (resized_ele_w // 2) * math.cos(degree))
pad_ele_x0 = 0 if (max_x - int(rotated_ele_w)) > 0 else -(
max_x - int(rotated_ele_w))
pad_ele_y0 = 0 if min_y > 0 else -(min_y)
min_x = int(max(max_x - int(rotated_ele_w), 0))
min_y = int(max(min_y, 0))
max_y = min_y + int(rotated_ele_h)
pad_y1 = max(max_y - int(person.shape[0]), 0)
pad_x1 = max(max_x - int(person.shape[1]), 0)
pad_w = pad_ele_x0 + pad_x1
pad_h = pad_ele_y0 + pad_y1
max_x += pad_w
pad_person = np.zeros(
(person.shape[0] + pad_h, person.shape[1] + pad_w, 3)).astype(np.uint8)
pad_person[pad_ele_y0:pad_ele_y0 + person.shape[0], pad_ele_x0:pad_ele_x0 +
person.shape[1], :] = person
e2gray = cv2.cvtColor(rotated_element, cv2.COLOR_BGR2GRAY)
ret, mask = cv2.threshold(e2gray, 1, 255, cv2.THRESH_BINARY_INV)
mask_inv = cv2.bitwise_not(mask)
roi = pad_person[min_y:max_y, min_x:max_x]
person_bg = cv2.bitwise_and(roi, roi, mask=mask)
element_fg = cv2.bitwise_and(
rotated_element, rotated_element, mask=mask_inv)
dst = cv2.add(person_bg, element_fg)
pad_person[min_y:max_y, min_x:max_x] = dst
return pad_person, pad_ele_x0, pad_x1, pad_ele_y0, pad_y1, min_x, min_y, max_x, max_y
def add_glasses(person, kypoint, element_path):
x1, y1, x2, y2, x3, y3, x4, y4, x5, y5 = kypoint[:]
eye_len = np.sqrt(np.square(np.abs(y1 - y2)) + np.square(np.abs(x1 - x2)))
# cal degree only according eye coordinates
degree = np.arccos((x2 - x1) / eye_len)
angle = np.degrees(degree)
if y2 < y1:
angle = -angle
element = cv2.imread(element_path)
glasses_file_name = os.path.split(element_path)[1]
# height_scale: height scale above the eyes
# glasses_scale: size ratio of glasses
height_scale, glasses_scale = GLASSES_SCALES[glasses_file_name][:]
h, w, _ = element.shape
element_len = w
resize_scale = eye_len * glasses_scale / float(element_len)
h, w = round(h * resize_scale + 0.5), round(w * resize_scale + 0.5)
resized_element = cv2.resize(element, (w, h))
resized_ele_h, resized_ele_w, _ = resized_element.shape
rotated_element = rotate(resized_element, angle)
rotated_ele_h, rotated_ele_w, _ = rotated_element.shape
eye_center_x = (x1 + x2) / 2.
eye_center_y = (y1 + y2) / 2.
min_x = int(eye_center_x) - int(rotated_ele_w * 0.5) + int(
angle * glasses_scale * person.shape[1] / 2000)
min_y = int(eye_center_y) - int(rotated_ele_h * height_scale)
max_x = min_x + rotated_ele_w
max_y = min_y + rotated_ele_h
e2gray = cv2.cvtColor(rotated_element, cv2.COLOR_BGR2GRAY)
ret, mask = cv2.threshold(e2gray, 1, 255, cv2.THRESH_BINARY_INV)
mask_inv = cv2.bitwise_not(mask)
roi = person[min_y:max_y, min_x:max_x]
person_bg = cv2.bitwise_and(roi, roi, mask=mask)
element_fg = cv2.bitwise_and(
rotated_element, rotated_element, mask=mask_inv)
dst = cv2.add(person_bg, element_fg)
person[min_y:max_y, min_x:max_x] = dst
return person
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import base64
import json
import cv2
import numpy as np
import paddle.nn as nn
import paddlehub as hub
from paddlehub.module.module import moduleinfo, serving
import solov2_blazeface.processor as P
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
@moduleinfo(
name="solov2_blazeface",
type="CV/image_editing",
author="paddlepaddle",
author_email="",
summary="solov2_blaceface is a segmentation and face detection model based on solov2 and blaceface.",
version="1.0.0")
class SoloV2BlazeFaceModel(nn.Layer):
"""
SoloV2BlazeFaceModel
"""
def __init__(self, use_gpu=True):
super(SoloV2BlazeFaceModel, self).__init__()
self.solov2 = hub.Module(name='solov2', use_gpu=use_gpu)
self.blaceface = hub.Module(name='blazeface', use_gpu=use_gpu)
def predict(self,
image,
background,
beard_file=None,
glasses_file=None,
hat_file=None,
visualization=False,
threshold=0.5):
# instance segmention
solov2_output = self.solov2.predict(
image=image, threshold=threshold, visualization=visualization)
# Set background pixel to 0
im_segm, x0, x1, y0, y1, _, _, _, _, flag_seg = P.visualize_box_mask(
image, solov2_output, threshold=threshold)
if flag_seg == 0:
return im_segm
h, w = y1 - y0, x1 - x0
back_json = background[:-3] + 'json'
stand_box = json.load(open(back_json))
stand_box = stand_box['outputs']['object'][0]['bndbox']
stand_xmin, stand_xmax, stand_ymin, stand_ymax = stand_box[
'xmin'], stand_box['xmax'], stand_box['ymin'], stand_box['ymax']
im_path = np.asarray(im_segm)
# face detection
blaceface_output = self.blaceface.predict(
image=im_path, threshold=threshold, visualization=visualization)
im_face_kp, p_left, p_right, p_up, p_bottom, h_xmin, h_ymin, h_xmax, h_ymax, flag_face = P.visualize_box_mask(
im_path,
blaceface_output,
threshold=threshold,
beard_file=beard_file,
glasses_file=glasses_file,
hat_file=hat_file)
if flag_face == 1:
if x0 > h_xmin:
shift_x_ = x0 - h_xmin
else:
shift_x_ = 0
if y0 > h_ymin:
shift_y_ = y0 - h_ymin
else:
shift_y_ = 0
h += p_up + p_bottom + shift_y_
w += p_left + p_right + shift_x_
x0 = min(x0, h_xmin)
y0 = min(y0, h_ymin)
x1 = max(x1, h_xmax) + shift_x_ + p_left + p_right
y1 = max(y1, h_ymax) + shift_y_ + p_up + p_bottom
# Fill the background image
cropped = im_face_kp.crop((x0, y0, x1, y1))
resize_scale = min((stand_xmax - stand_xmin) / (x1 - x0),
(stand_ymax - stand_ymin) / (y1 - y0))
h, w = int(h * resize_scale), int(w * resize_scale)
cropped = cropped.resize((w, h), cv2.INTER_LINEAR)
cropped = cv2.cvtColor(np.asarray(cropped), cv2.COLOR_RGB2BGR)
shift_x = int((stand_xmax - stand_xmin - cropped.shape[1]) / 2)
shift_y = int((stand_ymax - stand_ymin - cropped.shape[0]) / 2)
out_image = cv2.imread(background)
e2gray = cv2.cvtColor(cropped, cv2.COLOR_BGR2GRAY)
ret, mask = cv2.threshold(e2gray, 1, 255, cv2.THRESH_BINARY_INV)
mask_inv = cv2.bitwise_not(mask)
roi = out_image[stand_ymin + shift_y:stand_ymin + cropped.shape[
0] + shift_y, stand_xmin + shift_x:stand_xmin + cropped.shape[1] +
shift_x]
person_bg = cv2.bitwise_and(roi, roi, mask=mask)
element_fg = cv2.bitwise_and(cropped, cropped, mask=mask_inv)
dst = cv2.add(person_bg, element_fg)
out_image[stand_ymin + shift_y:stand_ymin + cropped.shape[
0] + shift_y, stand_xmin + shift_x:stand_xmin + cropped.shape[1] +
shift_x] = dst
return out_image
@serving
def serving_method(self, images, background, beard, glasses, hat, **kwargs):
"""
Run as a service.
"""
final = {}
background_path = os.path.join(
self.directory,
'element_source/background/{}.png'.format(background))
beard_path = os.path.join(self.directory,
'element_source/beard/{}.png'.format(beard))
glasses_path = os.path.join(
self.directory, 'element_source/glasses/{}.png'.format(glasses))
hat_path = os.path.join(self.directory,
'element_source/hat/{}.png'.format(hat))
images_decode = base64_to_cv2(images[0])
output = self.predict(
image=images_decode,
background=background_path,
hat_file=hat_path,
beard_file=beard_path,
glasses_file=glasses_path,
**kwargs)
final['image'] = cv2_to_base64(output)
return final
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import division
import cv2
import numpy as np
from PIL import Image
import solov2_blazeface.face_makeup_main as face_makeup_main
def visualize_box_mask(im,
results,
threshold=0.5,
beard_file=None,
glasses_file=None,
hat_file=None):
if isinstance(im, str):
im = Image.open(im).convert('RGB')
else:
im = Image.fromarray(im)
if 'segm' in results:
im, x0, x1, y0, y1, flag_seg = draw_segm(
im,
results['segm'],
results['label'],
results['score'],
threshold=threshold)
return im, x0, x1, y0, y1, 0, 0, 0, 0, flag_seg
if 'landmark' in results:
im, left, right, up, bottom, h_xmin, h_ymin, h_xmax, h_ymax, flag_face = trans_lmk(
im, results['landmark'], beard_file, glasses_file, hat_file)
return im, left, right, up, bottom, h_xmin, h_ymin, h_xmax, h_ymax, flag_face
else:
return im, 0, 0, 0, 0, 0, 0, 0, 0, 0
def draw_segm(im, np_segms, np_label, np_score, threshold=0.5, alpha=0.7):
"""
Draw segmentation on image
"""
im = np.array(im).astype('float32')
np_segms = np_segms.astype(np.uint8)
index_label = np.where(np_label == 0)[0]
index = np.where(np_score[index_label] > threshold)[0]
index = index_label[index]
if index.size == 0:
im = Image.fromarray(im.astype('uint8'))
return im, 0, 0, 0, 0, 0
person_segms = np_segms[index]
person_mask_single_channel = np.sum(person_segms, axis=0)
person_mask_single_channel[person_mask_single_channel > 1] = 1
person_mask = np.expand_dims(person_mask_single_channel, axis=2)
person_mask = np.repeat(person_mask, 3, axis=2)
im = im * person_mask
sum_x = np.sum(person_mask_single_channel, axis=0)
x = np.where(sum_x > 0.5)[0]
sum_y = np.sum(person_mask_single_channel, axis=1)
y = np.where(sum_y > 0.5)[0]
x0, x1, y0, y1 = x[0], x[-1], y[0], y[-1]
return Image.fromarray(im.astype('uint8')), x0, x1, y0, y1, 1
def lmk2out(bboxes, np_lmk, im_info, threshold=0.5, is_bbox_normalized=True):
image_w, image_h = im_info['origin_shape']
scale = im_info['scale']
face_index, landmark, prior_box = np_lmk[:]
xywh_res = []
if bboxes.shape == (1, 1) or bboxes is None:
return np.array([])
prior = np.reshape(prior_box, (-1, 4))
predict_lmk = np.reshape(landmark, (-1, 10))
k = 0
for i in range(bboxes.shape[0]):
score = bboxes[i][1]
if score < threshold:
continue
theindex = face_index[i][0]
me_prior = prior[theindex, :]
lmk_pred = predict_lmk[theindex, :]
prior_h = me_prior[2] - me_prior[0]
prior_w = me_prior[3] - me_prior[1]
prior_h_center = (me_prior[2] + me_prior[0]) / 2
prior_w_center = (me_prior[3] + me_prior[1]) / 2
lmk_decode = np.zeros((10))
for j in [0, 2, 4, 6, 8]:
lmk_decode[j] = lmk_pred[j] * 0.1 * prior_w + prior_h_center
for j in [1, 3, 5, 7, 9]:
lmk_decode[j] = lmk_pred[j] * 0.1 * prior_h + prior_w_center
if is_bbox_normalized:
lmk_decode = lmk_decode * np.array([
image_h, image_w, image_h, image_w, image_h, image_w, image_h,
image_w, image_h, image_w
])
xywh_res.append(lmk_decode)
return np.asarray(xywh_res)
def post_processing(image, lmk_decode, hat_path, beard_path, glasses_path):
image = cv2.cvtColor(np.asarray(image), cv2.COLOR_RGB2BGR)
p_left, p_right, p_up, p_bottom, h_xmax, h_ymax = [0] * 6
h_xmin, h_ymin = 10000, 10000
# Add beard on the face
if beard_path is not None:
image = face_makeup_main.add_beard(image, lmk_decode, beard_path)
# Add glasses on the face
if glasses_path is not None:
image = face_makeup_main.add_glasses(image, lmk_decode, glasses_path)
# Add hat on the face
if hat_path is not None:
image, p_left, p_right, p_up, p_bottom, h_xmin, h_ymin, h_xmax, h_ymax = face_makeup_main.add_hat(
image, lmk_decode, hat_path)
image = Image.fromarray(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
print('----------- Post Processing Success -----------')
return image, p_left, p_right, p_up, p_bottom, h_xmin, h_ymin, h_xmax, h_ymax
def trans_lmk(image, lmk_results, beard_file, glasses_file, hat_file):
p_left, p_right, p_up, p_bottom, h_xmax, h_ymax = [0] * 6
h_xmin, h_ymin = 10000, 10000
if lmk_results.shape[0] == 0:
return image, p_left, p_right, p_up, p_bottom, h_xmin, h_ymin, h_xmax, h_ymax, 0
for lmk_decode in lmk_results:
x1, y1, x2, y2 = lmk_decode[0], lmk_decode[1], lmk_decode[
2], lmk_decode[3]
x4, y4, x5, y5 = lmk_decode[6], lmk_decode[7], lmk_decode[
8], lmk_decode[9]
# Refine the order of keypoint
if x1 > x2:
lmk_decode[0], lmk_decode[1], lmk_decode[2], lmk_decode[
3] = lmk_decode[2], lmk_decode[3], lmk_decode[0], lmk_decode[1]
if x4 < x5:
lmk_decode[6], lmk_decode[7], lmk_decode[8], lmk_decode[
9] = lmk_decode[8], lmk_decode[9], lmk_decode[6], lmk_decode[7]
# Add decoration to the face
image, p_left_temp, p_right_temp, p_up_temp, p_bottom_temp, h_xmin_temp, h_ymin_temp, h_xmax_temp, h_ymax_temp = post_processing(
image, lmk_decode, hat_file, beard_file, glasses_file)
p_left = max(p_left, p_left_temp)
p_right = max(p_right, p_right_temp)
p_up = max(p_up, p_up_temp)
p_bottom = max(p_bottom, p_bottom_temp)
h_xmin = min(h_xmin, h_xmin_temp)
h_ymin = min(h_ymin, h_ymin_temp)
h_xmax = max(h_xmax, h_xmax_temp)
h_ymax = max(h_ymax, h_ymax_temp)
return image, p_left, p_right, p_up, p_bottom, h_xmin, h_ymin, h_xmax, h_ymax, 1
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import paddlehub as hub
import cv2
img_file = 'demo_images/test.jpg'
background = 'element_source/background/1.png'
beard_file = 'element_source/beard/1.png'
glasses_file = 'element_source/glasses/4.png'
hat_file = 'element_source/hat/1.png'
model = hub.Module(name='solov2_blazeface', use_gpu=True)
output = model.predict(
image=img_file,
background=background,
hat_file=hat_file,
beard_file=beard_file,
glasses_file=glasses_file,
visualization=True)
cv2.imwrite("chrismas_final.png", output)
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import requests
import json
import cv2
import base64
import time
import numpy as np
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
# Send HTTP request
org_im = cv2.cvtColor(cv2.imread('demo_images/test.jpg'), cv2.COLOR_BGR2RGB)
h, w, c = org_im.shape
hat_ids = 1
data = {
'images': [cv2_to_base64(org_im)],
'background': 3,
"beard": 2,
"glasses": 3,
"hat": 3
}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8880/predict/solov2_blazeface"
start = time.time()
r = requests.post(url=url, headers=headers, data=json.dumps(data))
end = time.time()
print('cost:', end - start)
result = base64_to_cv2(r.json()["results"]['image'])
cv2.imwrite("chrismas_final.png", result)
# Attention-guided Context Feature Pyramid Network for Object Detection
## Introduction
- Attention-guided Context Feature Pyramid Network for Object Detection: [https://arxiv.org/abs/2005.11475](https://arxiv.org/abs/2005.11475)
```
Cao J, Chen Q, Guo J, et al. Attention-guided Context Feature Pyramid Network for Object Detection[J]. arXiv preprint arXiv:2005.11475, 2020.
```
## Model Zoo
| Backbone | Type | Image/gpu | Lr schd | Inf time (fps) | Box AP | Mask AP | Download | Configs |
| :---------------------- | :-------------: | :-------: | :-----: | :------------: | :----: | :-----: | :----------------------------------------------------------: | :-----: |
| ResNet50-vd-ACFPN | Faster | 2 | 1x | 23.432 | 39.6 | - | [model](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r50_vd_acfpn_1x.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/static/configs/acfpn/faster_rcnn_r50_vd_acfpn_1x.yml) |
architecture: FasterRCNN
max_iters: 90000
snapshot_iter: 10000
use_gpu: true
log_iter: 20
save_dir: output
pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_pretrained.tar
weights: output/faster_rcnn_r50_vd_acfpn_1x/model_final
metric: COCO
num_classes: 81
FasterRCNN:
backbone: ResNet
fpn: ACFPN
rpn_head: FPNRPNHead
roi_extractor: FPNRoIAlign
bbox_head: BBoxHead
bbox_assigner: BBoxAssigner
ResNet:
depth: 50
feature_maps: [2, 3, 4, 5]
freeze_at: 2
norm_type: bn
variant: d
ACFPN:
max_level: 6
min_level: 2
num_chan: 256
spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
norm_groups: 32
FPNRPNHead:
anchor_generator:
anchor_sizes: [32, 64, 128, 256, 512]
aspect_ratios: [0.5, 1.0, 2.0]
stride: [16.0, 16.0]
variance: [1.0, 1.0, 1.0, 1.0]
anchor_start_size: 32
max_level: 6
min_level: 2
num_chan: 256
rpn_target_assign:
rpn_batch_size_per_im: 256
rpn_fg_fraction: 0.5
rpn_negative_overlap: 0.3
rpn_positive_overlap: 0.7
rpn_straddle_thresh: 0.0
train_proposal:
min_size: 0.0
nms_thresh: 0.7
post_nms_top_n: 2000
pre_nms_top_n: 2000
test_proposal:
min_size: 0.0
nms_thresh: 0.7
post_nms_top_n: 1000
pre_nms_top_n: 1000
FPNRoIAlign:
canconical_level: 4
canonical_size: 224
max_level: 5
min_level: 2
box_resolution: 7
sampling_ratio: 2
BBoxAssigner:
batch_size_per_im: 512
bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
bg_thresh_hi: 0.5
bg_thresh_lo: 0.0
fg_fraction: 0.25
fg_thresh: 0.5
BBoxHead:
head: TwoFCHead
nms:
keep_top_k: 100
nms_threshold: 0.5
score_threshold: 0.05
TwoFCHead:
mlp_dim: 1024
LearningRate:
base_lr: 0.02
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [60000, 80000]
- !LinearWarmup
start_factor: 0.1
steps: 500
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0001
type: L2
_READER_: '../faster_fpn_reader.yml'
TrainReader:
batch_size: 2
# Anchor Free系列模型
## 内容
- [简介](#简介)
- [模型库与基线](#模型库与基线)
- [算法细节](#算法细节)
- [如何贡献代码](#如何贡献代码)
## 简介
目前主流的检测算法大体分为两类: single-stage和two-stage,其中single-stage的经典算法包括SSD, YOLO等,two-stage方法有RCNN系列模型,两大类算法在[PaddleDetection Model Zoo](../../docs/MODEL_ZOO.md)中均有给出,它们的共同特点是先定义一系列密集的,大小不等的anchor区域,再基于这些先验区域进行分类和回归,这种方式极大的受限于anchor自身的设计。随着CornerNet的提出,涌现了多种anchor free方法,PaddleDetection也集成了一系列anchor free算法。
## 模型库与基线
下表中展示了PaddleDetection当前支持的网络结构,具体细节请参考[算法细节](#算法细节)
| | ResNet50 | ResNet50-vd | Hourglass104 | DarkNet53
|:------------------------:|:--------:|:-------------:|:-------------:|:-------------:|
| [CornerNet-Squeeze](#CornerNet-Squeeze) | x | ✓ | ✓ |x |
| [FCOS](#FCOS) | ✓ | x | x | x |
| [TTFNet](#TTFNet) | x | x | x | ✓ |
### 模型库
#### COCO数据集上的mAP
| 网络结构 | 骨干网络 | 图片个数/GPU | 预训练模型 | mAP | FPS | 模型下载 | 配置文件 |
|:------------:|:--------:|:----:|:-------:|:-------:|:---------:|:----------:|:----------:|
| CornerNet-Squeeze | Hourglass104 | 14 | 无 | 34.5 | 35.5 | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/cornernet_squeeze_hg104.tar) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/anchor_free/cornernet_squeeze_hg104.yml) |
| CornerNet-Squeeze | ResNet50-vd | 14 | [faster\_rcnn\_r50\_vd\_fpn\_2x](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r50_vd_fpn_2x.tar) | 32.7 | 47.01 | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/cornernet_squeeze_r50_vd_fpn.tar) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/anchor_free/cornernet_squeeze_r50_vd_fpn.yml) |
| CornerNet-Squeeze-dcn | ResNet50-vd | 14 | [faster\_rcnn\_dcn\_r50\_vd\_fpn\_2x](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_dcn_r50_vd_fpn_2x.tar) | 34.9 | 40.43 | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/cornernet_squeeze_dcn_r50_vd_fpn.tar) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/anchor_free/cornernet_squeeze_dcn_r50_vd_fpn.yml) |
| CornerNet-Squeeze-dcn-mixup-cosine* | ResNet50-vd | 14 | [faster\_rcnn\_dcn\_r50\_vd\_fpn\_2x](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_dcn_r50_vd_fpn_2x.tar) | 38.2 | 39.70 | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/cornernet_squeeze_dcn_r50_vd_fpn_mixup_cosine.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/anchor_free/cornernet_squeeze_dcn_r50_vd_fpn_mixup_cosine.yml) |
| FCOS | ResNet50 | 2 | [ResNet50\_cos\_pretrained](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar) | 39.8 | 18.85 | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/fcos_r50_fpn_1x.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/anchor_free/fcos_r50_fpn_1x.yml) |
| FCOS+multiscale_train | ResNet50 | 2 | [ResNet50\_cos\_pretrained](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar) | 42.0 | 19.05 | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/fcos_r50_fpn_multiscale_2x.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/anchor_free/fcos_r50_fpn_multiscale_2x.yml) |
| FCOS+DCN | ResNet50 | 2 | [ResNet50\_cos\_pretrained](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar) | 44.4 | 13.66 | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/fcos_dcn_r50_fpn_1x.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/anchor_free/fcos_dcn_r50_fpn_1x.yml) |
| TTFNet | DarkNet53 | 12 | [DarkNet53_pretrained](https://paddle-imagenet-models-name.bj.bcebos.com/DarkNet53_pretrained.tar) | 32.9 | 85.92 | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/ttfnet_darknet.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/anchor_free/ttfnet_darknet.yml) |
**注意:**
- 模型FPS在Tesla V100单卡环境中通过tools/eval.py进行测试
- CornerNet-Squeeze要求使用PaddlePaddle1.8及以上版本或适当的develop版本
- CornerNet-Squeeze中使用ResNet结构的骨干网络时,加入了FPN结构,骨干网络的输出feature map采用FPN中的P3层输出。
- \*CornerNet-Squeeze-dcn-mixup-cosine是基于原版CornerNet-Squeeze优化效果最好的模型,在ResNet的骨干网络基础上增加mixup预处理和使用cosine_decay
- FCOS使用GIoU loss、用location分支预测centerness、左上右下角点偏移量归一化和ground truth中心匹配策略
- Cornernet-Squeeze模型依赖corner_pooling op,该op在```ppdet/ext_op```中编译得到,具体编译方式请参考[自定义OP的编译过程](../../ppdet/ext_op/README.md)
## 算法细节
### CornerNet-Squeeze
**简介:** [CornerNet-Squeeze](https://arxiv.org/abs/1904.08900)[Cornernet](https://arxiv.org/abs/1808.01244)基础上进行改进,预测目标框的左上角和右下角的位置,同时参考SqueezeNet和MobileNet的特点,优化了CornerNet骨干网络Hourglass-104,大幅提升了模型预测速度,相较于原版[YOLO-v3](https://arxiv.org/abs/1804.02767),在训练精度和推理速度上都具备一定优势。
**特点:**
- 使用corner_pooling获取候选框左上角和右下角的位置
- 替换Hourglass-104中的residual block为SqueezeNet中的fire-module
- 替换第二层3x3卷积为3x3深度可分离卷积
### FCOS
**简介:** [FCOS](https://arxiv.org/abs/1904.01355)是一种密集预测的anchor-free检测算法,使用RetinaNet的骨架,直接在feature map上回归目标物体的长宽,并预测物体的类别以及centerness(feature map上像素点离物体中心的偏移程度),centerness最终会作为权重来调整物体得分。
**特点:**
- 利用FPN结构在不同层预测不同scale的物体框,避免了同一feature map像素点处有多个物体框重叠的情况
- 通过center-ness单层分支预测当前点是否是目标中心,消除低质量误检
## TTFNet
**简介:** [TTFNet](https://arxiv.org/abs/1909.00700)是一种用于实时目标检测且对训练时间友好的网络,对CenterNet收敛速度慢的问题进行改进,提出了利用高斯核生成训练样本的新方法,有效的消除了anchor-free head中存在的模糊性。同时简单轻量化的网络结构也易于进行任务扩展。
**特点:**
- 结构简单,仅需要两个head检测目标位置和大小,并且去除了耗时的后处理操作
- 训练时间短,基于DarkNet53的骨干网路,V100 8卡仅需要训练2个小时即可达到较好的模型效果
## 如何贡献代码
我们非常欢迎您可以为PaddleDetection中的Anchor Free检测模型提供代码,您可以提交PR供我们review;也十分感谢您的反馈,可以提交相应issue,我们会及时解答。
architecture: CornerNetSqueeze
use_gpu: true
max_iters: 500000
log_iter: 20
save_dir: output
snapshot_iter: 10000
metric: COCO
pretrain_weights: https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_dcn_r50_vd_fpn_2x.tar
weights: output/cornernet_squeeze_dcn_r50_vd_fpn/model_final
num_classes: 80
stack: 1
CornerNetSqueeze:
backbone: ResNet
fpn: FPN
corner_head: CornerHead
ResNet:
norm_type: bn
depth: 50
feature_maps: [3, 4, 5]
freeze_at: 2
variant: d
dcn_v2_stages: [3, 4, 5]
FPN:
min_level: 3
max_level: 6
num_chan: 256
spatial_scale: [0.03125, 0.0625, 0.125]
CornerHead:
test_batch_size: 1
ae_threshold: 0.5
num_dets: 100
top_k: 20
PostProcess:
use_soft_nms: true
detections_per_im: 100
nms_thresh: 0.001
sigma: 0.5
LearningRate:
base_lr: 0.0005
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones:
- 400000
- 450000
- !LinearWarmup
start_factor: 0.
steps: 4000
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0005
type: L2
TrainReader:
inputs_def:
image_shape: [3, 511, 511]
fields: ['image', 'im_id', 'gt_bbox', 'gt_class', 'tl_heatmaps', 'br_heatmaps', 'tl_regrs', 'br_regrs', 'tl_tags', 'br_tags', 'tag_masks']
output_size: 64
dataset:
!COCODataSet
image_dir: train2017
anno_path: annotations/instances_train2017.json
dataset_dir: dataset/coco
with_background: false
sample_transforms:
- !DecodeImage
to_rgb: False
- !CornerCrop
input_size: 511
- !Resize
target_dim: 511
- !RandomFlipImage
prob: 0.5
- !CornerRandColor
saturation: 0.4
contrast: 0.4
brightness: 0.4
- !Lighting
eigval: [0.2141788, 0.01817699, 0.00341571]
eigvec: [[-0.58752847, -0.69563484, 0.41340352],
[-0.5832747, 0.00994535, -0.81221408],
[-0.56089297, 0.71832671, 0.41158938]]
- !NormalizeImage
mean: [0.40789654, 0.44719302, 0.47026115]
std: [0.28863828, 0.27408164, 0.2780983]
is_scale: False
is_channel_first: False
- !Permute
to_bgr: False
- !CornerTarget
output_size: [64, 64]
num_classes: 80
batch_size: 14
shuffle: true
drop_last: true
worker_num: 2
use_process: true
drop_empty: false
EvalReader:
inputs_def:
fields: ['image', 'im_id', 'ratios', 'borders']
output_size: 64
dataset:
!COCODataSet
image_dir: val2017
anno_path: annotations/instances_val2017.json
dataset_dir: dataset/coco
with_background: false
sample_transforms:
- !DecodeImage
to_rgb: false
- !CornerCrop
is_train: false
- !CornerRatio
input_size: 511
output_size: 64
- !Permute
to_bgr: False
- !NormalizeImage
mean: [0.40789654, 0.44719302, 0.47026115]
std: [0.28863828, 0.27408164, 0.2780983]
is_scale: True
is_channel_first: True
use_process: true
batch_size: 1
drop_empty: false
worker_num: 2
TestReader:
inputs_def:
fields: ['image', 'im_id', 'ratios', 'borders']
output_size: 64
dataset:
!ImageFolder
anno_path: annotations/instances_val2017.json
with_background: false
sample_transforms:
- !DecodeImage
to_rgb: false
- !CornerCrop
is_train: false
- !CornerRatio
input_size: 511
output_size: 64
- !Permute
to_bgr: False
- !NormalizeImage
mean: [0.40789654, 0.44719302, 0.47026115]
std: [0.28863828, 0.27408164, 0.2780983]
is_scale: True
is_channel_first: True
batch_size: 1
architecture: CornerNetSqueeze
use_gpu: true
max_iters: 500000
log_iter: 20
save_dir: output
snapshot_iter: 10000
metric: COCO
pretrain_weights: https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_dcn_r50_vd_fpn_2x.tar
weights: output/cornernet_squeeze_dcn_r50_vd_fpn_mixup_cosine/model_final
num_classes: 80
stack: 1
CornerNetSqueeze:
backbone: ResNet
fpn: FPN
corner_head: CornerHead
ResNet:
norm_type: bn
depth: 50
feature_maps: [3, 4, 5]
freeze_at: 2
variant: d
dcn_v2_stages: [3, 4, 5]
FPN:
min_level: 3
max_level: 6
num_chan: 256
spatial_scale: [0.03125, 0.0625, 0.125]
CornerHead:
test_batch_size: 1
ae_threshold: 0.5
num_dets: 100
top_k: 20
PostProcess:
use_soft_nms: true
detections_per_im: 100
nms_thresh: 0.001
sigma: 0.5
LearningRate:
base_lr: 0.005
schedulers:
- !CosineDecay
max_iters: 500000
- !LinearWarmup
start_factor: 0.
steps: 4000
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0005
type: L2
TrainReader:
inputs_def:
image_shape: [3, 511, 511]
fields: ['image', 'im_id', 'gt_bbox', 'gt_class', 'tl_heatmaps', 'br_heatmaps', 'tl_regrs', 'br_regrs', 'tl_tags', 'br_tags', 'tag_masks']
output_size: 64
max_tag_len: 256
dataset:
!COCODataSet
image_dir: train2017
anno_path: annotations/instances_train2017.json
dataset_dir: dataset/coco
with_background: false
sample_transforms:
- !DecodeImage
to_rgb: False
with_mixup: True
- !MixupImage
alpha: 1.5
beta: 1.5
- !CornerCrop
input_size: 511
- !Resize
target_dim: 511
- !RandomFlipImage
prob: 0.5
- !CornerRandColor
saturation: 0.4
contrast: 0.4
brightness: 0.4
- !Lighting
eigval: [0.2141788, 0.01817699, 0.00341571]
eigvec: [[-0.58752847, -0.69563484, 0.41340352],
[-0.5832747, 0.00994535, -0.81221408],
[-0.56089297, 0.71832671, 0.41158938]]
- !NormalizeImage
mean: [0.40789654, 0.44719302, 0.47026115]
std: [0.28863828, 0.27408164, 0.2780983]
is_scale: False
is_channel_first: False
- !Permute
to_bgr: False
- !CornerTarget
output_size: [64, 64]
num_classes: 80
max_tag_len: 256
batch_size: 14
shuffle: true
drop_last: true
worker_num: 2
use_process: true
drop_empty: false
mixup_epoch: 200
EvalReader:
inputs_def:
fields: ['image', 'im_id', 'ratios', 'borders']
output_size: 64
dataset:
!COCODataSet
image_dir: val2017
anno_path: annotations/instances_val2017.json
dataset_dir: dataset/coco
with_background: false
sample_transforms:
- !DecodeImage
to_rgb: false
- !CornerCrop
is_train: false
- !CornerRatio
input_size: 511
output_size: 64
- !Permute
to_bgr: False
- !NormalizeImage
mean: [0.40789654, 0.44719302, 0.47026115]
std: [0.28863828, 0.27408164, 0.2780983]
is_scale: True
is_channel_first: True
use_process: true
batch_size: 1
drop_empty: false
worker_num: 2
TestReader:
inputs_def:
fields: ['image', 'im_id', 'ratios', 'borders']
output_size: 64
dataset:
!ImageFolder
anno_path: annotations/instances_val2017.json
with_background: false
sample_transforms:
- !DecodeImage
to_rgb: false
- !CornerCrop
is_train: false
- !CornerRatio
input_size: 511
output_size: 64
- !Permute
to_bgr: False
- !NormalizeImage
mean: [0.40789654, 0.44719302, 0.47026115]
std: [0.28863828, 0.27408164, 0.2780983]
is_scale: True
is_channel_first: True
batch_size: 1
architecture: CornerNetSqueeze
use_gpu: true
max_iters: 500000
log_iter: 20
save_dir: output
snapshot_iter: 10000
metric: COCO
pretrain_weights: NULL
weights: output/cornernet_squeeze_hg104/model_final
num_classes: 80
stack: 2
CornerNetSqueeze:
backbone: Hourglass
corner_head: CornerHead
Hourglass:
dims: [256, 256, 384, 384, 512]
modules: [2, 2, 2, 2, 4]
CornerHead:
test_batch_size: 1
ae_threshold: 0.5
num_dets: 100
top_k: 20
PostProcess:
use_soft_nms: true
detections_per_im: 100
nms_thresh: 0.001
sigma: 0.5
LearningRate:
base_lr: 0.00025
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones:
- 450000
OptimizerBuilder:
optimizer:
type: Adam
regularizer: NULL
TrainReader:
inputs_def:
image_shape: [3, 511, 511]
fields: ['image', 'im_id', 'gt_bbox', 'gt_class', 'tl_heatmaps', 'br_heatmaps', 'tl_regrs', 'br_regrs', 'tl_tags', 'br_tags', 'tag_masks']
output_size: 64
dataset:
!COCODataSet
image_dir: train2017
anno_path: annotations/instances_train2017.json
dataset_dir: dataset/coco
with_background: false
sample_transforms:
- !DecodeImage
to_rgb: False
- !CornerCrop
input_size: 511
- !Resize
target_dim: 511
- !RandomFlipImage
prob: 0.5
- !CornerRandColor
saturation: 0.4
contrast: 0.4
brightness: 0.4
- !Lighting
eigval: [0.2141788, 0.01817699, 0.00341571]
eigvec: [[-0.58752847, -0.69563484, 0.41340352],
[-0.5832747, 0.00994535, -0.81221408],
[-0.56089297, 0.71832671, 0.41158938]]
- !NormalizeImage
mean: [0.40789654, 0.44719302, 0.47026115]
std: [0.28863828, 0.27408164, 0.2780983]
is_scale: False
is_channel_first: False
- !Permute
to_bgr: False
- !CornerTarget
output_size: [64, 64]
num_classes: 80
batch_size: 14
shuffle: true
drop_last: true
worker_num: 2
use_process: true
drop_empty: false
EvalReader:
inputs_def:
fields: ['image', 'im_id', 'ratios', 'borders']
output_size: 64
dataset:
!COCODataSet
image_dir: val2017
anno_path: annotations/instances_val2017.json
dataset_dir: dataset/coco
with_background: false
sample_transforms:
- !DecodeImage
to_rgb: false
- !CornerCrop
is_train: false
- !CornerRatio
input_size: 511
output_size: 64
- !Permute
to_bgr: False
- !NormalizeImage
mean: [0.40789654, 0.44719302, 0.47026115]
std: [0.28863828, 0.27408164, 0.2780983]
is_scale: True
is_channel_first: True
batch_size: 1
drop_empty: false
worker_num: 2
use_process: true
TestReader:
inputs_def:
fields: ['image', 'im_id', 'ratios', 'borders']
output_size: 64
dataset:
!ImageFolder
anno_path: annotations/instances_val2017.json
with_background: false
sample_transforms:
- !DecodeImage
to_rgb: false
- !CornerCrop
is_train: false
- !CornerRatio
input_size: 511
output_size: 64
- !Permute
to_bgr: False
- !NormalizeImage
mean: [0.40789654, 0.44719302, 0.47026115]
std: [0.28863828, 0.27408164, 0.2780983]
is_scale: True
is_channel_first: True
batch_size: 1
此差异已折叠。
# Learning Data Augmentation Strategies for Object Detection
## Introduction
- Learning Data Augmentation Strategies for Object Detection: [https://arxiv.org/abs/1906.11172](https://arxiv.org/abs/1906.11172)
```
@article{Zoph2019LearningDA,
title={Learning Data Augmentation Strategies for Object Detection},
author={Barret Zoph and Ekin Dogus Cubuk and Golnaz Ghiasi and Tsung-Yi Lin and Jonathon Shlens and Quoc V. Le},
journal={ArXiv},
year={2019},
volume={abs/1906.11172}
}
```
## Model Zoo
| Backbone | Type | AutoAug policy | Image/gpu | Lr schd | Inf time (fps) | Box AP | Mask AP | Download | Configs |
| :---------------------- | :-------------:| :-------: | :-------: | :-----: | :------------: | :----: | :-----: | :----------------------------------------------------------: | :-----: |
| ResNet50-vd-FPN | Faster | v1 | 2 | 3x | 22.800 | 39.9 | - | [model](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r50_vd_fpn_aa_3x.tar) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/autoaugment/faster_rcnn_r50_vd_fpn_aa_3x.yml) |
| ResNet101-vd-FPN | Faster | v1 | 2 | 3x | 17.652 | 42.5 | - | [model](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r101_vd_fpn_aa_3x.tar) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/autoaugment/faster_rcnn_r101_vd_fpn_aa_3x.yml) |
此差异已折叠。
此差异已折叠。
architecture: YOLOv3
use_gpu: true
max_iters: 500000
log_iter: 20
save_dir: output
snapshot_iter: 20000
metric: COCO
pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_pretrained.tar
weights: output/yolov3_r50vd_dcn/model_final
num_classes: 80
use_fine_grained_loss: false
YOLOv3:
backbone: ResNet
yolo_head: YOLOv3Head
ResNet:
norm_type: sync_bn
freeze_at: 0
freeze_norm: false
norm_decay: 0.
depth: 50
feature_maps: [3, 4, 5]
variant: d
dcn_v2_stages: [5]
YOLOv3Head:
anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
anchors: [[10, 13], [16, 30], [33, 23],
[30, 61], [62, 45], [59, 119],
[116, 90], [156, 198], [373, 326]]
norm_decay: 0.
yolo_loss: YOLOv3Loss
nms:
background_label: -1
keep_top_k: 100
nms_threshold: 0.45
nms_top_k: 1000
normalized: false
score_threshold: 0.01
YOLOv3Loss:
ignore_thresh: 0.7
label_smooth: false
LearningRate:
base_lr: 0.001
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones:
- 400000
- 450000
- !LinearWarmup
start_factor: 0.
steps: 4000
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0005
type: L2
_READER_: '../yolov3_reader.yml'
此差异已折叠。
**文档教程请参考:** [FACE_DETECTION.md](../../docs/featured_model/FACE_DETECTION.md) <br/>
**English document please refer:** [FACE_DETECTION_en.md](../../docs/featured_model/FACE_DETECTION_en.md)
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册