未验证 提交 c577e93d 编写于 作者: F Feng Ni 提交者: GitHub

[cherry-pick][MOT] fit some detectors for deepsort (#4437)

* add score cls_id for deepsort, fit some detectors and pplcnet reid

* fix configs/mot/README
上级 0778b466
......@@ -20,7 +20,7 @@ The current mainstream multi-objective tracking (MOT) algorithm is mainly compos
Paddledetection implements three MOT algorithms of these two series.
- [DeepSORT](https://arxiv.org/abs/1812.00442) (Deep Cosine Metric Learning SORT) extends the original [SORT](https://arxiv.org/abs/1703.07402) (Simple Online and Realtime Tracking) algorithm, it adds a CNN model to extract features in image of human part bounded by a detector. It integrates appearance information based on a deep appearance descriptor, and assigns and updates the detected targets to the existing corresponding trajectories like ReID task. The detection bboxes result required by DeepSORT can be generated by any detection model, and then the saved detection result file can be loaded for tracking. Here we select the `PCB + Pyramid ResNet101` model provided by [PaddleClas](https://github.com/PaddlePaddle/PaddleClas) as the ReID model.
- [DeepSORT](https://arxiv.org/abs/1812.00442) (Deep Cosine Metric Learning SORT) extends the original [SORT](https://arxiv.org/abs/1703.07402) (Simple Online and Realtime Tracking) algorithm, it adds a CNN model to extract features in image of human part bounded by a detector. It integrates appearance information based on a deep appearance descriptor, and assigns and updates the detected targets to the existing corresponding trajectories like ReID task. The detection bboxes result required by DeepSORT can be generated by any detection model, and then the saved detection result file can be loaded for tracking. Here we select the `PCB + Pyramid ResNet101` and `PPLCNet` models provided by [PaddleClas](https://github.com/PaddlePaddle/PaddleClas) as the ReID model.
- [JDE](https://arxiv.org/abs/1909.12605) (Joint Detection and Embedding) learns the object detection task and appearance embedding task simutaneously in a shared neural network. And the detection results and the corresponding embeddings are also outputed at the same time. JDE original paper is based on an Anchor Base detector YOLOv3 , adding a new ReID branch to learn embeddings. The training process is constructed as a multi-task learning problem, taking into account both accuracy and speed.
......@@ -50,19 +50,22 @@ pip install -r requirements.txt
| backbone | input shape | MOTA | IDF1 | IDS | FP | FN | FPS | det result/model |ReID model| config |
| :---------| :------- | :----: | :----: | :--: | :----: | :---: | :---: | :---: | :---: | :---: |
| ResNet-101 | 1088x608 | 72.2 | 60.5 | 998 | 8054 | 21644 | - | [det result](https://dataset.bj.bcebos.com/mot/det_results_dir.zip) |[ReID model](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) |
| ResNet-101 | 1088x608 | 68.3 | 56.5 | 1722 | 17337 | 15890 | - | [det model](https://paddledet.bj.bcebos.com/models/mot/jde_yolov3_darknet53_30e_1088x608.pdparams) |[ReID model](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) |
| ResNet-101 | 1088x608 | 72.2 | 60.5 | 998 | 8054 | 21644 | - | [det result](https://dataset.bj.bcebos.com/mot/det_results_dir.zip) |[ReID model](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams)|[config](./deepsort/reid/deepsort_pcb_pyramid_r101.yml) |
| ResNet-101 | 1088x608 | 68.3 | 56.5 | 1722 | 17337 | 15890 | - | [det model](https://paddledet.bj.bcebos.com/models/mot/deepsort/jde_yolov3_darknet53_30e_1088x608_mix.pdparams) |[ReID model](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams)|[config](./deepsort/deepsort_jde_yolov3_pcb_pyramid.yml) |
| PPLCNet | 1088x608 | 72.2 | 59.5 | 1087 | 8034 | 21481 | - | [det result](https://dataset.bj.bcebos.com/mot/det_results_dir.zip) |[ReID model](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams)|[config](./deepsort/reid/deepsort_pplcnet.yml) |
| PPLCNet | 1088x608 | 68.1 | 53.6 | 1979 | 17446 | 15766 | - | [det model](https://paddledet.bj.bcebos.com/models/mot/deepsort/jde_yolov3_darknet53_30e_1088x608_mix.pdparams) |[ReID model](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams)|[config](./deepsort/deepsort_jde_yolov3_pplcnet.yml) |
### DeepSORT Results on MOT-16 Test Set
| backbone | input shape | MOTA | IDF1 | IDS | FP | FN | FPS | det result/model |ReID model| config |
| :---------| :------- | :----: | :----: | :--: | :----: | :---: | :---: | :---: | :---: | :---: |
| ResNet-101 | 1088x608 | 64.1 | 53.0 | 1024 | 12457 | 51919 | - |[det result](https://dataset.bj.bcebos.com/mot/det_results_dir.zip) |[ReID model](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) |
| ResNet-101 | 1088x608 | 61.2 | 48.5 | 1799 | 25796 | 43232 | - | [det model](https://paddledet.bj.bcebos.com/models/mot/jde_yolov3_darknet53_30e_1088x608.pdparams) |[ReID model](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) |
| ResNet-101 | 1088x608 | 64.1 | 53.0 | 1024 | 12457 | 51919 | - | [det result](https://dataset.bj.bcebos.com/mot/det_results_dir.zip) | [ReID model](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams)|[config](./deepsort/reid/deepsort_pcb_pyramid_r101.yml) |
| ResNet-101 | 1088x608 | 61.2 | 48.5 | 1799 | 25796 | 43232 | - | [det model](https://paddledet.bj.bcebos.com/models/mot/deepsort/jde_yolov3_darknet53_30e_1088x608_mix.pdparams) |[ReID model](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams)|[config](./deepsort/deepsort_jde_yolov3_pcb_pyramid.yml) |
| PPLCNet | 1088x608 | 64.0 | 51.3 | 1208 | 12697 | 51784 | - | [det result](https://dataset.bj.bcebos.com/mot/det_results_dir.zip) |[ReID model](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams)|[config](./deepsort/reid/deepsort_pplcnet.yml) |
| PPLCNet | 1088x608 | 61.1 | 48.8 | 2010 | 25401 | 43432 | - | [det model](https://paddledet.bj.bcebos.com/models/mot/deepsort/jde_yolov3_darknet53_30e_1088x608_mix.pdparams) |[ReID model](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams)|[config](./deepsort/deepsort_jde_yolov3_pplcnet.yml) |
**Notes:**
DeepSORT does not need to train on MOT dataset, only used for evaluation. Now it supports two evaluation methods.
- 1.Load the result file and the ReID model. Before DeepSORT evaluation, you should get detection results by a detection model first, and then prepare them like this:
```
det_results_dir
......@@ -80,34 +83,34 @@ wget https://dataset.bj.bcebos.com/mot/det_results_dir.zip
```
If you use a stronger detection model, you can get better results. Each txt is the detection result of all the pictures extracted from each video, and each line describes a bounding box with the following format:
```
[frame_id],[bb_left],[bb_top],[width],[height],[conf]
[frame_id],[x0],[y0],[w],[h],[score],[class_id]
```
- `frame_id` is the frame number of the image
- `bb_left` is the X coordinate of the left bound of the object box
- `bb_top` is the Y coordinate of the upper bound of the object box
- `width,height` is the pixel width and height
- `conf` is the object score with default value `1` (the results had been filtered out according to the detection score threshold)
- `frame_id` is the frame number of the image.
- `x0,y0` is the X and Y coordinates of the left bound of the object box.
- `w,h` is the pixel width and height of the object box.
- `score` is the confidence score of the object box.
- `class_id` is the category of the object box, set `0` if only has one category.
- 2.Load the detection model and the ReID model at the same time. Here, the JDE version of YOLOv3 is selected. For more detail of configuration, see `configs/mot/deepsort/_base_/deepsort_yolov3_darknet53_pcb_pyramid_r101.yml`.
- 2.Load the detection model and the ReID model at the same time. Here, the JDE version of YOLOv3 is selected. For more detail of configuration, see `configs/mot/deepsort/deepsort_jde_yolov3_pcb_pyramid.yml` and `configs/mot/deepsort/deepsort_ppyolov2_pplcnet.yml` for other general detectors.
### JDE Results on MOT-16 Training Set
| backbone | input shape | MOTA | IDF1 | IDS | FP | FN | FPS | download | config |
| :----------------- | :------- | :----: | :----: | :---: | :----: | :---: | :---: | :---: | :---: |
| DarkNet53 | 1088x608 | 72.0 | 66.9 | 1397 | 7274 | 22209 | - |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/mot/jde/jde_darknet53_30e_1088x608.yml) |
| DarkNet53 | 864x480 | 69.1 | 64.7 | 1539 | 7544 | 25046 | - |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_864x480.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/mot/jde/jde_darknet53_30e_864x480.yml) |
| DarkNet53 | 576x320 | 63.7 | 64.4 | 1310 | 6782 | 31964 | - |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/mot/jde/jde_darknet53_30e_576x320.yml) |
| DarkNet53 | 1088x608 | 72.0 | 66.9 | 1397 | 7274 | 22209 | - |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | [config](./jde/jde_darknet53_30e_1088x608.yml) |
| DarkNet53 | 864x480 | 69.1 | 64.7 | 1539 | 7544 | 25046 | - |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_864x480.pdparams) | [config](./jde/jde_darknet53_30e_864x480.yml) |
| DarkNet53 | 576x320 | 63.7 | 64.4 | 1310 | 6782 | 31964 | - |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [config](./jde/jde_darknet53_30e_576x320.yml) |
### JDE Results on MOT-16 Test Set
| backbone | input shape | MOTA | IDF1 | IDS | FP | FN | FPS | download | config |
| :----------------- | :------- | :----: | :----: | :---: | :----: | :---: | :---: | :---: | :---: |
| DarkNet53(paper) | 1088x608 | 64.4 | 55.8 | 1544 | - | - | - | - | - |
| DarkNet53 | 1088x608 | 64.6 | 58.5 | 1864 | 10550 | 52088 | - |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/mot/jde/jde_darknet53_30e_1088x608.yml) |
| DarkNet53 | 1088x608 | 64.6 | 58.5 | 1864 | 10550 | 52088 | - |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | [config](./jde/jde_darknet53_30e_1088x608.yml) |
| DarkNet53(paper) | 864x480 | 62.1 | 56.9 | 1608 | - | - | - | - | - |
| DarkNet53 | 864x480 | 63.2 | 57.7 | 1966 | 10070 | 55081 | - |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_864x480.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/mot/jde/jde_darknet53_30e_864x480.yml) |
| DarkNet53 | 576x320 | 59.1 | 56.4 | 1911 | 10923 | 61789 | - |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/mot/jde/jde_darknet53_30e_576x320.yml) |
| DarkNet53 | 864x480 | 63.2 | 57.7 | 1966 | 10070 | 55081 | - |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_864x480.pdparams) | [config](./jde/jde_darknet53_30e_864x480.yml) |
| DarkNet53 | 576x320 | 59.1 | 56.4 | 1911 | 10923 | 61789 | - |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [config](./jde/jde_darknet53_30e_576x320.yml) |
**Notes:**
JDE used 8 GPUs for training and mini-batch size as 4 on each GPU, and trained for 30 epoches.
......@@ -178,19 +181,21 @@ If you use a stronger detection model, you can get better results. Each txt is t
### FairMOT Results on HT-21 Training Set
| backbone | input shape | MOTA | IDF1 | IDS | FP | FN | FPS | download | config |
| :--------------| :------- | :----: | :----: | :---: | :----: | :---: | :------: | :----: |:----: |
| DLA-34 | 1088x608 | 64.7 | 69.0 | 8533 | 148817 | 234970 | - | [model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_headtracking21.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/mot/headtracking21/fairmot_dla34_30e_1088x608_headtracking21.yml) |
| DLA-34 | 1088x608 | 64.7 | 69.0 | 8533 | 148817 | 234970 | - | [model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_headtracking21.pdparams) | [config](./headtracking21/fairmot_dla34_30e_1088x608_headtracking21.yml) |
### FairMOT Results on HT-21 Test Set
| backbone | input shape | MOTA | IDF1 | IDS | FP | FN | FPS | download | config |
| :--------------| :------- | :----: | :----: | :----: | :----: | :----: |:-------: | :----: | :----: |
| DLA-34 | 1088x608 | 60.8 | 62.8 | 12781 | 118109 | 198896 | - | [model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_headtracking21.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/mot/headtracking21/fairmot_dla34_30e_1088x608_headtracking21.yml) |
| DLA-34 | 1088x608 | 60.8 | 62.8 | 12781 | 118109 | 198896 | - | [model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_headtracking21.pdparams) | [config](./headtracking21/fairmot_dla34_30e_1088x608_headtracking21.yml) |
### [Pedestrian Tracking](./pedestrian/README.md)
### FairMOT Results on each val-set of Pedestrian category
| Dataset | input shape | MOTA | IDF1 | FPS | download | config |
| :-------------| :------- | :----: | :----: | :----: | :-----: |:------: |
| PathTrack | 1088x608 | 44.9 | 59.3 | - |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_pathtrack.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/mot/pedestrian/fairmot_dla34_30e_1088x608_pathtrack.yml) |
| VisDrone | 1088x608 | 49.2 | 63.1 | - | [model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_visdrone_pedestrian.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/mot/pedestrian/fairmot_dla34_30e_1088x608_visdrone_pedestrian.yml) |
| PathTrack | 1088x608 | 44.9 | 59.3 | - |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_pathtrack.pdparams) | [config](./pedestrian/fairmot_dla34_30e_1088x608_pathtrack.yml) |
| VisDrone | 1088x608 | 49.2 | 63.1 | - | [model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_visdrone_pedestrian.pdparams) | [config](./pedestrian/fairmot_dla34_30e_1088x608_visdrone_pedestrian.yml) |
### [Vehicle Tracking](./vehicle/README.md)
### FairMOT Results on each val-set of Vehicle category
......
......@@ -20,7 +20,7 @@
- JDE(Joint Detection and Embedding)这类算法完是在一个共享神经网络中同时学习Detection和Embedding,使用一个多任务学习的思路设置损失函数。代表性的算法有**JDE****FairMOT**。这样的设计兼顾精度和速度,可以实现高精度的实时多目标跟踪。
PaddleDetection实现了这两个系列的3种多目标跟踪算法。
- [DeepSORT](https://arxiv.org/abs/1812.00442)(Deep Cosine Metric Learning SORT) 扩展了原有的[SORT](https://arxiv.org/abs/1703.07402)(Simple Online and Realtime Tracking)算法,增加了一个CNN模型用于在检测器限定的人体部分图像中提取特征,在深度外观描述的基础上整合外观信息,将检出的目标分配和更新到已有的对应轨迹上即进行一个ReID重识别任务。DeepSORT所需的检测框可以由任意一个检测器来生成,然后读入保存的检测结果和视频图片即可进行跟踪预测。ReID模型此处选择[PaddleClas](https://github.com/PaddlePaddle/PaddleClas)提供的`PCB+Pyramid ResNet101`模型。
- [DeepSORT](https://arxiv.org/abs/1812.00442)(Deep Cosine Metric Learning SORT) 扩展了原有的[SORT](https://arxiv.org/abs/1703.07402)(Simple Online and Realtime Tracking)算法,增加了一个CNN模型用于在检测器限定的人体部分图像中提取特征,在深度外观描述的基础上整合外观信息,将检出的目标分配和更新到已有的对应轨迹上即进行一个ReID重识别任务。DeepSORT所需的检测框可以由任意一个检测器来生成,然后读入保存的检测结果和视频图片即可进行跟踪预测。ReID模型此处选择[PaddleClas](https://github.com/PaddlePaddle/PaddleClas)提供的`PCB+Pyramid ResNet101``PPLCNet`模型。
- [JDE](https://arxiv.org/abs/1909.12605)(Joint Detection and Embedding)是在一个单一的共享神经网络中同时学习目标检测任务和embedding任务,并同时输出检测结果和对应的外观embedding匹配的算法。JDE原论文是基于Anchor Base的YOLOv3检测器新增加一个ReID分支学习embedding,训练过程被构建为一个多任务联合学习问题,兼顾精度和速度。
......@@ -49,21 +49,23 @@ pip install -r requirements.txt
| 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 检测结果或模型 | ReID模型 |配置文件 |
| :---------| :------- | :----: | :----: | :--: | :----: | :---: | :---: | :-----:| :-----: | :-----: |
| ResNet-101 | 1088x608 | 72.2 | 60.5 | 998 | 8054 | 21644 | - | [检测结果](https://dataset.bj.bcebos.com/mot/det_results_dir.zip) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) |
| ResNet-101 | 1088x608 | 68.3 | 56.5 | 1722 | 17337 | 15890 | - | [检测模型](https://paddledet.bj.bcebos.com/models/mot/jde_yolov3_darknet53_30e_1088x608.pdparams) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) |
| ResNet-101 | 1088x608 | 72.2 | 60.5 | 998 | 8054 | 21644 | - | [检测结果](https://dataset.bj.bcebos.com/mot/det_results_dir.zip) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](./deepsort/reid/deepsort_pcb_pyramid_r101.yml) |
| ResNet-101 | 1088x608 | 68.3 | 56.5 | 1722 | 17337 | 15890 | - | [检测模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/jde_yolov3_darknet53_30e_1088x608_mix.pdparams) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](./deepsort/deepsort_jde_yolov3_pcb_pyramid.yml) |
| PPLCNet | 1088x608 | 72.2 | 59.5 | 1087 | 8034 | 21481 | - | [检测结果](https://dataset.bj.bcebos.com/mot/det_results_dir.zip) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams)|[配置文件](./deepsort/reid/deepsort_pplcnet.yml) |
| PPLCNet | 1088x608 | 68.1 | 53.6 | 1979 | 17446 | 15766 | - | [检测模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/jde_yolov3_darknet53_30e_1088x608_mix.pdparams) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams)|[配置文件](./deepsort/deepsort_jde_yolov3_pplcnet.yml) |
### DeepSORT在MOT-16 Test Set上结果
| 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 检测结果或模型 | ReID模型 |配置文件 |
| :---------| :------- | :----: | :----: | :--: | :----: | :---: | :---: | :-----: | :-----: |:-----: |
| ResNet-101 | 1088x608 | 64.1 | 53.0 | 1024 | 12457 | 51919 | - | [检测结果](https://dataset.bj.bcebos.com/mot/det_results_dir.zip) | [ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) |
| ResNet-101 | 1088x608 | 61.2 | 48.5 | 1799 | 25796 | 43232 | - | [检测模型](https://paddledet.bj.bcebos.com/models/mot/jde_yolov3_darknet53_30e_1088x608.pdparams) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) |
| ResNet-101 | 1088x608 | 64.1 | 53.0 | 1024 | 12457 | 51919 | - | [检测结果](https://dataset.bj.bcebos.com/mot/det_results_dir.zip) | [ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](./deepsort/reid/deepsort_pcb_pyramid_r101.yml) |
| ResNet-101 | 1088x608 | 61.2 | 48.5 | 1799 | 25796 | 43232 | - | [检测模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/jde_yolov3_darknet53_30e_1088x608_mix.pdparams) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](./deepsort/deepsort_jde_yolov3_pcb_pyramid.yml) |
| PPLCNet | 1088x608 | 64.0 | 51.3 | 1208 | 12697 | 51784 | - | [检测结果](https://dataset.bj.bcebos.com/mot/det_results_dir.zip) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams)|[配置文件](./deepsort/reid/deepsort_pplcnet.yml) |
| PPLCNet | 1088x608 | 61.1 | 48.8 | 2010 | 25401 | 43432 | - | [检测模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/jde_yolov3_darknet53_30e_1088x608_mix.pdparams) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams)|[配置文件](./deepsort/deepsort_jde_yolov3_pplcnet.yml) |
**注意:**
DeepSORT不需要训练MOT数据集,只用于评估,现在支持两种评估的方式。
- 第1种方式是加载检测结果文件和ReID模型,在使用DeepSORT模型评估之前,应该首先通过一个检测模型得到检测结果,然后像这样准备好结果文件:
- **方式1**:加载检测结果文件和ReID模型,在使用DeepSORT模型评估之前,应该首先通过一个检测模型得到检测结果,然后像这样准备好结果文件:
```
det_results_dir
|——————MOT16-02.txt
......@@ -80,24 +82,24 @@ wget https://dataset.bj.bcebos.com/mot/det_results_dir.zip
```
如果使用更强的检测模型,可以取得更好的结果。其中每个txt是每个视频中所有图片的检测结果,每行都描述一个边界框,格式如下:
```
[frame_id],[bb_left],[bb_top],[width],[height],[conf]
[frame_id],[x0],[y0],[w],[h],[score],[class_id]
```
- `frame_id`是图片帧的序号
- `bb_left`是目标框的左边界的x坐标
- `bb_top`是目标框的上边界的y坐标
- `width,height`是真实的像素宽高
- `conf`是目标得分设置为`1`(已经按检测的得分阈值筛选出的检测结果)
- `x0,y0`是目标框的左上角x和y坐标
- `w,h`是目标框的像素宽高
- `score`是目标框的得分
- `class_id`是目标框的类别,如果只有1类则是`0`
- 第2种方式是同时加载检测模型和ReID模型,此处选用JDE版本的YOLOv3,具体配置见`configs/mot/deepsort/_base_/deepsort_yolov3_darknet53_pcb_pyramid_r101.yml`
- **方式2**:同时加载检测模型和ReID模型,此处选用JDE版本的YOLOv3,具体配置见`configs/mot/deepsort/deepsort_jde_yolov3_pcb_pyramid.yml`。加载其他通用检测模型可参照`configs/mot/deepsort/deepsort_ppyolov2_pplcnet.yml`进行修改。
### JDE在MOT-16 Training Set上结果
| 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 下载链接 | 配置文件 |
| :----------------- | :------- | :----: | :----: | :---: | :----: | :---: | :---: | :---: | :---: |
| DarkNet53 | 1088x608 | 72.0 | 66.9 | 1397 | 7274 | 22209 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/mot/jde/jde_darknet53_30e_1088x608.yml) |
| DarkNet53 | 864x480 | 69.1 | 64.7 | 1539 | 7544 | 25046 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_864x480.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/mot/jde/jde_darknet53_30e_864x480.yml) |
| DarkNet53 | 576x320 | 63.7 | 64.4 | 1310 | 6782 | 31964 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/mot/jde/jde_darknet53_30e_576x320.yml) |
| DarkNet53 | 1088x608 | 72.0 | 66.9 | 1397 | 7274 | 22209 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | [配置文件](./jde/jde_darknet53_30e_1088x608.yml) |
| DarkNet53 | 864x480 | 69.1 | 64.7 | 1539 | 7544 | 25046 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_864x480.pdparams) | [配置文件](./jde/jde_darknet53_30e_864x480.yml) |
| DarkNet53 | 576x320 | 63.7 | 64.4 | 1310 | 6782 | 31964 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [配置文件](./jde/jde_darknet53_30e_576x320.yml) |
### JDE在MOT-16 Test Set上结果
......@@ -105,10 +107,10 @@ wget https://dataset.bj.bcebos.com/mot/det_results_dir.zip
| 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 下载链接 | 配置文件 |
| :----------------- | :------- | :----: | :----: | :---: | :----: | :---: | :---: | :---: | :---: |
| DarkNet53(paper) | 1088x608 | 64.4 | 55.8 | 1544 | - | - | - | - | - |
| DarkNet53 | 1088x608 | 64.6 | 58.5 | 1864 | 10550 | 52088 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/mot/jde/jde_darknet53_30e_1088x608.yml) |
| DarkNet53 | 1088x608 | 64.6 | 58.5 | 1864 | 10550 | 52088 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | [配置文件](./jde/jde_darknet53_30e_1088x608.yml) |
| DarkNet53(paper) | 864x480 | 62.1 | 56.9 | 1608 | - | - | - | - | - |
| DarkNet53 | 864x480 | 63.2 | 57.7 | 1966 | 10070 | 55081 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_864x480.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/mot/jde/jde_darknet53_30e_864x480.yml) |
| DarkNet53 | 576x320 | 59.1 | 56.4 | 1911 | 10923 | 61789 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/mot/jde/jde_darknet53_30e_576x320.yml) |
| DarkNet53 | 864x480 | 63.2 | 57.7 | 1966 | 10070 | 55081 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_864x480.pdparams) | [配置文件](./jde/jde_darknet53_30e_864x480.yml) |
| DarkNet53 | 576x320 | 59.1 | 56.4 | 1911 | 10923 | 61789 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [配置文件](./jde/jde_darknet53_30e_576x320.yml) |
**注意:**
JDE使用8个GPU进行训练,每个GPU上batch size为4,训练了30个epoch。
......@@ -177,19 +179,21 @@ wget https://dataset.bj.bcebos.com/mot/det_results_dir.zip
### FairMOT在HT-21 Training Set上结果
| 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 下载链接 | 配置文件 |
| :--------------| :------- | :----: | :----: | :---: | :----: | :---: | :------: | :----: |:----: |
| DLA-34 | 1088x608 | 64.7 | 69.0 | 8533 | 148817 | 234970 | - | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_headtracking21.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/mot/headtracking21/fairmot_dla34_30e_1088x608_headtracking21.yml) |
| DLA-34 | 1088x608 | 64.7 | 69.0 | 8533 | 148817 | 234970 | - | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_headtracking21.pdparams) | [配置文件](./headtracking21/fairmot_dla34_30e_1088x608_headtracking21.yml) |
### FairMOT在HT-21 Test Set上结果
| 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 下载链接 | 配置文件 |
| :--------------| :------- | :----: | :----: | :----: | :----: | :----: |:-------: | :----: | :----: |
| DLA-34 | 1088x608 | 60.8 | 62.8 | 12781 | 118109 | 198896 | - | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_headtracking21.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/mot/headtracking21/fairmot_dla34_30e_1088x608_headtracking21.yml) |
| DLA-34 | 1088x608 | 60.8 | 62.8 | 12781 | 118109 | 198896 | - | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_headtracking21.pdparams) | [配置文件](./headtracking21/fairmot_dla34_30e_1088x608_headtracking21.yml) |
### [行人跟踪 (Pedestrian Tracking)](./pedestrian/README.md)
### FairMOT在各个数据集val-set上Pedestrian类别的结果
| 数据集 | 输入尺寸 | MOTA | IDF1 | FPS | 下载链接 | 配置文件 |
| :-------------| :------- | :----: | :----: | :----: | :-----: |:------: |
| PathTrack | 1088x608 | 44.9 | 59.3 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_pathtrack.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/mot/pedestrian/fairmot_dla34_30e_1088x608_pathtrack.yml) |
| VisDrone | 1088x608 | 49.2 | 63.1 | - | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_visdrone_pedestrian.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/mot/pedestrian/fairmot_dla34_30e_1088x608_visdrone_pedestrian.yml) |
| PathTrack | 1088x608 | 44.9 | 59.3 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_pathtrack.pdparams) | [配置文件](./pedestrian/fairmot_dla34_30e_1088x608_pathtrack.yml) |
| VisDrone | 1088x608 | 49.2 | 63.1 | - | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_visdrone_pedestrian.pdparams) | [配置文件](./pedestrian/fairmot_dla34_30e_1088x608_visdrone_pedestrian.yml) |
### [车辆跟踪 (Vehicle Tracking)](./vehicle/README.md)
### FairMOT在各个数据集val-set上Vehicle类别的结果
......
English | [简体中文](README_cn.md)
# DeepSORT (Deep Cosine Metric Learning for Person Re-identification)
## Table of Contents
- [Introduction](#Introduction)
- [Model Zoo](#Model_Zoo)
- [Getting Start](#Getting_Start)
- [Citations](#Citations)
## Introduction
[DeepSORT](https://arxiv.org/abs/1812.00442) (Deep Cosine Metric Learning SORT) extends the original [SORT](https://arxiv.org/abs/1703.07402) (Simple Online and Realtime Tracking) algorithm, it adds a CNN model to extract features in image of human part bounded by a detector. It integrates appearance information based on a deep appearance descriptor, and assigns and updates the detected targets to the existing corresponding trajectories like ReID task. The detection bboxes result required by DeepSORT can be generated by any detection model, and then the saved detection result file can be loaded for tracking. Here we select the `PCB + Pyramid ResNet101` model provided by [PaddleClas](https://github.com/PaddlePaddle/PaddleClas) as the ReID model.
## Model Zoo
### DeepSORT Results on MOT-16 Training Set
| backbone | input shape | MOTA | IDF1 | IDS | FP | FN | FPS | det result/model |ReID model| config |
| :---------| :------- | :----: | :----: | :--: | :----: | :---: | :---: | :---: | :---: | :---: |
| ResNet-101 | 1088x608 | 72.2 | 60.5 | 998 | 8054 | 21644 | - | [det result](https://dataset.bj.bcebos.com/mot/det_results_dir.zip) |[ReID model](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) |
| ResNet-101 | 1088x608 | 68.3 | 56.5 | 1722 | 17337 | 15890 | - | [det model](https://paddledet.bj.bcebos.com/models/mot/jde_yolov3_darknet53_30e_1088x608.pdparams) |[ReID model](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) |
### DeepSORT Results on MOT-16 Test Set
| backbone | input shape | MOTA | IDF1 | IDS | FP | FN | FPS | det result/model |ReID model| config |
| :---------| :------- | :----: | :----: | :--: | :----: | :---: | :---: | :---: | :---: | :---: |
| ResNet-101 | 1088x608 | 64.1 | 53.0 | 1024 | 12457 | 51919 | - |[det result](https://dataset.bj.bcebos.com/mot/det_results_dir.zip) |[ReID model](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) |
| ResNet-101 | 1088x608 | 61.2 | 48.5 | 1799 | 25796 | 43232 | - | [det model](https://paddledet.bj.bcebos.com/models/mot/jde_yolov3_darknet53_30e_1088x608.pdparams) |[ReID model](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) |
**Notes:**
DeepSORT does not need to train on MOT dataset, only used for evaluation. Now it supports two evaluation methods.
- 1.Load the result file and the ReID model. Before DeepSORT evaluation, you should get detection results by a detection model first, and then prepare them like this:
```
det_results_dir
|——————MOT16-02.txt
|——————MOT16-04.txt
|——————MOT16-05.txt
|——————MOT16-09.txt
|——————MOT16-10.txt
|——————MOT16-11.txt
|——————MOT16-13.txt
```
For MOT16 dataset, you can download a detection result after matching called det_results_dir.zip provided by PaddleDetection:
```
wget https://dataset.bj.bcebos.com/mot/det_results_dir.zip
```
If you use a stronger detection model, you can get better results. Each txt is the detection result of all the pictures extracted from each video, and each line describes a bounding box with the following format:
```
[frame_id],[bb_left],[bb_top],[width],[height],[conf]
```
- `frame_id` is the frame number of the image
- `bb_left` is the X coordinate of the left bound of the object box
- `bb_top` is the Y coordinate of the upper bound of the object box
- `width,height` is the pixel width and height
- `conf` is the object score with default value `1` (the results had been filtered out according to the detection score threshold)
- 2. Load the detection model and the ReID model at the same time. Here, the JDE version of YOLOv3 is selected. For more detail of configuration, see `configs/mot/deepsort/_base_/deepsort_jde_yolov3_darknet53_pcb_pyramid_r101.yml`. Load other general detection model, you can refer to `configs/mot/deepsort/_base_/deepsort_yolov3_darknet53_pcb_pyramid_r101.yml`.
## Getting Start
### 1. Evaluation
```bash
# Load the result file and ReID model to get the tracking result
CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml --det_results_dir {your detection results}
# Load JDE YOLOv3 detector and ReID model to get the tracking results
CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/deepsort/deepsort_jde_yolov3_pcb_pyramid_r101.yml
# or Load genernal YOLOv3 detector and ReID model to get the tracking results
CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/deepsort/deepsort_yolov3_pcb_pyramid_r101.yml --scaled=True
```
**Notes:**
JDE YOLOv3 pedestrian detector is trained with the same MOT dataset as JDE and FairMOT. In addition, the biggest difference between this model and general YOLOv3 model is that JDEBBoxPostProcess post-processing, and the output coordinates are not scaled back to the original image.
General YOLOv3 pedestrian detector is not trained on MOT dataset, so the performance is lower. But the output coordinates are scaled back to the original image.
`--scaled` means whether the coords after detector outputs are scaled back to the original image, False in JDE YOLOv3, True in general detector.
### 2. Inference
Inference a vidoe on single GPU with following command:
```bash
# load JDE YOLOv3 pedestrian detector and ReID model to get tracking results
CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/deepsort/deepsort_jde_yolov3_pcb_pyramid_r101.yml --video_file={your video name}.mp4 --save_videos
# or load general YOLOv3 pedestrian detector and ReID model to get tracking results
CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/deepsort/deepsort_yolov3_pcb_pyramid_r101.yml --video_file={your video name}.mp4 --scaled=True --save_videos
```
**Notes:**
Please make sure that [ffmpeg](https://ffmpeg.org/ffmpeg.html) is installed first, on Linux(Ubuntu) platform you can directly install it by the following command:`apt-get update && apt-get install -y ffmpeg`.
`--scaled` means whether the coords after detector outputs are scaled back to the original image, False in JDE YOLOv3, True in general detector.
### 3. Export model
```bash
# 1.export detection model
# export JDE YOLOv3 pedestrian detector
CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/deepsort/jde_yolov3_darknet53_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/jde_yolov3_darknet53_30e_1088x608.pdparams
# or export general YOLOv3 pedestrian detector
CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/pedestrian/pedestrian_yolov3_darknet.yml -o weights=https://paddledet.bj.bcebos.com/models/pedestrian_yolov3_darknet.pdparams
# 2. export ReID model
CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml -o reid_weights=https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams
```
### 4. Using exported model for python inference
```bash
# using exported JDE YOLOv3 pedestrian detector
python deploy/python/mot_sde_infer.py --model_dir=output_inference/jde_yolov3_darknet53_30e_1088x608/ --reid_model_dir=output_inference/deepsort_pcb_pyramid_r101/ --video_file={your video name}.mp4 --device=GPU --save_mot_txts
# or using exported general YOLOv3 pedestrian detector
python deploy/python/mot_sde_infer.py --model_dir=output_inference/pedestrian_yolov3_darknet/ --reid_model_dir=output_inference/deepsort_pcb_pyramid_r101/ --video_file={your video name}.mp4 --device=GPU --scaled=True --save_mot_txts
```
**Notes:**
The tracking model is used to predict the video, and does not support the prediction of a single image. The visualization video of the tracking results is saved by default. You can add `--save_mot_txts`(save a txt for every video) or `--save_mot_txt_per_img`(save a txt for every image) to save the txt result file, or `--save_images` to save the visualization images.
`--scaled` means whether the coords after detector outputs are scaled back to the original image, False in JDE YOLOv3, True in general detector.
## Citations
```
@inproceedings{Wojke2017simple,
title={Simple Online and Realtime Tracking with a Deep Association Metric},
author={Wojke, Nicolai and Bewley, Alex and Paulus, Dietrich},
booktitle={2017 IEEE International Conference on Image Processing (ICIP)},
year={2017},
pages={3645--3649},
organization={IEEE},
doi={10.1109/ICIP.2017.8296962}
}
@inproceedings{Wojke2018deep,
title={Deep Cosine Metric Learning for Person Re-identification},
author={Wojke, Nicolai and Bewley, Alex},
booktitle={2018 IEEE Winter Conference on Applications of Computer Vision (WACV)},
year={2018},
pages={748--756},
organization={IEEE},
doi={10.1109/WACV.2018.00087}
}
```
README_cn.md
\ No newline at end of file
......@@ -6,10 +6,11 @@
- [简介](#简介)
- [模型库](#模型库)
- [快速开始](#快速开始)
- [适配其他检测器](适配其他检测器)
- [引用](#引用)
## 简介
[DeepSORT](https://arxiv.org/abs/1812.00442)(Deep Cosine Metric Learning SORT) 扩展了原有的[SORT](https://arxiv.org/abs/1703.07402)(Simple Online and Realtime Tracking)算法,增加了一个CNN模型用于在检测器限定的人体部分图像中提取特征,在深度外观描述的基础上整合外观信息,将检出的目标分配和更新到已有的对应轨迹上即进行一个ReID重识别任务。DeepSORT所需的检测框可以由任意一个检测器来生成,然后读入保存的检测结果和视频图片即可进行跟踪预测。ReID模型此处选择[PaddleClas](https://github.com/PaddlePaddle/PaddleClas)提供的`PCB+Pyramid ResNet101`模型。
[DeepSORT](https://arxiv.org/abs/1812.00442)(Deep Cosine Metric Learning SORT) 扩展了原有的[SORT](https://arxiv.org/abs/1703.07402)(Simple Online and Realtime Tracking)算法,增加了一个CNN模型用于在检测器限定的人体部分图像中提取特征,在深度外观描述的基础上整合外观信息,将检出的目标分配和更新到已有的对应轨迹上即进行一个ReID重识别任务。DeepSORT所需的检测框可以由任意一个检测器来生成,然后读入保存的检测结果和视频图片即可进行跟踪预测。ReID模型此处选择[PaddleClas](https://github.com/PaddlePaddle/PaddleClas)提供的`PCB+Pyramid ResNet101``PPLCNet`模型。
## 模型库
......@@ -17,21 +18,33 @@
| 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 检测结果或模型 | ReID模型 |配置文件 |
| :---------| :------- | :----: | :----: | :--: | :----: | :---: | :---: | :-----:| :-----: | :-----: |
| ResNet-101 | 1088x608 | 72.2 | 60.5 | 998 | 8054 | 21644 | - | [检测结果](https://dataset.bj.bcebos.com/mot/det_results_dir.zip) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) |
| ResNet-101 | 1088x608 | 68.3 | 56.5 | 1722 | 17337 | 15890 | - | [检测模型](https://paddledet.bj.bcebos.com/models/mot/jde_yolov3_darknet53_30e_1088x608.pdparams) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) |
| ResNet-101 | 1088x608 | 72.2 | 60.5 | 998 | 8054 | 21644 | - | [检测结果](https://dataset.bj.bcebos.com/mot/det_results_dir.zip) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](./reid/deepsort_pcb_pyramid_r101.yml) |
| ResNet-101 | 1088x608 | 68.3 | 56.5 | 1722 | 17337 | 15890 | - | [检测模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/jde_yolov3_darknet53_30e_1088x608_mix.pdparams) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](./deepsort_jde_yolov3_pcb_pyramid.yml) |
| PPLCNet | 1088x608 | 72.2 | 59.5 | 1087 | 8034 | 21481 | - | [检测结果](https://dataset.bj.bcebos.com/mot/det_results_dir.zip) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams)|[配置文件](./reid/deepsort_pplcnet.yml) |
| PPLCNet | 1088x608 | 68.1 | 53.6 | 1979 | 17446 | 15766 | - | [检测模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/jde_yolov3_darknet53_30e_1088x608_mix.pdparams) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams)|[配置文件](./deepsort_jde_yolov3_pplcnet.yml) |
### DeepSORT在MOT-16 Test Set上结果
| 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 检测结果或模型 | ReID模型 |配置文件 |
| :---------| :------- | :----: | :----: | :--: | :----: | :---: | :---: | :-----: | :-----: |:-----: |
| ResNet-101 | 1088x608 | 64.1 | 53.0 | 1024 | 12457 | 51919 | - | [检测结果](https://dataset.bj.bcebos.com/mot/det_results_dir.zip) | [ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) |
| ResNet-101 | 1088x608 | 61.2 | 48.5 | 1799 | 25796 | 43232 | - | [检测模型](https://paddledet.bj.bcebos.com/models/mot/jde_yolov3_darknet53_30e_1088x608.pdparams) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) |
| ResNet-101 | 1088x608 | 64.1 | 53.0 | 1024 | 12457 | 51919 | - | [检测结果](https://dataset.bj.bcebos.com/mot/det_results_dir.zip) | [ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](./reid/deepsort_pcb_pyramid_r101.yml) |
| ResNet-101 | 1088x608 | 61.2 | 48.5 | 1799 | 25796 | 43232 | - | [检测模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/jde_yolov3_darknet53_30e_1088x608_mix.pdparams) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](./deepsort_jde_yolov3_pcb_pyramid.yml) |
| PPLCNet | 1088x608 | 64.0 | 51.3 | 1208 | 12697 | 51784 | - | [检测结果](https://dataset.bj.bcebos.com/mot/det_results_dir.zip) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams)|[配置文件](./reid/deepsort_pplcnet.yml) |
| PPLCNet | 1088x608 | 61.1 | 48.8 | 2010 | 25401 | 43432 | - | [检测模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/jde_yolov3_darknet53_30e_1088x608_mix.pdparams) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams)|[配置文件](./deepsort_jde_yolov3_pplcnet.yml) |
**注意:**
DeepSORT不需要训练MOT数据集,只用于评估,现在支持两种评估的方式。
### DeepSORT在MOT-17 half Val Set上结果
- 第1种方式是加载检测结果文件和ReID模型,在使用DeepSORT模型评估之前,应该首先通过一个检测模型得到检测结果,然后像这样准备好结果文件:
| 检测训练数据集 | 检测器 | ReID | 检测mAP | MOTA | IDF1 | FPS | 配置文件 |
| :-------- | :----- | :----: |:------: | :----: |:-----: |:----:|:----: |
| MIX | JDE YOLOv3 | PCB Pyramid | - | 66.9 | 62.7 | - |[配置文件](./deepsort_jde_yolov3_pcb_pyramid.yml) |
| MIX | JDE YOLOv3 | PPLCNet | - | 66.3 | 62.1 | - |[配置文件](./deepsort_jde_yolov3_pplcnet.yml) |
| pedestrian(未开放) | YOLOv3 | PPLCNet | 45.4 | 45.8 | 54.3 | - |[配置文件](./deepsort_yolov3_pplcnet.yml) |
| MOT-17 half train | PPYOLOv2 | PPLCNet | 46.8 | 48.7 | 54.5 | - |[配置文件](./deepsort_ppyolov2_pplcnet.yml) |
**注意:**
DeepSORT不需要训练MOT数据集,只用于评估,现在支持两种评估的方式。
- **方式1**:加载检测结果文件和ReID模型,在使用DeepSORT模型评估之前,应该首先通过一个检测模型得到检测结果,然后像这样准备好结果文件:
```
det_results_dir
|——————MOT16-02.txt
......@@ -48,45 +61,50 @@ wget https://dataset.bj.bcebos.com/mot/det_results_dir.zip
```
如果使用更强的检测模型,可以取得更好的结果。其中每个txt是每个视频中所有图片的检测结果,每行都描述一个边界框,格式如下:
```
[frame_id],[bb_left],[bb_top],[width],[height],[conf]
[frame_id],[x0],[y0],[w],[h],[score],[class_id]
```
- `frame_id`是图片帧的序号
- `bb_left`是目标框的左边界的x坐标
- `bb_top`是目标框的上边界的y坐标
- `width,height`是真实的像素宽高
- `conf`是目标得分设置为`1`(已经按检测的得分阈值筛选出的检测结果)
- `x0,y0`是目标框的左上角x和y坐标
- `w,h`是目标框的像素宽高
- `score`是目标框的得分
- `class_id`是目标框的类别,如果只有1类则是`0`
- 第2种方式是同时加载检测模型和ReID模型,此处选用JDE版本的YOLOv3,具体配置见`configs/mot/deepsort/_base_/deepsort_jde_yolov3_darknet53_pcb_pyramid_r101.yml`。加载其他通用检测模型可参照`configs/mot/deepsort/_base_/deepsort_yolov3_darknet53_pcb_pyramid_r101.yml`进行修改。
- **方式2**:同时加载检测模型和ReID模型,此处选用JDE版本的YOLOv3,具体配置见`configs/mot/deepsort/deepsort_jde_yolov3_pcb_pyramid.yml`。加载其他通用检测模型可参照`configs/mot/deepsort/deepsort_ppyolov2_pplcnet.yml`进行修改。
## 快速开始
### 1. 评估
**方式1**:加载检测结果文件和ReID模型,得到跟踪结果
```bash
# 加载检测结果文件和ReID模型,得到跟踪结果
CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml --det_results_dir {your detection results}
# 加载JDE YOLOv3行人检测模型和ReID模型,得到跟踪结果
CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/deepsort/deepsort_jde_yolov3_pcb_pyramid_r101.yml
CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/deepsort/reid/deepsort_pcb_pyramid_r101.yml --det_results_dir {your detection results}
# 或者
CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/deepsort/reid/deepsort_pplcnet.yml --det_results_dir {your detection results}
```
# 或者加载普通YOLOv3行人检测模型和ReID模型,得到跟踪结果
CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/deepsort/deepsort_yolov3_pcb_pyramid_r101.yml --scaled=True
**方式2**:加载行人检测模型和ReID模型,得到跟踪结果
```bash
CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/deepsort/deepsort_jde_yolov3_pcb_pyramid.yml
# 或者
CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/deepsort/deepsort_jde_yolov3_pplcnet.yml
# 或者
CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/deepsort/deepsort_ppyolov2_pplcnet.yml --scaled=True
```
**注意:**
JDE YOLOv3行人检测模型是和JDE和FairMOT使用同样的MOT数据集训练的,这个模型与普通YOLOv3模型最大的区别是使用了JDEBBoxPostProcess后处理,结果输出坐标没有缩放回原图
普通YOLOv3行人检测模型不是用MOT数据集训练的,所以精度效果更低, 其模型输出坐标是缩放回原图的。
`--scaled`表示在模型输出结果的坐标是否已经是缩放回原图的,如果使用的检测模型是JDE的YOLOv3则为False,如果使用通用检测模型则为True。
- JDE YOLOv3行人检测模型是和JDE和FairMOT使用同样的MOT数据集训练的,因此MOTA较高。而其他通用检测模型如PPYOLOv2只使用了MOT17 half数据集训练
- JDE YOLOv3模型与通用检测模型如YOLOv3和PPYOLOv2最大的区别是使用了JDEBBoxPostProcess后处理,结果输出坐标没有缩放回原图,而通用检测模型输出坐标是缩放回原图的。
- `--scaled`表示在模型输出结果的坐标是否已经是缩放回原图的,如果使用的检测模型是JDE YOLOv3则为False,如果使用通用检测模型则为True, 默认值是False。
### 2. 预测
使用单个GPU通过如下命令预测一个视频,并保存为视频
```bash
# 加载JDE YOLOv3行人检测模型和ReID模型,并保存为视频
CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/deepsort/deepsort_jde_yolov3_pcb_pyramid_r101.yml --video_file={your video name}.mp4 --save_videos
# 加载JDE YOLOv3行人检测模型和PCB Pyramid ReID模型,并保存为视频
CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/deepsort/deepsort_jde_yolov3_pcb_pyramid.yml --video_file={your video name}.mp4 --save_videos
# 或者加载普通YOLOv3行人检测模型和ReID模型,并保存为视频
CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/deepsort/deepsort_yolov3_pcb_pyramid_r101.yml --video_file={your video name}.mp4 --scaled=True --save_videos
# 或者加载PPYOLOv2行人检测模型和PPLCNet ReID模型,并保存为视频
CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/deepsort/deepsort_ppyolov2_pplcnet.yml --video_file={your video name}.mp4 --scaled=True --save_videos
```
**注意:**
......@@ -96,32 +114,77 @@ CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/deepsort/deepsor
### 3. 导出预测模型
Step 1:导出检测模型
```bash
# 1.先导出检测模型
# 导出JDE YOLOv3行人检测模型
CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/deepsort/jde_yolov3_darknet53_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/jde_yolov3_darknet53_30e_1088x608.pdparams
# 或导出普通YOLOv3行人检测模型
CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/pedestrian/pedestrian_yolov3_darknet.yml -o weights=https://paddledet.bj.bcebos.com/models/pedestrian_yolov3_darknet.pdparams
CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/deepsort/detector/jde_yolov3_darknet53_30e_1088x608_mix.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/deepsort/jde_yolov3_darknet53_30e_1088x608_mix.pdparams
# 或导出PPYOLOv2行人检测模型
CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/deepsort/detector/ppyolov2_r50vd_dcn_365e_640x640_mot17half.yml -o weights=https://paddledet.bj.bcebos.com/mot/deepsort/ppyolov2_r50vd_dcn_365e_640x640_mot17half.pdparams
```
# 2.再导出ReID模型
CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml -o reid_weights=https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams
Step 2:导出ReID模型
```bash
# 导出PCB Pyramid ReID模型
CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/deepsort/reid/deepsort_pcb_pyramid_r101.yml -o reid_weights=https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams
# 或者导出PPLCNet ReID模型
CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/deepsort/reid/deepsort_pplcnet.yml -o reid_weights=https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams
```
### 4. 用导出的模型基于Python去预测
```bash
# 用导出JDE YOLOv3行人检测模型
python deploy/python/mot_sde_infer.py --model_dir=output_inference/jde_yolov3_darknet53_30e_1088x608/ --reid_model_dir=output_inference/deepsort_pcb_pyramid_r101/ --video_file={your video name}.mp4 --device=GPU --save_mot_txts
# 用导出JDE YOLOv3行人检测模型和PCB Pyramid ReID模型
python deploy/python/mot_sde_infer.py --model_dir=output_inference/jde_yolov3_darknet53_30e_1088x608_mix/ --reid_model_dir=output_inference/deepsort_pcb_pyramid_r101/ --video_file={your video name}.mp4 --device=GPU --save_mot_txts
# 或用导出的普通yolov3行人检测模型
python deploy/python/mot_sde_infer.py --model_dir=output_inference/pedestrian_yolov3_darknet/ --reid_model_dir=output_inference/deepsort_pcb_pyramid_r101/ --video_file={your video name}.mp4 --device=GPU --scaled=True --save_mot_txts
# 或用导出的PPYOLOv2行人检测模型和PPLCNet ReID模型
python deploy/python/mot_sde_infer.py --model_dir=output_inference/ppyolov2_r50vd_dcn_365e_640x640_mot17half/ --reid_model_dir=output_inference/deepsort_pplcnet/ --video_file={your video name}.mp4 --device=GPU --scaled=True --save_mot_txts
```
**注意:**
跟踪模型是对视频进行预测,不支持单张图的预测,默认保存跟踪结果可视化后的视频,可添加`--save_mot_txts`(对每个视频保存一个txt)或`--save_mot_txt_per_img`(对每张图片保存一个txt)表示保存跟踪结果的txt文件,或`--save_images`表示保存跟踪结果可视化图片。
`--scaled`表示在模型输出结果的坐标是否已经是缩放回原图的,如果使用的检测模型是JDE的YOLOv3则为False,如果使用通用检测模型则为True。
## 适配其他检测器
### 1、配置文件目录说明
- `detector/xxx.yml`是纯粹的检测模型配置文件,如`detector/ppyolov2_r50vd_dcn_365e_640x640_mot17half.yml`,支持检测的所有流程(train/eval/infer/export/deploy)。DeepSORT跟踪的eval/infer与这个纯检测的yml文件无关,但是export的时候需要这个纯检测的yml单独导出检测模型,DeepSORT跟踪导出模型是分开detector和reid分别导出的,用户可自行定义和组装detector+reid成为一个完整的DeepSORT跟踪系统。
- `detector/`下的检测器配置文件中,用户需要将自己的数据集转为COCO格式。由于ID的真实标签不需要参与进去,用户可以在此自行配置任何检测模型,只需保证输出结果包含结果框的种类、坐标和分数即可。
- `reid/deepsort_yyy.yml`文件夹里的是ReID模型和tracker的配置文件,如`reid/deepsort_pplcnet.yml`,此处ReID模型是由[PaddleClas](https://github.com/PaddlePaddle/PaddleClas)提供的`deepsort_pcb_pyramid_r101.yml``deepsort_pplcnet.yml`,是在Market1501(751类人)行人ReID数据集上训练得到的,训练细节待PaddleClas公布。
- `deepsort_xxx_yyy.yml`是一个完整的DeepSORT跟踪的配置,如`deepsort_ppyolov2_pplcnet.yml`,其中检测部分`xxx``detector/`里的,reid和tracker部分`yyy``reid/`里的。
- DeepSORT跟踪的eval/infer有两种方式,方式1是只使用`reid/deepsort_yyy.yml`加载检测结果文件和`yyy`ReID模型,方式2是使用`deepsort_xxx_yyy.yml`加载`xxx`检测模型和`yyy`ReID模型,但是DeepSORT跟踪的deploy必须使用`deepsort_xxx_yyy.yml`
- 检测器的eval/infer/deploy只使用到`detector/xxx.yml`,ReID一般不单独使用,如需单独使用必须提前加载检测结果文件然后只使用`reid/deepsort_yyy.yml`
### 2、适配的具体步骤
1.先将数据集制作成COCO格式按通用检测模型配置来训练,参照`detector/`文件夹里的模型配置文件,制作生成`detector/xxx.yml`, 已经支持有Faster R-CNN、YOLOv3、PPYOLOv2、JDE YOLOv3和PicoDet等模型。
2.制作`deepsort_xxx_yyy.yml`, 其中`DeepSORT.detector`的配置就是`detector/xxx.yml`里的, `EvalMOTDataset``det_weights`可以自行设置。`yyy``reid/deepsort_yyy.yml``reid/deepsort_pplcnet.yml`
### 3、使用的具体步骤
#### 1.加载检测模型和ReID模型去评估:
```
CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/deepsort/deepsort_xxx_yyy.yml --scaled=True
```
#### 2.加载检测模型和ReID模型去推理:
```
CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/deepsort/deepsort_xxx_yyy.yml --video_file={your video name}.mp4 --scaled=True --save_videos
```
#### 3.导出检测模型和ReID模型:
```bash
# 导出检测模型
CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/deepsort/detector/xxx.yml
# 导出ReID模型
CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/deepsort/reid/deepsort_yyy.yml
```
#### 4.使用导出的检测模型和ReID模型去部署:
```
python deploy/python/mot_sde_infer.py --model_dir=output_inference/xxx./ --reid_model_dir=output_inference/deepsort_yyy/ --video_file={your video name}.mp4 --device=GPU --scaled=True --save_mot_txts
```
**注意:**
`--scaled`表示在模型输出结果的坐标是否已经是缩放回原图的,如果使用的检测模型是JDE的YOLOv3则为False,如果使用通用检测模型则为True。
## 引用
```
@inproceedings{Wojke2017simple,
......
_BASE_: [
'detector/faster_rcnn_r50_fpn_2x_1333x800_mot17half.yml',
'../../datasets/mot.yml',
'../../runtime.yml',
'_base_/deepsort_reader_1088x608.yml',
]
metric: MOT
num_classes: 1
EvalMOTDataset:
!MOTImageFolder
dataset_dir: dataset/mot
data_root: MOT17/images/half
keep_ori_im: True # set as True in DeepSORT
det_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort/faster_rcnn_r50_fpn_2x_1333x800_mot17half.pdparams
reid_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams
# DeepSORT configuration
architecture: DeepSORT
pretrain_weights: None
DeepSORT:
detector: FasterRCNN
reid: PPLCNetEmbedding
tracker: DeepSORTTracker
# reid and tracker configuration
# see 'configs/mot/deepsort/reid/deepsort_pplcnet.yml'
PPLCNetEmbedding:
input_ch: 1280
output_ch: 512
DeepSORTTracker:
input_size: [64, 192]
min_box_area: 0
vertical_ratio: -1
budget: 100
max_age: 70
n_init: 3
metric_type: cosine
matching_threshold: 0.2
max_iou_distance: 0.9
motion: KalmanFilter
# detector configuration
# see 'configs/mot/deepsort/detector/faster_rcnn_r50_fpn_2x_1333x800_mot17half.yml'
FasterRCNN:
backbone: ResNet
neck: FPN
rpn_head: RPNHead
bbox_head: BBoxHead
bbox_post_process: BBoxPostProcess
# Tracking requires higher quality boxes, so nms.score_threshold will be higher
BBoxPostProcess:
decode: RCNNBox
nms:
name: MultiClassNMS
keep_top_k: 100
score_threshold: 0.2 # 0.05 in original detector
nms_threshold: 0.5
_BASE_: [
'detector/jde_yolov3_darknet53_30e_1088x608_mix.yml',
'../../datasets/mot.yml',
'../../runtime.yml',
'_base_/deepsort_reader_1088x608.yml',
]
metric: MOT
num_classes: 1
EvalMOTDataset:
!MOTImageFolder
dataset_dir: dataset/mot
data_root: MOT16/images/train
keep_ori_im: True # set as True in DeepSORT
det_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort/jde_yolov3_darknet53_30e_1088x608_mix.pdparams
reid_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams
# DeepSORT configuration
architecture: DeepSORT
pretrain_weights: None
DeepSORT:
detector: YOLOv3 # JDE version
detector: YOLOv3 # JDE version YOLOv3
reid: PCBPyramid
tracker: DeepSORTTracker
# reid and tracker configuration
# see 'configs/mot/deepsort/reid/deepsort_pcb_pyramid_r101.yml'
PCBPyramid:
num_conv_out_channels: 128
num_classes: 751
DeepSORTTracker:
input_size: [64, 192]
min_box_area: 0
vertical_ratio: -1
budget: 100
max_age: 70
n_init: 3
......@@ -20,30 +46,16 @@ DeepSORTTracker:
motion: KalmanFilter
# JDE version YOLOv3 detector for MOT dataset.
# The most obvious difference is JDEBBoxPostProcess and the bboxes coordinates
# output are not scaled to the original image.
# detector configuration: JDE version YOLOv3
# see 'configs/mot/deepsort/detector/jde_yolov3_darknet53_30e_1088x608_mix.yml'
# The most obvious difference from general YOLOv3 is the JDEBBoxPostProcess and the bboxes coordinates output are not scaled to the original image.
YOLOv3:
backbone: DarkNet
neck: YOLOv3FPN
yolo_head: YOLOv3Head
post_process: JDEBBoxPostProcess
DarkNet:
depth: 53
return_idx: [2, 3, 4]
freeze_norm: True
YOLOv3FPN:
freeze_norm: True
YOLOv3Head:
anchors: [[128,384], [180,540], [256,640], [512,640],
[32,96], [45,135], [64,192], [90,271],
[8,24], [11,34], [16,48], [23,68]]
anchor_masks: [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]
loss: JDEDetectionLoss
# Tracking requires higher quality boxes, so decode.conf_thresh will be higher
JDEBBoxPostProcess:
decode:
name: JDEBox
......
_BASE_: [
'../../datasets/mot.yml',
'../../runtime.yml',
'_base_/deepsort_jde_yolov3_darknet53_pcb_pyramid_r101.yml',
'_base_/deepsort_reader_1088x608.yml',
]
EvalMOTDataset:
!MOTImageFolder
dataset_dir: dataset/mot
data_root: MOT16/images/train
keep_ori_im: True # set as True in DeepSORT
det_weights: https://paddledet.bj.bcebos.com/models/mot/jde_yolov3_darknet53_30e_1088x608.pdparams
reid_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams
DeepSORT:
detector: YOLOv3
reid: PCBPyramid
tracker: DeepSORTTracker
# JDE version YOLOv3 detector for MOT dataset.
# The most obvious difference is JDEBBoxPostProcess and the bboxes coordinates
# output are not scaled to the original image.
YOLOv3:
backbone: DarkNet
neck: YOLOv3FPN
yolo_head: YOLOv3Head
post_process: JDEBBoxPostProcess
_BASE_: [
'detector/jde_yolov3_darknet53_30e_1088x608_mix.yml',
'../../datasets/mot.yml',
'../../runtime.yml',
'_base_/deepsort_reader_1088x608.yml',
]
metric: MOT
num_classes: 1
EvalMOTDataset:
!MOTImageFolder
dataset_dir: dataset/mot
data_root: MOT16/images/train
keep_ori_im: True # set as True in DeepSORT
det_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort/jde_yolov3_darknet53_30e_1088x608_mix.pdparams
reid_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams
# DeepSORT configuration
architecture: DeepSORT
pretrain_weights: None
DeepSORT:
detector: YOLOv3 # JDE version YOLOv3
reid: PPLCNetEmbedding
tracker: DeepSORTTracker
# reid and tracker configuration
# see 'configs/mot/deepsort/reid/deepsort_pplcnet.yml'
PPLCNetEmbedding:
input_ch: 1280
output_ch: 512
DeepSORTTracker:
input_size: [64, 192]
min_box_area: 0
vertical_ratio: -1
budget: 100
max_age: 70
n_init: 3
metric_type: cosine
matching_threshold: 0.2
max_iou_distance: 0.9
motion: KalmanFilter
# detector configuration: JDE version YOLOv3
# see 'configs/mot/deepsort/detector/jde_yolov3_darknet53_30e_1088x608_mix.yml'
# The most obvious difference from general YOLOv3 is the JDEBBoxPostProcess and the bboxes coordinates output are not scaled to the original image.
YOLOv3:
backbone: DarkNet
neck: YOLOv3FPN
yolo_head: YOLOv3Head
post_process: JDEBBoxPostProcess
# Tracking requires higher quality boxes, so decode.conf_thresh will be higher
JDEBBoxPostProcess:
decode:
name: JDEBox
conf_thresh: 0.3
downsample_ratio: 32
nms:
name: MultiClassNMS
keep_top_k: 500
score_threshold: 0.01
nms_threshold: 0.5
nms_top_k: 2000
normalized: true
return_idx: false
_BASE_: [
'../../datasets/mot.yml',
'../../runtime.yml',
'_base_/deepsort_yolov3_darknet53_pcb_pyramid_r101.yml',
'_base_/deepsort_reader_1088x608.yml',
]
EvalMOTDataset:
!MOTImageFolder
dataset_dir: dataset/mot
data_root: MOT16/images/train
keep_ori_im: True # set as True in DeepSORT
det_weights: None
reid_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams
DeepSORT:
detector: None
reid: PCBPyramid
tracker: DeepSORTTracker
_BASE_: [
'detector/picodet_l_esnet_300e_896x896_mot17half.yml',
'../../datasets/mot.yml',
'../../runtime.yml',
'_base_/deepsort_reader_1088x608.yml',
]
metric: MOT
num_classes: 1
EvalMOTDataset:
!MOTImageFolder
dataset_dir: dataset/mot
data_root: MOT17/images/half
keep_ori_im: True # set as True in DeepSORT
det_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort/picodet_l_esnet_300e_896x896_mot17half.pdparams
reid_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams
# DeepSORT configuration
architecture: DeepSORT
pretrain_weights: None
DeepSORT:
detector: PicoDet
reid: PPLCNetEmbedding
tracker: DeepSORTTracker
# reid and tracker configuration
# see 'configs/mot/deepsort/reid/deepsort_pplcnet.yml'
PPLCNetEmbedding:
input_ch: 1280
output_ch: 512
DeepSORTTracker:
input_size: [64, 192]
min_box_area: 0
vertical_ratio: -1
budget: 100
max_age: 70
n_init: 3
metric_type: cosine
matching_threshold: 0.2
max_iou_distance: 0.9
motion: KalmanFilter
# detector configuration
# see 'configs/mot/deepsort/detector/picodet_l_esnet_300e_640x640_mot17half.yml'
PicoDet:
backbone: ESNet
neck: CSPPAN
head: PicoHead
PicoHead:
conv_feat:
name: PicoFeat
feat_in: 128
feat_out: 128
num_convs: 4
num_fpn_stride: 4
norm_type: bn
share_cls_reg: False
fpn_stride: [8, 16, 32, 64]
feat_in_chan: 128
prior_prob: 0.01
reg_max: 7
cell_offset: 0.5
loss_class:
name: VarifocalLoss
use_sigmoid: True
iou_weighted: True
loss_weight: 1.0
loss_dfl:
name: DistributionFocalLoss
loss_weight: 0.25
loss_bbox:
name: GIoULoss
loss_weight: 2.0
assigner:
name: SimOTAAssigner
candidate_topk: 10
iou_weight: 6
nms:
name: MultiClassNMS
nms_top_k: 1000
keep_top_k: 100
score_threshold: 0.25 # 0.025 in original detector
nms_threshold: 0.6
_BASE_: [
'detector/ppyolov2_r50vd_dcn_365e_640x640_mot17half.yml',
'../../datasets/mot.yml',
'../../runtime.yml',
'_base_/deepsort_reader_1088x608.yml',
]
metric: MOT
num_classes: 1
EvalMOTDataset:
!MOTImageFolder
dataset_dir: dataset/mot
data_root: MOT17/images/half
keep_ori_im: True # set as True in DeepSORT
det_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort/ppyolov2_r50vd_dcn_365e_640x640_mot17half.pdparams
reid_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams
# DeepSORT configuration
architecture: DeepSORT
pretrain_weights: None
DeepSORT:
detector: YOLOv3 # PPYOLOv2 version
reid: PPLCNetEmbedding
tracker: DeepSORTTracker
# reid and tracker configuration
# see 'configs/mot/deepsort/reid/deepsort_pplcnet.yml'
PPLCNetEmbedding:
input_ch: 1280
output_ch: 512
DeepSORTTracker:
input_size: [64, 192]
min_box_area: 0
vertical_ratio: -1
budget: 100
max_age: 70
n_init: 3
metric_type: cosine
matching_threshold: 0.2
max_iou_distance: 0.9
motion: KalmanFilter
# detector configuration: PPYOLOv2 version
# see 'configs/mot/deepsort/detector/ppyolov2_r50vd_dcn_365e_640x640_mot17half.yml'
YOLOv3:
backbone: ResNet
neck: PPYOLOPAN
yolo_head: YOLOv3Head
post_process: BBoxPostProcess
ResNet:
depth: 50
variant: d
return_idx: [1, 2, 3]
dcn_v2_stages: [3]
freeze_at: -1
freeze_norm: false
norm_decay: 0.
# Tracking requires higher quality boxes, so decode.conf_thresh will be higher
BBoxPostProcess:
decode:
name: YOLOBox
conf_thresh: 0.25 # 0.01 in original detector
downsample_ratio: 32
clip_bbox: true
scale_x_y: 1.05
nms:
name: MatrixNMS
keep_top_k: 100
score_threshold: 0.25 # 0.01 in original detector
post_threshold: 0.25 # 0.01 in original detector
nms_top_k: -1
background_label: -1
_BASE_: [
'../../datasets/mot.yml',
'../../runtime.yml',
'_base_/deepsort_yolov3_darknet53_pcb_pyramid_r101.yml',
'_base_/deepsort_reader_1088x608.yml',
]
EvalMOTDataset:
!MOTImageFolder
dataset_dir: dataset/mot
data_root: MOT16/images/train
keep_ori_im: True # set as True in DeepSORT
det_weights: https://paddledet.bj.bcebos.com/models/pedestrian_yolov3_darknet.pdparams
reid_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams
DeepSORT:
detector: YOLOv3
reid: PCBPyramid
tracker: DeepSORTTracker
# General version YOLOv3
# Using BBoxPostProcess and the bboxes output are scaled to the original image.
YOLOv3:
backbone: DarkNet
neck: YOLOv3FPN
yolo_head: YOLOv3Head
post_process: BBoxPostProcess
_BASE_: [
'detector/yolov3_darknet53_270e_608x608_pedestrian.yml',
'../../datasets/mot.yml',
'../../runtime.yml',
'_base_/deepsort_reader_1088x608.yml',
]
metric: MOT
num_classes: 1
EvalMOTDataset:
!MOTImageFolder
dataset_dir: dataset/mot
data_root: MOT17/images/half
keep_ori_im: True # set as True in DeepSORT
det_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort/yolov3_darknet53_270e_608x608_pedestrian.pdparams
reid_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams
# DeepSORT configuration
architecture: DeepSORT
pretrain_weights: None
DeepSORT:
detector: YOLOv3 # General YOLOv3 version
reid: PPLCNetEmbedding
tracker: DeepSORTTracker
# reid and tracker configuration
# see 'configs/mot/deepsort/reid/deepsort_pplcnet.yml'
PPLCNetEmbedding:
input_ch: 1280
output_ch: 512
DeepSORTTracker:
input_size: [64, 192]
min_box_area: 0
vertical_ratio: -1
budget: 100
max_age: 70
n_init: 3
metric_type: cosine
matching_threshold: 0.2
max_iou_distance: 0.9
motion: KalmanFilter
# detector configuration: General YOLOv3 version
# see 'configs/mot/deepsort/detector/yolov3_darknet53_270e_608x608_pedestrian.yml'
YOLOv3:
backbone: DarkNet
neck: YOLOv3FPN
yolo_head: YOLOv3Head
post_process: BBoxPostProcess
# Tracking requires higher quality boxes, so decode.conf_thresh will be higher
BBoxPostProcess:
decode:
name: YOLOBox
conf_thresh: 0.1 # 0.005 in original detector
downsample_ratio: 32
clip_bbox: true
nms:
name: MultiClassNMS
keep_top_k: 100
score_threshold: 0.01
nms_threshold: 0.45
nms_top_k: 1000
English | [简体中文](README_cn.md)
# Detector for DeepSORT
## Introduction
[DeepSORT](https://arxiv.org/abs/1812.00442)(Deep Cosine Metric Learning SORT) is composed of a detector and a ReID model in series. The configs of several common detectors are provided here as a reference. Note that different training dataset, backbone, input size, training epochs and NMS threshold will lead to differences in model accuracy and performance. Please adapt according to your needs.
## Model Zoo
### Results on MOT17-half dataset
| Backbone | Model | input size | lr schedule | FPS | Box AP | download | config |
| :-------------- | :------------- | :--------: | :---------: | :-----------: | :-----: | :----------: | :-----: |
| ResNet50-vd | PPYOLOv2 | 640x640 | 365e | ---- | 46.8 | [download](https://paddledet.bj.bcebos.com/models/mot/deepsort/ppyolov2_r50vd_dcn_365e_640x640_mot17half.pdparams) | [config](./ppyolov2_r50vd_dcn_365e_640x640_mot17half.yml) |
| ResNet50-FPN | Faster R-CNN | 1333x800 | 1x | ---- | 44.2 | [download](https://paddledet.bj.bcebos.com/models/mot/deepsort/faster_rcnn_r50_fpn_2x_1333x800_mot17half.pdparams) | [config](./faster_rcnn_r50_fpn_2x_1333x800_mot17half.yml) |
| DarkNet-53 | YOLOv3 | 608X608 | 270e | ---- | 45.4 | [download](https://paddledet.bj.bcebos.com/models/mot/deepsort/yolov3_darknet53_270e_608x608_pedestrian.pdparams) | [config](./yolov3_darknet53_270e_608x608_pedestrian.yml) |
| ESNet | PicoDet | 896x896 | 300e | ---- | 40.9 | [download](https://paddledet.bj.bcebos.com/models/mot/deepsort/picodet_l_esnet_300e_896x896_mot17half.pdparams) | [config](./picodet_l_esnet_300e_896x896_mot17half.yml) |
**Notes:**
- The above model except for YOLOv3 is trained with **MOT17-half train** set.
- **MOT17-half train** set is a dataset composed of pictures and labels of the first half frame of each video in MOT17 Train dataset (7 sequences in total). **MOT17-half val set** is used for evaluation, which is composed of the second half frame of each video. They can be downloaded from this [link](https://paddledet.bj.bcebos.com/data/mot/mot17half/annotations.zip). Download and unzip it in the `dataset/mot/MOT17/images/`folder.
- YOLOv3 is trained with the same pedestrian dataset as `configs/pedestrian/pedestrian_yolov3_darknet.yml`, which is not open yet.
- For pedestrian tracking, please use pedestrian detector combined with pedestrian ReID model. For vehicle tracking, please use vehicle detector combined with vehicle ReID model.
- High quality detected boxes are required for DeepSORT tracking, so the post-processing settings such as NMS threshold of these models are different from those in pure detection tasks.
## Quick Start
Start the training and evaluation with the following command
```bash
job_name=ppyolov2_r50vd_dcn_365e_640x640_mot17half
config=configs/mot/deepsort/detector/${job_name}.yml
log_dir=log_dir/${job_name}
# 1. training
python -m paddle.distributed.launch --log_dir=${log_dir} --gpus 0,1,2,3,4,5,6,7 tools/train.py -c ${config}
# 2. evaluation
CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c ${config} -o weights=https://paddledet.bj.bcebos.com/models/mot/deepsort/${job_name}.pdparams
```
简体中文 | [English](README.md)
# DeepSORT的检测器
## 简介
[DeepSORT](https://arxiv.org/abs/1812.00442)(Deep Cosine Metric Learning SORT) 由检测器和ReID模型串联组合而成,此处提供了几个常用检测器的配置作为参考。由于训练数据集、输入尺度、训练epoch数、NMS阈值设置等的不同均会导致模型精度和性能的差异,请自行根据需求进行适配。
## 模型库
### 在MOT17-half val数据集上的检测结果
| 骨架网络 | 网络类型 | 输入尺度 | 学习率策略 |推理时间(fps) | Box AP | 下载 | 配置文件 |
| :-------------- | :------------- | :--------: | :---------: | :-----------: | :-----: | :------: | :-----: |
| ResNet50-vd | PPYOLOv2 | 640x640 | 365e | ---- | 46.8 | [下载链接](https://paddledet.bj.bcebos.com/models/mot/deepsort/ppyolov2_r50vd_dcn_365e_640x640_mot17half.pdparams) | [配置文件](./ppyolov2_r50vd_dcn_365e_640x640_mot17half.yml) |
| ResNet50-FPN | Faster R-CNN | 1333x800 | 1x | ---- | 44.2 | [下载链接](https://paddledet.bj.bcebos.com/models/mot/deepsort/faster_rcnn_r50_fpn_2x_1333x800_mot17half.pdparams) | [配置文件](./faster_rcnn_r50_fpn_2x_1333x800_mot17half.yml) |
| DarkNet-53 | YOLOv3 | 608X608 | 270e | ---- | 45.4 | [下载链接](https://paddledet.bj.bcebos.com/models/mot/deepsort/yolov3_darknet53_270e_608x608_pedestrian.pdparams) | [配置文件](./yolov3_darknet53_270e_608x608_pedestrian.yml) |
| ESNet | PicoDet | 896x896 | 300e | ---- | 40.9 | [下载链接](https://paddledet.bj.bcebos.com/models/mot/deepsort/picodet_l_esnet_300e_896x896_mot17half.pdparams) | [配置文件](./picodet_l_esnet_300e_896x896_mot17half.yml) |
**注意:**
- 以上模型除了YOLOv3以外均采用**MOT17-half train**数据集训练。
- **MOT17-half train**是MOT17的train序列(共7个)每个视频的前一半帧的图片和标注组成的数据集,而为了验证精度可以都用**MOT17-half val**数据集去评估,它是每个视频的后一半帧组成的,数据集可以从[此链接](https://paddledet.bj.bcebos.com/data/mot/mot17half/annotations.zip)下载,并解压放在`dataset/mot/MOT17/images/`文件夹下。
- YOLOv3和`configs/pedestrian/pedestrian_yolov3_darknet.yml`是相同的pedestrian数据集训练的,此数据集暂未开放。
- 行人跟踪请使用行人检测器结合行人ReID模型。车辆跟踪请使用车辆检测器结合车辆ReID模型。
- 用于DeepSORT跟踪时需要高质量的检出框,因此这些模型的NMS阈值等后处理设置会与纯检测任务的设置不同。
## 快速开始
通过如下命令一键式启动训练和评估
```bash
job_name=ppyolov2_r50vd_dcn_365e_640x640_mot17half
config=configs/mot/deepsort/detector/${job_name}.yml
log_dir=log_dir/${job_name}
# 1. training
python -m paddle.distributed.launch --log_dir=${log_dir} --gpus 0,1,2,3,4,5,6,7 tools/train.py -c ${config}
# 2. evaluation
CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c ${config} -o weights=https://paddledet.bj.bcebos.com/models/mot/deepsort/${job_name}.pdparams
```
_BASE_: [
'../../../faster_rcnn/faster_rcnn_r50_fpn_2x_coco.yml',
]
weights: output/faster_rcnn_r50_fpn_2x_1333x800_mot17half/model_final
num_classes: 1
TrainDataset:
!COCODataSet
dataset_dir: dataset/mot/MOT17/images
anno_path: annotations/train_half.json
image_dir: train
data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
EvalDataset:
!COCODataSet
dataset_dir: dataset/mot/MOT17/images
anno_path: annotations/val_half.json
image_dir: train
# detector configuration
architecture: FasterRCNN
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_cos_pretrained.pdparams
FasterRCNN:
backbone: ResNet
neck: FPN
rpn_head: RPNHead
bbox_head: BBoxHead
bbox_post_process: BBoxPostProcess
ResNet:
depth: 50
norm_type: bn
freeze_at: 0
return_idx: [0,1,2,3]
num_stages: 4
FPN:
out_channel: 256
RPNHead:
anchor_generator:
aspect_ratios: [0.5, 1.0, 2.0]
anchor_sizes: [[32], [64], [128], [256], [512]]
strides: [4, 8, 16, 32, 64]
rpn_target_assign:
batch_size_per_im: 256
fg_fraction: 0.5
negative_overlap: 0.3
positive_overlap: 0.7
use_random: True
train_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 2000
post_nms_top_n: 1000
topk_after_collect: True
test_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 1000
post_nms_top_n: 1000
BBoxHead:
head: TwoFCHead
roi_extractor:
resolution: 7
sampling_ratio: 0
aligned: True
bbox_assigner: BBoxAssigner
BBoxAssigner:
batch_size_per_im: 512
bg_thresh: 0.5
fg_thresh: 0.5
fg_fraction: 0.25
use_random: True
TwoFCHead:
out_channel: 1024
BBoxPostProcess:
decode: RCNNBox
nms:
name: MultiClassNMS
keep_top_k: 100
score_threshold: 0.05
nms_threshold: 0.5
_BASE_: [
'../../datasets/mot.yml',
'../../runtime.yml',
'../jde/_base_/optimizer_30e.yml',
'../jde/_base_/jde_reader_1088x608.yml',
'../../../datasets/mot.yml',
'../../../runtime.yml',
'../../jde/_base_/optimizer_30e.yml',
'../../jde/_base_/jde_reader_1088x608.yml',
]
weights: output/jde_yolov3_darknet53_30e_1088x608/model_final
weights: output/jde_yolov3_darknet53_30e_1088x608_mix/model_final
metric: MOTDet
num_classes: 1
EvalReader:
inputs_def:
num_max_boxes: 50
......@@ -31,13 +31,15 @@ TestReader:
EvalDataset:
!MOTDataSet
dataset_dir: dataset/mot
image_lists: ['mot16.train']
image_lists: ['mot17.half']
data_fields: ['image', 'gt_bbox', 'gt_class', 'gt_ide']
TestDataset:
!ImageFolder
anno_path: None
# detector configuration
architecture: YOLOv3
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/DarkNet53_pretrained.pdparams
......
_BASE_: [
'../../../picodet/picodet_l_640_coco.yml',
]
weights: output/picodet_l_esnet_300e_896x896_mot17half/model_final
num_classes: 1
TrainDataset:
!COCODataSet
dataset_dir: dataset/mot/MOT17/images
anno_path: annotations/train_half.json
image_dir: train
data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
EvalDataset:
!COCODataSet
dataset_dir: dataset/mot/MOT17/images
anno_path: annotations/val_half.json
image_dir: train
worker_num: 6
TrainReader:
sample_transforms:
- Decode: {}
- RandomCrop: {}
- RandomFlip: {prob: 0.5}
- RandomDistort: {}
batch_transforms:
- BatchRandomResize: {target_size: [832, 864, 896, 928, 960], random_size: True, random_interp: True, keep_ratio: False}
- NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
- Permute: {}
batch_size: 32
shuffle: true
drop_last: true
collate_batch: false
EvalReader:
sample_transforms:
- Decode: {}
- Resize: {interp: 2, target_size: [896, 896], keep_ratio: False}
- NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
- Permute: {}
batch_transforms:
- PadBatch: {pad_to_stride: 32}
batch_size: 8
shuffle: false
# detector configuration
architecture: PicoDet
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ESNet_x1_25_pretrained.pdparams
find_unused_parameters: True
use_ema: true
cycle_epoch: 40
snapshot_epoch: 10
epoch: 250
PicoDet:
backbone: ESNet
neck: CSPPAN
head: PicoHead
ESNet:
scale: 1.25
feature_maps: [4, 11, 14]
act: hard_swish
channel_ratio: [0.875, 0.5, 1.0, 0.625, 0.5, 0.75, 0.625, 0.625, 0.5, 0.625, 1.0, 0.625, 0.75]
CSPPAN:
out_channels: 128
use_depthwise: True
num_csp_blocks: 1
num_features: 4
PicoHead:
conv_feat:
name: PicoFeat
feat_in: 128
feat_out: 128
num_convs: 4
num_fpn_stride: 4
norm_type: bn
share_cls_reg: False
fpn_stride: [8, 16, 32, 64]
feat_in_chan: 128
prior_prob: 0.01
reg_max: 7
cell_offset: 0.5
loss_class:
name: VarifocalLoss
use_sigmoid: True
iou_weighted: True
loss_weight: 1.0
loss_dfl:
name: DistributionFocalLoss
loss_weight: 0.25
loss_bbox:
name: GIoULoss
loss_weight: 2.0
assigner:
name: SimOTAAssigner
candidate_topk: 10
iou_weight: 6
nms:
name: MultiClassNMS
nms_top_k: 1000
keep_top_k: 100
score_threshold: 0.025
nms_threshold: 0.6
_BASE_: [
'../../../ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml',
]
weights: output/ppyolov2_r50vd_dcn_365e_640x640_mot17half/model_final
num_classes: 1
TrainDataset:
!COCODataSet
dataset_dir: dataset/mot/MOT17/images
anno_path: annotations/train_half.json
image_dir: train
data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
EvalDataset:
!COCODataSet
dataset_dir: dataset/mot/MOT17/images
anno_path: annotations/val_half.json
image_dir: train
# detector configuration
architecture: YOLOv3
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_pretrained.pdparams
norm_type: sync_bn
use_ema: true
ema_decay: 0.9998
YOLOv3:
backbone: ResNet
neck: PPYOLOPAN
yolo_head: YOLOv3Head
post_process: BBoxPostProcess
ResNet:
depth: 50
variant: d
return_idx: [1, 2, 3]
dcn_v2_stages: [3]
freeze_at: -1
freeze_norm: false
norm_decay: 0.
PPYOLOPAN:
drop_block: true
block_size: 3
keep_prob: 0.9
spp: true
YOLOv3Head:
anchors: [[10, 13], [16, 30], [33, 23],
[30, 61], [62, 45], [59, 119],
[116, 90], [156, 198], [373, 326]]
anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
loss: YOLOv3Loss
iou_aware: true
iou_aware_factor: 0.5
YOLOv3Loss:
ignore_thresh: 0.7
downsample: [32, 16, 8]
label_smooth: false
scale_x_y: 1.05
iou_loss: IouLoss
iou_aware_loss: IouAwareLoss
IouLoss:
loss_weight: 2.5
loss_square: true
IouAwareLoss:
loss_weight: 1.0
BBoxPostProcess:
decode:
name: YOLOBox
conf_thresh: 0.01
downsample_ratio: 32
clip_bbox: true
scale_x_y: 1.05
nms:
name: MatrixNMS
keep_top_k: 100
score_threshold: 0.01
post_threshold: 0.01
nms_top_k: -1
background_label: -1
architecture: DeepSORT
pretrain_weights: None
DeepSORT:
detector: YOLOv3 # General version
reid: PCBPyramid
tracker: DeepSORTTracker
PCBPyramid:
num_conv_out_channels: 128
num_classes: 751
DeepSORTTracker:
budget: 100
max_age: 70
n_init: 3
metric_type: cosine
matching_threshold: 0.2
max_iou_distance: 0.9
motion: KalmanFilter
# General version YOLOv3
# Using BBoxPostProcess and the bboxes output are scaled to the original image.
# This config is the same as '../../../pedestrian/pedestrian_yolov3_darknet.yml'.
_BASE_: [
'../../../yolov3/yolov3_darknet53_270e_coco.yml',
]
weights: https://paddledet.bj.bcebos.com/models/mot/deepsort/pedestrian_yolov3_darknet.pdparams
num_classes: 1
# This pedestrian training dataset used is not open temporarily.
# Only the trained yolov3 model is provided, but you can eval on MOT17 half val dataset.
TrainDataset:
!COCODataSet
dataset_dir: dataset/pedestrian
anno_path: annotations/instances_train2017.json
image_dir: train2017
data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
EvalDataset:
!COCODataSet
dataset_dir: dataset/mot/MOT17/images
anno_path: annotations/val_half.json
image_dir: train
# detector configuration
architecture: YOLOv3
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/DarkNet53_pretrained.pdparams
norm_type: sync_bn
YOLOv3:
backbone: DarkNet
neck: YOLOv3FPN
yolo_head: YOLOv3Head
post_process: BBoxPostProcess
norm_type: sync_bn
DarkNet:
depth: 53
return_idx: [2, 3, 4]
......@@ -44,6 +46,11 @@ YOLOv3Head:
anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
loss: YOLOv3Loss
YOLOv3Loss:
ignore_thresh: 0.7
downsample: [32, 16, 8]
label_smooth: false
BBoxPostProcess:
decode:
name: YOLOBox
......
English | [简体中文](README_cn.md)
# ReID of DeepSORT
## Introduction
[DeepSORT](https://arxiv.org/abs/1812.00442)(Deep Cosine Metric Learning SORT) is composed of detector and ReID model in series. Several common ReID models are provided here for the configs of DeepSORT as a reference.
## Model Zoo
### Results on Market1501 pedestrian ReID dataset
| Backbone | Model | Params | FPS | mAP | Top1 | Top5 | download | config |
| :-------------: | :-----------------: | :-------: | :------: | :-------: | :-------: | :-------: | :-------: | :-------: |
| ResNet-101 | PCB Pyramid Embedding | 289M | --- | 86.31 | 94.95 | 98.28 | [download](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams) | [config](./deepsort_pcb_pyramid_r101.yml) |
| PPLCNet-2.5x | PPLCNet Embedding | 36M | --- | 71.59 | 87.38 | 95.49 | [download](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams) | [config](./deepsort_pplcnet.yml) |
### Results on VERI-Wild vehicle ReID dataset
| Backbone | Model | Params | FPS | mAP | Top1 | Top5 | download | config |
| :-------------: | :-----------------: | :-------: | :------: | :-------: | :-------: | :-------: | :-------: | :-------: |
| PPLCNet-2.5x | PPLCNet Embedding | 93M | --- | 82.44 | 93.54 | 98.53 | [download](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet_vehicle.pdparams) | [config](./deepsort_pplcnet_vehicle.yml) |
**Notes:**
- ReID models are provided by [PaddleClas](https://github.com/PaddlePaddle/PaddleClas), the specific training process and code will be published by PaddleClas.
- For pedestrian tracking, please use the **Market1501** pedestrian ReID model in combination with a pedestrian detector.
- For vehicle tracking, please use the **VERI-Wild** vehicle ReID model in combination with a vehicle detector.
简体中文 | [English](README.md)
# DeepSORT的ReID模型
## 简介
[DeepSORT](https://arxiv.org/abs/1812.00442)(Deep Cosine Metric Learning SORT) 由检测器和ReID模型串联组合而成,此处提供了几个常用ReID模型的配置作为DeepSORT使用的参考。
## 模型库
### 在Market1501行人重识别数据集上的结果
| 骨架网络 | 网络类型 | Params | FPS | mAP | Top1 | Top5 | 下载链接 | 配置文件 |
| :-------------: | :-----------------: | :-------: | :------: | :-------: | :-------: | :-------: | :-------: | :-------: |
| ResNet-101 | PCB Pyramid Embedding | 289M | --- | 86.31 | 94.95 | 98.28 | [下载链接](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams) | [配置文件](./deepsort_pcb_pyramid_r101.yml) |
| PPLCNet-2.5x | PPLCNet Embedding | 36M | --- | 71.59 | 87.38 | 95.49 | [下载链接](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams) | [配置文件](./deepsort_pplcnet.yml) |
### 在VERI-Wild车辆重识别数据集上的结果
| 骨架网络 | 网络类型 | Params | FPS | mAP | Top1 | Top5 | 下载链接 | 配置文件 |
| :-------------: | :-----------------: | :-------: | :------: | :-------: | :-------: | :-------: | :-------: | :-------: |
| PPLCNet-2.5x | PPLCNet Embedding | 93M | --- | 82.44 | 93.54 | 98.53 | [下载链接](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet_vehicle.pdparams) | [配置文件](./deepsort_pplcnet_vehicle.yml) |
**注意:**
- ReID模型由[PaddleClas](https://github.com/PaddlePaddle/PaddleClas)提供,具体训练流程和代码待PaddleClas公布.
- 行人跟踪请用**Market1501**行人重识别数据集训练的ReID模型结合行人检测器去使用。
- 车辆跟踪请用**VERI-Wild**车辆重识别数据集训练的ReID模型结合车辆检测器去使用。
# This config represents a ReID only configuration of DeepSORT, it has two uses.
# One is used for loading the detection results and ReID model to get tracking results;
# Another is used for exporting the ReID model to deploy infer.
_BASE_: [
'../../../datasets/mot.yml',
'../../../runtime.yml',
'../_base_/deepsort_reader_1088x608.yml',
]
EvalMOTDataset:
!MOTImageFolder
dataset_dir: dataset/mot
data_root: MOT16/images/train
keep_ori_im: True # set as True in DeepSORT
det_weights: None
reid_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams
# A ReID only configuration of DeepSORT, detector should be None.
architecture: DeepSORT
pretrain_weights: None
DeepSORT:
detector: None
reid: PCBPyramid
tracker: DeepSORTTracker
PCBPyramid:
num_conv_out_channels: 128
num_classes: 751 # default 751 classes in Market-1501 dataset.
DeepSORTTracker:
input_size: [64, 192]
min_box_area: 0 # 0 means no need to filter out too small boxes
vertical_ratio: -1 # -1 means no need to filter out bboxes, usuallly set 1.6 for pedestrian
budget: 100
max_age: 70
n_init: 3
metric_type: cosine
matching_threshold: 0.2
max_iou_distance: 0.9
motion: KalmanFilter
# This config represents a ReID only configuration of DeepSORT, it has two uses.
# One is used for loading the detection results and ReID model to get tracking results;
# Another is used for exporting the ReID model to deploy infer.
_BASE_: [
'../../../datasets/mot.yml',
'../../../runtime.yml',
'../_base_/deepsort_reader_1088x608.yml',
]
EvalMOTDataset:
!MOTImageFolder
dataset_dir: dataset/mot
data_root: MOT16/images/train
keep_ori_im: True # set as True in DeepSORT
det_weights: None
reid_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort_pplcnet.pdparams
# A ReID only configuration of DeepSORT, detector should be None.
architecture: DeepSORT
pretrain_weights: None
DeepSORT:
detector: None
reid: PPLCNetEmbedding
tracker: DeepSORTTracker
PPLCNetEmbedding:
input_ch: 1280
output_ch: 512
DeepSORTTracker:
input_size: [64, 192]
min_box_area: 0 # filter out too small boxes
vertical_ratio: -1 # filter out bboxes, usuallly set 1.6 for pedestrian
budget: 100
max_age: 70
n_init: 3
metric_type: cosine
matching_threshold: 0.2
max_iou_distance: 0.9
motion: KalmanFilter
# This config represents a ReID only configuration of DeepSORT, it has two uses.
# One is used for loading the detection results and ReID model to get tracking results;
# Another is used for exporting the ReID model to deploy infer.
_BASE_: [
'../../../datasets/mot.yml',
'../../../runtime.yml',
'../_base_/deepsort_reader_1088x608.yml',
]
EvalMOTDataset:
!MOTImageFolder
dataset_dir: dataset/mot
data_root: kitti_vehicle/images/train
keep_ori_im: True # set as True in DeepSORT
det_weights: None
reid_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort_pplcnet_vehicle.pdparams
# A ReID only configuration of DeepSORT, detector should be None.
architecture: DeepSORT
pretrain_weights: None
DeepSORT:
detector: None
reid: PPLCNetEmbedding
tracker: DeepSORTTracker
PPLCNetEmbedding:
input_ch: 1280
output_ch: 512
DeepSORTTracker:
input_size: [64, 192]
min_box_area: 0 # 0 means no need to filter out too small boxes
vertical_ratio: -1 # -1 means no need to filter out bboxes, usuallly set 1.6 for pedestrian
budget: 100
max_age: 70
n_init: 3
metric_type: cosine
matching_threshold: 0.2
max_iou_distance: 0.9
motion: KalmanFilter
......@@ -23,7 +23,6 @@ from preprocess import preprocess
from tracker import DeepSORTTracker
from ppdet.modeling.mot import visualization as mot_vis
from ppdet.modeling.mot.utils import Timer as MOTTimer
from ppdet.modeling.mot.utils import Detection
from paddle.inference import Config
from paddle.inference import create_predictor
......@@ -71,7 +70,11 @@ def clip_box(xyxy, input_shape, im_shape, scale_factor):
img0_shape = [int(im_shape[0] / ratio), int(im_shape[1] / ratio)]
xyxy[:, 0::2] = np.clip(xyxy[:, 0::2], a_min=0, a_max=img0_shape[1])
xyxy[:, 1::2] = np.clip(xyxy[:, 1::2], a_min=0, a_max=img0_shape[0])
return xyxy
w = xyxy[:, 2:3] - xyxy[:, 0:1]
h = xyxy[:, 3:4] - xyxy[:, 1:2]
mask = np.logical_and(h > 0, w > 0)
keep_idx = np.nonzero(mask)
return xyxy[keep_idx[0]], keep_idx
def preprocess_reid(imgs,
......@@ -137,19 +140,33 @@ class SDE_Detector(Detector):
def postprocess(self, boxes, input_shape, im_shape, scale_factor, threshold,
scaled):
over_thres_idx = np.nonzero(boxes[:, 1:2] >= threshold)[0]
if len(over_thres_idx) == 0:
pred_dets = np.zeros((1, 6), dtype=np.float32)
pred_xyxys = np.zeros((1, 4), dtype=np.float32)
return pred_dets, pred_xyxys
if not scaled:
# postprocess output of jde yolov3 detector
# scaled means whether the coords after detector outputs
# have been scaled back to the original image, set True
# in general detector, set False in JDE YOLOv3.
pred_bboxes = scale_coords(boxes[:, 2:], input_shape, im_shape,
scale_factor)
pred_bboxes = clip_box(pred_bboxes, input_shape, im_shape,
scale_factor)
else:
# postprocess output of general detector
pred_bboxes = boxes[:, 2:]
pred_scores = boxes[:, 1:2]
keep_mask = pred_scores[:, 0] >= threshold
return pred_bboxes[keep_mask], pred_scores[keep_mask]
pred_xyxys, keep_idx = clip_box(pred_bboxes, input_shape, im_shape,
scale_factor)
pred_scores = boxes[:, 1:2][keep_idx[0]]
pred_cls_ids = boxes[:, 0:1][keep_idx[0]]
pred_tlwhs = np.concatenate(
(pred_xyxys[:, 0:2], pred_xyxys[:, 2:4] - pred_xyxys[:, 0:2] + 1),
axis=1)
pred_dets = np.concatenate(
(pred_tlwhs, pred_scores, pred_cls_ids), axis=1)
return pred_dets[over_thres_idx], pred_xyxys[over_thres_idx]
def predict(self, image, scaled, threshold=0.5, warmup=0, repeats=1):
'''
......@@ -159,13 +176,12 @@ class SDE_Detector(Detector):
scaled (bool): whether the coords after detector outputs are scaled,
default False in jde yolov3, set True in general detector.
Returns:
pred_bboxes, pred_scores (np.ndarray)
pred_dets (np.ndarray, [N, 6])
'''
self.det_times.preprocess_time_s.start()
inputs = self.preprocess(image)
self.det_times.preprocess_time_s.end()
pred_bboxes, pred_scores = None, None
input_names = self.predictor.get_input_names()
for i in range(len(input_names)):
input_tensor = self.predictor.get_input_handle(input_names[i])
......@@ -186,14 +202,20 @@ class SDE_Detector(Detector):
self.det_times.inference_time_s.end(repeats=repeats)
self.det_times.postprocess_time_s.start()
input_shape = inputs['image'].shape[2:]
im_shape = inputs['im_shape']
scale_factor = inputs['scale_factor']
pred_bboxes, pred_scores = self.postprocess(
boxes, input_shape, im_shape, scale_factor, threshold, scaled)
if len(boxes) == 0:
pred_dets = np.zeros((1, 6), dtype=np.float32)
pred_xyxys = np.zeros((1, 4), dtype=np.float32)
else:
input_shape = inputs['image'].shape[2:]
im_shape = inputs['im_shape']
scale_factor = inputs['scale_factor']
pred_dets, pred_xyxys = self.postprocess(
boxes, input_shape, im_shape, scale_factor, threshold, scaled)
self.det_times.postprocess_time_s.end()
self.det_times.img_num += 1
return pred_bboxes, pred_scores
return pred_dets, pred_xyxys
class SDE_ReID(object):
......@@ -227,34 +249,57 @@ class SDE_ReID(object):
self.cpu_mem, self.gpu_mem, self.gpu_util = 0, 0, 0
self.batch_size = batch_size
assert pred_config.tracker, "Tracking model should have tracker"
self.tracker = DeepSORTTracker()
pt = pred_config.tracker
max_age = pt['max_age'] if 'max_age' in pt else 30
max_iou_distance = pt[
'max_iou_distance'] if 'max_iou_distance' in pt else 0.7
self.tracker = DeepSORTTracker(
max_age=max_age, max_iou_distance=max_iou_distance)
def get_crops(self, xyxy, ori_img):
w, h = self.tracker.input_size
self.det_times.preprocess_time_s.start()
crops = []
xyxy = xyxy.astype(np.int64)
ori_img = ori_img.transpose(1, 0, 2) # [h,w,3]->[w,h,3]
for i, bbox in enumerate(xyxy):
crop = ori_img[bbox[0]:bbox[2], bbox[1]:bbox[3], :]
crops.append(crop)
crops = preprocess_reid(crops, w, h)
self.det_times.preprocess_time_s.end()
return crops
def preprocess(self, crops):
# to keep fast speed, only use topk crops
crops = crops[:self.batch_size]
inputs = {}
inputs['crops'] = np.array(crops).astype('float32')
return inputs
def postprocess(self, bbox_tlwh, pred_scores, features):
detections = [
Detection(tlwh, score, feat)
for tlwh, score, feat in zip(bbox_tlwh, pred_scores, features)
]
self.tracker.predict()
online_targets = self.tracker.update(detections)
online_tlwhs = []
online_scores = []
online_ids = []
for track in online_targets:
if not track.is_confirmed() or track.time_since_update > 1:
def postprocess(self, pred_dets, pred_embs):
tracker = self.tracker
tracker.predict()
online_targets = tracker.update(pred_dets, pred_embs)
online_tlwhs, online_scores, online_ids = [], [], []
for t in online_targets:
if not t.is_confirmed() or t.time_since_update > 1:
continue
online_tlwhs.append(track.to_tlwh())
online_scores.append(1.0)
online_ids.append(track.track_id)
tlwh = t.to_tlwh()
tscore = t.score
tid = t.track_id
if tlwh[2] * tlwh[3] <= tracker.min_box_area: continue
if tracker.vertical_ratio > 0 and tlwh[2] / tlwh[
3] > tracker.vertical_ratio:
continue
online_tlwhs.append(tlwh)
online_scores.append(tscore)
online_ids.append(tid)
return online_tlwhs, online_scores, online_ids
def predict(self, crops, bbox_tlwh, pred_scores, warmup=0, repeats=1):
def predict(self, crops, pred_dets, warmup=0, repeats=1):
self.det_times.preprocess_time_s.start()
inputs = self.preprocess(crops)
self.det_times.preprocess_time_s.end()
......@@ -268,49 +313,31 @@ class SDE_ReID(object):
self.predictor.run()
output_names = self.predictor.get_output_names()
feature_tensor = self.predictor.get_output_handle(output_names[0])
features = feature_tensor.copy_to_cpu()
pred_embs = feature_tensor.copy_to_cpu()
self.det_times.inference_time_s.start()
for i in range(repeats):
self.predictor.run()
output_names = self.predictor.get_output_names()
feature_tensor = self.predictor.get_output_handle(output_names[0])
features = feature_tensor.copy_to_cpu()
pred_embs = feature_tensor.copy_to_cpu()
self.det_times.inference_time_s.end(repeats=repeats)
self.det_times.postprocess_time_s.start()
online_tlwhs, online_scores, online_ids = self.postprocess(
bbox_tlwh, pred_scores, features)
online_tlwhs, online_scores, online_ids = self.postprocess(pred_dets,
pred_embs)
self.det_times.postprocess_time_s.end()
self.det_times.img_num += 1
return online_tlwhs, online_scores, online_ids
def get_crops(self, xyxy, ori_img, pred_scores, w, h):
self.det_times.preprocess_time_s.start()
crops = []
keep_scores = []
xyxy = xyxy.astype(np.int64)
ori_img = ori_img.transpose(1, 0, 2) # [h,w,3]->[w,h,3]
for i, bbox in enumerate(xyxy):
if bbox[2] <= bbox[0] or bbox[3] <= bbox[1]:
continue
crop = ori_img[bbox[0]:bbox[2], bbox[1]:bbox[3], :]
crops.append(crop)
keep_scores.append(pred_scores[i])
if len(crops) == 0:
return [], []
crops = preprocess_reid(crops, w, h)
self.det_times.preprocess_time_s.end()
return crops, keep_scores
return online_tlwhs, online_scores, online_ids
def predict_image(detector, reid_model, image_list):
results = []
image_list.sort()
for i, img_file in enumerate(image_list):
frame = cv2.imread(img_file)
if FLAGS.run_benchmark:
pred_bboxes, pred_scores = detector.predict(
pred_dets, pred_xyxys = detector.predict(
[frame], FLAGS.scaled, FLAGS.threshold, warmup=10, repeats=10)
cm, gm, gu = get_current_memory_mb()
detector.cpu_mem += cm
......@@ -318,34 +345,33 @@ def predict_image(detector, reid_model, image_list):
detector.gpu_util += gu
print('Test iter {}, file name:{}'.format(i, img_file))
else:
pred_bboxes, pred_scores = detector.predict([frame], FLAGS.scaled,
FLAGS.threshold)
pred_dets, pred_xyxys = detector.predict([frame], FLAGS.scaled,
FLAGS.threshold)
# process
bbox_tlwh = np.concatenate(
(pred_bboxes[:, 0:2],
pred_bboxes[:, 2:4] - pred_bboxes[:, 0:2] + 1),
axis=1)
crops, pred_scores = reid_model.get_crops(
pred_bboxes, frame, pred_scores, w=64, h=192)
if len(crops) == 0:
continue
if FLAGS.run_benchmark:
online_tlwhs, online_scores, online_ids = reid_model.predict(
crops, bbox_tlwh, pred_scores, warmup=10, repeats=10)
if len(pred_dets) == 1 and sum(pred_dets) == 0:
print('Frame {} has no object, try to modify score threshold.'.
format(i))
online_im = frame
else:
online_tlwhs, online_scores, online_ids = reid_model.predict(
crops, bbox_tlwh, pred_scores)
online_im = mot_vis.plot_tracking(
frame, online_tlwhs, online_ids, online_scores, frame_id=i)
# reid process
crops = reid_model.get_crops(pred_xyxys, frame)
if FLAGS.run_benchmark:
online_tlwhs, online_scores, online_ids = reid_model.predict(
crops, pred_dets, warmup=10, repeats=10)
else:
online_tlwhs, online_scores, online_ids = reid_model.predict(
crops, pred_dets)
online_im = mot_vis.plot_tracking(
frame, online_tlwhs, online_ids, online_scores, frame_id=i)
if FLAGS.save_images:
if not os.path.exists(FLAGS.output_dir):
os.makedirs(FLAGS.output_dir)
img_name = os.path.split(img_file)[-1]
out_path = os.path.join(FLAGS.output_dir, img_name)
cv2.imwrite(out_path, online_im)
print("save result to: " + out_path)
if FLAGS.save_images:
if not os.path.exists(FLAGS.output_dir):
os.makedirs(FLAGS.output_dir)
img_name = os.path.split(img_file)[-1]
out_path = os.path.join(FLAGS.output_dir, img_name)
cv2.imwrite(out_path, online_im)
print("save result to: " + out_path)
def predict_video(detector, reid_model, camera_id):
......@@ -376,29 +402,32 @@ def predict_video(detector, reid_model, camera_id):
if not ret:
break
timer.tic()
pred_bboxes, pred_scores = detector.predict([frame], FLAGS.scaled,
FLAGS.threshold)
timer.toc()
bbox_tlwh = np.concatenate(
(pred_bboxes[:, 0:2],
pred_bboxes[:, 2:4] - pred_bboxes[:, 0:2] + 1),
axis=1)
crops, pred_scores = reid_model.get_crops(
pred_bboxes, frame, pred_scores, w=64, h=192)
if len(crops) == 0:
continue
online_tlwhs, online_scores, online_ids = reid_model.predict(
crops, bbox_tlwh, pred_scores)
results.append((frame_id + 1, online_tlwhs, online_scores, online_ids))
fps = 1. / timer.average_time
im = mot_vis.plot_tracking(
frame,
online_tlwhs,
online_ids,
online_scores,
frame_id=frame_id,
fps=fps)
pred_dets, pred_xyxys = detector.predict([frame], FLAGS.scaled,
FLAGS.threshold)
if len(pred_dets) == 1 and sum(pred_dets) == 0:
print('Frame {} has no object, try to modify score threshold.'.
format(frame_id))
timer.toc()
im = frame
else:
# reid process
crops = reid_model.get_crops(pred_xyxys, frame)
online_tlwhs, online_scores, online_ids = reid_model.predict(
crops, pred_dets)
results.append(
(frame_id + 1, online_tlwhs, online_scores, online_ids))
timer.toc()
fps = 1. / timer.average_time
im = mot_vis.plot_tracking(
frame,
online_tlwhs,
online_ids,
online_scores,
frame_id=frame_id,
fps=fps)
if FLAGS.save_images:
save_dir = os.path.join(FLAGS.output_dir, video_name.split('.')[-2])
if not os.path.exists(save_dir):
......@@ -417,19 +446,22 @@ def predict_video(detector, reid_model, camera_id):
# First few frames, the model may have no tracking results but have
# detection results,use the detection results instead, and set id -1.
if results[-1][2] == []:
tlwhs = [tlwh for tlwh in bbox_tlwh]
scores = [score[0] for score in pred_scores]
result = (frame_id + 1, tlwhs, scores, [-1] * len(tlwhs))
tlwhs = [tlwh for tlwh in pred_dets[:, :4]]
scores = [score[0] for score in pred_dets[:, 4:5]]
ids = [-1] * len(tlwhs)
result = (frame_id + 1, tlwhs, scores, ids)
else:
result = results[-1]
write_mot_results(result_filename, [result])
frame_id += 1
print('detect frame: %d' % (frame_id))
print('detect frame:%d' % (frame_id))
if camera_id != -1:
cv2.imshow('Tracking Detection', im)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
if FLAGS.save_mot_txts:
result_filename = os.path.join(FLAGS.output_dir,
video_name.split('.')[-2] + '.txt')
......
......@@ -20,6 +20,7 @@ from ppdet.modeling.mot.motion import KalmanFilter
from ppdet.modeling.mot.matching.deepsort_matching import NearestNeighborDistanceMetric
from ppdet.modeling.mot.matching.deepsort_matching import iou_cost, min_cost_matching, matching_cascade, gate_cost_matrix
from ppdet.modeling.mot.tracker.base_sde_tracker import Track
from ppdet.modeling.mot.utils import Detection
__all__ = ['DeepSORTTracker']
......@@ -29,7 +30,12 @@ class DeepSORTTracker(object):
DeepSORT tracker
Args:
img_size (list): input image size, [h, w]
input_size (list): input feature map size to reid model, [h, w] format,
[64, 192] as default.
min_box_area (int): min box area to filter out low quality boxes
vertical_ratio (float): w/h, the vertical ratio of the bbox to filter
bad results, set 1.6 default for pedestrian tracking. If set <=0
means no need to filter bboxes.
budget (int): If not None, fix samples per class to at most this number.
Removes the oldest samples when the budget is reached.
max_age (int): maximum number of missed misses before a track is deleted
......@@ -46,15 +52,19 @@ class DeepSORTTracker(object):
"""
def __init__(self,
img_size=[608, 1088],
input_size=[64, 192],
min_box_area=0,
vertical_ratio=-1,
budget=100,
max_age=30,
max_age=70,
n_init=3,
metric_type='cosine',
matching_threshold=0.2,
max_iou_distance=0.7,
max_iou_distance=0.9,
motion='KalmanFilter'):
self.img_size = img_size
self.input_size = input_size
self.min_box_area = min_box_area
self.vertical_ratio = vertical_ratio
self.max_age = max_age
self.n_init = n_init
self.metric = NearestNeighborDistanceMetric(metric_type,
......@@ -73,13 +83,23 @@ class DeepSORTTracker(object):
for track in self.tracks:
track.predict(self.motion)
def update(self, detections):
def update(self, pred_dets, pred_embs):
"""
Perform measurement update and track management.
Args:
detections (list): List[ppdet.modeling.mot.utils.Detection]
A list of detections at the current time step.
pred_dets (Tensor): Detection results of the image, shape is [N, 6].
pred_embs (Tensor): Embedding results of the image, shape is [N, 128],
usually pred_embs.shape[1] can be a multiple of 128, in PCB
Pyramidal model is 128*21.
"""
pred_tlwhs = pred_dets[:, :4]
pred_scores = pred_dets[:, 4:5]
pred_cls_ids = pred_dets[:, 5:]
detections = [
Detection(tlwh, score, feat, cls_id)
for tlwh, score, feat, cls_id in zip(pred_tlwhs, pred_scores,
pred_embs, pred_cls_ids)
]
# Run matching cascade.
matches, unmatched_tracks, unmatched_detections = \
self._match(detections)
......@@ -154,5 +174,5 @@ class DeepSORTTracker(object):
mean, covariance = self.motion.initiate(detection.to_xyah())
self.tracks.append(
Track(mean, covariance, self._next_id, self.n_init, self.max_age,
detection.feature))
detection.cls_id, detection.score, detection.feature))
self._next_id += 1
......@@ -137,8 +137,7 @@ class Tracker(object):
pred_dets, pred_embs = self.model(data)
online_targets = self.model.tracker.update(pred_dets, pred_embs)
online_tlwhs, online_ids = [], []
online_scores = []
online_tlwhs, online_scores, online_ids = [], [], []
for t in online_targets:
tlwh = t.tlwh
tid = t.track_id
......@@ -173,7 +172,6 @@ class Tracker(object):
draw_threshold=0):
if save_dir:
if not os.path.exists(save_dir): os.makedirs(save_dir)
tracker = self.model.tracker
use_detector = False if not self.model.detector else True
timer = Timer()
......@@ -197,65 +195,90 @@ class Tracker(object):
input_shape = data['image'].shape[2:]
im_shape = data['im_shape']
scale_factor = data['scale_factor']
# forward
timer.tic()
if not use_detector:
dets = dets_list[frame_id]
bbox_tlwh = paddle.to_tensor(dets['bbox'], dtype='float32')
pred_scores = paddle.to_tensor(dets['score'], dtype='float32')
if pred_scores < draw_threshold: continue
if bbox_tlwh.shape[0] > 0:
# detector outputs: pred_cls_ids, pred_scores, pred_bboxes
pred_cls_ids = paddle.to_tensor(
dets['cls_id'], dtype='float32').unsqueeze(1)
pred_scores = paddle.to_tensor(
dets['score'], dtype='float32').unsqueeze(1)
pred_bboxes = paddle.concat(
(bbox_tlwh[:, 0:2],
bbox_tlwh[:, 2:4] + bbox_tlwh[:, 0:2]),
axis=1)
else:
pred_bboxes = []
pred_scores = []
logger.warning(
'Frame {} has not object, try to modify score threshold.'.
format(frame_id))
frame_id += 1
continue
else:
outs = self.model.detector(data)
if outs['bbox_num'] > 0:
# detector outputs: pred_cls_ids, pred_scores, pred_bboxes
pred_cls_ids = outs['bbox'][:, 0:1]
pred_scores = outs['bbox'][:, 1:2]
if not scaled:
# scaled means whether the coords after detector outputs
# have been scaled back to the original image, set True
# in general detector, set False in JDE YOLOv3.
pred_bboxes = scale_coords(outs['bbox'][:, 2:],
input_shape, im_shape,
scale_factor)
else:
pred_bboxes = outs['bbox'][:, 2:]
pred_scores = outs['bbox'][:, 1:2]
else:
pred_bboxes = []
pred_scores = []
pred_bboxes = clip_box(pred_bboxes, input_shape, im_shape,
scale_factor)
bbox_tlwh = paddle.concat(
(pred_bboxes[:, 0:2],
pred_bboxes[:, 2:4] - pred_bboxes[:, 0:2] + 1),
axis=1)
logger.warning(
'Frame {} has not object, try to modify score threshold.'.
format(frame_id))
frame_id += 1
continue
crops, pred_scores = get_crops(
pred_bboxes, ori_image, pred_scores, w=64, h=192)
pred_xyxys, keep_idx = clip_box(pred_bboxes, input_shape, im_shape,
scale_factor)
pred_scores = paddle.gather_nd(pred_scores, keep_idx).unsqueeze(1)
pred_cls_ids = paddle.gather_nd(pred_cls_ids, keep_idx).unsqueeze(1)
pred_tlwhs = paddle.concat(
(pred_xyxys[:, 0:2],
pred_xyxys[:, 2:4] - pred_xyxys[:, 0:2] + 1),
axis=1)
pred_dets = paddle.concat(
(pred_tlwhs, pred_scores, pred_cls_ids), axis=1)
tracker = self.model.tracker
crops = get_crops(
pred_xyxys,
ori_image,
w=tracker.input_size[0],
h=tracker.input_size[1])
crops = paddle.to_tensor(crops)
pred_scores = paddle.to_tensor(pred_scores)
data.update({'crops': crops})
features = self.model(data)
features = features.numpy()
detections = [
Detection(tlwh, score, feat)
for tlwh, score, feat in zip(bbox_tlwh, pred_scores, features)
]
self.model.tracker.predict()
online_targets = self.model.tracker.update(detections)
online_tlwhs = []
online_scores = []
online_ids = []
for track in online_targets:
if not track.is_confirmed() or track.time_since_update > 1:
pred_embs = self.model(data)
tracker.predict()
online_targets = tracker.update(pred_dets, pred_embs)
online_tlwhs, online_scores, online_ids = [], [], []
for t in online_targets:
if not t.is_confirmed() or t.time_since_update > 1:
continue
online_tlwhs.append(track.to_tlwh())
online_scores.append(1.0)
online_ids.append(track.track_id)
tlwh = t.to_tlwh()
tscore = t.score
tid = t.track_id
if tscore < draw_threshold: continue
if tlwh[2] * tlwh[3] <= tracker.min_box_area: continue
if tracker.vertical_ratio > 0 and tlwh[2] / tlwh[
3] > tracker.vertical_ratio:
continue
online_tlwhs.append(tlwh)
online_scores.append(tscore)
online_ids.append(tid)
timer.toc()
# save results
......
......@@ -15,6 +15,7 @@
This code is borrow from https://github.com/nwojke/deep_sort/blob/master/deep_sort/track.py
"""
import datetime
from ppdet.core.workspace import register, serializable
__all__ = ['TrackState', 'Track']
......@@ -50,6 +51,8 @@ class Track(object):
`n_init` frames.
max_age (int): The maximum number of consecutive misses before the track
state is set to `Deleted`.
cls_id (int): The category id of the tracked box.
score (float): The confidence score of the tracked box.
feature (Optional[ndarray]): Feature vector of the detection this track
originates from. If not None, this feature is added to the `features` cache.
......@@ -69,6 +72,8 @@ class Track(object):
track_id,
n_init,
max_age,
cls_id,
score,
feature=None):
self.mean = mean
self.covariance = covariance
......@@ -76,6 +81,9 @@ class Track(object):
self.hits = 1
self.age = 1
self.time_since_update = 0
self.cls_id = cls_id
self.score = score
self.start_time = datetime.datetime.now()
self.state = TrackState.Tentative
self.features = []
......@@ -117,6 +125,8 @@ class Track(object):
self.covariance,
detection.to_xyah())
self.features.append(detection.feature)
self.cls_id = detection.cls_id
self.score = detection.score
self.hits += 1
self.time_since_update = 0
......
......@@ -20,6 +20,7 @@ import numpy as np
from ..matching.deepsort_matching import NearestNeighborDistanceMetric
from ..matching.deepsort_matching import iou_cost, min_cost_matching, matching_cascade, gate_cost_matrix
from .base_sde_tracker import Track
from ..utils import Detection
from ppdet.core.workspace import register, serializable
from ppdet.utils.logger import setup_logger
......@@ -36,7 +37,12 @@ class DeepSORTTracker(object):
DeepSORT tracker
Args:
img_size (list): input image size, [h, w]
input_size (list): input feature map size to reid model, [h, w] format,
[64, 192] as default.
min_box_area (int): min box area to filter out low quality boxes
vertical_ratio (float): w/h, the vertical ratio of the bbox to filter
bad results, set 1.6 default for pedestrian tracking. If set <=0
means no need to filter bboxes.
budget (int): If not None, fix samples per class to at most this number.
Removes the oldest samples when the budget is reached.
max_age (int): maximum number of missed misses before a track is deleted
......@@ -53,15 +59,19 @@ class DeepSORTTracker(object):
"""
def __init__(self,
img_size=[608, 1088],
input_size=[64, 192],
min_box_area=0,
vertical_ratio=-1,
budget=100,
max_age=30,
max_age=70,
n_init=3,
metric_type='cosine',
matching_threshold=0.2,
max_iou_distance=0.7,
max_iou_distance=0.9,
motion='KalmanFilter'):
self.img_size = img_size
self.input_size = input_size
self.min_box_area = min_box_area
self.vertical_ratio = vertical_ratio
self.max_age = max_age
self.n_init = n_init
self.metric = NearestNeighborDistanceMetric(metric_type,
......@@ -80,13 +90,25 @@ class DeepSORTTracker(object):
for track in self.tracks:
track.predict(self.motion)
def update(self, detections):
def update(self, pred_dets, pred_embs):
"""
Perform measurement update and track management.
Args:
detections (list): List[ppdet.modeling.mot.utils.Detection]
A list of detections at the current time step.
pred_dets (Tensor): Detection results of the image, shape is [N, 6].
pred_embs (Tensor): Embedding results of the image, shape is [N, 128],
usually pred_embs.shape[1] can be a multiple of 128, in PCB
Pyramidal model is 128*21.
"""
pred_tlwhs = pred_dets[:, :4]
pred_scores = pred_dets[:, 4:5].squeeze(1)
pred_cls_ids = pred_dets[:, 5:].squeeze(1)
detections = [
Detection(tlwh, score, feat, cls_id)
for tlwh, score, feat, cls_id in zip(pred_tlwhs, pred_scores,
pred_embs, pred_cls_ids)
]
# Run matching cascade.
matches, unmatched_tracks, unmatched_detections = \
self._match(detections)
......@@ -161,5 +183,5 @@ class DeepSORTTracker(object):
mean, covariance = self.motion.initiate(detection.to_xyah())
self.tracks.append(
Track(mean, covariance, self._next_id, self.n_init, self.max_age,
detection.feature))
detection.cls_id, detection.score, detection.feature))
self._next_id += 1
......@@ -72,17 +72,19 @@ class Detection(object):
This class represents a bounding box detection in a single image.
Args:
tlwh (ndarray): Bounding box in format `(top left x, top left y,
tlwh (Tensor): Bounding box in format `(top left x, top left y,
width, height)`.
confidence (ndarray): Detector confidence score.
score (Tensor): Bounding box confidence score.
feature (Tensor): A feature vector that describes the object
contained in this image.
cls_id (Tensor): Bounding box category id.
"""
def __init__(self, tlwh, confidence, feature):
def __init__(self, tlwh, score, feature, cls_id):
self.tlwh = np.asarray(tlwh, dtype=np.float32)
self.confidence = np.asarray(confidence, dtype=np.float32)
self.feature = feature
self.score = float(score)
self.feature = np.asarray(feature, dtype=np.float32)
self.cls_id = int(cls_id)
def to_tlbr(self):
"""
......@@ -106,15 +108,20 @@ class Detection(object):
def load_det_results(det_file, num_frames):
assert os.path.exists(det_file) and os.path.isfile(det_file), \
'Error: det_file: {} not exist or not a file.'.format(det_file)
'{} is not exist or not a file.'.format(det_file)
labels = np.loadtxt(det_file, dtype='float32', delimiter=',')
assert labels.shape[1] == 7, \
"Each line of {} should have 7 items: '[frame_id],[x0],[y0],[w],[h],[score],[class_id]'.".format(det_file)
results_list = []
for frame_i in range(0, num_frames):
results = {'bbox': [], 'score': []}
for frame_i in range(num_frames):
results = {'bbox': [], 'score': [], 'cls_id': []}
lables_with_frame = labels[labels[:, 0] == frame_i + 1]
# each line of lables_with_frame:
# [frame_id],[x0],[y0],[w],[h],[score],[class_id]
for l in lables_with_frame:
results['bbox'].append(l[1:5])
results['score'].append(l[5])
results['cls_id'].append(l[6])
results_list.append(results)
return results_list
......@@ -139,26 +146,24 @@ def clip_box(xyxy, input_shape, im_shape, scale_factor):
xyxy[:, 0::2] = paddle.clip(xyxy[:, 0::2], min=0, max=img0_shape[1])
xyxy[:, 1::2] = paddle.clip(xyxy[:, 1::2], min=0, max=img0_shape[0])
return xyxy
w = xyxy[:, 2:3] - xyxy[:, 0:1]
h = xyxy[:, 3:4] - xyxy[:, 1:2]
mask = paddle.logical_and(h > 0, w > 0)
keep_idx = paddle.nonzero(mask)
xyxy = paddle.gather_nd(xyxy, keep_idx[:, :1])
return xyxy, keep_idx
def get_crops(xyxy, ori_img, pred_scores, w, h):
def get_crops(xyxy, ori_img, w, h):
crops = []
keep_scores = []
xyxy = xyxy.numpy().astype(np.int64)
ori_img = ori_img.numpy()
ori_img = np.squeeze(ori_img, axis=0).transpose(1, 0, 2)
pred_scores = pred_scores.numpy()
for i, bbox in enumerate(xyxy):
if bbox[2] <= bbox[0] or bbox[3] <= bbox[1]:
continue
crop = ori_img[bbox[0]:bbox[2], bbox[1]:bbox[3], :]
crops.append(crop)
keep_scores.append(pred_scores[i])
if len(crops) == 0:
return [], []
crops = preprocess_reid(crops, w, h)
return crops, keep_scores
return crops
def preprocess_reid(imgs,
......
......@@ -13,11 +13,13 @@
# limitations under the License.
from . import jde_embedding_head
from . import pyramidal_embedding
from . import resnet
from . import fairmot_embedding_head
from . import resnet
from . import pyramidal_embedding
from . import pplcnet_embedding
from .fairmot_embedding_head import *
from .jde_embedding_head import *
from .pyramidal_embedding import *
from .resnet import *
from .fairmot_embedding_head import *
from .pyramidal_embedding import *
from .pplcnet_embedding import *
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
from paddle.nn.initializer import Normal, Constant
from paddle import ParamAttr
from paddle.nn import AdaptiveAvgPool2D, BatchNorm, Conv2D, Dropout, Linear
from paddle.regularizer import L2Decay
from paddle.nn.initializer import KaimingNormal
from ppdet.core.workspace import register
__all__ = ['PPLCNetEmbedding']
# Each element(list) represents a depthwise block, which is composed of k, in_c, out_c, s, use_se.
# k: kernel_size
# in_c: input channel number in depthwise block
# out_c: output channel number in depthwise block
# s: stride in depthwise block
# use_se: whether to use SE block
NET_CONFIG = {
"blocks2":
#k, in_c, out_c, s, use_se
[[3, 16, 32, 1, False]],
"blocks3": [[3, 32, 64, 2, False], [3, 64, 64, 1, False]],
"blocks4": [[3, 64, 128, 2, False], [3, 128, 128, 1, False]],
"blocks5": [[3, 128, 256, 2, False], [5, 256, 256, 1, False],
[5, 256, 256, 1, False], [5, 256, 256, 1, False],
[5, 256, 256, 1, False], [5, 256, 256, 1, False]],
"blocks6": [[5, 256, 512, 2, True], [5, 512, 512, 1, True]]
}
def make_divisible(v, divisor=8, min_value=None):
if min_value is None:
min_value = divisor
new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
if new_v < 0.9 * v:
new_v += divisor
return new_v
class ConvBNLayer(nn.Layer):
def __init__(self,
num_channels,
filter_size,
num_filters,
stride,
num_groups=1):
super().__init__()
self.conv = Conv2D(
in_channels=num_channels,
out_channels=num_filters,
kernel_size=filter_size,
stride=stride,
padding=(filter_size - 1) // 2,
groups=num_groups,
weight_attr=ParamAttr(initializer=KaimingNormal()),
bias_attr=False)
self.bn = BatchNorm(
num_filters,
param_attr=ParamAttr(regularizer=L2Decay(0.0)),
bias_attr=ParamAttr(regularizer=L2Decay(0.0)))
self.hardswish = nn.Hardswish()
def forward(self, x):
x = self.conv(x)
x = self.bn(x)
x = self.hardswish(x)
return x
class DepthwiseSeparable(nn.Layer):
def __init__(self,
num_channels,
num_filters,
stride,
dw_size=3,
use_se=False):
super().__init__()
self.use_se = use_se
self.dw_conv = ConvBNLayer(
num_channels=num_channels,
num_filters=num_channels,
filter_size=dw_size,
stride=stride,
num_groups=num_channels)
if use_se:
self.se = SEModule(num_channels)
self.pw_conv = ConvBNLayer(
num_channels=num_channels,
filter_size=1,
num_filters=num_filters,
stride=1)
def forward(self, x):
x = self.dw_conv(x)
if self.use_se:
x = self.se(x)
x = self.pw_conv(x)
return x
class SEModule(nn.Layer):
def __init__(self, channel, reduction=4):
super().__init__()
self.avg_pool = AdaptiveAvgPool2D(1)
self.conv1 = Conv2D(
in_channels=channel,
out_channels=channel // reduction,
kernel_size=1,
stride=1,
padding=0)
self.relu = nn.ReLU()
self.conv2 = Conv2D(
in_channels=channel // reduction,
out_channels=channel,
kernel_size=1,
stride=1,
padding=0)
self.hardsigmoid = nn.Hardsigmoid()
def forward(self, x):
identity = x
x = self.avg_pool(x)
x = self.conv1(x)
x = self.relu(x)
x = self.conv2(x)
x = self.hardsigmoid(x)
x = paddle.multiply(x=identity, y=x)
return x
class PPLCNet(nn.Layer):
"""
PP-LCNet, see https://arxiv.org/abs/2109.15099.
This code is different from PPLCNet in ppdet/modeling/backbones/lcnet.py
or in PaddleClas, because the output is the flatten feature of last_conv.
Args:
scale (float): Scale ratio of channels.
class_expand (int): Number of channels of conv feature.
"""
def __init__(self, scale=1.0, class_expand=1280):
super(PPLCNet, self).__init__()
self.scale = scale
self.class_expand = class_expand
self.conv1 = ConvBNLayer(
num_channels=3,
filter_size=3,
num_filters=make_divisible(16 * scale),
stride=2)
self.blocks2 = nn.Sequential(*[
DepthwiseSeparable(
num_channels=make_divisible(in_c * scale),
num_filters=make_divisible(out_c * scale),
dw_size=k,
stride=s,
use_se=se)
for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks2"])
])
self.blocks3 = nn.Sequential(*[
DepthwiseSeparable(
num_channels=make_divisible(in_c * scale),
num_filters=make_divisible(out_c * scale),
dw_size=k,
stride=s,
use_se=se)
for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks3"])
])
self.blocks4 = nn.Sequential(*[
DepthwiseSeparable(
num_channels=make_divisible(in_c * scale),
num_filters=make_divisible(out_c * scale),
dw_size=k,
stride=s,
use_se=se)
for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks4"])
])
self.blocks5 = nn.Sequential(*[
DepthwiseSeparable(
num_channels=make_divisible(in_c * scale),
num_filters=make_divisible(out_c * scale),
dw_size=k,
stride=s,
use_se=se)
for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks5"])
])
self.blocks6 = nn.Sequential(*[
DepthwiseSeparable(
num_channels=make_divisible(in_c * scale),
num_filters=make_divisible(out_c * scale),
dw_size=k,
stride=s,
use_se=se)
for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks6"])
])
self.avg_pool = AdaptiveAvgPool2D(1)
self.last_conv = Conv2D(
in_channels=make_divisible(NET_CONFIG["blocks6"][-1][2] * scale),
out_channels=self.class_expand,
kernel_size=1,
stride=1,
padding=0,
bias_attr=False)
self.hardswish = nn.Hardswish()
self.flatten = nn.Flatten(start_axis=1, stop_axis=-1)
def forward(self, x):
x = self.conv1(x)
x = self.blocks2(x)
x = self.blocks3(x)
x = self.blocks4(x)
x = self.blocks5(x)
x = self.blocks6(x)
x = self.avg_pool(x)
x = self.last_conv(x)
x = self.hardswish(x)
x = self.flatten(x)
return x
@register
class PPLCNetEmbedding(nn.Layer):
"""
PPLCNet Embedding
Args:
input_ch (int): Number of channels of input conv feature.
output_ch (int): Number of channels of output conv feature.
"""
def __init__(self, scale=2.5, input_ch=1280, output_ch=512):
super(PPLCNetEmbedding, self).__init__()
self.backbone = PPLCNet(scale=scale)
self.neck = nn.Linear(input_ch, output_ch)
def forward(self, x):
feat = self.backbone(x)
feat_out = self.neck(feat)
return feat_out
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册