未验证 提交 52006a4b 编写于 作者: G George Ni 提交者: GitHub

add mot16test doc, test=document_fix (#3222)

上级 86720c1f
...@@ -11,12 +11,19 @@ English | [简体中文](README_cn.md) ...@@ -11,12 +11,19 @@ English | [简体中文](README_cn.md)
- [Citations](#Citations) - [Citations](#Citations)
## Introduction ## Introduction
PaddleDetection implements three multi-object tracking methods. The current mainstream multi-objective tracking (MOT) algorithm is mainly composed of two parts: detection and embedding. Detection aims to detect the potential targets in each frame of the video. Embedding assigns and updates the detected target to the corresponding track (named ReID task). According to the different implementation of these two parts, it can be divided into **SDE** series and **JDE** series algorithm.
- [DeepSORT](https://arxiv.org/abs/1812.00442) (Deep Cosine Metric Learning SORT) extends the original [SORT](https://arxiv.org/abs/1703.07402) (Simple Online and Realtime Tracking) algorithm to integrate appearance information based on a deep appearance descriptor. It adds a CNN model to extract features in image of human part bounded by a detector. Here we use `JDE` as detection model to generate boxes, and select `PCBPyramid` as the ReID model. We also support loading the boxes from saved detection result files.
- [JDE](https://arxiv.org/abs/1909.12605) (Joint Detection and Embedding) is a fast and high-performance multiple-object tracker that learns the object detection task and appearance embedding task simutaneously in a shared neural network. - **SDE** (Separate Detection and Embedding) is a kind of algorithm which completely separates Detection and Embedding. The most representative is **DeepSORT** algorithm. This design can make the system fit any kind of detectors without difference, and can be improved for each part separately. However, due to the series process, the speed is slow. Time-consuming is a great challenge in the construction of real-time MOT system.
- [FairMOT](https://arxiv.org/abs/2004.01888) focuses on accomplishing the detection and re-identification in a single network to improve the inference speed, presents a simple baseline which consists of two homogeneous branches to predict pixel-wise objectness scores and re-ID features. The achieved fairness between the two tasks allows FairMOT to obtain high levels of detection and tracking accuracy. - **JDE** (Joint Detection and Embedding) is to learn detection and embedding simultaneously in a shared neural network, and set the loss function with a multi task learning approach. The representative algorithms are **JDE** and **FairMOT**. This design can achieve high-precision real-time MOT performance.
Paddledetection implements three MOT algorithms of these two series.
- [DeepSORT](https://arxiv.org/abs/1812.00442) (Deep Cosine Metric Learning SORT) extends the original [SORT](https://arxiv.org/abs/1703.07402) (Simple Online and Realtime Tracking) algorithm, it adds a CNN model to extract features in image of human part bounded by a detector. It integrates appearance information based on a deep appearance descriptor, and assigns and updates the detected targets to the existing corresponding trajectories like ReID task. The detection bboxes result required by DeepSORT can be generated by any detection model, and then the saved detection result file can be loaded for tracking. Here we select the `PCB + Pyramid ResNet101` model provided by [PaddleClas](https://github.com/PaddlePaddle/PaddleClas) as the ReID model.
- [JDE](https://arxiv.org/abs/1909.12605) (Joint Detection and Embedding) learns the object detection task and appearance embedding task simutaneously in a shared neural network. And the detection results and the corresponding embeddings are also outputed at the same time. JDE original paper is based on an Anchor Base detector YOLOv3 , adding a new ReID branch to learn embeddings. The training process is constructed as a multi-task learning problem, taking into account both accuracy and speed.
- [FairMOT](https://arxiv.org/abs/2004.01888) is based on an Anchor Free detector Centernet, which overcomes the problem of anchor and feature misalignment in anchor based detection framework. The fusion of deep and shallow features enables the detection and ReID tasks to obtain the required features respectively. It also uses low dimensional ReID features. FairMOT is a simple baseline composed of two homogeneous branches propose to predict the pixel level target score and ReID features. It achieves the fairness between the two tasks and obtains a higher level of real-time MOT performance.
<div align="center"> <div align="center">
<img src="../../docs/images/mot16_jde.gif" width=500 /> <img src="../../docs/images/mot16_jde.gif" width=500 />
...@@ -38,25 +45,20 @@ pip install -r requirements.txt ...@@ -38,25 +45,20 @@ pip install -r requirements.txt
## Model Zoo ## Model Zoo
### JDE on MOT-16 Training Set ### DeepSORT on MOT-16 Training Set
| backbone | input shape | MOTA | IDF1 | IDS | FP | FN | FPS | download | config |
| :----------------- | :------- | :----: | :----: | :---: | :----: | :---: | :---: | :---: | :---: |
| DarkNet53 | 1088x608 | 73.2 | 69.3 | 1351 | 6591 | 21625 | - |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.1/configs/mot/jde/jde_darknet53_30e_1088x608.yml) |
| DarkNet53 | 864x480 | 70.1 | 65.2 | 1328 | 6441 | 25187 | - |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_864x480.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.1/configs/mot/jde/jde_darknet53_30e_864x480.yml) |
| DarkNet53 | 576x320 | 63.2 | 64.5 | 1308 | 7011 | 32252 | - |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.1/configs/mot/jde/jde_darknet53_30e_576x320.yml) |
**Notes:** | backbone | input shape | MOTA | IDF1 | IDS | FP | FN | FPS | download| config |
JDE used 8 GPUs for training and mini-batch size as 4 on each GPU, and trained for 30 epoches. | :---------| :------- | :----: | :----: | :--: | :----: | :---: | :---: | :---: | :---: |
| ResNet-101 | 1088x608 | 72.2 | 60.5 | 998 | 8054 | 21644 | - | [download](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.1/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) |
### DeepSORT on MOT-16 Training Set ### DeepSORT on MOT-16 Test Set
| backbone | input shape | MOTA | IDF1 | IDS | FP | FN | FPS | Detector | ReID | config | | backbone | input shape | MOTA | IDF1 | IDS | FP | FN | FPS | download| config |
| :---------| :------- | :----: | :----: | :--: | :----: | :---: | :---: |:-----: | :-----: | :-----: | | :---------| :------- | :----: | :----: | :--: | :----: | :---: | :---: | :---: | :---: |
| DarkNet53 | 1088x608 | 72.2 | 60.5 | 998 | 8054 | 21644 | 5.07 |[JDE](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams)| [ReID](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.1/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) | | ResNet-101 | 1088x608 | 64.1 | 53.0 | 1024 | 12457 | 51919 | - | [download](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.1/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) |
**Notes:** **Notes:**
DeepSORT does not need to train, only used for evaluation. Before DeepSORT evaluation, you should get detection results by a detection model first, here we use JDE, and then prepare them like this: DeepSORT does not need to train on MOT dataset, only used for evaluation. Before DeepSORT evaluation, you should get detection results by a detection model first, and then prepare them like this:
``` ```
det_results_dir det_results_dir
|——————MOT16-02.txt |——————MOT16-02.txt
...@@ -67,18 +69,44 @@ det_results_dir ...@@ -67,18 +69,44 @@ det_results_dir
|——————MOT16-11.txt |——————MOT16-11.txt
|——————MOT16-13.txt |——————MOT16-13.txt
``` ```
For MOT16 dataset, you can download the det_results_dir.zip of one detection model provided by PaddleDetection:
```
wget https://dataset.bj.bcebos.com/mot/det_results_dir.zip
```
Each txt is the detection result of all the pictures extracted from each video, and each line describes a bounding box with the following format: Each txt is the detection result of all the pictures extracted from each video, and each line describes a bounding box with the following format:
``` ```
[frame_id][identity][bb_left][bb_top][width][height][conf][x][y][z] [frame_id],[identity],[bb_left],[bb_top],[width],[height],[conf]
``` ```
**Notes:** **Notes:**
- `frame_id` is the frame number of the image - `frame_id` is the frame number of the image
- `identity` is the object id using default value `-1` - `identity` is the object id using default value `-1`
- `bb_left` is the X coordinate of the left bound of the object box - `bb_left` is the X coordinate of the left bound of the object box
- `bb_top` is the Y coordinate of the upper bound of the object box - `bb_top` is the Y coordinate of the upper bound of the object box
- `width, height` is the pixel width and height - `width,height` is the pixel width and height
- `conf` is the object score with default value `1` (the results had been filtered out according to the detection score threshold) - `conf` is the object score with default value `1` (the results had been filtered out according to the detection score threshold)
- `x,y,z` are used in 3D, default to `-1` in 2D.
### JDE on MOT-16 Training Set
| backbone | input shape | MOTA | IDF1 | IDS | FP | FN | FPS | download | config |
| :----------------- | :------- | :----: | :----: | :---: | :----: | :---: | :---: | :---: | :---: |
| DarkNet53 | 1088x608 | 72.0 | 66.9 | 1397 | 7274 | 22209 | - |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.1/configs/mot/jde/jde_darknet53_30e_1088x608.yml) |
| DarkNet53 | 864x480 | 69.1 | 64.7 | 1539 | 7544 | 25046 | - |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_864x480.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.1/configs/mot/jde/jde_darknet53_30e_864x480.yml) |
| DarkNet53 | 576x320 | 63.7 | 64.4 | 1310 | 6782 | 31964 | - |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.1/configs/mot/jde/jde_darknet53_30e_576x320.yml) |
### JDE on MOT-16 Test Set
| backbone | input shape | MOTA | IDF1 | IDS | FP | FN | FPS | download | config |
| :----------------- | :------- | :----: | :----: | :---: | :----: | :---: | :---: | :---: | :---: |
| DarkNet53(paper) | 1088x608 | 64.4 | 55.8 | 1544 | - | - | - | - | - |
| DarkNet53 | 1088x608 | 64.6 | 58.5 | 1864 | 10550 | 52088 | - |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.1/configs/mot/jde/jde_darknet53_30e_1088x608.yml) |
| DarkNet53(paper) | 864x480 | 62.1 | 56.9 | 1608 | - | - | - | - | - |
| DarkNet53 | 864x480 | 63.2 | 57.7 | 1966 | 10070 | 55081 | - |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_864x480.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.1/configs/mot/jde/jde_darknet53_30e_864x480.yml) |
| DarkNet53 | 576x320 | 59.1 | 56.4 | 1911 | 10923 | 61789 | - |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.1/configs/mot/jde/jde_darknet53_30e_576x320.yml) |
**Notes:**
JDE used 8 GPUs for training and mini-batch size as 4 on each GPU, and trained for 30 epoches.
### FairMOT Results on MOT-16 Training Set ### FairMOT Results on MOT-16 Training Set
...@@ -98,10 +126,11 @@ Each txt is the detection result of all the pictures extracted from each video, ...@@ -98,10 +126,11 @@ Each txt is the detection result of all the pictures extracted from each video,
**Notes:** **Notes:**
FairMOT used 8 GPUs for training and mini-batch size as 6 on each GPU, and trained for 30 epoches. FairMOT used 8 GPUs for training and mini-batch size as 6 on each GPU, and trained for 30 epoches.
## Dataset Preparation ## Dataset Preparation
### MOT Dataset ### MOT Dataset
PaddleDetection use the same training data as [JDE](https://github.com/Zhongdao/Towards-Realtime-MOT) and [FairMOT](https://github.com/ifzhang/FairMOT). Please refer to [PrepareMOTDataSet](../../docs/tutorials/PrepareMOTDataSet.md) to download and prepare all the training data including **Caltech Pedestrian, CityPersons, CUHK-SYSU, PRW, ETHZ, MOT17 and MOT16**. **MOT15 and MOT20** can also be downloaded from the official webpage of MOT challenge. If you want to use these datasets, please **follow their licenses**. PaddleDetection use the same training data as [JDE](https://github.com/Zhongdao/Towards-Realtime-MOT) and [FairMOT](https://github.com/ifzhang/FairMOT). Please refer to [PrepareMOTDataSet](../../docs/tutorials/PrepareMOTDataSet.md) to download and prepare all the training data including **Caltech Pedestrian, CityPersons, CUHK-SYSU, PRW, ETHZ, MOT17 and MOT16**. The former six are used as the mixed dataset for training, and MOT16 are used as the evaluation dataset. In addition, you can use **MOT15 and MOT20** for finetune. All pedestrians in these datasets have detection bbox labels and some have ID labels. If you want to use these datasets, please **follow their licenses**.
### Data Format ### Data Format
These several relevant datasets have the following structure: These several relevant datasets have the following structure:
...@@ -190,6 +219,16 @@ CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/fairmot/fairmot_d ...@@ -190,6 +219,16 @@ CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/fairmot/fairmot_d
# use saved checkpoint in training # use saved checkpoint in training
CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=output/fairmot_dla34_30e_1088x608/model_final.pdparams CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=output/fairmot_dla34_30e_1088x608/model_final.pdparams
``` ```
**Notes:**
The default evaluation dataset is MOT-16 Train Set. If you want to change the evaluation dataset, please refer to the following code and modify `configs/datasets/mot.yml`
```
EvalMOTDataset:
!MOTImageFolder
task: MOT17_train
dataset_dir: dataset/mot
data_root: MOT17/images/train
keep_ori_im: False # set True if save visualization images or video
```
### 3. Inference ### 3. Inference
......
...@@ -12,12 +12,18 @@ ...@@ -12,12 +12,18 @@
## 简介 ## 简介
PaddleDetection实现了3种多目标跟踪方法。 当前主流的多目标追踪(MOT)算法主要由两部分组成:Detection+Embedding。Detection部分即针对视频,检测出每一帧中的潜在目标。Embedding部分则将检出的目标分配和更新到已有的对应轨迹上(即ReID重识别任务)。根据这两部分实现的不同,又可以划分为**SDE**系列和**JDE**系列算法。
- [DeepSORT](https://arxiv.org/abs/1812.00442)(Deep Cosine Metric Learning SORT) 扩展了原有的[SORT](https://arxiv.org/abs/1703.07402)(Simple Online and Realtime Tracking)算法,增加了一个CNN模型用于在检测器限定的人体部分图像中提取特征,在深度外观描述的基础上整合外观信息。
- [JDE](https://arxiv.org/abs/1909.12605)(Joint Detection and Embedding)是一个快速高性能多目标跟踪器,它是在共享神经网络中同时学习目标检测任务和外观嵌入任务的 - SDE(Separate Detection and Embedding)这类算法完全分离Detection和Embedding两个环节,最具代表性的就是**DeepSORT**算法。这样的设计可以使系统无差别的适配各类检测器,可以针对两个部分分别调优,但由于流程上是串联的导致速度慢耗时较长,在构建实时MOT系统中面临较大挑战
- [FairMOT](https://arxiv.org/abs/2004.01888)着重研究在单个网络中实现检测和ReID以提高推理速度,提出了一种由两个同质分支组成的简单基线来预测像素级目标得分和ReID特征,实现了两个任务之间的公平性,并获得了高水平的检测和跟踪精度。 - JDE(Joint Detection and Embedding)这类算法完是在一个共享神经网络中同时学习Detection和Embedding,使用一个多任务学习的思路设置损失函数。代表性的算法有**JDE****FairMOT**。这样的设计兼顾精度和速度,可以实现高精度的实时多目标跟踪。
PaddleDetection实现了这两个系列的3种多目标跟踪算法。
- [DeepSORT](https://arxiv.org/abs/1812.00442)(Deep Cosine Metric Learning SORT) 扩展了原有的[SORT](https://arxiv.org/abs/1703.07402)(Simple Online and Realtime Tracking)算法,增加了一个CNN模型用于在检测器限定的人体部分图像中提取特征,在深度外观描述的基础上整合外观信息,将检出的目标分配和更新到已有的对应轨迹上即进行一个ReID重识别任务。DeepSORT所需的检测框可以由任意一个检测器来生成,然后读入保存的检测结果和视频图片即可进行跟踪预测。ReID模型此处选择[PaddleClas](https://github.com/PaddlePaddle/PaddleClas)提供的`PCB+Pyramid ResNet101`模型。
- [JDE](https://arxiv.org/abs/1909.12605)(Joint Detection and Embedding)是在一个单一的共享神经网络中同时学习目标检测任务和embedding任务,并同时输出检测结果和对应的外观embedding匹配的算法。JDE原论文是基于Anchor Base的YOLOv3检测器新增加一个ReID分支学习embedding,训练过程被构建为一个多任务联合学习问题,兼顾精度和速度。
- [FairMOT](https://arxiv.org/abs/2004.01888)以Anchor Free的CenterNet检测器为基础,克服了Anchor-Based的检测框架中anchor和特征不对齐问题,深浅层特征融合使得检测和ReID任务各自获得所需要的特征,并且使用低维度ReID特征,提出了一种由两个同质分支组成的简单baseline来预测像素级目标得分和ReID特征,实现了两个任务之间的公平性,并获得了更高水平的实时多目标跟踪精度。
<div align="center"> <div align="center">
<img src="../../docs/images/mot16_jde.gif" width=500 /> <img src="../../docs/images/mot16_jde.gif" width=500 />
...@@ -38,25 +44,20 @@ pip install -r requirements.txt ...@@ -38,25 +44,20 @@ pip install -r requirements.txt
## 模型库 ## 模型库
### JDE在MOT-16 Training Set上结果 ### DeepSORT在MOT-16 Training Set上结果
| 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 检测模型 | 配置文件 |
| :----------------- | :------- | :----: | :----: | :---: | :----: | :---: | :---: | :---: | :---: |
| DarkNet53 | 1088x608 | 73.2 | 69.3 | 1351 | 6591 | 21625 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.1/configs/mot/jde/jde_darknet53_30e_1088x608.yml) |
| DarkNet53 | 864x480 | 70.1 | 65.2 | 1328 | 6441 | 25187 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_864x480.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.1/configs/mot/jde/jde_darknet53_30e_864x480.yml) |
| DarkNet53 | 576x320 | 63.2 | 64.5 | 1308 | 7011 | 32252 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.1/configs/mot/jde/jde_darknet53_30e_576x320.yml) |
**注意:** | 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 下载链接 | 配置文件 |
JDE使用8个GPU进行训练,每个GPU上batch size为4,训练30个epoch。 | :---------| :------- | :----: | :----: | :--: | :----: | :---: | :---: | :-----: | :-----: |
| ResNet-101 | 1088x608 | 72.2 | 60.5 | 998 | 8054 | 21644 | - | [下载链接](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.1/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) |
### DeepSORT在MOT-16 Training Set上结果 ### DeepSORT在MOT-16 Test Set上结果
| 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 检测模型 | ReID模型 | 配置文件 | | 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 下载链接 | 配置文件 |
| :---------| :------- | :----: | :----: | :--: | :----: | :---: | :---: |:-----: | :-----: | :-----: | | :---------| :------- | :----: | :----: | :--: | :----: | :---: | :---: | :-----: | :-----: |
| DarkNet53 | 1088x608 | 72.2 | 60.5 | 998 | 8054 | 21644 | 5.07 |[JDE](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams)| [ReID](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.1/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) | | ResNet-101 | 1088x608 | 64.1 | 53.0 | 1024 | 12457 | 51919 | - | [下载链接](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.1/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) |
**注意:** **注意:**
DeepSORT此处不需要训练MOT数据集,只用于评估。在使用DeepSORT模型评估之前,应该首先通过一个检测模型得到检测结果,此处使用JDE,然后像这样准备好结果文件: DeepSORT不需要训练MOT数据集,只用于评估。在使用DeepSORT模型评估之前,应该首先通过一个检测模型得到检测结果,然后像这样准备好结果文件:
``` ```
det_results_dir det_results_dir
|——————MOT16-02.txt |——————MOT16-02.txt
...@@ -67,18 +68,45 @@ det_results_dir ...@@ -67,18 +68,45 @@ det_results_dir
|——————MOT16-11.txt |——————MOT16-11.txt
|——————MOT16-13.txt |——————MOT16-13.txt
``` ```
对于MOT16数据集,可以下载PaddleDetection提供的一个检测结果det_results_dir.zip并解压:
```
wget https://dataset.bj.bcebos.com/mot/det_results_dir.zip
```
其中每个txt是每个视频中所有图片的检测结果,每行都描述一个边界框,格式如下: 其中每个txt是每个视频中所有图片的检测结果,每行都描述一个边界框,格式如下:
``` ```
[frame_id][identity][bb_left][bb_top][width][height][conf][x][y][z] [frame_id],[identity],[bb_left],[bb_top],[width],[height],[conf]
``` ```
**注意**: **注意**:
- `frame_id`是图片帧的序号 - `frame_id`是图片帧的序号
- `identity`是目标id采用默认值为`-1` - `identity`是目标id采用默认值为`-1`
- `bb_left`是目标框的左边界的x坐标 - `bb_left`是目标框的左边界的x坐标
- `bb_top`是目标框的上边界的y坐标 - `bb_top`是目标框的上边界的y坐标
- `widthheight`是真实的像素宽高 - `width,height`是真实的像素宽高
- `conf`是目标得分设置为`1`(已经按检测的得分阈值筛选出的检测结果) - `conf`是目标得分设置为`1`(已经按检测的得分阈值筛选出的检测结果)
- `x,y,z`是3D中用到的,在2D中默认为`-1`
### JDE在MOT-16 Training Set上结果
| 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 下载链接 | 配置文件 |
| :----------------- | :------- | :----: | :----: | :---: | :----: | :---: | :---: | :---: | :---: |
| DarkNet53 | 1088x608 | 72.0 | 66.9 | 1397 | 7274 | 22209 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.1/configs/mot/jde/jde_darknet53_30e_1088x608.yml) |
| DarkNet53 | 864x480 | 69.1 | 64.7 | 1539 | 7544 | 25046 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_864x480.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.1/configs/mot/jde/jde_darknet53_30e_864x480.yml) |
| DarkNet53 | 576x320 | 63.7 | 64.4 | 1310 | 6782 | 31964 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.1/configs/mot/jde/jde_darknet53_30e_576x320.yml) |
### JDE在MOT-16 Test Set上结果
| 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 下载链接 | 配置文件 |
| :----------------- | :------- | :----: | :----: | :---: | :----: | :---: | :---: | :---: | :---: |
| DarkNet53(paper) | 1088x608 | 64.4 | 55.8 | 1544 | - | - | - | - | - |
| DarkNet53 | 1088x608 | 64.6 | 58.5 | 1864 | 10550 | 52088 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.1/configs/mot/jde/jde_darknet53_30e_1088x608.yml) |
| DarkNet53(paper) | 864x480 | 62.1 | 56.9 | 1608 | - | - | - | - | - |
| DarkNet53 | 864x480 | 63.2 | 57.7 | 1966 | 10070 | 55081 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_864x480.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.1/configs/mot/jde/jde_darknet53_30e_864x480.yml) |
| DarkNet53 | 576x320 | 59.1 | 56.4 | 1911 | 10923 | 61789 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.1/configs/mot/jde/jde_darknet53_30e_576x320.yml) |
**注意:**
JDE使用8个GPU进行训练,每个GPU上batch size为4,训练了30个epoch。
### FairMOT在MOT-16 Training Set上结果 ### FairMOT在MOT-16 Training Set上结果
...@@ -101,7 +129,7 @@ det_results_dir ...@@ -101,7 +129,7 @@ det_results_dir
## 数据集准备 ## 数据集准备
### MOT数据集 ### MOT数据集
PaddleDetection使用和[JDE](https://github.com/Zhongdao/Towards-Realtime-MOT) 还有[FairMOT](https://github.com/ifzhang/FairMOT)相同的数据集。请参照[数据准备文档](../../docs/tutorials/PrepareMOTDataSet_cn.md)去下载并准备好所有的数据集包括**Caltech Pedestrian, CityPersons, CUHK-SYSU, PRW, ETHZ, MOT17和MOT16**此外还可以下载**MOT15和MOT20**数据集,如果您想使用这些数据集,请**遵循他们的License** PaddleDetection使用和[JDE](https://github.com/Zhongdao/Towards-Realtime-MOT) 还有[FairMOT](https://github.com/ifzhang/FairMOT)相同的数据集。请参照[数据准备文档](../../docs/tutorials/PrepareMOTDataSet_cn.md)去下载并准备好所有的数据集包括**Caltech Pedestrian, CityPersons, CUHK-SYSU, PRW, ETHZ, MOT17和MOT16**使用前6者作为联合数据集参与训练,MOT16作为评测数据集。此外还可以使用**MOT15和MOT20**进行finetune。所有的行人都有检测框标签,部分有ID标签。如果您想使用这些数据集,请**遵循他们的License**
### 数据格式 ### 数据格式
这几个相关数据集都遵循以下结构: 这几个相关数据集都遵循以下结构:
...@@ -188,6 +216,16 @@ CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/fairmot/fairmot_d ...@@ -188,6 +216,16 @@ CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/fairmot/fairmot_d
# 使用训练保存的checkpoint # 使用训练保存的checkpoint
CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=output/fairmot_dla34_30e_1088x608/model_final.pdparams CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=output/fairmot_dla34_30e_1088x608/model_final.pdparams
``` ```
**注意:**
默认评估的是MOT-16 Train Set数据集, 如需换评估数据集可参照以下代码修改`configs/datasets/mot.yml`
```
EvalMOTDataset:
!MOTImageFolder
task: MOT17_train
dataset_dir: dataset/mot
data_root: MOT17/images/train
keep_ori_im: False # set True if save visualization images or video
```
### 3. 预测 ### 3. 预测
......
...@@ -9,18 +9,24 @@ English | [简体中文](README_cn.md) ...@@ -9,18 +9,24 @@ English | [简体中文](README_cn.md)
- [Citations](#Citations) - [Citations](#Citations)
## Introduction ## Introduction
[DeepSORT](https://arxiv.org/abs/1812.00442) (Deep Cosine Metric Learning SORT) extends the original [SORT](https://arxiv.org/abs/1703.07402) (Simple Online and Realtime Tracking) algorithm to integrate appearance information based on a deep appearance descriptor. It adds a CNN model to extract features in image of human part bounded by a detector. Here we use `JDE` as detection model to generate boxes, and select `PCBPyramid` as the ReID model. We also support loading the boxes from saved detection result files. [DeepSORT](https://arxiv.org/abs/1812.00442) (Deep Cosine Metric Learning SORT) extends the original [SORT](https://arxiv.org/abs/1703.07402) (Simple Online and Realtime Tracking) algorithm, it adds a CNN model to extract features in image of human part bounded by a detector. It integrates appearance information based on a deep appearance descriptor, and assigns and updates the detected targets to the existing corresponding trajectories like ReID task. The detection bboxes result required by DeepSORT can be generated by any detection model, and then the saved detection result file can be loaded for tracking. Here we select the `PCB + Pyramid ResNet101` model provided by [PaddleClas](https://github.com/PaddlePaddle/PaddleClas) as the ReID model.
## Model Zoo ## Model Zoo
### DeepSORT on MOT-16 Training Set ### DeepSORT on MOT-16 Training Set
| backbone | input shape | MOTA | IDF1 | IDS | FP | FN | FPS | Detector | ReID | config | | backbone | input shape | MOTA | IDF1 | IDS | FP | FN | FPS | download| config |
| :---------| :------- | :----: | :----: | :--: | :----: | :---: | :---: |:-------: | :---: | :---: | | :---------| :------- | :----: | :----: | :--: | :----: | :---: | :---: | :---: | :---: |
| DarkNet53 | 1088x608 | 72.2 | 60.5 | 998 | 8054 | 21644 | 5.07 |[JDE](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams)| [ReID](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.1/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) | | ResNet101 | 1088x608 | 72.2 | 60.5 | 998 | 8054 | 21644 | - | [download](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.1/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) |
### DeepSORT on MOT-16 Test Set
| backbone | input shape | MOTA | IDF1 | IDS | FP | FN | FPS | download| config |
| :---------| :------- | :----: | :----: | :--: | :----: | :---: | :---: | :---: | :---: |
| ResNet101 | 1088x608 | 64.1 | 53.0 | 1024 | 12457 | 51919 | - | [download](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.1/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) |
**Notes:** **Notes:**
DeepSORT does not need to train on MOT dataset, only used for evaluation. Before DeepSORT evaluation, you should get detection results by a detection model first, here we use JDE, and then prepare them like this: DeepSORT does not need to train on MOT dataset, only used for evaluation. Before DeepSORT evaluation, you should get detection results by a detection model first, and then prepare them like this:
``` ```
det_results_dir det_results_dir
|——————MOT16-02.txt |——————MOT16-02.txt
...@@ -31,9 +37,13 @@ det_results_dir ...@@ -31,9 +37,13 @@ det_results_dir
|——————MOT16-11.txt |——————MOT16-11.txt
|——————MOT16-13.txt |——————MOT16-13.txt
``` ```
For MOT16 dataset, you can download the det_results_dir.zip provided by PaddleDetection:
```
wget https://dataset.bj.bcebos.com/mot/det_results_dir.zip
```
Each txt is the detection result of all the pictures extracted from each video, and each line describes a bounding box with the following format: Each txt is the detection result of all the pictures extracted from each video, and each line describes a bounding box with the following format:
``` ```
[frame_id],[identity],[bb_left],[bb_top],[width],[height],[conf],[x],[y],[z] [frame_id],[identity],[bb_left],[bb_top],[width],[height],[conf]
``` ```
**Notes:** **Notes:**
- `frame_id` is the frame number of the image - `frame_id` is the frame number of the image
...@@ -42,21 +52,10 @@ Each txt is the detection result of all the pictures extracted from each video, ...@@ -42,21 +52,10 @@ Each txt is the detection result of all the pictures extracted from each video,
- `bb_top` is the Y coordinate of the upper bound of the object box - `bb_top` is the Y coordinate of the upper bound of the object box
- `width,height` is the pixel width and height - `width,height` is the pixel width and height
- `conf` is the object score with default value `1` (the results had been filtered out according to the detection score threshold) - `conf` is the object score with default value `1` (the results had been filtered out according to the detection score threshold)
- `x,y,z` are used in 3D, default to `-1` in 2D.
## Getting Start ## Getting Start
### 1. Evaluate a detector to get detection results ### 1. Evaluation
```bash
# use weights released in PaddleDetection model zoo
CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/jde/jde_darknet53_30e_1088x608_track.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams
# use saved checkpoint after training
CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/jde/jde_darknet53_30e_1088x608_track.yml -o weights=output/jde_darknet53_30e_1088x608/model_final.pdparams
```
### 2. Tracking
```bash ```bash
# track the objects by loading detected result files # track the objects by loading detected result files
......
...@@ -9,18 +9,24 @@ ...@@ -9,18 +9,24 @@
- [引用](#引用) - [引用](#引用)
## 简介 ## 简介
[DeepSORT](https://arxiv.org/abs/1812.00442) (Deep Cosine Metric Learning SORT) 扩展了原有的[SORT](https://arxiv.org/abs/1703.07402) (Simple Online and Realtime Tracking)算法,增加了一个CNN模型用于在检测器限定的人体部分图像中提取特征,在深度外观描述的基础上整合外观信息。我们使用`JDE`作为检测模型来生成检测框,并选择`PCBPyramid`作为ReID模型。我们还支持加载保存的检测结果文件来进行预测跟踪 [DeepSORT](https://arxiv.org/abs/1812.00442)(Deep Cosine Metric Learning SORT) 扩展了原有的[SORT](https://arxiv.org/abs/1703.07402)(Simple Online and Realtime Tracking)算法,增加了一个CNN模型用于在检测器限定的人体部分图像中提取特征,在深度外观描述的基础上整合外观信息,将检出的目标分配和更新到已有的对应轨迹上即进行一个ReID重识别任务。DeepSORT所需的检测框可以由任意一个检测器来生成,然后读入保存的检测结果和视频图片即可进行跟踪预测。ReID模型此处选择[PaddleClas](https://github.com/PaddlePaddle/PaddleClas)提供的`PCB+Pyramid ResNet101`模型
## 模型库 ## 模型库
### DeepSORT在MOT-16 Training Set上结果 ### DeepSORT在MOT-16 Training Set上结果
| 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 检测模型 | ReID模型 | 配置文件 | | 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 下载链接 | 配置文件 |
| :---------| :------- | :----: | :----: | :--: | :----: | :---: | :---: |:-----: | :-----: | :-----: | | :---------| :------- | :----: | :----: | :--: | :----: | :---: | :---: | :-----: | :-----: |
| DarkNet53 | 1088x608 | 72.2 | 60.5 | 998 | 8054 | 21644 | 5.07 |[JDE](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams)| [ReID](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.1/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) | | ResNet101 | 1088x608 | 72.2 | 60.5 | 998 | 8054 | 21644 | - | [下载链接](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.1/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) |
### DeepSORT在MOT-16 Test Set上结果
| 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 下载链接 | 配置文件 |
| :---------| :------- | :----: | :----: | :--: | :----: | :---: | :---: | :-----: | :-----: |
| ResNet101 | 1088x608 | 64.1 | 53.0 | 1024 | 12457 | 51919 | - | [下载链接](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.1/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) |
**注意:** **注意:**
DeepSORT此处不需要训练MOT数据集,只用于评估。在使用DeepSORT模型评估之前,应该首先通过一个检测模型得到检测结果,此处使用JDE,然后像这样准备好结果文件: DeepSORT不需要训练MOT数据集,只用于评估。在使用DeepSORT模型评估之前,应该首先通过一个检测模型得到检测结果,然后像这样准备好结果文件:
``` ```
det_results_dir det_results_dir
|——————MOT16-02.txt |——————MOT16-02.txt
...@@ -31,9 +37,13 @@ det_results_dir ...@@ -31,9 +37,13 @@ det_results_dir
|——————MOT16-11.txt |——————MOT16-11.txt
|——————MOT16-13.txt |——————MOT16-13.txt
``` ```
对于MOT16数据集,可以下载PaddleDetection提供的det_results_dir.zip并解压:
```
wget https://dataset.bj.bcebos.com/mot/det_results_dir.zip
```
其中每个txt是每个视频中所有图片的检测结果,每行都描述一个边界框,格式如下: 其中每个txt是每个视频中所有图片的检测结果,每行都描述一个边界框,格式如下:
``` ```
[frame_id],[identity],[bb_left],[bb_top],[width],[height],[conf],[x],[y],[z] [frame_id],[identity],[bb_left],[bb_top],[width],[height],[conf]
``` ```
**注意**: **注意**:
- `frame_id`是图片帧的序号 - `frame_id`是图片帧的序号
...@@ -42,21 +52,10 @@ det_results_dir ...@@ -42,21 +52,10 @@ det_results_dir
- `bb_top`是目标框的上边界的y坐标 - `bb_top`是目标框的上边界的y坐标
- `width,height`是真实的像素宽高 - `width,height`是真实的像素宽高
- `conf`是目标得分设置为`1`(已经按检测的得分阈值筛选出的检测结果) - `conf`是目标得分设置为`1`(已经按检测的得分阈值筛选出的检测结果)
- `x,y,z`是3D中用到的,在2D中默认为`-1`
## 快速开始 ## 快速开始
### 1. 验证检测模型得到检测结果文件 ### 1. 评估
```bash
# 使用PaddleDetection发布的权重
CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/jde/jde_darknet53_30e_1088x608_track.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams
# 使用训练保存的checkpoint
CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/jde/jde_darknet53_30e_1088x608_track.yml -o weights=output/jde_darknet53_30e_1088x608/model_final.pdparams
```
### 2. 跟踪预测
```bash ```bash
# 加载检测结果文件得到跟踪结果 # 加载检测结果文件得到跟踪结果
......
...@@ -10,8 +10,7 @@ English | [简体中文](README_cn.md) ...@@ -10,8 +10,7 @@ English | [简体中文](README_cn.md)
## Introduction ## Introduction
[FairMOT](https://arxiv.org/abs/2004.01888) focuses on accomplishing the detection and re-identification in a single network to improve the inference speed, presents a simple baseline which consists of two homogeneous branches to predict pixel-wise objectness scores and re-ID features. The achieved fairness between the two tasks allows FairMOT to obtain high levels of detection and tracking accuracy. [FairMOT](https://arxiv.org/abs/2004.01888) is based on an Anchor Free detector Centernet, which overcomes the problem of anchor and feature misalignment in anchor based detection framework. The fusion of deep and shallow features enables the detection and ReID tasks to obtain the required features respectively. It also uses low dimensional ReID features. FairMOT is a simple baseline composed of two homogeneous branches propose to predict the pixel level target score and ReID features. It achieves the fairness between the two tasks and obtains a higher level of real-time MOT performance.
## Model Zoo ## Model Zoo
...@@ -55,6 +54,28 @@ CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/fairmot/fairmot_d ...@@ -55,6 +54,28 @@ CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/fairmot/fairmot_d
# use saved checkpoint in training # use saved checkpoint in training
CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=output/fairmot_dla34_30e_1088x608/model_final.pdparams CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=output/fairmot_dla34_30e_1088x608/model_final.pdparams
``` ```
**Notes:**
The default evaluation dataset is MOT-16 Train Set. If you want to change the evaluation dataset, please refer to the following code and modify `configs/datasets/mot.yml`
```
EvalMOTDataset:
!MOTImageFolder
task: MOT17_train
dataset_dir: dataset/mot
data_root: MOT17/images/train
keep_ori_im: False # set True if save visualization images or video
```
### 3. Inference
Inference a vidoe on single GPU with following command:
```bash
# inference on video and save a video
CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams --video_file={your video name}.mp4 --save_videos
```
**Notes:**
Please make sure that [ffmpeg](https://ffmpeg.org/ffmpeg.html) is installed first, on Linux(Ubuntu) platform you can directly install it by the following command:`apt-get update && apt-get install -y ffmpeg`.
## Citations ## Citations
``` ```
......
...@@ -10,7 +10,7 @@ ...@@ -10,7 +10,7 @@
## 内容 ## 内容
[FairMOT](https://arxiv.org/abs/2004.01888)着重研究在单个网络中实现检测和ReID以提高推理速度,提出了一种由两个同质分支组成的简单基线来预测像素级目标得分和ReID特征, 实现了两个任务之间的公平性,并获得高水平的检测和跟踪精度。 [FairMOT](https://arxiv.org/abs/2004.01888)以Anchor Free的CenterNet检测器为基础,克服了Anchor-Based的检测框架中anchor和特征不对齐问题,深浅层特征融合使得检测和ReID任务各自获得所需要的特征,并且使用低维度ReID特征,提出了一种由两个同质分支组成的简单baseline来预测像素级目标得分和ReID特征,实现了两个任务之间的公平性,并获得了更高水平的实时多目标跟踪精度。
## 模型库 ## 模型库
...@@ -52,6 +52,28 @@ CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/fairmot/fairmot_d ...@@ -52,6 +52,28 @@ CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/fairmot/fairmot_d
# 使用训练保存的checkpoint # 使用训练保存的checkpoint
CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=output/fairmot_dla34_30e_1088x608/model_final.pdparams CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=output/fairmot_dla34_30e_1088x608/model_final.pdparams
``` ```
**注意:**
默认评估的是MOT-16 Train Set数据集, 如需换评估数据集可参照以下代码修改`configs/datasets/mot.yml`
```
EvalMOTDataset:
!MOTImageFolder
task: MOT17_train
dataset_dir: dataset/mot
data_root: MOT17/images/train
keep_ori_im: False # set True if save visualization images or video
```
### 3. 预测
使用单个GPU通过如下命令预测一个视频,并保存为视频
```bash
# 预测一个视频
CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams --video_file={your video name}.mp4 --save_videos
```
**注意:**
请先确保已经安装了[ffmpeg](https://ffmpeg.org/ffmpeg.html), Linux(Ubuntu)平台可以直接用以下命令安装:`apt-get update && apt-get install -y ffmpeg`
## 引用 ## 引用
``` ```
......
...@@ -10,7 +10,8 @@ English | [简体中文](README_cn.md) ...@@ -10,7 +10,8 @@ English | [简体中文](README_cn.md)
## Introduction ## Introduction
[JDE](https://arxiv.org/abs/1909.12605) (Joint Detection and Embedding) is a fast and high-performance multiple-object tracker that learns the object detection task and appearance embedding task simutaneously in a shared neural network. - [JDE](https://arxiv.org/abs/1909.12605) (Joint Detection and Embedding) learns the object detection task and appearance embedding task simutaneously in a shared neural network. And the detection results and the corresponding embeddings are also outputed at the same time. JDE original paper is based on an Anchor Base detector YOLOv3 , adding a new ReID branch to learn embeddings. The training process is constructed as a multi-task learning problem, taking into account both accuracy and speed.
<div align="center"> <div align="center">
<img src="../../../docs/images/mot16_jde.gif" width=500 /> <img src="../../../docs/images/mot16_jde.gif" width=500 />
</div> </div>
...@@ -21,11 +22,19 @@ English | [简体中文](README_cn.md) ...@@ -21,11 +22,19 @@ English | [简体中文](README_cn.md)
| backbone | input shape | MOTA | IDF1 | IDS | FP | FN | FPS | download | config | | backbone | input shape | MOTA | IDF1 | IDS | FP | FN | FPS | download | config |
| :----------------- | :------- | :----: | :----: | :---: | :----: | :---: | :---: | :---: | :---: | | :----------------- | :------- | :----: | :----: | :---: | :----: | :---: | :---: | :---: | :---: |
| DarkNet53 | 1088x608 | 73.2 | 69.3 | 1351 | 6591 | 21625 | - |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.1/configs/mot/jde/jde_darknet53_30e_1088x608.yml) | | DarkNet53 | 1088x608 | 72.0 | 66.9 | 1397 | 7274 | 22209 | - |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.1/configs/mot/jde/jde_darknet53_30e_1088x608.yml) |
| DarkNet53 | 864x480 | 70.1 | 65.2 | 1328 | 6441 | 25187 | - |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_864x480.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.1/configs/mot/jde/jde_darknet53_30e_864x480.yml) | | DarkNet53 | 864x480 | 69.1 | 64.7 | 1539 | 7544 | 25046 | - |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_864x480.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.1/configs/mot/jde/jde_darknet53_30e_864x480.yml) |
| DarkNet53 | 576x320 | 63.2 | 64.5 | 1308 | 7011 | 32252 | - |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.1/configs/mot/jde/jde_darknet53_30e_576x320.yml) | | DarkNet53 | 576x320 | 63.7 | 64.4 | 1310 | 6782 | 31964 | - |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.1/configs/mot/jde/jde_darknet53_30e_576x320.yml) |
### JDE on MOT-16 Test Set
| backbone | input shape | MOTA | IDF1 | IDS | FP | FN | FPS | download | config |
| :----------------- | :------- | :----: | :----: | :---: | :----: | :---: | :---: | :---: | :---: |
| DarkNet53(paper) | 1088x608 | 64.4 | 55.8 | 1544 | - | - | - | - | - |
| DarkNet53 | 1088x608 | 64.6 | 58.5 | 1864 | 10550 | 52088 | - |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.1/configs/mot/jde/jde_darknet53_30e_1088x608.yml) |
| DarkNet53(paper) | 864x480 | 62.1 | 56.9 | 1608 | - | - | - | - | - |
| DarkNet53 | 864x480 | 63.2 | 57.7 | 1966 | 10070 | 55081 | - |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_864x480.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.1/configs/mot/jde/jde_darknet53_30e_864x480.yml) |
| DarkNet53 | 576x320 | 59.1 | 56.4 | 1911 | 10923 | 61789 | - |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.1/configs/mot/jde/jde_darknet53_30e_576x320.yml) |
**Notes:** **Notes:**
JDE used 8 GPUs for training and mini-batch size as 4 on each GPU, and trained for 30 epoches. JDE used 8 GPUs for training and mini-batch size as 4 on each GPU, and trained for 30 epoches.
...@@ -51,6 +60,18 @@ CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/jde/jde_darknet53 ...@@ -51,6 +60,18 @@ CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/jde/jde_darknet53
CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/jde/jde_darknet53_30e_1088x608.yml -o weights=output/jde_darknet53_30e_1088x608/model_final.pdparams CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/jde/jde_darknet53_30e_1088x608.yml -o weights=output/jde_darknet53_30e_1088x608/model_final.pdparams
``` ```
### 3. Inference
Inference a vidoe on single GPU with following command:
```bash
# inference on video and save a video
CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/jde/jde_darknet53_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams --video_file={your video name}.mp4 --save_videos
```
**Notes:**
Please make sure that [ffmpeg](https://ffmpeg.org/ffmpeg.html) is installed first, on Linux(Ubuntu) platform you can directly install it by the following command:`apt-get update && apt-get install -y ffmpeg`.
## Citations ## Citations
``` ```
@article{wang2019towards, @article{wang2019towards,
......
...@@ -10,7 +10,8 @@ ...@@ -10,7 +10,8 @@
## 内容 ## 内容
[JDE](https://arxiv.org/abs/1909.12605) (Joint Detection and Embedding)是一个快速高性能多目标跟踪器,它是在共享神经网络中同时学习目标检测任务和外观嵌入任务的。 [JDE](https://arxiv.org/abs/1909.12605)(Joint Detection and Embedding)是在一个单一的共享神经网络中同时学习目标检测任务和embedding任务,并同时输出检测结果和对应的外观embedding匹配的算法。JDE原论文是基于Anchor Base的YOLOv3检测器新增加一个ReID分支学习embedding,训练过程被构建为一个多任务联合学习问题,兼顾精度和速度。
<div align="center"> <div align="center">
<img src="../../../docs/images/mot16_jde.gif" width=500 /> <img src="../../../docs/images/mot16_jde.gif" width=500 />
</div> </div>
...@@ -19,11 +20,22 @@ ...@@ -19,11 +20,22 @@
### JDE在MOT-16 Training Set上结果 ### JDE在MOT-16 Training Set上结果
| 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 检测模型 | 配置文件 | | 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 下载链接 | 配置文件 |
| :----------------- | :------- | :----: | :----: | :---: | :----: | :---: | :---: | :---: | :---: |
| DarkNet53 | 1088x608 | 72.0 | 66.9 | 1397 | 7274 | 22209 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.1/configs/mot/jde/jde_darknet53_30e_1088x608.yml) |
| DarkNet53 | 864x480 | 69.1 | 64.7 | 1539 | 7544 | 25046 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_864x480.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.1/configs/mot/jde/jde_darknet53_30e_864x480.yml) |
| DarkNet53 | 576x320 | 63.7 | 64.4 | 1310 | 6782 | 31964 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.1/configs/mot/jde/jde_darknet53_30e_576x320.yml) |
### JDE在MOT-16 Test Set上结果
| 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 下载链接 | 配置文件 |
| :----------------- | :------- | :----: | :----: | :---: | :----: | :---: | :---: | :---: | :---: | | :----------------- | :------- | :----: | :----: | :---: | :----: | :---: | :---: | :---: | :---: |
| DarkNet53 | 1088x608 | 73.2 | 69.3 | 1351 | 6591 | 21625 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.1/configs/mot/jde/jde_darknet53_30e_1088x608.yml) | | DarkNet53(paper) | 1088x608 | 64.4 | 55.8 | 1544 | - | - | - | - | - |
| DarkNet53 | 864x480 | 70.1 | 65.2 | 1328 | 6441 | 25187 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_864x480.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.1/configs/mot/jde/jde_darknet53_30e_864x480.yml) | | DarkNet53 | 1088x608 | 64.6 | 58.5 | 1864 | 10550 | 52088 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.1/configs/mot/jde/jde_darknet53_30e_1088x608.yml) |
| DarkNet53 | 576x320 | 63.2 | 64.5 | 1308 | 7011 | 32252 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.1/configs/mot/jde/jde_darknet53_30e_576x320.yml) | | DarkNet53(paper) | 864x480 | 62.1 | 56.9 | 1608 | - | - | - | - | - |
| DarkNet53 | 864x480 | 63.2 | 57.7 | 1966 | 10070 | 55081 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_864x480.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.1/configs/mot/jde/jde_darknet53_30e_864x480.yml) |
| DarkNet53 | 576x320 | 59.1 | 56.4 | 1911 | 10923 | 61789 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.1/configs/mot/jde/jde_darknet53_30e_576x320.yml) |
**注意:** **注意:**
JDE使用8个GPU进行训练,每个GPU上batch size为4,训练了30个epoch。 JDE使用8个GPU进行训练,每个GPU上batch size为4,训练了30个epoch。
...@@ -50,6 +62,18 @@ CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/jde/jde_darknet53 ...@@ -50,6 +62,18 @@ CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/jde/jde_darknet53
CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/jde/jde_darknet53_30e_1088x608.yml -o weights=output/jde_darknet53_30e_1088x608/model_final.pdparams CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/jde/jde_darknet53_30e_1088x608.yml -o weights=output/jde_darknet53_30e_1088x608/model_final.pdparams
``` ```
### 3. 预测
使用单个GPU通过如下命令预测一个视频,并保存为视频
```bash
# 预测一个视频
CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/jde/jde_darknet53_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams --video_file={your video name}.mp4 --save_videos
```
**注意:**
请先确保已经安装了[ffmpeg](https://ffmpeg.org/ffmpeg.html), Linux(Ubuntu)平台可以直接用以下命令安装:`apt-get update && apt-get install -y ffmpeg`
## 引用 ## 引用
``` ```
@article{wang2019towards, @article{wang2019towards,
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册