README.md 26.6 KB
Newer Older
G
George Ni 已提交
1 2 3 4 5 6
English | [简体中文](README_cn.md)

# MOT (Multi-Object Tracking)

## Table of Contents
- [Introduction](#Introduction)
7
- [Installation](#Installation)
G
George Ni 已提交
8
- [Model Zoo](#Model_Zoo)
G
George Ni 已提交
9
- [Feature Tracking Model](#Feature_Tracking_Model)
G
George Ni 已提交
10 11 12 13 14
- [Dataset Preparation](#Dataset_Preparation)
- [Getting Start](#Getting_Start)
- [Citations](#Citations)

## Introduction
G
George Ni 已提交
15
The current mainstream multi-objective tracking (MOT) algorithm is mainly composed of two parts: detection and embedding. Detection aims to detect the potential targets in each frame of the video. Embedding assigns and updates the detected target to the corresponding track (named ReID task). According to the different implementation of these two parts, it can be divided into **SDE** series and **JDE** series algorithm.
G
George Ni 已提交
16

G
George Ni 已提交
17
- **SDE** (Separate Detection and Embedding) is a kind of algorithm which completely separates Detection and Embedding. The most representative is **DeepSORT** algorithm. This design can make the system fit any kind of detectors without difference, and can be improved for each part separately. However, due to the series process, the speed is slow. Time-consuming is a great challenge in the construction of real-time MOT system.
G
George Ni 已提交
18

G
George Ni 已提交
19 20 21 22
- **JDE** (Joint Detection and Embedding) is to learn detection and embedding simultaneously in a shared neural network, and set the loss function with a multi task learning approach. The representative algorithms are **JDE** and **FairMOT**. This design can achieve high-precision real-time MOT performance.

Paddledetection implements three MOT algorithms of these two series.

23
- [DeepSORT](https://arxiv.org/abs/1812.00442) (Deep Cosine Metric Learning SORT) extends the original [SORT](https://arxiv.org/abs/1703.07402) (Simple Online and Realtime Tracking) algorithm, it adds a CNN model to extract features in image of human part bounded by a detector. It integrates appearance information based on a deep appearance descriptor, and assigns and updates the detected targets to the existing corresponding trajectories like ReID task. The detection bboxes result required by DeepSORT can be generated by any detection model, and then the saved detection result file can be loaded for tracking. Here we select the `PCB + Pyramid ResNet101` and `PPLCNet` models provided by [PaddleClas](https://github.com/PaddlePaddle/PaddleClas) as the ReID model.
G
George Ni 已提交
24 25 26 27

- [JDE](https://arxiv.org/abs/1909.12605) (Joint Detection and Embedding) learns the object detection task and appearance embedding task simutaneously in a shared neural network. And the detection results and the corresponding embeddings are also outputed at the same time. JDE original paper is based on an Anchor Base detector YOLOv3 , adding a new ReID branch to learn embeddings. The training process is constructed as a multi-task learning problem, taking into account both accuracy and speed.

- [FairMOT](https://arxiv.org/abs/2004.01888) is based on an Anchor Free detector Centernet, which overcomes the problem of anchor and feature misalignment in anchor based detection framework. The fusion of deep and shallow features enables the detection and ReID tasks to obtain the required features respectively. It also uses low dimensional ReID features. FairMOT is a simple baseline composed of two homogeneous branches propose to predict the pixel level target score and ReID features. It achieves the fairness between the two tasks and  obtains a higher level of real-time MOT performance.
G
George Ni 已提交
28 29 30

<div align="center">
  <img src="../../docs/images/mot16_jde.gif" width=500 />
W
wangguanzhong 已提交
31 32
  <br>
  demo resource: MOT17 dataset</div>
G
George Ni 已提交
33 34
</div>

35 36 37 38 39 40 41 42 43 44

## Installation

Install all the related dependencies for MOT:
```
pip install lap sklearn motmetrics openpyxl cython_bbox
or
pip install -r requirements.txt
```
**Notes:**
G
George Ni 已提交
45
- Install `cython_bbox` for Windows: `pip install -e git+https://github.com/samson-wang/cython_bbox.git#egg=cython-bbox`. You can refer to this [tutorial](https://stackoverflow.com/questions/60349980/is-there-a-way-to-install-cython-bbox-for-windows).
46
- Please make sure that [ffmpeg](https://ffmpeg.org/ffmpeg.html) is installed first, on Linux(Ubuntu) platform you can directly install it by the following command:`apt-get update && apt-get install -y ffmpeg`.
47 48


G
George Ni 已提交
49 50
## Model Zoo

51
### DeepSORT Results on MOT-16 Training Set
G
George Ni 已提交
52

53 54
| backbone  | input shape | MOTA | IDF1 |  IDS  |   FP  |   FN  |   FPS  | det result/model |ReID model| config |
| :---------| :------- | :----: | :----: | :--: | :----: | :---: | :---: | :---: | :---: | :---: |
55 56 57 58
| ResNet-101 | 1088x608 |  72.2  |  60.5  | 998  |  8054  | 21644 |  - | [det result](https://dataset.bj.bcebos.com/mot/det_results_dir.zip) |[ReID model](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams)|[config](./deepsort/reid/deepsort_pcb_pyramid_r101.yml) |
| ResNet-101 | 1088x608 |  68.3  |  56.5  | 1722 |  17337 | 15890 |  - | [det model](https://paddledet.bj.bcebos.com/models/mot/deepsort/jde_yolov3_darknet53_30e_1088x608_mix.pdparams) |[ReID model](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams)|[config](./deepsort/deepsort_jde_yolov3_pcb_pyramid.yml) |
| PPLCNet    | 1088x608 |  72.2  |  59.5  | 1087  |  8034  | 21481 |  - | [det result](https://dataset.bj.bcebos.com/mot/det_results_dir.zip) |[ReID model](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams)|[config](./deepsort/reid/deepsort_pplcnet.yml) |
| PPLCNet    | 1088x608 |  68.1  |  53.6  | 1979 |  17446 | 15766 |  - | [det model](https://paddledet.bj.bcebos.com/models/mot/deepsort/jde_yolov3_darknet53_30e_1088x608_mix.pdparams) |[ReID model](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams)|[config](./deepsort/deepsort_jde_yolov3_pplcnet.yml) |
G
George Ni 已提交
59

60
### DeepSORT Results on MOT-16 Test Set
G
George Ni 已提交
61

62 63
| backbone  | input shape | MOTA | IDF1 |  IDS  |   FP  |   FN  |   FPS  | det result/model |ReID model| config |
| :---------| :------- | :----: | :----: | :--: | :----: | :---: | :---: | :---: | :---: | :---: |
64 65 66 67
| ResNet-101 | 1088x608 |  64.1  |  53.0  | 1024  |  12457  | 51919 |  - | [det result](https://dataset.bj.bcebos.com/mot/det_results_dir.zip) | [ReID model](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams)|[config](./deepsort/reid/deepsort_pcb_pyramid_r101.yml) |
| ResNet-101 | 1088x608 |  61.2  |  48.5  | 1799  |  25796  | 43232 |  - | [det model](https://paddledet.bj.bcebos.com/models/mot/deepsort/jde_yolov3_darknet53_30e_1088x608_mix.pdparams)  |[ReID model](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams)|[config](./deepsort/deepsort_jde_yolov3_pcb_pyramid.yml) |
| PPLCNet    | 1088x608 |  64.0  |  51.3  | 1208  |  12697  | 51784 |  - | [det result](https://dataset.bj.bcebos.com/mot/det_results_dir.zip) |[ReID model](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams)|[config](./deepsort/reid/deepsort_pplcnet.yml) |
| PPLCNet    | 1088x608 |  61.1  |  48.8  | 2010 |  25401 | 43432 |  - | [det model](https://paddledet.bj.bcebos.com/models/mot/deepsort/jde_yolov3_darknet53_30e_1088x608_mix.pdparams) |[ReID model](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams)|[config](./deepsort/deepsort_jde_yolov3_pplcnet.yml) |
G
George Ni 已提交
68 69

**Notes:**
70 71
DeepSORT does not need to train on MOT dataset, only used for evaluation. Now it supports two evaluation methods.
- 1.Load the result file and the ReID model. Before DeepSORT evaluation, you should get detection results by a detection model first, and then prepare them like this:
G
George Ni 已提交
72 73 74 75 76 77 78 79 80 81
```
det_results_dir
   |——————MOT16-02.txt
   |——————MOT16-04.txt
   |——————MOT16-05.txt
   |——————MOT16-09.txt
   |——————MOT16-10.txt
   |——————MOT16-11.txt
   |——————MOT16-13.txt
```
82
For MOT16 dataset, you can download a detection result after matching called det_results_dir.zip provided by PaddleDetection:
G
George Ni 已提交
83 84 85
```
wget https://dataset.bj.bcebos.com/mot/det_results_dir.zip
```
86
If you use a stronger detection model, you can get better results. Each txt is the detection result of all the pictures extracted from each video, and each line describes a bounding box with the following format:
G
George Ni 已提交
87
```
88
[frame_id],[x0],[y0],[w],[h],[score],[class_id]
G
George Ni 已提交
89
```
90 91 92 93 94
- `frame_id` is the frame number of the image.
- `x0,y0` is the X and Y coordinates of the left bound of the object box.
- `w,h` is the pixel width and height of the object box.
- `score` is the confidence score of the object box.
- `class_id` is the category of the object box, set `0` if only has one category.
G
George Ni 已提交
95

96
- 2.Load the detection model and the ReID model at the same time. Here, the JDE version of YOLOv3 is selected. For more detail of configuration, see `configs/mot/deepsort/deepsort_jde_yolov3_pcb_pyramid.yml` and `configs/mot/deepsort/deepsort_ppyolov2_pplcnet.yml` for other general detectors.
97

G
George Ni 已提交
98

99
### JDE Results on MOT-16 Training Set
G
George Ni 已提交
100 101 102

| backbone           | input shape | MOTA | IDF1  |  IDS  |   FP  |  FN  |  FPS  | download | config |
| :----------------- | :------- | :----: | :----: | :---: | :----: | :---: | :---: | :---: | :---: |
103 104 105
| DarkNet53          | 1088x608 |  72.0  |  66.9  | 1397  |  7274  | 22209 |   -   |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | [config](./jde/jde_darknet53_30e_1088x608.yml) |
| DarkNet53          | 864x480 |  69.1  |  64.7  | 1539  |  7544  | 25046 |   -   |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_864x480.pdparams) | [config](./jde/jde_darknet53_30e_864x480.yml) |
| DarkNet53          | 576x320 |  63.7  |  64.4  | 1310  |  6782  | 31964 |   -   |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [config](./jde/jde_darknet53_30e_576x320.yml) |
G
George Ni 已提交
106

107
### JDE Results on MOT-16 Test Set
G
George Ni 已提交
108 109 110 111

| backbone           | input shape | MOTA | IDF1  |  IDS  |   FP  |  FN  |  FPS  | download | config |
| :----------------- | :------- | :----: | :----: | :---: | :----: | :---: | :---: | :---: | :---: |
| DarkNet53(paper)   | 1088x608 |  64.4  |  55.8  | 1544  |    -    |   -   |   -   |   -  |   -   |
112
| DarkNet53          | 1088x608 |  64.6  |  58.5  | 1864  |  10550 | 52088 |   -   |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | [config](./jde/jde_darknet53_30e_1088x608.yml) |
G
George Ni 已提交
113
| DarkNet53(paper)   | 864x480 |   62.1  |  56.9  | 1608  |    -    |   -   |   -   |   -  |   -   |
114 115
| DarkNet53          | 864x480 |   63.2  |  57.7  | 1966  |  10070  | 55081 |   -   |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_864x480.pdparams) | [config](./jde/jde_darknet53_30e_864x480.yml) |
| DarkNet53          | 576x320 |   59.1  |  56.4  | 1911  |  10923  | 61789  |   -   |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [config](./jde/jde_darknet53_30e_576x320.yml) |
G
George Ni 已提交
116 117 118 119

**Notes:**
 JDE used 8 GPUs for training and mini-batch size as 4 on each GPU, and trained for 30 epoches.

G
George Ni 已提交
120

G
George 已提交
121
### FairMOT Results on MOT-16 Training Set
G
George Ni 已提交
122 123 124 125

| backbone       | input shape | MOTA | IDF1 |  IDS  |    FP   |   FN   |    FPS    | download | config |
| :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: |
| DLA-34(paper)  | 1088x608 |  83.3  |  81.9  |   544  |  3822  |  14095  |     -   |    -   |   -    |
F
Feng Ni 已提交
126 127 128
| DLA-34         | 1088x608 |  83.2  |  83.1  |   499  |  3861  |  14223  |     -   | [model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams) | [config](./fairmot/fairmot_dla34_30e_1088x608.yml) |
| DLA-34         | 864x480 |  80.8  |  81.1  |  561  |  3643  | 16967 |    -     |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_864x480.pdparams) | [config](./fairmot/fairmot_dla34_30e_864x480.yml) |
| DLA-34         | 576x320 |  74.0  |  76.1  |  640  |  4989  | 23034 |    -     |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_576x320.pdparams) | [config](./fairmot/fairmot_dla34_30e_576x320.yml) |
G
George Ni 已提交
129 130


G
George 已提交
131
### FairMOT Results on MOT-16 Test Set
G
George Ni 已提交
132 133 134 135

| backbone       | input shape | MOTA | IDF1 |  IDS  |    FP   |   FN   |    FPS    | download | config |
| :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: |
| DLA-34(paper)  | 1088x608 |  74.9  |  72.8  |  1074  |    -   |    -   |   25.9   |    -   |   -    |
F
Feng Ni 已提交
136 137 138
| DLA-34         | 1088x608 |  75.0  |  74.7  |  919   |  7934  |  36747 |    -     | [model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams) | [config](./fairmot/fairmot_dla34_30e_1088x608.yml) |
| DLA-34         | 864x480 |  73.0  |  72.6  |  977   |  7578  |  40601 |    -     |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_864x480.pdparams) | [config](./fairmot/fairmot_dla34_30e_864x480.yml) |
| DLA-34         | 576x320 |  69.9  |  70.2  |  1044   |  8869  |  44898 |    -     |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_576x320.pdparams) | [config](./fairmot/fairmot_dla34_30e_576x320.yml) |
G
George Ni 已提交
139 140

**Notes:**
F
Feng Ni 已提交
141
 FairMOT DLA-34 used 2 GPUs for training and mini-batch size as 6 on each GPU, and trained for 30 epoches.
G
George Ni 已提交
142

G
George Ni 已提交
143

F
Feng Ni 已提交
144 145 146 147
### FairMOT enhance model
### Results on MOT-16 Test Set
| backbone       | input shape |  MOTA  |  IDF1  |  IDS  |   FP  |   FN   |   FPS   |  download | config |
| :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: |
F
Feng Ni 已提交
148
| DLA-34         | 1088x608 |  75.9  |  74.7  |  1021   |  11425  |  31475 |    -     |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_enhance_dla34_30e_1088x608.pdparams) | [config](./fairmot_enhance_dla34_30e_1088x608.yml) |
F
Feng Ni 已提交
149 150 151 152

### Results on MOT-17 Test Set
| backbone       | input shape |  MOTA  |  IDF1  |   IDS  |   FP   |   FN   |    FPS   |  download  | config |
| :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: |
F
Feng Ni 已提交
153
| DLA-34         | 1088x608 |  75.3  |  74.2  |  3270  |  29112  | 106749 |    -     |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_enhance_dla34_30e_1088x608.pdparams) | [config](./fairmot_enhance_dla34_30e_1088x608.yml) |
F
Feng Ni 已提交
154

F
Feng Ni 已提交
155 156
**Notes:**
 FairMOT enhance DLA-34 used 8 GPUs for training and mini-batch size as 16 on each GPU,and trained for 60 epoches. The crowdhuman dataset is added to the train-set during training.
F
Feng Ni 已提交
157 158


F
Feng Ni 已提交
159
### FairMOT light model
F
Feng Ni 已提交
160 161 162 163 164 165 166 167 168
### Results on MOT-16 Test Set
| backbone       | input shape | MOTA | IDF1 |  IDS  |    FP   |   FN   |    FPS    | download | config |
| :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: |
| HRNetV2-W18   | 1088x608 |  71.7  |  66.6  |  1340  |  8642  | 41592 |    -     |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_1088x608.pdparams) | [config](./fairmot/fairmot_hrnetv2_w18_dlafpn_30e_1088x608.yml) |

### Results on MOT-17 Test Set
| backbone       | input shape | MOTA | IDF1 |  IDS  |    FP   |   FN   |    FPS    | download | config |
| :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: |
| HRNetV2-W18   | 1088x608 |  70.7  |  65.7  |  4281  |  22485  | 138468 |    -     |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_1088x608.pdparams) | [config](./fairmot/fairmot_hrnetv2_w18_dlafpn_30e_1088x608.yml) |
169 170
| HRNetV2-W18   | 864x480  |  70.3  |  65.8  |  4056  |  18927  | 144486 |    -     |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_864x480.pdparams) | [config](./fairmot/fairmot_hrnetv2_w18_dlafpn_30e_864x480.yml) |
| HRNetV2-W18   | 576x320  |  65.3  |  64.8  |  4137  |  28860  | 163017 |    -     |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_576x320.pdparams) | [config](./fairmot/fairmot_hrnetv2_w18_dlafpn_30e_576x320.yml) |
F
Feng Ni 已提交
171 172

**Notes:**
173
 FairMOT HRNetV2-W18 used 8 GPUs for training and mini-batch size as 4 on each GPU, and trained for 30 epoches. Only ImageNet pre-train model is used, and the optimizer adopts Momentum. The crowdhuman dataset is added to the train-set during training.
F
Feng Ni 已提交
174 175


G
George Ni 已提交
176 177
## Feature Tracking Model

F
Feng Ni 已提交
178
### [Head Tracking](./headtracking21/README.md)
G
George Ni 已提交
179 180 181 182

### FairMOT Results on HT-21 Training Set
|    backbone      |  input shape |  MOTA  |  IDF1  |  IDS  |   FP  |   FN   |   FPS   |  download | config |
| :--------------| :------- | :----: | :----: | :---: | :----: | :---: | :------: | :----: |:----: |
183
| DLA-34         | 1088x608 |  64.7 |  69.0  |   8533  |  148817  |  234970  |     -   | [model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_headtracking21.pdparams) | [config](./headtracking21/fairmot_dla34_30e_1088x608_headtracking21.yml) |
G
George Ni 已提交
184 185 186 187

### FairMOT Results on HT-21 Test Set
|    backbone      |  input shape |  MOTA  |  IDF1  |  IDS  |   FP  |   FN   |   FPS   |  download | config |
| :--------------| :------- | :----: | :----: | :----: | :----: | :----: |:-------: | :----: | :----: |
188 189
| DLA-34         | 1088x608 |  60.8  |  62.8  |  12781   |  118109  |  198896 |    -     | [model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_headtracking21.pdparams) | [config](./headtracking21/fairmot_dla34_30e_1088x608_headtracking21.yml) |

F
Feng Ni 已提交
190 191 192 193 194

### [Pedestrian Tracking](./pedestrian/README.md)
### FairMOT Results on each val-set of Pedestrian category
|    Dataset      |  input shape |  MOTA  |  IDF1  |  FPS   |  download | config |
| :-------------| :------- | :----: | :----: | :----: | :-----: |:------: |
195 196 197
|  PathTrack    | 1088x608 |  44.9 |    59.3   |    -   |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_pathtrack.pdparams) | [config](./pedestrian/fairmot_dla34_30e_1088x608_pathtrack.yml) |
|  VisDrone     | 1088x608 |  49.2 |   63.1 |    -   | [model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_visdrone_pedestrian.pdparams) | [config](./pedestrian/fairmot_dla34_30e_1088x608_visdrone_pedestrian.yml) |

F
Feng Ni 已提交
198 199 200 201 202

### [Vehicle Tracking](./vehicle/README.md)
### FairMOT Results on each val-set of Vehicle category
|    Dataset      |  input shape |  MOTA  |  IDF1  |  FPS   |  download | config |
| :-------------| :------- | :----: | :----: | :----: | :-----: |:------: |
F
Feng Ni 已提交
203 204 205
|  BDD100K      | 1088x608 |  43.5 |  50.0  |    -    | [model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_bdd100k_vehicle.pdparams) | [config](./vehicle/fairmot_dla34_30e_1088x608_bdd100k_vehicle.yml) |
|  KITTI        | 1088x608 |  82.7 |    -   |    -   |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_kitti_vehicle.pdparams) | [config](./vehicle/fairmot_dla34_30e_1088x608_kitti_vehicle.yml) |
|  VisDrone     | 1088x608 |  52.1 |   63.3 |    -   | [model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_visdrone_vehicle.pdparams) | [config](./vehicle/fairmot_dla34_30e_1088x608_visdrone_vehicle.yml) |
G
George Ni 已提交
206

G
George Ni 已提交
207 208 209
## Dataset Preparation

### MOT Dataset
G
George Ni 已提交
210
PaddleDetection use the same training data as [JDE](https://github.com/Zhongdao/Towards-Realtime-MOT) and [FairMOT](https://github.com/ifzhang/FairMOT). Please refer to [PrepareMOTDataSet](../../docs/tutorials/PrepareMOTDataSet.md) to download and prepare all the training data including **Caltech Pedestrian, CityPersons, CUHK-SYSU, PRW, ETHZ, MOT17 and MOT16**. The former six are used as the mixed dataset for training, and MOT16 are used as the evaluation dataset. In addition, you can use **MOT15 and MOT20** for finetune. All pedestrians in these datasets have detection bbox labels and some have ID labels. If you want to use these datasets, please **follow their licenses**.
G
George Ni 已提交
211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238

### Data Format
These several relevant datasets have the following structure:
```
Caltech
   |——————images
   |        └——————00001.jpg
   |        |—————— ...
   |        └——————0000N.jpg
   └——————labels_with_ids
            └——————00001.txt
            |—————— ...
            └——————0000N.txt
MOT17
   |——————images
   |        └——————train
   |        └——————test
   └——————labels_with_ids
            └——————train
```
Annotations of these datasets are provided in a unified format. Every image has a corresponding annotation text. Given an image path, the annotation text path can be generated by replacing the string `images` with `labels_with_ids` and replacing `.jpg` with `.txt`.

In the annotation text, each line is describing a bounding box and has the following format:
```
[class] [identity] [x_center] [y_center] [width] [height]
```
**Notes:**
- `class` should be `0`. Only single-class multi-object tracking is supported now.
239
- `identity` is an integer from `1` to `num_identities`(`num_identities` is the total number of instances of objects in the dataset), or `-1` if this box has no identity annotation.
240
- `[x_center] [y_center] [width] [height]` are the center coordinates, width and height, note that they are normalized by the width/height of the image, so they are floating point numbers ranging from 0 to 1.
G
George Ni 已提交
241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281

### Dataset Directory

First, follow the command below to download the `image_list.zip` and unzip it in the `dataset/mot` directory:
```
wget https://dataset.bj.bcebos.com/mot/image_lists.zip
```
Then download and unzip each dataset, and the final directory is as follows:
```
dataset/mot
  |——————image_lists
            |——————caltech.10k.val  
            |——————caltech.all  
            |——————caltech.train  
            |——————caltech.val  
            |——————citypersons.train  
            |——————citypersons.val  
            |——————cuhksysu.train  
            |——————cuhksysu.val  
            |——————eth.train  
            |——————mot15.train  
            |——————mot16.train  
            |——————mot17.train  
            |——————mot20.train  
            |——————prw.train  
            |——————prw.val
  |——————Caltech
  |——————Cityscapes
  |——————CUHKSYSU
  |——————ETHZ
  |——————MOT15
  |——————MOT16
  |——————MOT17
  |——————MOT20
  |——————PRW
```

## Getting Start

### 1. Training

282
Training FairMOT on 2 GPUs with following command
G
George Ni 已提交
283 284

```bash
285
python -m paddle.distributed.launch --log_dir=./fairmot_dla34_30e_1088x608/ --gpus 0,1 tools/train.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml
G
George Ni 已提交
286 287 288 289 290 291 292 293 294 295 296
```

### 2. Evaluation

Evaluating the track performance of FairMOT on val dataset in single GPU with following commands:

```bash
# use weights released in PaddleDetection model zoo
CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams

# use saved checkpoint in training
G
George Ni 已提交
297
CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=output/fairmot_dla34_30e_1088x608/model_final.pdparams
G
George Ni 已提交
298
```
G
George Ni 已提交
299
**Notes:**
G
George Ni 已提交
300
 The default evaluation dataset is MOT-16 Train Set. If you want to change the evaluation dataset, please refer to the following code and modify `configs/datasets/mot.yml`, modify `data_root`
G
George Ni 已提交
301 302 303 304 305 306 307
```
EvalMOTDataset:
  !MOTImageFolder
    dataset_dir: dataset/mot
    data_root: MOT17/images/train
    keep_ori_im: False # set True if save visualization images or video
```
G
George Ni 已提交
308

G
George Ni 已提交
309 310 311 312 313 314
### 3. Inference

Inference a vidoe on single GPU with following command:

```bash
# inference on video and save a video
315
CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams --video_file={your video name}.mp4 --frame_rate=20 --save_videos
G
George Ni 已提交
316
```
G
George Ni 已提交
317 318 319 320 321 322 323 324

Inference a image folder on single GPU with following command:

```bash
# inference image folder and save a video
CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams --image_dir={your infer images folder} --save_videos
```

G
George Ni 已提交
325
**Notes:**
326
 Please make sure that [ffmpeg](https://ffmpeg.org/ffmpeg.html) is installed first, on Linux(Ubuntu) platform you can directly install it by the following command:`apt-get update && apt-get install -y ffmpeg`. `--frame_rate` means the frame rate of the video and the frames extracted per second. It can be set by yourself, default value is -1 indicating the video frame rate read by OpenCV.
G
George Ni 已提交
327 328


329 330 331 332 333 334 335 336 337
### 4. Export model

```bash
CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams
```

### 5. Using exported model for python inference

```bash
338
python deploy/python/mot_jde_infer.py --model_dir=output_inference/fairmot_dla34_30e_1088x608 --video_file={your video name}.mp4 --device=GPU --save_mot_txts
339
```
340
**Notes:**
341
The tracking model is used to predict the video, and does not support the prediction of a single image. The visualization video of the tracking results is saved by default. You can add `--save_mot_txts` to save the txt result file, or `--save_images` to save the visualization images.
342

343 344 345 346 347
### 6. Using exported MOT and keypoint model for unite python inference

```bash
python deploy/python/mot_keypoint_unite_infer.py --mot_model_dir=output_inference/fairmot_dla34_30e_1088x608/ --keypoint_model_dir=output_inference/higherhrnet_hrnet_w32_512/ --video_file={your video name}.mp4 --device=GPU
```
348
**Notes:**
349
 Keypoint model export tutorial: `configs/keypoint/README.md`.
350

G
George Ni 已提交
351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378
## Citations
```
@inproceedings{Wojke2017simple,
  title={Simple Online and Realtime Tracking with a Deep Association Metric},
  author={Wojke, Nicolai and Bewley, Alex and Paulus, Dietrich},
  booktitle={2017 IEEE International Conference on Image Processing (ICIP)},
  year={2017},
  pages={3645--3649},
  organization={IEEE},
  doi={10.1109/ICIP.2017.8296962}
}

@inproceedings{Wojke2018deep,
  title={Deep Cosine Metric Learning for Person Re-identification},
  author={Wojke, Nicolai and Bewley, Alex},
  booktitle={2018 IEEE Winter Conference on Applications of Computer Vision (WACV)},
  year={2018},
  pages={748--756},
  organization={IEEE},
  doi={10.1109/WACV.2018.00087}
}

@article{wang2019towards,
  title={Towards Real-Time Multi-Object Tracking},
  author={Wang, Zhongdao and Zheng, Liang and Liu, Yixuan and Wang, Shengjin},
  journal={arXiv preprint arXiv:1909.12605},
  year={2019}
}
379 380 381 382 383 384 385

@article{zhang2020fair,
  title={FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking},
  author={Zhang, Yifu and Wang, Chunyu and Wang, Xinggang and Zeng, Wenjun and Liu, Wenyu},
  journal={arXiv preprint arXiv:2004.01888},
  year={2020}
}
G
George Ni 已提交
386
```