未验证 提交 e76e1a8a 编写于 作者: G George Ni 提交者: GitHub

[MOT] fix mot doc (#3025)

* fix mot doc

* remove image_lists, fix all mot docs, add custom data

* fix doc, test=document_fix

* fix dov

* fix doc format, test=document_fix
上级 6cfe3643
English | [简体中文](README_cn.md)
# MOT (Multi-Object Tracking)
## Table of Contents
- [Introduction](#Introduction)
- [Model Zoo](#Model_Zoo)
- [Dataset Preparation](#Dataset_Preparation)
- [Installation](#Installation)
- [Getting Start](#Getting_Start)
- [Citations](#Citations)
## Introduction
PaddleDetection implements three multi-object tracking methods.
- [DeepSORT](https://arxiv.org/abs/1812.00442) (Deep Cosine Metric Learning SORT) extends the original [SORT](https://arxiv.org/abs/1703.07402) (Simple Online and Realtime Tracking) algorithm to integrate appearance information based on a deep appearance descriptor. It adds a CNN model to extract features in image of human part bounded by a detector. Here we use `JDE` as detection model to generate boxes, and select `PCBPyramid` as the ReID model. We also support loading the boxes from saved detection result files.
- [JDE](https://arxiv.org/abs/1909.12605) (Joint Detection and Embedding) is a fast and high-performance multiple-object tracker that learns the object detection task and appearance embedding task simutaneously in a shared neural network.
- [FairMOT](https://arxiv.org/abs/2004.01888) focuses on accomplishing the detection and re-identification in a single network to improve the inference speed, presents a simple baseline which consists of two homogeneous branches to predict pixel-wise objectness scores and re-ID features. The achieved fairness between the two tasks allows FairMOT to obtain high levels of detection and tracking accuracy.
<div align="center">
<img src="../../docs/images/mot16_jde.gif" width=500 />
</div>
## Model Zoo
### JDE on MOT-16 training set
| backbone | input shape | MOTA | IDF1 | IDS | FP | FN | FPS | download | config |
| :----------------- | :------- | :----: | :----: | :---: | :----: | :---: | :---: | :---: | :---: |
| DarkNet53 | 1088x608 | 73.2 | 69.4 | 1320 | 6613 | 21629 | - |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_1088x608.yml) |
| DarkNet53 | 864x480 | 70.1 | 65.4 | 1341 | 6454 | 25208 | - |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_864x480.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_864x480.yml) |
| DarkNet53 | 576x320 | 63.1 | 64.6 | 1357 | 7083 | 32312 | - |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_576x320.yml) |
**Notes:**
JDE used 8 GPUs for training and mini-batch size as 4 on each GPU, and trained for 30 epoches.
### DeepSORT on MOT-16 training set
| backbone | input shape | MOTA | IDF1 | IDS | FP | FN | FPS | Detector | ReID | config |
| :---------| :------- | :----: | :----: | :--: | :----: | :---: | :---: |:-----: | :-----: | :-----: |
| DarkNet53 | 1088x608 | 72.2 | 60.5 | 998 | 8054 | 21644 | 5.07 |[JDE](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams)| [ReID](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) |
**Notes:**
DeepSORT does not need to train, only used for evaluation. Before DeepSORT evaluation, you should get detection results by a detection model first, here we use JDE, and then prepare them like this:
```
det_results_dir
|——————MOT16-02.txt
|——————MOT16-04.txt
|——————MOT16-05.txt
|——————MOT16-09.txt
|——————MOT16-10.txt
|——————MOT16-11.txt
|——————MOT16-13.txt
```
Each txt is the detection result of all the pictures extracted from each video, and each line describes a bounding box with the following format:
```
[frame_id][identity][bb_left][bb_top][width][height][conf][x][y][z]
```
**Notes:**
- `frame_id` is the frame number of the image
- `identity` is the object id using default value `-1`
- `bb_left` is the X coordinate of the left bound of the object box
- `bb_top` is the Y coordinate of the upper bound of the object box
- `width, height` is the pixel width and height
- `conf` is the object score with default value `1` (the results had been filtered out according to the detection score threshold)
- `x,y,z` are used in 3D, default to `-1` in 2D.
### FairMOT Results on MOT-16 train set
| backbone | input shape | MOTA | IDF1 | IDS | FP | FN | FPS | download | config |
| :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: |
| DLA-34(paper) | 1088x608 | 83.3 | 81.9 | 544 | 3822 | 14095 | - | - | - |
| DLA-34 | 1088x608 | 83.7 | 83.3 | 435 | 3829 | 13764 | - | [model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml) |
### FairMOT Results on MOT-16 test set
| backbone | input shape | MOTA | IDF1 | IDS | FP | FN | FPS | download | config |
| :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: |
| DLA-34(paper) | 1088x608 | 74.9 | 72.8 | 1074 | - | - | 25.9 | - | - |
| DLA-34 | 1088x608 | 74.8 | 74.4 | 930 | 7038 | 37994 | - | [model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml) |
**Notes:**
FairMOT used 8 GPUs for training and mini-batch size as 6 on each GPU, and trained for 30 epoches.
## Dataset Preparation
### MOT Dataset
PaddleDetection use the same training data as [JDE](https://github.com/Zhongdao/Towards-Realtime-MOT) and [FairMOT](https://github.com/ifzhang/FairMOT). Please refer to [PrepareMOTDataSet](../../docs/tutorials/PrepareMOTDataSet.md) to download and prepare all the training data including **Caltech Pedestrian, CityPersons, CUHK-SYSU, PRW, ETHZ, MOT17 and MOT16**. **MOT15 and MOT20** can also be downloaded from the official webpage of MOT challenge. If you want to use these datasets, please **follow their licenses**.
### Data Format
These several relevant datasets have the following structure:
```
Caltech
|——————images
| └——————00001.jpg
| |—————— ...
| └——————0000N.jpg
└——————labels_with_ids
└——————00001.txt
|—————— ...
└——————0000N.txt
MOT17
|——————images
| └——————train
| └——————test
└——————labels_with_ids
└——————train
```
Annotations of these datasets are provided in a unified format. Every image has a corresponding annotation text. Given an image path, the annotation text path can be generated by replacing the string `images` with `labels_with_ids` and replacing `.jpg` with `.txt`.
In the annotation text, each line is describing a bounding box and has the following format:
```
[class] [identity] [x_center] [y_center] [width] [height]
```
**Notes:**
- `class` should be `0`. Only single-class multi-object tracking is supported now.
- `identity` is an integer from `0` to `num_identities - 1`(`num_identities` is the total number of instances of objects in the dataset), or `-1` if this box has no identity annotation.
- `[x_center] [y_center] [width] [height]` are normalized by the width/height of the image, so they are floating point numbers ranging from 0 to 1.
### Dataset Directory
First, follow the command below to download the `image_list.zip` and unzip it in the `dataset/mot` directory:
```
wget https://dataset.bj.bcebos.com/mot/image_lists.zip
```
Then download and unzip each dataset, and the final directory is as follows:
```
dataset/mot
|——————image_lists
|——————caltech.10k.val
|——————caltech.all
|——————caltech.train
|——————caltech.val
|——————citypersons.train
|——————citypersons.val
|——————cuhksysu.train
|——————cuhksysu.val
|——————eth.train
|——————mot15.train
|——————mot16.train
|——————mot17.train
|——————mot20.train
|——————prw.train
|——————prw.val
|——————Caltech
|——————Cityscapes
|——————CUHKSYSU
|——————ETHZ
|——————MOT15
|——————MOT16
|——————MOT17
|——————MOT20
|——————PRW
```
## Installation
Install all the related dependencies for MOT:
```
pip install lap sklearn motmetrics openpyxl cython_bbox
or
pip install -r requirements.txt
```
**Notes:**
Install `cython_bbox` for windows, please refer to this [tutorial](https://stackoverflow.com/questions/60349980/is-there-a-way-to-install-cython-bbox-for-windows)
## Getting Start
### 1. Training
Training FairMOT on 8 GPUs with following command
```bash
python -m paddle.distributed.launch --log_dir=./fairmot_dla34_30e_1088x608/ --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml
```
### 2. Evaluation
Evaluating the track performance of FairMOT on val dataset in single GPU with following commands:
```bash
# use weights released in PaddleDetection model zoo
CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams
# use saved checkpoint in training
CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=output/fairmot_dla34_30e_1088x608/model_final
```
## Citations
```
@article{wang2019towards,
title={Towards Real-Time Multi-Object Tracking},
author={Wang, Zhongdao and Zheng, Liang and Liu, Yixuan and Wang, Shengjin},
journal={arXiv preprint arXiv:1909.12605},
year={2019}
}
@inproceedings{Wojke2017simple,
title={Simple Online and Realtime Tracking with a Deep Association Metric},
author={Wojke, Nicolai and Bewley, Alex and Paulus, Dietrich},
booktitle={2017 IEEE International Conference on Image Processing (ICIP)},
year={2017},
pages={3645--3649},
organization={IEEE},
doi={10.1109/ICIP.2017.8296962}
}
@inproceedings{Wojke2018deep,
title={Deep Cosine Metric Learning for Person Re-identification},
author={Wojke, Nicolai and Bewley, Alex},
booktitle={2018 IEEE Winter Conference on Applications of Computer Vision (WACV)},
year={2018},
pages={748--756},
organization={IEEE},
doi={10.1109/WACV.2018.00087}
}
@article{wang2019towards,
title={Towards Real-Time Multi-Object Tracking},
author={Wang, Zhongdao and Zheng, Liang and Liu, Yixuan and Wang, Shengjin},
journal={arXiv preprint arXiv:1909.12605},
year={2019}
}
```
简体中文 | [English](README.md)
# 多目标跟踪 (Multi-Object Tracking)
## 内容
- [简介](#简介)
- [模型库](#模型库)
- [数据集准备](#数据集准备)
- [安装依赖](#安装依赖)
- [快速开始](#快速开始)
- [引用](#引用)
## 简介
PaddleDetection实现了3种多目标跟踪方法。
- [DeepSORT](https://arxiv.org/abs/1812.00442)(Deep Cosine Metric Learning SORT) 扩展了原有的[SORT](https://arxiv.org/abs/1703.07402)(Simple Online and Realtime Tracking)算法,增加了一个CNN模型用于在检测器限定的人体部分图像中提取特征,在深度外观描述的基础上整合外观信息。
- [JDE](https://arxiv.org/abs/1909.12605)(Joint Detection and Embedding)是一个快速高性能多目标跟踪器,它是在共享神经网络中同时学习目标检测任务和外观嵌入任务的。
- [FairMOT](https://arxiv.org/abs/2004.01888)着重研究在单个网络中实现检测和ReID以提高推理速度,提出了一种由两个同质分支组成的简单基线来预测像素级目标得分和ReID特征,实现了两个任务之间的公平性,并获得了高水平的检测和跟踪精度。
<div align="center">
<img src="../../docs/images/mot16_jde.gif" width=500 />
</div>
## 模型库
### JDE在MOT-16 train集上结果
| 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 检测模型 | 配置文件 |
| :----------------- | :------- | :----: | :----: | :---: | :----: | :---: | :---: | :---: | :---: |
| DarkNet53 | 1088x608 | 73.2 | 69.4 | 1320 | 6613 | 21629 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_1088x608.yml) |
| DarkNet53 | 864x480 | 70.1 | 65.4 | 1341 | 6454 | 25208 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_864x480.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_864x480.yml) |
| DarkNet53 | 576x320 | 63.1 | 64.6 | 1357 | 7083 | 32312 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_576x320.yml) |
**注意:**
JDE使用8个GPU进行训练,每个GPU上batch size为4,训练30个epoch。
### DeepSORT在MOT-16 train集上结果
| 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 检测模型 | ReID模型 | 配置文件 |
| :---------| :------- | :----: | :----: | :--: | :----: | :---: | :---: |:-----: | :-----: | :-----: |
| DarkNet53 | 1088x608 | 72.2 | 60.5 | 998 | 8054 | 21644 | 5.07 |[JDE](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams)| [ReID](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) |
**注意:**
DeepSORT此处不需要训练MOT数据集,只用于评估。在使用DeepSORT模型评估之前,应该首先通过一个检测模型得到检测结果,此处使用JDE,然后像这样准备好结果文件:
```
det_results_dir
|——————MOT16-02.txt
|——————MOT16-04.txt
|——————MOT16-05.txt
|——————MOT16-09.txt
|——————MOT16-10.txt
|——————MOT16-11.txt
|——————MOT16-13.txt
```
其中每个txt是每个视频中所有图片的检测结果,每行都描述一个边界框,格式如下:
```
[frame_id][identity][bb_left][bb_top][width][height][conf][x][y][z]
```
**注意**:
- `frame_id`是图片帧的序号
- `identity`是目标id采用默认值为`-1`
- `bb_left`是目标框的左边界的x坐标
- `bb_top`是目标框的上边界的y坐标
- `width,height`是真实的像素宽高
- `conf`是目标得分设置为`1`(已经按检测的得分阈值筛选出的检测结果)
- `x,y,z`是3D中用到的,在2D中默认为`-1`
### FairMOT在MOT-16 train集上结果
| 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 下载链接 | 配置文件 |
| :--------------| :------- | :----: | :----: | :---: | :----: | :---: | :------: | :----: |:----: |
| DLA-34(paper) | 1088x608 | 83.3 | 81.9 | 544 | 3822 | 14095 | - | - | - |
| DLA-34 | 1088x608 | 83.7 | 83.3 | 435 | 3829 | 13764 | - |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml) |
### FairMOT在MOT-16 test集上结果
| 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 下载链接 | 配置文件 |
| :--------------| :------- | :----: | :----: | :----: | :----: | :----: |:-------: | :----: | :----: |
| DLA-34(paper) | 1088x608 | 74.9 | 72.8 | 1074 | - | - | 25.9 | - | - |
| DLA-34 | 1088x608 | 74.8 | 74.4 | 930 | 7038 | 37994 | - |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml) |
**注意:**
FairMOT使用8个GPU进行训练,每个GPU上batch size为6,训练30个epoch。
## 数据集准备
### MOT数据集
PaddleDetection使用和[JDE](https://github.com/Zhongdao/Towards-Realtime-MOT) 还有[FairMOT](https://github.com/ifzhang/FairMOT)相同的数据集。请参照[数据准备文档](../../docs/tutorials/PrepareMOTDataSet_cn.md)去下载并准备好所有的数据集包括**Caltech Pedestrian, CityPersons, CUHK-SYSU, PRW, ETHZ, MOT17和MOT16**。此外还可以下载**MOT15和MOT20**数据集,如果您想使用这些数据集,请**遵循他们的License**
### 数据格式
这几个相关数据集都遵循以下结构:
```
Caltech
|——————images
| └——————00001.jpg
| |—————— ...
| └——————0000N.jpg
└——————labels_with_ids
└——————00001.txt
|—————— ...
└——————0000N.txt
MOT17
|——————images
| └——————train
| └——————test
└——————labels_with_ids
└——————train
```
所有数据集的标注是以统一数据格式提供的。各个数据集中每张图片都有相应的标注文本。给定一个图像路径,可以通过将字符串`images`替换为`labels_with_ids`并将`.jpg`替换为`.txt`来生成标注文本路径。在标注文本中,每行都描述一个边界框,格式如下:
```
[class][identity][x_center][y_center][width][height]
```
**注意**:
- `class``0`,目前仅支持单类别多目标跟踪。
- `identity`是从`0``num_identifies-1`的整数(`num_identifies`是数据集中不同物体实例的总数),如果此框没有`identity`标注,则为`-1`
- `[x_center][y_center][width][height]`的值是由图片的宽度/高度标准化的,因此它们是从0到1的浮点数。
### 数据集目录
首先按照以下命令下载image_lists.zip并解压放在`dataset/mot`目录下:
```
wget https://dataset.bj.bcebos.com/mot/image_lists.zip
```
然后依次下载各个数据集并解压,最终目录为:
```
dataset/mot
|——————image_lists
|——————caltech.10k.val
|——————caltech.all
|——————caltech.train
|——————caltech.val
|——————citypersons.train
|——————citypersons.val
|——————cuhksysu.train
|——————cuhksysu.val
|——————eth.train
|——————mot15.train
|——————mot16.train
|——————mot17.train
|——————mot20.train
|——————prw.train
|——————prw.val
|——————Caltech
|——————Cityscapes
|——————CUHKSYSU
|——————ETHZ
|——————MOT15
|——————MOT16
|——————MOT17
|——————MOT20
|——————PRW
```
## 安装依赖
一键安装MOT相关的依赖:
```
pip install lap sklearn motmetrics openpyxl cython_bbox
或者
pip install -r requirements.txt
```
**注意:**
`cython_bbox`在windows上安装可参考这个[教程](https://stackoverflow.com/questions/60349980/is-there-a-way-to-install-cython-bbox-for-windows)
## 快速开始
### 1. 训练
FairMOT使用8GPU通过如下命令一键式启动训练
```bash
python -m paddle.distributed.launch --log_dir=./fairmot_dla34_30e_1088x608/ --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml
```
### 2. 评估
FairMOT使用单张GPU通过如下命令一键式启动评估
```bash
# 使用PaddleDetection发布的权重
CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams
# 使用训练保存的checkpoint
CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=output/fairmot_dla34_30e_1088x608/model_final
```
## 引用
```
@article{wang2019towards,
title={Towards Real-Time Multi-Object Tracking},
author={Wang, Zhongdao and Zheng, Liang and Liu, Yixuan and Wang, Shengjin},
journal={arXiv preprint arXiv:1909.12605},
year={2019}
}
@inproceedings{Wojke2017simple,
title={Simple Online and Realtime Tracking with a Deep Association Metric},
author={Wojke, Nicolai and Bewley, Alex and Paulus, Dietrich},
booktitle={2017 IEEE International Conference on Image Processing (ICIP)},
year={2017},
pages={3645--3649},
organization={IEEE},
doi={10.1109/ICIP.2017.8296962}
}
@inproceedings{Wojke2018deep,
title={Deep Cosine Metric Learning for Person Re-identification},
author={Wojke, Nicolai and Bewley, Alex},
booktitle={2018 IEEE Winter Conference on Applications of Computer Vision (WACV)},
year={2018},
pages={748--756},
organization={IEEE},
doi={10.1109/WACV.2018.00087}
}
@article{wang2019towards,
title={Towards Real-Time Multi-Object Tracking},
author={Wang, Zhongdao and Zheng, Liang and Liu, Yixuan and Wang, Shengjin},
journal={arXiv preprint arXiv:1909.12605},
year={2019}
}
```
...@@ -6,20 +6,21 @@ English | [简体中文](README_cn.md) ...@@ -6,20 +6,21 @@ English | [简体中文](README_cn.md)
- [Introduction](#Introduction) - [Introduction](#Introduction)
- [Model Zoo](#Model_Zoo) - [Model Zoo](#Model_Zoo)
- [Getting Start](#Getting_Start) - [Getting Start](#Getting_Start)
- [Citations](#Citations)
## Introduction ## Introduction
[DeepSORT](https://arxiv.org/abs/1812.00442) is basicly the same with SORT but added a CNN model to extract features in image of human part bounded by a detector. We use JDE as detection model to generate boxes, and select `PCBPyramid` as the ReID model. We also support loading the boxes from saved detection result files. [DeepSORT](https://arxiv.org/abs/1812.00442) (Deep Cosine Metric Learning SORT) extends the original [SORT](https://arxiv.org/abs/1703.07402) (Simple Online and Realtime Tracking) algorithm to integrate appearance information based on a deep appearance descriptor. It adds a CNN model to extract features in image of human part bounded by a detector. Here we use `JDE` as detection model to generate boxes, and select `PCBPyramid` as the ReID model. We also support loading the boxes from saved detection result files.
## Model Zoo ## Model Zoo
### DeepSORT on MOT-16 training set ### DeepSORT on MOT-16 training set
| backbone | input shape | MOTA | IDF1 | IDS | FP | FN | FPS | Detector | ReID | config | | backbone | input shape | MOTA | IDF1 | IDS | FP | FN | FPS | Detector | ReID | config |
| :---------| :------- | :----: | :----: | :--: | :----: | :---: | :---: |:---: | :---: | :---: | | :---------| :------- | :----: | :----: | :--: | :----: | :---: | :---: |:-------: | :---: | :---: |
| DarkNet53 | 1088x608 | 72.2 | 60.3 | 998 | 8055 | 21631 | 3.28 |[JDE](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams)| [ReID](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) | | DarkNet53 | 1088x608 | 72.2 | 60.5 | 998 | 8054 | 21644 | 5.07 |[JDE](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams)| [ReID](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) |
**Notes:** **Notes:**
DeepSORT does not need to train, only used for evaluation. Before DeepSORT evaluation, you should get detection results by a detection model first, here we use JDE, and then prepare them like this: DeepSORT does not need to train on MOT dataset, only used for evaluation. Before DeepSORT evaluation, you should get detection results by a detection model first, here we use JDE, and then prepare them like this:
``` ```
det_results_dir det_results_dir
|——————MOT16-02.txt |——————MOT16-02.txt
...@@ -30,6 +31,12 @@ det_results_dir ...@@ -30,6 +31,12 @@ det_results_dir
|——————MOT16-11.txt |——————MOT16-11.txt
|——————MOT16-13.txt |——————MOT16-13.txt
``` ```
Each txt is the detection result of all the pictures extracted from each video, and each line describes a bounding box with the following format:
```
[frame_id][identity][bb_left][bb_top][width][height][conf][x][y][z]
```
**Notes:**
`frame_id` is the frame number of the image, `identity` is the object id using default value `-1`, `bb_left` is the X coordinate of the left bound of the object box, `bb_top` is the Y coordinate of the upper boundary of the object box, `width, height` is the pixel width and height, `conf` is the object score with default value `1` (the results had been filtered out according to the detection score threshold), `x,y,z` are used in 3D, default to `-1` in 2D.
## Getting Start ## Getting Start
......
简体中文 | [English](README.md) 简体中文 | [English](README.md)
# DeepSORT (Simple Online and Realtime Tracking with a Deep Association Metric) # DeepSORT
## 内容 ## 内容
- [简介](#简介) - [简介](#简介)
- [模型库与基线](#模型库与基线) - [模型库](#模型库)
- [快速开始](#快速开始) - [快速开始](#快速开始)
- [引用](#引用)
## 简介 ## 简介
[DeepSORT](https://arxiv.org/abs/1812.00442) 与SORT基本类似,但增加了一个CNN模型用于在检测器限定的人体部分图像中提取特征。我们使用JDE作为检测模型来生成检测框,并选择`PCBPyramid`作为ReID模型。我们还支持加载保存的检测结果文件来进行预测跟踪。 [DeepSORT](https://arxiv.org/abs/1812.00442) (Deep Cosine Metric Learning SORT) 扩展了原有的[SORT](https://arxiv.org/abs/1703.07402) (Simple Online and Realtime Tracking)算法,增加了一个CNN模型用于在检测器限定的人体部分图像中提取特征,在深度外观描述的基础上整合外观信息。我们使用`JDE`作为检测模型来生成检测框,并选择`PCBPyramid`作为ReID模型。我们还支持加载保存的检测结果文件来进行预测跟踪。
## 模型库与基线 ## 模型库
### DeepSORT on MOT-16 training set ### DeepSORT在MOT-16 train集上结果
| 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 检测模型 | ReID模型 | 配置文件 | | 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 检测模型 | ReID模型 | 配置文件 |
| :---------| :------- | :----: | :----: | :--: | :----: | :---: | :---: |:---: | :---: | :---: | | :---------| :------- | :----: | :----: | :--: | :----: | :---: | :---: |:-----: | :-----: | :-----: |
| DarkNet53 | 1088x608 | 72.2 | 60.3 | 998 | 8055 | 21631 | 3.28 |[JDE](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams)| [ReID](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) | | DarkNet53 | 1088x608 | 72.2 | 60.5 | 998 | 8054 | 21644 | 5.07 |[JDE](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams)| [ReID](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) |
**Notes:** **注意:**
DeepSORT此处不需要训练,只用于评估。在使用DeepSORT模型评估之前,应该首先通过一个检测模型得到检测结果,这里我们使用JDE,然后像这样准备好结果文件: DeepSORT此处不需要训练MOT数据集,只用于评估。在使用DeepSORT模型评估之前,应该首先通过一个检测模型得到检测结果,此处使用JDE,然后像这样准备好结果文件:
``` ```
det_results_dir det_results_dir
|——————MOT16-02.txt |——————MOT16-02.txt
...@@ -30,6 +31,11 @@ det_results_dir ...@@ -30,6 +31,11 @@ det_results_dir
|——————MOT16-11.txt |——————MOT16-11.txt
|——————MOT16-13.txt |——————MOT16-13.txt
``` ```
其中每个txt是每个视频中所有图片的检测结果,每行都描述一个边界框,格式如下:
```
[frame_id][identity][bb_left][bb_top][width][height][conf][x][y][z]
```
**注意**: `frame_id`是图片帧的序号,`identity`是目标id采用默认值为`-1``bb_left`是目标框的左边界的x坐标,`bb_top`是目标框的上边界的y坐标,`width,height`是真实的像素宽高,`conf`是目标得分设置为`1`(已经按检测的得分阈值筛选出的检测结果),`x,y,z`是3D中用到的,在2D中默认为`-1`即可。
## 快速开始 ## 快速开始
......
English | [简体中文](README_cn.md)
# FairMOT (FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking) # FairMOT (FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking)
## Table of Contents ## Table of Contents
...@@ -8,38 +10,37 @@ ...@@ -8,38 +10,37 @@
## Introduction ## Introduction
FairMOT focuses on accomplishing the detection and re-identification in a single network to improve the inference speed, presents a simple baseline which consists of two homogeneous branches to predict pixel-wise objectness scores and re-ID features. The achieved fairness between the two tasks allows FairMOT to obtain high levels of detection and tracking accuracy. [FairMOT](https://arxiv.org/abs/2004.01888) focuses on accomplishing the detection and re-identification in a single network to improve the inference speed, presents a simple baseline which consists of two homogeneous branches to predict pixel-wise objectness scores and re-ID features. The achieved fairness between the two tasks allows FairMOT to obtain high levels of detection and tracking accuracy.
## Model Zoo ## Model Zoo
### Results on MOT-16 train set ### FairMOT Results on MOT-16 train set
| backbone | input shape | MOTA | IDF1 | IDS | FP | FN | download | config | | backbone | input shape | MOTA | IDF1 | IDS | FP | FN | FPS | download | config |
| :-----------------| :------- | :----: | :----: | :---: | :----: | :---: |:---: | :---: | | :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: |
| DLA-34(paper) | 1088x608 | 83.3 | 81.9 | 544 | 3822 | 14095 | ---- | ---- | | DLA-34(paper) | 1088x608 | 83.3 | 81.9 | 544 | 3822 | 14095 | - | - | - |
| DLA-34 | 1088x608 | 83.4 | 82.7 | 517 | 4077 | 13761 | [model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml) | | DLA-34 | 1088x608 | 83.7 | 83.3 | 435 | 3829 | 13764 | - | [model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml) |
### Results on MOT-16 test set ### FairMOT Results on MOT-16 test set
| backbone | input shape | MOTA | IDF1 | IDS | MT | ML | download | config | | backbone | input shape | MOTA | IDF1 | IDS | FP | FN | FPS | download | config |
| :-----------------| :------- | :----: | :----: | :---: | :----: | :---: | :---: | :---: | | :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: |
| DLA-34(paper) | 1088x608 | 74.9 72.8 1074 44.7% 15.9% | ---- | ---- | | DLA-34(paper) | 1088x608 | 74.9 | 72.8 | 1074 | - | - | 25.9 | - | - |
| DLA-34 | 1088x608 | 74.7 | 72.8 | 1044 | 41.9% | 19.1% |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml) | | DLA-34 | 1088x608 | 74.8 | 74.4 | 930 | 7038 | 37994 | - | [model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml) |
**Notes:** **Notes:**
FairMOT used 8 GPUs for training and mini-batch size as 6 on each GPU, and trained for 30 epoches.
FairMOT used 2 GPUs for training and mini-batch size as 6 on each GPU, and trained for 30 epoches.
## Getting Start ## Getting Start
### 1. Training ### 1. Training
Training FairMOT on 2 GPUs with following command Training FairMOT on 8 GPUs with following command
```bash ```bash
python -m paddle.distributed.launch --log_dir=./fairmot_dla34_30e_1088x608/ --gpus 0,1 tools/train.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml &>fairmot_dla34_30e_1088x608.log 2>&1 & python -m paddle.distributed.launch --log_dir=./fairmot_dla34_30e_1088x608/ --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml
``` ```
......
简体中文 | [English](README.md)
# FairMOT (FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking)
## 内容
- [简介](#简介)
- [模型库](#模型库)
- [快速开始](#快速开始)
- [引用](#引用)
## 内容
[FairMOT](https://arxiv.org/abs/2004.01888)着重研究在单个网络中实现检测和ReID以提高推理速度,提出了一种由两个同质分支组成的简单基线来预测像素级目标得分和ReID特征, 实现了两个任务之间的公平性,并获得高水平的检测和跟踪精度。
## 模型库
### FairMOT在MOT-16 train集上结果
| 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 下载链接 | 配置文件 |
| :--------------| :------- | :----: | :----: | :---: | :----: | :---: | :------: | :----: |:----: |
| DLA-34(paper) | 1088x608 | 83.3 | 81.9 | 544 | 3822 | 14095 | - | - | - |
| DLA-34 | 1088x608 | 83.7 | 83.3 | 435 | 3829 | 13764 | - |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml) |
### FairMOT在MOT-16 test集上结果
| 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 下载链接 | 配置文件 |
| :--------------| :------- | :----: | :----: | :----: | :----: | :----: |:-------: | :----: | :----: |
| DLA-34(paper) | 1088x608 | 74.9 | 72.8 | 1074 | - | - | 25.9 | - | - |
| DLA-34 | 1088x608 | 74.8 | 74.4 | 930 | 7038 | 37994 | - |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml) |
**注意:**
FairMOT使用8个GPU进行训练,每个GPU上batch size为6,训练30个epoch。
## 快速开始
### 1. 训练
使用8GPU通过如下命令一键式启动训练
```bash
python -m paddle.distributed.launch --log_dir=./fairmot_dla34_30e_1088x608/ --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml
```
### 2. 评估
使用单张GPU通过如下命令一键式启动评估
```bash
# 使用PaddleDetection发布的权重
CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams
# 使用训练保存的checkpoint
CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=output/fairmot_dla34_30e_1088x608/model_final
```
## 引用
```
@article{wang2019towards,
title={Towards Real-Time Multi-Object Tracking},
author={Wang, Zhongdao and Zheng, Liang and Liu, Yixuan and Wang, Shengjin},
journal={arXiv preprint arXiv:1909.12605},
year={2019}
}
```
epoch: 30 epoch: 30
LearningRate: LearningRate:
base_lr: 0.0001 base_lr: 0.0004
schedulers: schedulers:
- !PiecewiseDecay - !PiecewiseDecay
gamma: 0.1 gamma: 0.1
......
English | [简体中文](README_cn.md) English | [简体中文](README_cn.md)
# JDE (Towards-Realtime-MOT) # JDE (Joint Detection and Embedding)
## Table of Contents ## Table of Contents
- [Introduction](#Introduction) - [Introduction](#Introduction)
- [Model Zoo](#Model_Zoo) - [Model Zoo](#Model_Zoo)
- [Getting Start](#Getting_Start) - [Getting Start](#Getting_Start)
- [Citations](#Citations)
## Introduction ## Introduction
[Joint Detection and Embedding](https://arxiv.org/abs/1909.12605)(JDE) is a fast and high-performance multiple-object tracker that learns the object detection task and appearance embedding task simutaneously in a shared neural network. [JDE](https://arxiv.org/abs/1909.12605) (Joint Detection and Embedding) is a fast and high-performance multiple-object tracker that learns the object detection task and appearance embedding task simutaneously in a shared neural network.
<div align="center"> <div align="center">
<img src="../../../docs/images/mot16_jde.gif" width=500 /> <img src="../../../docs/images/mot16_jde.gif" width=500 />
</div> </div>
...@@ -35,7 +36,7 @@ English | [简体中文](README_cn.md) ...@@ -35,7 +36,7 @@ English | [简体中文](README_cn.md)
Training JDE on 8 GPUs with following command Training JDE on 8 GPUs with following command
```bash ```bash
python -m paddle.distributed.launch --log_dir=./jde_darknet53_30e_1088x608/ --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/mot/jde/jde_darknet53_30e_1088x608.yml &>jde_darknet53_30e_1088x608.log 2>&1 & python -m paddle.distributed.launch --log_dir=./jde_darknet53_30e_1088x608/ --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/mot/jde/jde_darknet53_30e_1088x608.yml
``` ```
### 2. Evaluation ### 2. Evaluation
...@@ -59,7 +60,7 @@ Inference a vidoe in single GPU with following commands. ...@@ -59,7 +60,7 @@ Inference a vidoe in single GPU with following commands.
CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/jde/jde_darknet53_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams --video_file={your video name}.mp4 --save_videos CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/jde/jde_darknet53_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams --video_file={your video name}.mp4 --save_videos
``` ```
**Notes:** **Notes:**
Please make sure that `ffmpeg` is installed first. Please make sure that [ffmpeg](https://www.ffmpeg.org) is installed first.
## Citations ## Citations
``` ```
......
简体中文 | [English](README.md) 简体中文 | [English](README.md)
# JDE (Towards-Realtime-MOT) # JDE (Joint Detection and Embedding)
## 内容 ## 内容
- [简介](#简介) - [简介](#简介)
- [模型库与基线](#模型库与基线) - [模型库](#模型库)
- [快速开始](#快速开始) - [快速开始](#快速开始)
- [引用](#引用)
## 内容 ## 内容
[Joint Detection and Embedding](https://arxiv.org/abs/1909.12605)(JDE) 是一个快速高性能多目标跟踪器,它是在共享神经网络中同时学习目标检测任务和外观嵌入任务的。 [JDE](https://arxiv.org/abs/1909.12605) (Joint Detection and Embedding)是一个快速高性能多目标跟踪器,它是在共享神经网络中同时学习目标检测任务和外观嵌入任务的。
<div align="center"> <div align="center">
<img src="../../../docs/images/mot16_jde.gif" width=500 /> <img src="../../../docs/images/mot16_jde.gif" width=500 />
</div> </div>
## 模型库与基线 ## 模型库
### JDE on MOT-16 training set ### JDE在MOT-16 train集上结果
| 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 检测模型 | 配置文件 | | 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 检测模型 | 配置文件 |
| :----------------- | :------- | :----: | :----: | :---: | :----: | :---: | :---: | :---: | :---: | | :----------------- | :------- | :----: | :----: | :---: | :----: | :---: | :---: | :---: | :---: |
...@@ -26,7 +26,7 @@ ...@@ -26,7 +26,7 @@
| DarkNet53 | 576x320 | 63.1 | 64.6 | 1357 | 7083 | 32312 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_576x320.yml) | | DarkNet53 | 576x320 | 63.1 | 64.6 | 1357 | 7083 | 32312 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_576x320.yml) |
**注意:** **注意:**
JDE使用8个GPU进行训练,每个GPU上batch size为4,训练了30个epoches JDE使用8个GPU进行训练,每个GPU上batch size为4,训练了30个epoch。
## 快速开始 ## 快速开始
...@@ -35,7 +35,7 @@ ...@@ -35,7 +35,7 @@
使用8GPU通过如下命令一键式启动训练 使用8GPU通过如下命令一键式启动训练
```bash ```bash
python -m paddle.distributed.launch --log_dir=./jde_darknet53_30e_1088x608/ --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/mot/jde/jde_darknet53_30e_1088x608.yml &>jde_darknet53_30e_1088x608.log 2>&1 & python -m paddle.distributed.launch --log_dir=./jde_darknet53_30e_1088x608/ --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/mot/jde/jde_darknet53_30e_1088x608.yml
``` ```
### 2. 评估 ### 2. 评估
...@@ -59,7 +59,7 @@ CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/jde/jde_darknet53 ...@@ -59,7 +59,7 @@ CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/jde/jde_darknet53
CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/jde/jde_darknet53_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams --video_file={your video name}.mp4 --save_videos CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/jde/jde_darknet53_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams --video_file={your video name}.mp4 --save_videos
``` ```
**注意:** **注意:**
请先确保已经安装了`ffmpeg` 请先确保已经安装了[ffmpeg](https://www.ffmpeg.org)
## 引用 ## 引用
``` ```
......
...@@ -19,6 +19,7 @@ import numpy as np ...@@ -19,6 +19,7 @@ import numpy as np
MOT_data = 'MOT16' MOT_data = 'MOT16'
# choose a data in ['MOT15', 'MOT16', 'MOT17', 'MOT20'] # choose a data in ['MOT15', 'MOT16', 'MOT17', 'MOT20']
# or your custom data (prepare it following the 'docs/tutorials/PrepareMOTDataSet.md')
def mkdirs(d): def mkdirs(d):
......
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
# MOT Dataset English | [简体中文](PrepareMOTDataSet_cn.md)
* **MIXMOT**
We use the same training data as [JDE](https://github.com/Zhongdao/Towards-Realtime-MOT) and [FairMOT](https://github.com/ifzhang/FairMOT) in this part and we call it "MIXMOT". Please refer to their [DATA ZOO](https://github.com/Zhongdao/Towards-Realtime-MOT/blob/master/DATASET_ZOO.md) to download and prepare all the training data including Caltech Pedestrian, CityPersons, CUHK-SYSU, PRW, ETHZ, MOT17 and MOT16.
* **2DMOT15 and MOT20** # Contents
[2DMOT15](https://motchallenge.net/data/2D_MOT_2015/) and [MOT20](https://motchallenge.net/data/MOT20/) can be downloaded from the official webpage of MOT challenge. After downloading, you should prepare the data in the following structure: ## Multi-Object Tracking Dataset Preparation
``` - [MOT Dataset](#MOT_Dataset)
MOT15 - [Data Format](#Data_Format)
|——————images - [Dataset Directory](#Dataset_Directory)
| └——————train - [Download Links](#Download_Links)
| └——————test - [Custom Dataset Preparation](#Custom_Dataset_Preparation)
└——————labels_with_ids - [Citations](#Citations)
└——————train
MOT20 ### MOT Dataset
|——————images PaddleDetection uses the same training data as [JDE](https://github.com/Zhongdao/Towards-Realtime-MOT) and [FairMOT](https://github.com/ifzhang/FairMOT). Please download and prepare all the training data including Caltech Pedestrian, CityPersons, CUHK-SYSU, PRW, ETHZ, MOT17 and MOT16. MOT15 and MOT20 can also be downloaded from the official webpage of MOT challenge. If you want to use these datasets, please **follow their licenses**.
| └——————train
| └——————test ### Data Format
└——————labels_with_ids These several relevant datasets have the following structure:
└——————train
```
Annotations of these several relevant datasets are provided in a unified format. If you want to use these datasets, please **follow their licenses**,
and if you use any of these datasets in your research, please cite the original work (you can find the BibTeX in the bottom).
## Data Format
All the datasets have the following structure:
``` ```
Caltech Caltech
|——————images |——————images
...@@ -32,9 +24,14 @@ Caltech ...@@ -32,9 +24,14 @@ Caltech
└——————00001.txt └——————00001.txt
|—————— ... |—————— ...
└——————0000N.txt └——————0000N.txt
MOT17
|——————images
| └——————train
| └——————test
└——————labels_with_ids
└——————train
``` ```
Every image has a corresponding annotation text. Given an image path, Annotations of these datasets are provided in a unified format. Every image has a corresponding annotation text. Given an image path, the annotation text path can be generated by replacing the string `images` with `labels_with_ids` and replacing `.jpg` with `.txt`.
the annotation text path can be generated by replacing the string `images` with `labels_with_ids` and replacing `.jpg` with `.txt`.
In the annotation text, each line is describing a bounding box and has the following format: In the annotation text, each line is describing a bounding box and has the following format:
``` ```
...@@ -46,11 +43,18 @@ The field `[identity]` is an integer from `0` to `num_identities - 1`, or `-1` i ...@@ -46,11 +43,18 @@ The field `[identity]` is an integer from `0` to `num_identities - 1`, or `-1` i
***Note** that the values of `[x_center] [y_center] [width] [height]` are normalized by the width/height of the image, so they are floating point numbers ranging from 0 to 1. ***Note** that the values of `[x_center] [y_center] [width] [height]` are normalized by the width/height of the image, so they are floating point numbers ranging from 0 to 1.
## Final Dataset root ### Dataset Directory
First, follow the command below to download the `image_list.zip` and unzip it in the `dataset/mot` directory:
```
wget https://dataset.bj.bcebos.com/mot/image_lists.zip
```
Then download and unzip each dataset, and the final directory is as follows:
``` ```
dataset/mot dataset/mot
|——————image_lists |——————image_lists
|——————caltech.10k.val |——————caltech.10k.val
|——————caltech.all
|——————caltech.train |——————caltech.train
|——————caltech.val |——————caltech.val
|——————citypersons.train |——————citypersons.train
...@@ -58,8 +62,10 @@ dataset/mot ...@@ -58,8 +62,10 @@ dataset/mot
|——————cuhksysu.train |——————cuhksysu.train
|——————cuhksysu.val |——————cuhksysu.val
|——————eth.train |——————eth.train
|——————mot15.train
|——————mot16.train |——————mot16.train
|——————mot17.train |——————mot17.train
|——————mot20.train
|——————prw.train |——————prw.train
|——————prw.val |——————prw.val
|——————Caltech |——————Caltech
...@@ -73,9 +79,83 @@ dataset/mot ...@@ -73,9 +79,83 @@ dataset/mot
|——————PRW |——————PRW
``` ```
## Download ### Custom Dataset Preparation
In order to standardize training and evaluation, custom data needs to be converted into the same directory and format as MOT-17 dataset:
```
custom_data
|——————images
| └——————test
| └——————train
| └——————seq1
| | └——————gt
| | | └——————gt.txt
| | └——————img1
| | | └——————000001.jpg
| | | |——————000002.jpg
| | | └—————— ...
| | └——————seqinfo.ini
| └——————seq2
| └——————...
└——————labels_with_ids
└——————train
└——————seq1
| └——————000001.txt
| |——————000002.txt
| └—————— ...
└——————seq2
└—————— ...
```
#### images
- `gt.txt` is the original annotation file of all images extracted from the video.
- `img1` is the folder of images extracted from the video by a certain frame rate.
- `seqinfo.ini` is a video information description file, and the following format is required:
```
[Sequence]
name=MOT16-02
imDir=img1
frameRate=30
seqLength=600
imWidth=1920
imHeight=1080
imExt=.jpg
```
Each line in `gt.txt` describes a bounding box, with the format as follows:
```
[frame_id][identity][bb_left][bb_top][width][height][x][y][z]
```
**Notes:**:
- `frame_id` is the current frame id.
- `identity` is an integer from `0` to `num_identities - 1`(`num_identities` is the total number of instances of objects in the dataset), or `-1` if this box has no identity annotation.
- `bb_left` is the x coordinate of the left boundary of the target box
- `bb_top` is the Y coordinate of the upper boundary of the target box
- `width, height` are the pixel width and height, and `x,y,z` are only used in 3D.
#### labels_with_ids
Annotations of these datasets are provided in a unified format. Every image has a corresponding annotation text. Given an image path, the annotation text path can be generated by replacing the string `images` with `labels_with_ids` and replacing `.jpg` with `.txt`.
### Caltech Pedestrian In the annotation text, each line is describing a bounding box and has the following format:
```
[class] [identity] [x_center] [y_center] [width] [height]
```
**Notes:**
- `class` should be `0`. Only single-class multi-object tracking is supported now.
- `identity` is an integer from `0` to `num_identities - 1`(`num_identities` is the total number of instances of objects in the dataset), or `-1` if this box has no identity annotation.
- `[x_center] [y_center] [width] [height]` are normalized by the width/height of the image, so they are floating point numbers ranging from 0 to 1.
Generate the corresponding `labels_with_ids` with following command:
```
cd dataset/mot
python gen_labels_MOT.py
```
### Download Links
#### Caltech Pedestrian
Baidu NetDisk: Baidu NetDisk:
[[0]](https://pan.baidu.com/s/1sYBXXvQaXZ8TuNwQxMcAgg) [[0]](https://pan.baidu.com/s/1sYBXXvQaXZ8TuNwQxMcAgg)
[[1]](https://pan.baidu.com/s/1lVO7YBzagex1xlzqPksaPw) [[1]](https://pan.baidu.com/s/1lVO7YBzagex1xlzqPksaPw)
...@@ -91,7 +171,8 @@ please download all the images `.tar` files from [this page](http://www.vision.c ...@@ -91,7 +171,8 @@ please download all the images `.tar` files from [this page](http://www.vision.c
You may need [this tool](https://github.com/mitmul/caltech-pedestrian-dataset-converter) to convert the original data format to jpeg images. You may need [this tool](https://github.com/mitmul/caltech-pedestrian-dataset-converter) to convert the original data format to jpeg images.
Original dataset webpage: [CaltechPedestrians](http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/) Original dataset webpage: [CaltechPedestrians](http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/)
### CityPersons
#### CityPersons
Baidu NetDisk: Baidu NetDisk:
[[0]](https://pan.baidu.com/s/1g24doGOdkKqmbgbJf03vsw) [[0]](https://pan.baidu.com/s/1g24doGOdkKqmbgbJf03vsw)
[[1]](https://pan.baidu.com/s/1mqDF9M5MdD3MGxSfe0ENsA) [[1]](https://pan.baidu.com/s/1mqDF9M5MdD3MGxSfe0ENsA)
...@@ -104,9 +185,9 @@ Google Drive: ...@@ -104,9 +185,9 @@ Google Drive:
[[2]](https://drive.google.com/file/d/1q_OltirP68YFvRWgYkBHLEFSUayjkKYE/view?usp=sharing) [[2]](https://drive.google.com/file/d/1q_OltirP68YFvRWgYkBHLEFSUayjkKYE/view?usp=sharing)
[[3]](https://drive.google.com/file/d/1VSL0SFoQxPXnIdBamOZJzHrHJ1N2gsTW/view?usp=sharing) [[3]](https://drive.google.com/file/d/1VSL0SFoQxPXnIdBamOZJzHrHJ1N2gsTW/view?usp=sharing)
Original dataset webpage: [Citypersons pedestrian detection dataset](https://bitbucket.org/shanshanzhang/citypersons) Original dataset webpage: [Citypersons pedestrian detection dataset](https://github.com/cvgroup-njust/CityPersons)
### CUHK-SYSU #### CUHK-SYSU
Baidu NetDisk: Baidu NetDisk:
[[0]](https://pan.baidu.com/s/1YFrlyB1WjcQmFW3Vt_sEaQ) [[0]](https://pan.baidu.com/s/1YFrlyB1WjcQmFW3Vt_sEaQ)
...@@ -115,16 +196,15 @@ Google Drive: ...@@ -115,16 +196,15 @@ Google Drive:
Original dataset webpage: [CUHK-SYSU Person Search Dataset](http://www.ee.cuhk.edu.hk/~xgwang/PS/dataset.html) Original dataset webpage: [CUHK-SYSU Person Search Dataset](http://www.ee.cuhk.edu.hk/~xgwang/PS/dataset.html)
### PRW #### PRW
Baidu NetDisk: Baidu NetDisk:
[[0]](https://pan.baidu.com/s/1iqOVKO57dL53OI1KOmWeGQ) [[0]](https://pan.baidu.com/s/1iqOVKO57dL53OI1KOmWeGQ)
Google Drive: Google Drive:
[[0]](https://drive.google.com/file/d/116_mIdjgB-WJXGe8RYJDWxlFnc_4sqS8/view?usp=sharing) [[0]](https://drive.google.com/file/d/116_mIdjgB-WJXGe8RYJDWxlFnc_4sqS8/view?usp=sharing)
Original dataset webpage: [Person Search in the Wild datset](http://www.liangzheng.com.cn/Project/project_prw.html)
### ETHZ (overlapping videos with MOT-16 removed): #### ETHZ (overlapping videos with MOT-16 removed):
Baidu NetDisk: Baidu NetDisk:
[[0]](https://pan.baidu.com/s/14EauGb2nLrcB3GRSlQ4K9Q) [[0]](https://pan.baidu.com/s/14EauGb2nLrcB3GRSlQ4K9Q)
...@@ -133,7 +213,7 @@ Google Drive: ...@@ -133,7 +213,7 @@ Google Drive:
Original dataset webpage: [ETHZ pedestrian datset](https://data.vision.ee.ethz.ch/cvl/aess/dataset/) Original dataset webpage: [ETHZ pedestrian datset](https://data.vision.ee.ethz.ch/cvl/aess/dataset/)
### MOT-17 #### MOT-17
Baidu NetDisk: Baidu NetDisk:
[[0]](https://pan.baidu.com/s/1lHa6UagcosRBz-_Y308GvQ) [[0]](https://pan.baidu.com/s/1lHa6UagcosRBz-_Y308GvQ)
...@@ -142,7 +222,7 @@ Google Drive: ...@@ -142,7 +222,7 @@ Google Drive:
Original dataset webpage: [MOT-17](https://motchallenge.net/data/MOT17/) Original dataset webpage: [MOT-17](https://motchallenge.net/data/MOT17/)
### MOT-16 (for evaluation ) #### MOT-16
Baidu NetDisk: Baidu NetDisk:
[[0]](https://pan.baidu.com/s/10pUuB32Hro-h-KUZv8duiw) [[0]](https://pan.baidu.com/s/10pUuB32Hro-h-KUZv8duiw)
...@@ -151,8 +231,17 @@ Google Drive: ...@@ -151,8 +231,17 @@ Google Drive:
Original dataset webpage: [MOT-16](https://motchallenge.net/data/MOT16/) Original dataset webpage: [MOT-16](https://motchallenge.net/data/MOT16/)
#### MOT-15
Original dataset webpage: [MOT-15](https://motchallenge.net/data/MOT15/)
#### MOT-20
Original dataset webpage: [MOT-20](https://motchallenge.net/data/MOT20/)
# Citation ### Citation
Caltech: Caltech:
``` ```
@inproceedings{ dollarCVPR09peds, @inproceedings{ dollarCVPR09peds,
......
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册