[Doc] update mot data doc (#5204)

* update mot data doc, test=document_fix * update mot data doc link, test=document_fix

[Doc] update mot data doc (#5204)
* update mot data doc, test=document_fix * update mot data doc link, test=document_fix
253712ad · Feng Ni · GitHub · 3fc967b6 · 253712ad · 253712ad
4 changed file
--- a/configs/mot/README.md
+++ b/configs/mot/README.md
@@ -74,7 +74,12 @@ pip install -r requirements.txt
 ## 数据集准备
 ### MOT数据集
 PaddleDetection复现[JDE](https://github.com/Zhongdao/Towards-Realtime-MOT) 和[FairMOT](https://github.com/ifzhang/FairMOT)，是使用的和他们相同的MIX数据集，包括**Caltech Pedestrian, CityPersons, CUHK-SYSU, PRW, ETHZ, MOT17和MOT16**。使用前6者作为联合数据集参与训练，MOT16作为评测数据集。如果您想使用这些数据集，请**遵循他们的License**。
-为了训练更多场景的垂类模型，垂类数据集也是处理成与MIX数据集相同格式，请参照[数据准备文档](../../docs/tutorials/PrepareMOTDataSet_cn.md)去准备数据集。
+
+**注意：**
+- 多目标跟踪数据集一般是用于单类别的多目标跟踪，DeepSORT、JDE和FairMOT均为单类别跟踪模型，MIX数据集以及其子数据集也都是单类别的行人跟踪数据集，可认为相比于行人检测数据集多了id号的标注。
+- 为了训练更多场景的垂类模型例如车辆等，垂类数据集也需要处理成与MIX数据集相同的格式，PaddleDetection也提供了[车辆跟踪](vehicle/README_cn.md)、[人头跟踪](headtracking21/README_cn.md)以及更通用的[行人跟踪](pedestrian/README_cn.md)的垂类数据集和模型。用户自定义数据集也可参照[数据准备文档](../../docs/tutorials/PrepareMOTDataSet_cn.md)去准备。
+- 多类别跟踪模型是[MCFairMOT](mcfairmot/README_cn.md)，多类别数据集是VisDrone数据集的整合版，可参照[MCFairMOT](mcfairmot/README_cn.md)的文档说明。
+- 跨镜头跟踪模型，是选用的[AIC21 MTMCT](https://www.aicitychallenge.org) (CityFlow)车辆跨镜头跟踪数据集，数据集和模型可参照[跨境头跟踪](mtmct/README_cn.md)的文档说明。

 ### 数据集目录
 首先按照以下命令下载image_lists.zip并解压放在`PaddleDetection/dataset/mot`目录下：
@@ -134,8 +139,8 @@ MOT17
 [class] [identity] [x_center] [y_center] [width] [height]
 ```
 **注意**:
- `class`为类别id，从0开始计，支持单类别和多类别。
- `identity`是从`1`到`num_identifies`的整数(`num_identifies`是数据集中不同物体实例的总数)，如果此框没有`identity`标注，则为`-1`。
+- `class`为类别id，支持单类别和多类别，从`0`开始计，单类别即为`0`。
+- `identity`是从`1`到`num_identities`的整数(`num_identities`是数据集中所有视频或图片序列的不同物体实例的总数)，如果此框没有`identity`标注，则为`-1`。
 - `[x_center] [y_center] [width] [height]`是中心点坐标和宽高，注意他们的值是由图片的宽度/高度标准化的，因此它们是从0到1的浮点数。



--- a/configs/mot/README_en.md
+++ b/configs/mot/README_en.md
@@ -77,7 +77,11 @@ pip install -r requirements.txt
 ### MOT Dataset
 PaddleDetection implement [JDE](https://github.com/Zhongdao/Towards-Realtime-MOT) and [FairMOT](https://github.com/ifzhang/FairMOT), and use the same training data named 'MIX' as them, including **Caltech Pedestrian, CityPersons, CUHK-SYSU, PRW, ETHZ, MOT17 and MOT16**. The former six are used as the mixed dataset for training, and MOT16 are used as the evaluation dataset. If you want to use these datasets, please **follow their licenses**.

-In order to train the feature models of more scenes, more datasets are also processed into the same format as the MIX dataset. Please refer to MOT data preparation [doc](../../docs/tutorials/PrepareMOTDataSet.md) to prepare the dataset.
+**Notes:**
+- Multi-Object Tracking(MOT) datasets are always used for single category tracking. DeepSORT, JDE and FairMOT are single category MOT models. 'MIX' dataset and it's sub datasets are also single category pedestrian tracking datasets. It can be considered that there are additional IDs ground truth for detection datasets.
+- In order to train the feature models of more scenes, more datasets are also processed into the same format as the MIX dataset. PaddleDetection Team also provides feature datasets and models of [vehicle tracking](vehicle/readme.md), [head tracking](headtracking21/readme.md) and more general [pedestrian tracking](pedestrian/readme.md). User defined datasets can also be prepared by referring to data preparation [doc](../../docs/tutorials/PrepareMOTDataSet.md).
+- The multipe category MOT model is [MCFairMOT] (mcfairmot/readme_cn.md), and the multi category dataset is the integrated version of VisDrone dataset. Please refer to the doc of [MCFairMOT](mcfairmot/README.md).
+- The Multi-Target Multi-Camera Tracking (MTMCT) model is [AIC21 MTMCT](https://www.aicitychallenge.org)(CityFlow) Multi-Camera Vehicle Tracking dataset. The dataset and model can refer to the doc of [MTMCT](mtmct/README.md)

 ### Dataset Directory
 First, download the image_lists.zip using the following command, and unzip them into `PaddleDetection/dataset/mot`:
@@ -139,8 +143,8 @@ In the annotation text, each line is describing a bounding box and has the follo
 [class] [identity] [x_center] [y_center] [width] [height]
 ```
 **Notes:**
- `class` is the class id, start from 0, and support single class and multi-class.
- `identity` is an integer from `1` to `num_identities`(`num_identities` is the total number of instances of objects in the dataset), or `-1` if this box has no identity annotation.
+- `class` is the class id, support single class and multi-class, start from `0`, and for single class is `0`.
+- `identity` is an integer from `1` to `num_identities`(`num_identities` is the total number of instances of objects in the dataset of all videos or image squences), or `-1` if this box has no identity annotation.
 - `[x_center] [y_center] [width] [height]` are the center coordinates, width and height, note that they are normalized by the width/height of the image, so they are floating point numbers ranging from 0 to 1.



--- a/docs/tutorials/PrepareMOTDataSet.md
+++ b/docs/tutorials/PrepareMOTDataSet.md
@@ -3,53 +3,38 @@ English | [简体中文](PrepareMOTDataSet_cn.md)
 # Contents
 ## Multi-Object Tracking Dataset Preparation
 - [MOT Dataset](#MOT_Dataset)
- [Data Format](#Data_Format)
 - [Dataset Directory](#Dataset_Directory)
- [Download Links](#Download_Links)
+- [Data Format](#Data_Format)
 - [Custom Dataset Preparation](#Custom_Dataset_Preparation)
 - [Citations](#Citations)

 ### MOT Dataset
-PaddleDetection uses the same training data as [JDE](https://github.com/Zhongdao/Towards-Realtime-MOT) and [FairMOT](https://github.com/ifzhang/FairMOT). Please download and prepare all the training data including **Caltech Pedestrian, CityPersons, CUHK-SYSU, PRW, ETHZ, MOT17 and MOT16**. **MOT15 and MOT20** can also be downloaded from the official webpage of MOT challenge. If you want to use these datasets, please **follow their licenses**.
-
-### Data Format
-These several relevant datasets have the following structure:
-```
-Caltech
-   |——————images
-   |        └——————00001.jpg
-   |        |—————— ...
-   |        └——————0000N.jpg
-   └——————labels_with_ids
-            └——————00001.txt
-            |—————— ...
-            └——————0000N.txt
-MOT17
-   |——————images
-   |        └——————train
-   |        └——————test
-   └——————labels_with_ids
-            └——————train
-```
-Annotations of these datasets are provided in a unified format. Every image has a corresponding annotation text. Given an image path, the annotation text path can be generated by replacing the string `images` with `labels_with_ids` and replacing `.jpg` with `.txt`.
+PaddleDetection implement [JDE](https://github.com/Zhongdao/Towards-Realtime-MOT) and [FairMOT](https://github.com/ifzhang/FairMOT), and use the same training data named 'MIX' as them, including **Caltech Pedestrian, CityPersons, CUHK-SYSU, PRW, ETHZ, MOT17 and MOT16**. The former six are used as the mixed dataset for training, and MOT16 are used as the evaluation dataset. If you want to use these datasets, please **follow their licenses**.

-In the annotation text, each line is describing a bounding box and has the following format:
-```
-[class] [identity] [x_center] [y_center] [width] [height]
-```
 **Notes:**
- `class` should be `0`. Only single-class multi-object tracking is supported now.
- `identity` is an integer from `1` to `num_identities`(`num_identities` is the total number of instances of objects in the dataset), or `-1` if this box has no identity annotation.
- `[x_center] [y_center] [width] [height]` are the center coordinates, width and height, note that they are normalized by the width/height of the image, so they are floating point numbers ranging from 0 to 1.
-
+- Multi-Object Tracking(MOT) datasets are always used for single category tracking. DeepSORT, JDE and FairMOT are single category MOT models. 'MIX' dataset and it's sub datasets are also single category pedestrian tracking datasets. It can be considered that there are additional IDs ground truth for detection datasets.
+- In order to train the feature models of more scenes, more datasets are also processed into the same format as the MIX dataset. PaddleDetection Team also provides feature datasets and models of [vehicle tracking](../../configs/mot/vehicle/readme.md), [head tracking](../../configs/mot/headtracking21/readme.md) and more general [pedestrian tracking](../../configs/mot/pedestrian/readme.md). User defined datasets can also be prepared by referring to this data preparation doc.
+- The multipe category MOT model is [MCFairMOT] (../../configs/mot/mcfairmot/readme_cn.md), and the multi category dataset is the integrated version of VisDrone dataset. Please refer to the doc of [MCFairMOT](../../configs/mot/mcfairmot/README.md).
+- The Multi-Target Multi-Camera Tracking (MTMCT) model is [AIC21 MTMCT](https://www.aicitychallenge.org)(CityFlow) Multi-Camera Vehicle Tracking dataset. The dataset and model can refer to the doc of [MTMCT](../../configs/mot/mtmct/README.md).

 ### Dataset Directory
-
-First, follow the command below to download the `image_list.zip` and unzip it in the `dataset/mot` directory:
+First, download the image_lists.zip using the following command, and unzip them into `PaddleDetection/dataset/mot`:
 ```
 wget https://dataset.bj.bcebos.com/mot/image_lists.zip
 ```
-Then download and unzip each dataset, and the final directory is as follows:
+
+Then, download the MIX dataset using the following command, and unzip them into `PaddleDetection/dataset/mot`:
+```
+wget https://dataset.bj.bcebos.com/mot/MOT17.zip
+wget https://dataset.bj.bcebos.com/mot/Caltech.zip
+wget https://dataset.bj.bcebos.com/mot/CUHKSYSU.zip
+wget https://dataset.bj.bcebos.com/mot/PRW.zip
+wget https://dataset.bj.bcebos.com/mot/Cityscapes.zip
+wget https://dataset.bj.bcebos.com/mot/ETHZ.zip
+wget https://dataset.bj.bcebos.com/mot/MOT16.zip
+```
+
+The final directory is:
 ```
 dataset/mot
  |——————image_lists
@@ -62,23 +47,41 @@ dataset/mot
            |——————cuhksysu.train  
            |——————cuhksysu.val  
            |——————eth.train  
-            |——————mot15.train  
            |——————mot16.train  
            |——————mot17.train  
-            |——————mot20.train  
            |——————prw.train  
            |——————prw.val
  |——————Caltech
  |——————Cityscapes
  |——————CUHKSYSU
  |——————ETHZ
-  |——————MOT15
  |——————MOT16
  |——————MOT17
-  |——————MOT20
  |——————PRW
 ```

+### Data Format
+These several relevant datasets have the following structure:
+```
+MOT17
+   |——————images
+   |        └——————train
+   |        └——————test
+   └——————labels_with_ids
+            └——————train
+```
+Annotations of these datasets are provided in a unified format. Every image has a corresponding annotation text. Given an image path, the annotation text path can be generated by replacing the string `images` with `labels_with_ids` and replacing `.jpg` with `.txt`.
+
+In the annotation text, each line is describing a bounding box and has the following format:
+```
+[class] [identity] [x_center] [y_center] [width] [height]
+```
+**Notes:**
+- `class` is the class id, support single class and multi-class, start from `0`, and for single class is `0`.
+- `identity` is an integer from `1` to `num_identities`(`num_identities` is the total number of instances of objects in the dataset), or `-1` if this box has no identity annotation.
+- `[x_center] [y_center] [width] [height]` are the center coordinates, width and height, note that they are normalized by the width/height of the image, so they are floating point numbers ranging from 0 to 1.
+
+
 ### Custom Dataset Preparation

 In order to standardize training and evaluation, custom data needs to be converted into the same directory and format as MOT-16 dataset:
@@ -128,7 +131,7 @@ Each line in `gt.txt`  describes a bounding box, with the format as follows:
 ```
 **Notes:**:
 - `frame_id` is the current frame id.
- `identity` is an integer from `1` to `num_identities`(`num_identities` is the total number of instances of objects in the dataset), or `-1` if this box has no identity annotation.
+- `identity` is an integer from `1` to `num_identities`(`num_identities` is the total number of instances of objects in **this video or image sequence**), or `-1` if this box has no identity annotation.
 - `bb_left` is the x coordinate of the left boundary of the target box
 - `bb_top` is the Y coordinate of the upper boundary of the target box
 - `width, height` are the pixel width and height
@@ -145,8 +148,8 @@ In the annotation text, each line is describing a bounding box and has the follo
 [class] [identity] [x_center] [y_center] [width] [height]
 ```
 **Notes:**
- `class` should be `0`. Only single-class multi-object tracking is supported now.
- `identity` is an integer from `1` to `num_identities`(`num_identities` is the total number of instances of objects in the dataset), or `-1` if this box has no identity annotation.
+- `class` is the class id, support single class and multi-class, start from `0`, and for single class is `0`.
+- `identity` is an integer from `1` to `num_identities`(`num_identities` is the total number of instances of objects in the dataset of all videos or image squences), or `-1` if this box has no identity annotation.
 - `[x_center] [y_center] [width] [height]` are the center coordinates, width and height, note that they are normalized by the width/height of the image, so they are floating point numbers ranging from 0 to 1.

 Generate the corresponding `labels_with_ids` with following command:
@@ -156,94 +159,6 @@ python gen_labels_MOT.py
 ```


-### Download Links
-
-#### Caltech Pedestrian
-Baidu NetDisk:
-[[0]](https://pan.baidu.com/s/1sYBXXvQaXZ8TuNwQxMcAgg)
-[[1]](https://pan.baidu.com/s/1lVO7YBzagex1xlzqPksaPw)
-[[2]](https://pan.baidu.com/s/1PZXxxy_lrswaqTVg0GuHWg)
-[[3]](https://pan.baidu.com/s/1M93NCo_E6naeYPpykmaNgA)
-[[4]](https://pan.baidu.com/s/1ZXCdPNXfwbxQ4xCbVu5Dtw)
-[[5]](https://pan.baidu.com/s/1kcZkh1tcEiBEJqnDtYuejg)
-[[6]](https://pan.baidu.com/s/1sDjhtgdFrzR60KKxSjNb2A)
-[[7]](https://pan.baidu.com/s/18Zvp_d33qj1pmutFDUbJyw)
-
-Google Drive: [[annotations]](https://drive.google.com/file/d/1h8vxl_6tgi9QVYoer9XcY9YwNB32TE5k/view?usp=sharing) ,
-please download all the images `.tar` files from [this page](http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/datasets/USA/) and unzip the images under `Caltech/images`
-
-You may need [this tool](https://github.com/mitmul/caltech-pedestrian-dataset-converter) to convert the original data format to jpeg images.
-Original dataset webpage: [CaltechPedestrians](http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/)
-
-#### CityPersons
-Baidu NetDisk:
-[[0]](https://pan.baidu.com/s/1g24doGOdkKqmbgbJf03vsw)
-[[1]](https://pan.baidu.com/s/1mqDF9M5MdD3MGxSfe0ENsA)
-[[2]](https://pan.baidu.com/s/1Qrbh9lQUaEORCIlfI25wdA)
-[[3]](https://pan.baidu.com/s/1lw7shaffBgARDuk8mkkHhw)
-
-Google Drive:
-[[0]](https://drive.google.com/file/d/1DgLHqEkQUOj63mCrS_0UGFEM9BG8sIZs/view?usp=sharing)
-[[1]](https://drive.google.com/file/d/1BH9Xz59UImIGUdYwUR-cnP1g7Ton_LcZ/view?usp=sharing)
-[[2]](https://drive.google.com/file/d/1q_OltirP68YFvRWgYkBHLEFSUayjkKYE/view?usp=sharing)
-[[3]](https://drive.google.com/file/d/1VSL0SFoQxPXnIdBamOZJzHrHJ1N2gsTW/view?usp=sharing)
-
-Original dataset webpage: [Citypersons pedestrian detection dataset](https://github.com/cvgroup-njust/CityPersons)
-
-#### CUHK-SYSU
-Baidu NetDisk:
-[[0]](https://pan.baidu.com/s/1YFrlyB1WjcQmFW3Vt_sEaQ)
-
-Google Drive:
-[[0]](https://drive.google.com/file/d/1D7VL43kIV9uJrdSCYl53j89RE2K-IoQA/view?usp=sharing)
-
-Original dataset webpage: [CUHK-SYSU Person Search Dataset](http://www.ee.cuhk.edu.hk/~xgwang/PS/dataset.html)
-
-#### PRW
-Baidu NetDisk:
-[[0]](https://pan.baidu.com/s/1iqOVKO57dL53OI1KOmWeGQ)
-
-Google Drive:
-[[0]](https://drive.google.com/file/d/116_mIdjgB-WJXGe8RYJDWxlFnc_4sqS8/view?usp=sharing)
-
-
-#### ETHZ (overlapping videos with MOT-16 removed):
-Baidu NetDisk:
-[[0]](https://pan.baidu.com/s/14EauGb2nLrcB3GRSlQ4K9Q)
-
-Google Drive:
-[[0]](https://drive.google.com/file/d/19QyGOCqn8K_rc9TXJ8UwLSxCx17e0GoY/view?usp=sharing)
-
-Original dataset webpage: [ETHZ pedestrian datset](https://data.vision.ee.ethz.ch/cvl/aess/dataset/)
-
-#### MOT-17
-Baidu NetDisk:
-[[0]](https://pan.baidu.com/s/1lHa6UagcosRBz-_Y308GvQ)
-
-Google Drive:
-[[0]](https://drive.google.com/file/d/1ET-6w12yHNo8DKevOVgK1dBlYs739e_3/view?usp=sharing)
-
-Original dataset webpage: [MOT-17](https://motchallenge.net/data/MOT17/)
-
-#### MOT-16
-Baidu NetDisk:
-[[0]](https://pan.baidu.com/s/10pUuB32Hro-h-KUZv8duiw)
-
-Google Drive:
-[[0]](https://drive.google.com/file/d/1254q3ruzBzgn4LUejDVsCtT05SIEieQg/view?usp=sharing)
-
-Original dataset webpage: [MOT-16](https://motchallenge.net/data/MOT16/)
-
-#### MOT-15
-Original dataset webpage: [MOT-15](https://motchallenge.net/data/MOT15/)
-
-#### MOT-20
-Original dataset webpage: [MOT-20](https://motchallenge.net/data/MOT20/)
-
-
-
-
-
 ### Citation
 Caltech:
 ```

--- a/docs/tutorials/PrepareMOTDataSet_cn.md
+++ b/docs/tutorials/PrepareMOTDataSet_cn.md
@@ -3,50 +3,38 @@
 # 目录
 ## 多目标跟踪数据集准备
 - [MOT数据集](#MOT数据集)
- [数据格式](#数据格式)
 - [数据集目录](#数据集目录)
- [下载链接](#下载链接)
+- [数据格式](#数据格式)
 - [用户数据准备](#用户数据准备)
 - [引用](#引用)

 ### MOT数据集
-PaddleDetection使用和[JDE](https://github.com/Zhongdao/Towards-Realtime-MOT) 还有[FairMOT](https://github.com/ifzhang/FairMOT)相同的数据集，请先下载并准备好所有的数据集包括**Caltech Pedestrian, CityPersons, CUHK-SYSU, PRW, ETHZ, MOT17和MOT16**。此外还可以下载**MOT15和MOT20**数据集，如果您想使用这些数据集，请**遵循他们的License**。
+PaddleDetection复现[JDE](https://github.com/Zhongdao/Towards-Realtime-MOT) 和[FairMOT](https://github.com/ifzhang/FairMOT)，是使用的和他们相同的MIX数据集，包括**Caltech Pedestrian, CityPersons, CUHK-SYSU, PRW, ETHZ, MOT17和MOT16**。使用前6者作为联合数据集参与训练，MOT16作为评测数据集。如果您想使用这些数据集，请**遵循他们的License**。

-### 数据格式
-这几个相关数据集都遵循以下结构：
-```
-Caltech
-   |——————images
-   |        └——————00001.jpg
-   |        |—————— ...
-   |        └——————0000N.jpg
-   └——————labels_with_ids
-            └——————00001.txt
-            |—————— ...
-            └——————0000N.txt
-MOT17
-   |——————images
-   |        └——————train
-   |        └——————test
-   └——————labels_with_ids
-            └——————train
-```
-所有数据集的标注是以统一数据格式提供的。各个数据集中每张图片都有相应的标注文本。给定一个图像路径，可以通过将字符串`images`替换为`labels_with_ids`并将`.jpg`替换为`.txt`来生成标注文本路径。在标注文本中，每行都描述一个边界框，格式如下：
-```
-[class] [identity] [x_center] [y_center] [width] [height]
-```
-**注意**:
- `class`为`0`，目前仅支持单类别多目标跟踪。
- `identity`是从`1`到`num_identifies`的整数(`num_identifies`是数据集中不同物体实例的总数)，如果此框没有`identity`标注，则为`-1`。
- `[x_center] [y_center] [width] [height]`是中心点坐标和宽高，注意它们的值是由图片的宽度/高度标准化的，因此它们是从0到1的浮点数。
+**注意：**
+- 多目标跟踪数据集一般是用于单类别的多目标跟踪，DeepSORT、JDE和FairMOT均为单类别跟踪模型，MIX数据集以及其子数据集也都是单类别的行人跟踪数据集，可认为相比于行人检测数据集多了id号的标注。
+- 为了训练更多场景的垂类模型例如车辆等，垂类数据集也需要处理成与MIX数据集相同的格式，PaddleDetection也提供了[车辆跟踪](../../configs/mot/vehicle/README_cn.md)、[人头跟踪](../../configs/mot/headtracking21/README_cn.md)以及更通用的[行人跟踪](../../configs/mot/pedestrian/README_cn.md)的垂类数据集和模型。用户自定义数据集也可参照本文档准备。
+- 多类别跟踪模型是[MCFairMOT](../../configs/mot/mcfairmot/README_cn.md)，多类别数据集是VisDrone数据集的整合版，可参照[MCFairMOT](../../configs/mot/mcfairmot/README_cn.md)的文档说明。
+- 跨镜头跟踪模型，是选用的[AIC21 MTMCT](https://www.aicitychallenge.org) (CityFlow)车辆跨镜头跟踪数据集，数据集和模型可参照[跨境头跟踪](../../configs/mot/mtmct/README_cn.md)的文档说明。

 ### 数据集目录
-
-首先按照以下命令下载image_lists.zip并解压放在`dataset/mot`目录下：
+首先按照以下命令下载image_lists.zip并解压放在`PaddleDetection/dataset/mot`目录下：
 ```
 wget https://dataset.bj.bcebos.com/mot/image_lists.zip
 ```
-然后依次下载各个数据集并解压，最终目录为：
+
+然后按照以下命令可以快速下载MIX数据集的各个子数据集，并解压放在`PaddleDetection/dataset/mot`目录下：
+```
+wget https://dataset.bj.bcebos.com/mot/MOT17.zip
+wget https://dataset.bj.bcebos.com/mot/Caltech.zip
+wget https://dataset.bj.bcebos.com/mot/CUHKSYSU.zip
+wget https://dataset.bj.bcebos.com/mot/PRW.zip
+wget https://dataset.bj.bcebos.com/mot/Cityscapes.zip
+wget https://dataset.bj.bcebos.com/mot/ETHZ.zip
+wget https://dataset.bj.bcebos.com/mot/MOT16.zip
+```
+
+最终目录为：
 ```
 dataset/mot
  |——————image_lists
@@ -59,106 +47,38 @@ dataset/mot
            |——————cuhksysu.train  
            |——————cuhksysu.val  
            |——————eth.train  
-            |——————mot15.train  
            |——————mot16.train  
            |——————mot17.train  
-            |——————mot20.train  
            |——————prw.train  
            |——————prw.val
  |——————Caltech
  |——————Cityscapes
  |——————CUHKSYSU
  |——————ETHZ
-  |——————MOT15
  |——————MOT16
  |——————MOT17
-  |——————MOT20
  |——————PRW
 ```

-### 下载链接
-
-#### Caltech Pedestrian
-Baidu NetDisk:
-[[0]](https://pan.baidu.com/s/1sYBXXvQaXZ8TuNwQxMcAgg)
-[[1]](https://pan.baidu.com/s/1lVO7YBzagex1xlzqPksaPw)
-[[2]](https://pan.baidu.com/s/1PZXxxy_lrswaqTVg0GuHWg)
-[[3]](https://pan.baidu.com/s/1M93NCo_E6naeYPpykmaNgA)
-[[4]](https://pan.baidu.com/s/1ZXCdPNXfwbxQ4xCbVu5Dtw)
-[[5]](https://pan.baidu.com/s/1kcZkh1tcEiBEJqnDtYuejg)
-[[6]](https://pan.baidu.com/s/1sDjhtgdFrzR60KKxSjNb2A)
-[[7]](https://pan.baidu.com/s/18Zvp_d33qj1pmutFDUbJyw)
-
-Google Drive: [[annotations]](https://drive.google.com/file/d/1h8vxl_6tgi9QVYoer9XcY9YwNB32TE5k/view?usp=sharing),
-请从[这个页面](http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/datasets/USA/)下载所有的`.tar`结尾的图片文件, 并解压到`Caltech/images`目录。
-
-你需要使用这个[工具](https://github.com/mitmul/caltech-pedestrian-dataset-converter) 将原始数据格式转换为jpeg图像。
-原始数据集网址: [CaltechPedestrians](http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/)
-
-#### CityPersons
-Baidu NetDisk:
-[[0]](https://pan.baidu.com/s/1g24doGOdkKqmbgbJf03vsw)
-[[1]](https://pan.baidu.com/s/1mqDF9M5MdD3MGxSfe0ENsA)
-[[2]](https://pan.baidu.com/s/1Qrbh9lQUaEORCIlfI25wdA)
-[[3]](https://pan.baidu.com/s/1lw7shaffBgARDuk8mkkHhw)
-
-Google Drive:
-[[0]](https://drive.google.com/file/d/1DgLHqEkQUOj63mCrS_0UGFEM9BG8sIZs/view?usp=sharing)
-[[1]](https://drive.google.com/file/d/1BH9Xz59UImIGUdYwUR-cnP1g7Ton_LcZ/view?usp=sharing)
-[[2]](https://drive.google.com/file/d/1q_OltirP68YFvRWgYkBHLEFSUayjkKYE/view?usp=sharing)
-[[3]](https://drive.google.com/file/d/1VSL0SFoQxPXnIdBamOZJzHrHJ1N2gsTW/view?usp=sharing)
-
-原始数据集网址: [Citypersons pedestrian detection dataset](https://github.com/cvgroup-njust/CityPersons)
-
-#### CUHK-SYSU
-Baidu NetDisk:
-[[0]](https://pan.baidu.com/s/1YFrlyB1WjcQmFW3Vt_sEaQ)
-
-Google Drive:
-[[0]](https://drive.google.com/file/d/1D7VL43kIV9uJrdSCYl53j89RE2K-IoQA/view?usp=sharing)
-
-原始数据集网址: [CUHK-SYSU Person Search Dataset](http://www.ee.cuhk.edu.hk/~xgwang/PS/dataset.html)
-
-#### PRW
-Baidu NetDisk:
-[[0]](https://pan.baidu.com/s/1iqOVKO57dL53OI1KOmWeGQ)
-
-Google Drive:
-[[0]](https://drive.google.com/file/d/116_mIdjgB-WJXGe8RYJDWxlFnc_4sqS8/view?usp=sharing)
-
-
-#### ETHZ (overlapping videos with MOT-16 removed):
-Baidu NetDisk:
-[[0]](https://pan.baidu.com/s/14EauGb2nLrcB3GRSlQ4K9Q)
-
-Google Drive:
-[[0]](https://drive.google.com/file/d/19QyGOCqn8K_rc9TXJ8UwLSxCx17e0GoY/view?usp=sharing)
-
-原始数据集网址: [ETHZ pedestrian datset](https://data.vision.ee.ethz.ch/cvl/aess/dataset/)
-
-#### MOT-17
-Baidu NetDisk:
-[[0]](https://pan.baidu.com/s/1lHa6UagcosRBz-_Y308GvQ)
-
-Google Drive:
-[[0]](https://drive.google.com/file/d/1ET-6w12yHNo8DKevOVgK1dBlYs739e_3/view?usp=sharing)
-
-原始数据集网址: [MOT-17](https://motchallenge.net/data/MOT17/)
-
-#### MOT-16
-Baidu NetDisk:
-[[0]](https://pan.baidu.com/s/10pUuB32Hro-h-KUZv8duiw)
-
-Google Drive:
-[[0]](https://drive.google.com/file/d/1254q3ruzBzgn4LUejDVsCtT05SIEieQg/view?usp=sharing)
-
-原始数据集网址: [MOT-16](https://motchallenge.net/data/MOT16/)
-
-#### MOT-15
-原始数据集网址: [MOT-15](https://motchallenge.net/data/MOT15/)
+### 数据格式
+这几个相关数据集都遵循以下结构：
+```
+MOT17
+   |——————images
+   |        └——————train
+   |        └——————test
+   └——————labels_with_ids
+            └——————train
+```
+所有数据集的标注是以统一数据格式提供的。各个数据集中每张图片都有相应的标注文本。给定一个图像路径，可以通过将字符串`images`替换为`labels_with_ids`并将`.jpg`替换为`.txt`来生成标注文本路径。在标注文本中，每行都描述一个边界框，格式如下：
+```
+[class] [identity] [x_center] [y_center] [width] [height]
+```
+**注意**:
+- `class`为类别id，支持单类别和多类别，从`0`开始计，单类别即为`0`。
+- `identity`是从`1`到`num_identities`的整数(`num_identities`是数据集中所有视频或图片序列的不同物体实例的总数)，如果此框没有`identity`标注，则为`-1`。
+- `[x_center] [y_center] [width] [height]`是中心点坐标和宽高，注意他们的值是由图片的宽度/高度标准化的，因此它们是从0到1的浮点数。

-#### MOT-20
-原始数据集网址: [MOT-20](https://motchallenge.net/data/MOT20/)


 ### 用户数据准备
@@ -210,7 +130,7 @@ imExt=.jpg
 ```
 **注意**:
 - `frame_id`为当前图片帧序号
- `identity`是从`1`到`num_identifies`的整数(`num_identifies`是当前视频中不同物体实例的总数)，如果此框没有`identity`标注，则为`-1`
+- `identity`是从`1`到`num_identities`的整数(`num_identities`是**当前视频或图片序列**的不同物体实例的总数)，如果此框没有`identity`标注，则为`-1`。
 - `bb_left`是目标框的左边界的x坐标
 - `bb_top`是目标框的上边界的y坐标
 - `width，height`是真实的像素宽高
@@ -225,8 +145,8 @@ imExt=.jpg
 [class] [identity] [x_center] [y_center] [width] [height]
 ```
 **注意**:
- `class`为`0`，目前仅支持单类别多目标跟踪。
- `identity`是从`1`到`num_identifies`的整数(`num_identifies`是数据集中不同物体实例的总数)，如果此框没有`identity`标注，则为`-1`。
+- `class`为类别id，支持单类别和多类别，从`0`开始计，单类别即为`0`。
+- `identity`是从`1`到`num_identities`的整数(`num_identities`是数据集中所有视频或图片序列的不同物体实例的总数)，如果此框没有`identity`标注，则为`-1`。
 - `[x_center] [y_center] [width] [height]`是中心点坐标和宽高，注意是由图片的宽度/高度标准化的，因此它们是从0到1的浮点数。

 可采用如下脚本生成相应的`labels_with_ids`: