[MOT] fix mot doc (#3025)

* fix mot doc * remove image_lists, fix all mot docs, add custom data * fix doc, test=document_fix * fix dov * fix doc format, test=document_fix

[MOT] fix mot doc (#3025)
* fix mot doc * remove image_lists, fix all mot docs, add custom data * fix doc, test=document_fix * fix dov * fix doc format, test=document_fix
e76e1a8a · George Ni · GitHub · 6cfe3643 · e76e1a8a · e76e1a8a
32 changed file
--- a/configs/mot/README.md
+++ b/configs/mot/README.md
+English | [简体中文](README_cn.md)
+
+# MOT (Multi-Object Tracking)
+
+## Table of Contents
+- [Introduction](#Introduction)
+- [Model Zoo](#Model_Zoo)
+- [Dataset Preparation](#Dataset_Preparation)
+- [Installation](#Installation)
+- [Getting Start](#Getting_Start)
+- [Citations](#Citations)
+
+## Introduction
+PaddleDetection implements three multi-object tracking methods.
+- [DeepSORT](https://arxiv.org/abs/1812.00442) (Deep Cosine Metric Learning SORT) extends the original [SORT](https://arxiv.org/abs/1703.07402) (Simple Online and Realtime Tracking) algorithm to integrate appearance information based on a deep appearance descriptor. It adds a CNN model to extract features in image of human part bounded by a detector. Here we use `JDE` as detection model to generate boxes, and select `PCBPyramid` as the ReID model. We also support loading the boxes from saved detection result files.
+
+- [JDE](https://arxiv.org/abs/1909.12605) (Joint Detection and Embedding) is a fast and high-performance multiple-object tracker that learns the object detection task and appearance embedding task simutaneously in a shared neural network.
+
+- [FairMOT](https://arxiv.org/abs/2004.01888) focuses on accomplishing the detection and re-identification in a single network to improve the inference speed, presents a simple baseline which consists of two homogeneous branches to predict pixel-wise objectness scores and re-ID features. The achieved fairness between the two tasks allows FairMOT to obtain high levels of detection and tracking accuracy.
+
+<div align="center">
+  <img src="../../docs/images/mot16_jde.gif" width=500 />
+</div>
+
+## Model Zoo
+
+### JDE on MOT-16 training set
+
+| backbone           | input shape | MOTA | IDF1  |  IDS  |   FP  |  FN  |  FPS  | download | config |
+| :----------------- | :------- | :----: | :----: | :---: | :----: | :---: | :---: | :---: | :---: |
+| DarkNet53          | 1088x608 |  73.2  |  69.4  | 1320  |  6613  | 21629 |   -   |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_1088x608.yml) |
+| DarkNet53          | 864x480 |  70.1  |  65.4  | 1341  |  6454  | 25208 |   -   |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_864x480.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_864x480.yml) |
+| DarkNet53          | 576x320 |  63.1  |  64.6  | 1357  |  7083  | 32312 |   -   |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_576x320.yml) |
+
+**Notes:**
+ JDE used 8 GPUs for training and mini-batch size as 4 on each GPU, and trained for 30 epoches.
+
+### DeepSORT on MOT-16 training set
+
+| backbone  | input shape  | MOTA   | IDF1   |  IDS  |   FP  |   FN  |   FPS  | Detector | ReID | config |
+| :---------| :------- | :----: | :----: | :--: | :----: | :---: | :---: |:-----: | :-----: | :-----: |
+| DarkNet53 | 1088x608 |  72.2  |  60.5  | 998  |  8054  | 21644 |  5.07 |[JDE](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams)| [ReID](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) |
+
+**Notes:**
+ DeepSORT does not need to train, only used for evaluation. Before DeepSORT evaluation, you should get detection results by a detection model first, here we use JDE, and then prepare them like this:
+```
+det_results_dir
+   |——————MOT16-02.txt
+   |——————MOT16-04.txt
+   |——————MOT16-05.txt
+   |——————MOT16-09.txt
+   |——————MOT16-10.txt
+   |——————MOT16-11.txt
+   |——————MOT16-13.txt
+```
+Each txt is the detection result of all the pictures extracted from each video, and each line describes a bounding box with the following format:
+```
+[frame_id][identity][bb_left][bb_top][width][height][conf][x][y][z]
+```
+**Notes:**
+- `frame_id` is the frame number of the image
+- `identity` is the object id using default value `-1`
+- `bb_left` is the X coordinate of the left bound of the object box
+- `bb_top` is the Y coordinate of the upper bound of the object box
+- `width, height` is the pixel width and height
+- `conf` is the object score with default value `1` (the results had been filtered out according to the detection score threshold)
+- `x,y,z` are used in 3D, default to `-1` in 2D.
+
+### FairMOT Results on MOT-16 train set
+
+| backbone       | input shape | MOTA | IDF1 |  IDS  |    FP   |   FN   |    FPS    | download | config |
+| :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: |
+| DLA-34(paper)  | 1088x608 |  83.3  |  81.9  |   544  |  3822  |  14095  |     -   |    -   |   -    |
+| DLA-34         | 1088x608 |  83.7  |  83.3  |   435  |  3829  |  13764  |     -   | [model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml) |
+
+
+### FairMOT Results on MOT-16 test set
+
+| backbone       | input shape | MOTA | IDF1 |  IDS  |    FP   |   FN   |    FPS    | download | config |
+| :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: |
+| DLA-34(paper)  | 1088x608 |  74.9  |  72.8  |  1074  |    -   |    -   |   25.9   |    -   |   -    |
+| DLA-34         | 1088x608 |  74.8  |  74.4  |  930   |  7038  |  37994 |    -     | [model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml) |
+
+**Notes:**
+ FairMOT used 8 GPUs for training and mini-batch size as 6 on each GPU, and trained for 30 epoches.
+
+## Dataset Preparation
+
+### MOT Dataset
+PaddleDetection use the same training data as [JDE](https://github.com/Zhongdao/Towards-Realtime-MOT) and [FairMOT](https://github.com/ifzhang/FairMOT). Please refer to [PrepareMOTDataSet](../../docs/tutorials/PrepareMOTDataSet.md) to download and prepare all the training data including **Caltech Pedestrian, CityPersons, CUHK-SYSU, PRW, ETHZ, MOT17 and MOT16**. **MOT15 and MOT20** can also be downloaded from the official webpage of MOT challenge. If you want to use these datasets, please **follow their licenses**.
+
+### Data Format
+These several relevant datasets have the following structure:
+```
+Caltech
+   |——————images
+   |        └——————00001.jpg
+   |        |—————— ...
+   |        └——————0000N.jpg
+   └——————labels_with_ids
+            └——————00001.txt
+            |—————— ...
+            └——————0000N.txt
+MOT17
+   |——————images
+   |        └——————train
+   |        └——————test
+   └——————labels_with_ids
+            └——————train
+```
+Annotations of these datasets are provided in a unified format. Every image has a corresponding annotation text. Given an image path, the annotation text path can be generated by replacing the string `images` with `labels_with_ids` and replacing `.jpg` with `.txt`.
+
+In the annotation text, each line is describing a bounding box and has the following format:
+```
+[class] [identity] [x_center] [y_center] [width] [height]
+```
+**Notes:**
+- `class` should be `0`. Only single-class multi-object tracking is supported now.
+- `identity` is an integer from `0` to `num_identities - 1`(`num_identities` is the total number of instances of objects in the dataset), or `-1` if this box has no identity annotation.
+- `[x_center] [y_center] [width] [height]` are normalized by the width/height of the image, so they are floating point numbers ranging from 0 to 1.
+
+### Dataset Directory
+
+First, follow the command below to download the `image_list.zip` and unzip it in the `dataset/mot` directory:
+```
+wget https://dataset.bj.bcebos.com/mot/image_lists.zip
+```
+Then download and unzip each dataset, and the final directory is as follows:
+```
+dataset/mot
+  |——————image_lists
+            |——————caltech.10k.val  
+            |——————caltech.all  
+            |——————caltech.train  
+            |——————caltech.val  
+            |——————citypersons.train  
+            |——————citypersons.val  
+            |——————cuhksysu.train  
+            |——————cuhksysu.val  
+            |——————eth.train  
+            |——————mot15.train  
+            |——————mot16.train  
+            |——————mot17.train  
+            |——————mot20.train  
+            |——————prw.train  
+            |——————prw.val
+  |——————Caltech
+  |——————Cityscapes
+  |——————CUHKSYSU
+  |——————ETHZ
+  |——————MOT15
+  |——————MOT16
+  |——————MOT17
+  |——————MOT20
+  |——————PRW
+```
+
+## Installation
+
+Install all the related dependencies for MOT:
+```
+pip install lap sklearn motmetrics openpyxl cython_bbox
+or
+pip install -r requirements.txt
+```
+**Notes:**
+ Install `cython_bbox` for windows, please refer to this [tutorial](https://stackoverflow.com/questions/60349980/is-there-a-way-to-install-cython-bbox-for-windows)
+
+
+## Getting Start
+
+### 1. Training
+
+Training FairMOT on 8 GPUs with following command
+
+```bash
+python -m paddle.distributed.launch --log_dir=./fairmot_dla34_30e_1088x608/ --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml
+```
+
+### 2. Evaluation
+
+Evaluating the track performance of FairMOT on val dataset in single GPU with following commands:
+
+```bash
+# use weights released in PaddleDetection model zoo
+CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams
+
+# use saved checkpoint in training
+CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=output/fairmot_dla34_30e_1088x608/model_final
+```
+
+## Citations
+```
+@article{wang2019towards,
+  title={Towards Real-Time Multi-Object Tracking},
+  author={Wang, Zhongdao and Zheng, Liang and Liu, Yixuan and Wang, Shengjin},
+  journal={arXiv preprint arXiv:1909.12605},
+  year={2019}
+}
+
+@inproceedings{Wojke2017simple,
+  title={Simple Online and Realtime Tracking with a Deep Association Metric},
+  author={Wojke, Nicolai and Bewley, Alex and Paulus, Dietrich},
+  booktitle={2017 IEEE International Conference on Image Processing (ICIP)},
+  year={2017},
+  pages={3645--3649},
+  organization={IEEE},
+  doi={10.1109/ICIP.2017.8296962}
+}
+
+@inproceedings{Wojke2018deep,
+  title={Deep Cosine Metric Learning for Person Re-identification},
+  author={Wojke, Nicolai and Bewley, Alex},
+  booktitle={2018 IEEE Winter Conference on Applications of Computer Vision (WACV)},
+  year={2018},
+  pages={748--756},
+  organization={IEEE},
+  doi={10.1109/WACV.2018.00087}
+}
+
+@article{wang2019towards,
+  title={Towards Real-Time Multi-Object Tracking},
+  author={Wang, Zhongdao and Zheng, Liang and Liu, Yixuan and Wang, Shengjin},
+  journal={arXiv preprint arXiv:1909.12605},
+  year={2019}
+}
+```
--- a/configs/mot/README_cn.md
+++ b/configs/mot/README_cn.md
+简体中文 | [English](README.md)
+
+# 多目标跟踪 (Multi-Object Tracking)
+
+## 内容
+- [简介](#简介)
+- [模型库](#模型库)
+- [数据集准备](#数据集准备)
+- [安装依赖](#安装依赖)
+- [快速开始](#快速开始)
+- [引用](#引用)
+
+## 简介
+
+PaddleDetection实现了3种多目标跟踪方法。
+- [DeepSORT](https://arxiv.org/abs/1812.00442)(Deep Cosine Metric Learning SORT) 扩展了原有的[SORT](https://arxiv.org/abs/1703.07402)(Simple Online and Realtime Tracking)算法，增加了一个CNN模型用于在检测器限定的人体部分图像中提取特征，在深度外观描述的基础上整合外观信息。
+
+- [JDE](https://arxiv.org/abs/1909.12605)(Joint Detection and Embedding)是一个快速高性能多目标跟踪器，它是在共享神经网络中同时学习目标检测任务和外观嵌入任务的。
+
+- [FairMOT](https://arxiv.org/abs/2004.01888)着重研究在单个网络中实现检测和ReID以提高推理速度，提出了一种由两个同质分支组成的简单基线来预测像素级目标得分和ReID特征，实现了两个任务之间的公平性，并获得了高水平的检测和跟踪精度。
+
+<div align="center">
+  <img src="../../docs/images/mot16_jde.gif" width=500 />
+</div>
+
+## 模型库
+
+### JDE在MOT-16 train集上结果
+
+| 骨干网络            |  输入尺寸  |  MOTA  |  IDF1 |  IDS  |  FP  |  FN  |  FPS  |  检测模型  | 配置文件 |
+| :----------------- | :------- | :----: | :----: | :---: | :----: | :---: | :---: | :---: | :---: |
+| DarkNet53          | 1088x608 |  73.2  |  69.4  | 1320  |  6613  | 21629 |   -   |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_1088x608.yml) |
+| DarkNet53          | 864x480 |  70.1  |  65.4  | 1341  |  6454  | 25208 |   -   |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_864x480.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_864x480.yml) |
+| DarkNet53          | 576x320 |  63.1  |  64.6  | 1357  |  7083  | 32312 |   -   |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_576x320.yml) |
+
+**注意:**
+ JDE使用8个GPU进行训练，每个GPU上batch size为4，训练30个epoch。
+
+### DeepSORT在MOT-16 train集上结果
+
+|  骨干网络  | 输入尺寸 | MOTA |  IDF1  |  IDS | FP  |   FN  |   FPS  | 检测模型 | ReID模型 | 配置文件 |
+| :---------| :------- | :----: | :----: | :--: | :----: | :---: | :---: |:-----: | :-----: | :-----: |
+| DarkNet53 | 1088x608 |  72.2  |  60.5  | 998  |  8054  | 21644 |  5.07 |[JDE](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams)| [ReID](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) |
+
+**注意:**
+  DeepSORT此处不需要训练MOT数据集，只用于评估。在使用DeepSORT模型评估之前，应该首先通过一个检测模型得到检测结果，此处使用JDE，然后像这样准备好结果文件:
+```
+det_results_dir
+   |——————MOT16-02.txt
+   |——————MOT16-04.txt
+   |——————MOT16-05.txt
+   |——————MOT16-09.txt
+   |——————MOT16-10.txt
+   |——————MOT16-11.txt
+   |——————MOT16-13.txt
+```
+其中每个txt是每个视频中所有图片的检测结果，每行都描述一个边界框，格式如下：
+```
+[frame_id][identity][bb_left][bb_top][width][height][conf][x][y][z]
+```
+**注意**:
+- `frame_id`是图片帧的序号
+- `identity`是目标id采用默认值为`-1`
+- `bb_left`是目标框的左边界的x坐标
+- `bb_top`是目标框的上边界的y坐标
+- `width，height`是真实的像素宽高
+- `conf`是目标得分设置为`1`(已经按检测的得分阈值筛选出的检测结果)
+- `x,y,z`是3D中用到的，在2D中默认为`-1`
+
+### FairMOT在MOT-16 train集上结果
+
+|    骨干网络      |  输入尺寸 |  MOTA  |  IDF1  |  IDS  |   FP  |   FN   |   FPS   |  下载链接 | 配置文件 |
+| :--------------| :------- | :----: | :----: | :---: | :----: | :---: | :------: | :----: |:----: |
+| DLA-34(paper)  | 1088x608 |  83.3  |  81.9  |  544  |  3822  | 14095 |    -     |   -   |   -   |
+| DLA-34         | 1088x608 |  83.7  |  83.3  |  435  |  3829  | 13764 |    -     |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml) |
+
+### FairMOT在MOT-16 test集上结果
+
+|    骨干网络      |  输入尺寸 |  MOTA  |  IDF1  |   IDS  |   FP   |   FN   |    FPS   |  下载链接  | 配置文件 |
+| :--------------| :------- | :----: | :----: | :----: | :----: | :----: |:-------: | :----: | :----: |
+| DLA-34(paper)  | 1088x608 |  74.9  |  72.8  |  1074  |    -   |    -   |   25.9   |    -   |   -    |
+| DLA-34         | 1088x608 |  74.8  |  74.4  |  930   |  7038  |  37994 |    -     |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml) |
+
+**注意:**
+ FairMOT使用8个GPU进行训练，每个GPU上batch size为6，训练30个epoch。
+
+
+## 数据集准备
+
+### MOT数据集
+PaddleDetection使用和[JDE](https://github.com/Zhongdao/Towards-Realtime-MOT) 还有[FairMOT](https://github.com/ifzhang/FairMOT)相同的数据集。请参照[数据准备文档](../../docs/tutorials/PrepareMOTDataSet_cn.md)去下载并准备好所有的数据集包括**Caltech Pedestrian, CityPersons, CUHK-SYSU, PRW, ETHZ, MOT17和MOT16**。此外还可以下载**MOT15和MOT20**数据集，如果您想使用这些数据集，请**遵循他们的License**。
+
+### 数据格式
+这几个相关数据集都遵循以下结构：
+```
+Caltech
+   |——————images
+   |        └——————00001.jpg
+   |        |—————— ...
+   |        └——————0000N.jpg
+   └——————labels_with_ids
+            └——————00001.txt
+            |—————— ...
+            └——————0000N.txt
+MOT17
+   |——————images
+   |        └——————train
+   |        └——————test
+   └——————labels_with_ids
+            └——————train
+```
+所有数据集的标注是以统一数据格式提供的。各个数据集中每张图片都有相应的标注文本。给定一个图像路径，可以通过将字符串`images`替换为`labels_with_ids`并将`.jpg`替换为`.txt`来生成标注文本路径。在标注文本中，每行都描述一个边界框，格式如下：
+```
+[class][identity][x_center][y_center][width][height]
+```
+**注意**:
+- `class`为`0`，目前仅支持单类别多目标跟踪。
+- `identity`是从`0`到`num_identifies-1`的整数(`num_identifies`是数据集中不同物体实例的总数)，如果此框没有`identity`标注，则为`-1`。
+- `[x_center][y_center][width][height]`的值是由图片的宽度/高度标准化的，因此它们是从0到1的浮点数。
+
+### 数据集目录
+
+首先按照以下命令下载image_lists.zip并解压放在`dataset/mot`目录下：
+```
+wget https://dataset.bj.bcebos.com/mot/image_lists.zip
+```
+然后依次下载各个数据集并解压，最终目录为：
+```
+dataset/mot
+  |——————image_lists
+            |——————caltech.10k.val  
+            |——————caltech.all  
+            |——————caltech.train  
+            |——————caltech.val  
+            |——————citypersons.train  
+            |——————citypersons.val  
+            |——————cuhksysu.train  
+            |——————cuhksysu.val  
+            |——————eth.train  
+            |——————mot15.train  
+            |——————mot16.train  
+            |——————mot17.train  
+            |——————mot20.train  
+            |——————prw.train  
+            |——————prw.val
+  |——————Caltech
+  |——————Cityscapes
+  |——————CUHKSYSU
+  |——————ETHZ
+  |——————MOT15
+  |——————MOT16
+  |——————MOT17
+  |——————MOT20
+  |——————PRW
+```
+
+## 安装依赖
+
+一键安装MOT相关的依赖：
+```
+pip install lap sklearn motmetrics openpyxl cython_bbox
+或者
+pip install -r requirements.txt
+```
+**注意：**
+ `cython_bbox`在windows上安装可参考这个[教程](https://stackoverflow.com/questions/60349980/is-there-a-way-to-install-cython-bbox-for-windows)
+
+## 快速开始
+
+### 1. 训练
+
+FairMOT使用8GPU通过如下命令一键式启动训练
+
+```bash
+python -m paddle.distributed.launch --log_dir=./fairmot_dla34_30e_1088x608/ --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml
+```
+
+### 2. 评估
+
+FairMOT使用单张GPU通过如下命令一键式启动评估
+
+```bash
+# 使用PaddleDetection发布的权重
+CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams
+
+# 使用训练保存的checkpoint
+CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=output/fairmot_dla34_30e_1088x608/model_final
+```
+
+## 引用
+```
+@article{wang2019towards,
+  title={Towards Real-Time Multi-Object Tracking},
+  author={Wang, Zhongdao and Zheng, Liang and Liu, Yixuan and Wang, Shengjin},
+  journal={arXiv preprint arXiv:1909.12605},
+  year={2019}
+}
+
+@inproceedings{Wojke2017simple,
+  title={Simple Online and Realtime Tracking with a Deep Association Metric},
+  author={Wojke, Nicolai and Bewley, Alex and Paulus, Dietrich},
+  booktitle={2017 IEEE International Conference on Image Processing (ICIP)},
+  year={2017},
+  pages={3645--3649},
+  organization={IEEE},
+  doi={10.1109/ICIP.2017.8296962}
+}
+
+@inproceedings{Wojke2018deep,
+  title={Deep Cosine Metric Learning for Person Re-identification},
+  author={Wojke, Nicolai and Bewley, Alex},
+  booktitle={2018 IEEE Winter Conference on Applications of Computer Vision (WACV)},
+  year={2018},
+  pages={748--756},
+  organization={IEEE},
+  doi={10.1109/WACV.2018.00087}
+}
+
+@article{wang2019towards,
+  title={Towards Real-Time Multi-Object Tracking},
+  author={Wang, Zhongdao and Zheng, Liang and Liu, Yixuan and Wang, Shengjin},
+  journal={arXiv preprint arXiv:1909.12605},
+  year={2019}
+}
+```
--- a/configs/mot/deepsort/README.md
+++ b/configs/mot/deepsort/README.md
@@ -6,20 +6,21 @@ English | [简体中文](README_cn.md)
 - [Introduction](#Introduction)
 - [Model Zoo](#Model_Zoo)
 - [Getting Start](#Getting_Start)
+- [Citations](#Citations)

 ## Introduction
-[DeepSORT](https://arxiv.org/abs/1812.00442) is basicly the same with SORT but added a CNN model to extract features in image of human part bounded by a detector. We use JDE as detection model to generate boxes, and select `PCBPyramid` as the ReID model. We also support loading the boxes from saved detection result files.
+[DeepSORT](https://arxiv.org/abs/1812.00442) (Deep Cosine Metric Learning SORT) extends the original [SORT](https://arxiv.org/abs/1703.07402) (Simple Online and Realtime Tracking) algorithm to integrate appearance information based on a deep appearance descriptor. It adds a CNN model to extract features in image of human part bounded by a detector. Here we use `JDE` as detection model to generate boxes, and select `PCBPyramid` as the ReID model. We also support loading the boxes from saved detection result files.

 ## Model Zoo

 ### DeepSORT on MOT-16 training set

-| backbone  | input shape  | MOTA   | IDF1   |  IDS  |   FP  |   FN  |   FPS  | Detector | ReID | config |
-| :---------| :------- | :----: | :----: | :--: | :----: | :---: | :---: |:---: | :---: | :---: |
-| DarkNet53 | 1088x608 |  72.2  |  60.3  | 998  |  8055  | 21631 |  3.28 |[JDE](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams)| [ReID](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) |
+| backbone  | input shape | MOTA | IDF1 |  IDS  |   FP  |   FN  |   FPS  | Detector | ReID | config |
+| :---------| :------- | :----: | :----: | :--: | :----: | :---: | :---: |:-------: | :---: | :---: |
+| DarkNet53 | 1088x608 |  72.2  |  60.5  | 998  |  8054  | 21644 |  5.07 |[JDE](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams)| [ReID](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) |

 **Notes:**
- DeepSORT does not need to train, only used for evaluation. Before DeepSORT evaluation, you should get detection results by a detection model first, here we use JDE, and then prepare them like this:
+ DeepSORT does not need to train on MOT dataset, only used for evaluation. Before DeepSORT evaluation, you should get detection results by a detection model first, here we use JDE, and then prepare them like this:
 ```
 det_results_dir
   |——————MOT16-02.txt
@@ -30,6 +31,12 @@ det_results_dir
   |——————MOT16-11.txt
   |——————MOT16-13.txt
 ```
+Each txt is the detection result of all the pictures extracted from each video, and each line describes a bounding box with the following format:
+```
+[frame_id][identity][bb_left][bb_top][width][height][conf][x][y][z]
+```
+**Notes:**
+`frame_id` is the frame number of the image, `identity` is the object id using default value `-1`, `bb_left` is the X coordinate of the left bound of the object box, `bb_top` is the Y coordinate of the upper boundary of the object box, `width, height` is the pixel width and height, `conf` is the object score with default value `1` (the results had been filtered out according to the detection score threshold), `x,y,z` are used in 3D, default to `-1` in 2D.

 ## Getting Start


--- a/configs/mot/deepsort/README_cn.md
+++ b/configs/mot/deepsort/README_cn.md
 简体中文 | [English](README.md)

-# DeepSORT (Simple Online and Realtime Tracking with a Deep Association Metric)
+# DeepSORT

 ## 内容
 - [简介](#简介)
- [模型库与基线](#模型库与基线)
+- [模型库](#模型库)
 - [快速开始](#快速开始)
+- [引用](#引用)

 ## 简介
-[DeepSORT](https://arxiv.org/abs/1812.00442) 与SORT基本类似，但增加了一个CNN模型用于在检测器限定的人体部分图像中提取特征。我们使用JDE作为检测模型来生成检测框，并选择`PCBPyramid`作为ReID模型。我们还支持加载保存的检测结果文件来进行预测跟踪。
+[DeepSORT](https://arxiv.org/abs/1812.00442) (Deep Cosine Metric Learning SORT) 扩展了原有的[SORT](https://arxiv.org/abs/1703.07402) (Simple Online and Realtime Tracking)算法，增加了一个CNN模型用于在检测器限定的人体部分图像中提取特征，在深度外观描述的基础上整合外观信息。我们使用`JDE`作为检测模型来生成检测框，并选择`PCBPyramid`作为ReID模型。我们还支持加载保存的检测结果文件来进行预测跟踪。

-## 模型库与基线
+## 模型库

-### DeepSORT on MOT-16 training set
+### DeepSORT在MOT-16 train集上结果

-|  骨干网络  | 输入尺寸 | MOTA |  IDF1  |  IDS | FP  |   FN  |   FPS  | 检测模型 | ReID模型 | 配置文件 |
-| :---------| :------- | :----: | :----: | :--: | :----: | :---: | :---: |:---: | :---: | :---: |
-| DarkNet53 | 1088x608 |  72.2  |  60.3  | 998  |  8055  | 21631 |  3.28 |[JDE](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams)| [ReID](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) |
+|  骨干网络  |  输入尺寸  |  MOTA  |  IDF1  |  IDS |   FP   |   FN  |  FPS | 检测模型 | ReID模型 | 配置文件 |
+| :---------| :------- | :----: | :----: | :--: | :----: | :---: | :---: |:-----: | :-----: | :-----: |
+| DarkNet53 | 1088x608 |  72.2  |  60.5  | 998  |  8054  | 21644 |  5.07 |[JDE](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams)| [ReID](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) |

-**Notes:**
-  DeepSORT此处不需要训练，只用于评估。在使用DeepSORT模型评估之前，应该首先通过一个检测模型得到检测结果，这里我们使用JDE，然后像这样准备好结果文件:
+**注意:**
+  DeepSORT此处不需要训练MOT数据集，只用于评估。在使用DeepSORT模型评估之前，应该首先通过一个检测模型得到检测结果，此处使用JDE，然后像这样准备好结果文件:
 ```
 det_results_dir
   |——————MOT16-02.txt
@@ -30,6 +31,11 @@ det_results_dir
   |——————MOT16-11.txt
   |——————MOT16-13.txt
 ```
+其中每个txt是每个视频中所有图片的检测结果，每行都描述一个边界框，格式如下：
+```
+[frame_id][identity][bb_left][bb_top][width][height][conf][x][y][z]
+```
+**注意**: `frame_id`是图片帧的序号，`identity`是目标id采用默认值为`-1`，`bb_left`是目标框的左边界的x坐标，`bb_top`是目标框的上边界的y坐标，`width，height`是真实的像素宽高，`conf`是目标得分设置为`1`(已经按检测的得分阈值筛选出的检测结果)，`x,y,z`是3D中用到的，在2D中默认为`-1`即可。

 ## 快速开始


--- a/configs/mot/fairmot/README.md
+++ b/configs/mot/fairmot/README.md
+English | [简体中文](README_cn.md)
+
 # FairMOT (FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking)

 ## Table of Contents
@@ -8,38 +10,37 @@

 ## Introduction

-FairMOT focuses on accomplishing the detection and re-identification in a single network to improve the inference speed, presents a simple baseline which consists of two homogeneous branches to predict pixel-wise objectness scores and re-ID features. The achieved fairness between the two tasks allows FairMOT to obtain high levels of detection and tracking accuracy.
+[FairMOT](https://arxiv.org/abs/2004.01888) focuses on accomplishing the detection and re-identification in a single network to improve the inference speed, presents a simple baseline which consists of two homogeneous branches to predict pixel-wise objectness scores and re-ID features. The achieved fairness between the two tasks allows FairMOT to obtain high levels of detection and tracking accuracy.


 ## Model Zoo

-### Results on MOT-16 train set
+### FairMOT Results on MOT-16 train set

-| backbone      | input shape  | MOTA   | IDF1   |  IDS  |   FP  |   FN  | download  | config |
-| :-----------------| :------- | :----: | :----: | :---: | :----: | :---: |:---: | :---: |
-| DLA-34(paper)  | 1088x608 |  83.3  |  81.9  | 544  |  3822  | 14095 | ---- | ---- |
-| DLA-34         | 1088x608 |  83.4  |  82.7  | 517  | 4077   | 13761 |  [model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml) |
+| backbone       | input shape | MOTA | IDF1 |  IDS  |    FP   |   FN   |    FPS    | download | config |
+| :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: |
+| DLA-34(paper)  | 1088x608 |  83.3  |  81.9  |   544  |  3822  |  14095  |     -   |    -   |   -    |
+| DLA-34         | 1088x608 |  83.7  |  83.3  |   435  |  3829  |  13764  |     -   | [model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml) |


-### Results on MOT-16 test set
+### FairMOT Results on MOT-16 test set

-| backbone      | input shape  | MOTA   | IDF1   |  IDS  |   MT  |   ML  | download  | config |
-| :-----------------| :------- | :----: | :----: | :---: | :----: | :---: | :---: | :---: |
-| DLA-34(paper)  | 1088x608 |    74.9   72.8    1074    44.7%   15.9%  | ---- | ---- |
-| DLA-34         | 1088x608 |  74.7  |  72.8  | 1044  | 41.9%   | 19.1% |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml) |
+| backbone       | input shape | MOTA | IDF1 |  IDS  |    FP   |   FN   |    FPS    | download | config |
+| :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: |
+| DLA-34(paper)  | 1088x608 |  74.9  |  72.8  |  1074  |    -   |    -   |   25.9   |    -   |   -    |
+| DLA-34         | 1088x608 |  74.8  |  74.4  |  930   |  7038  |  37994 |    -     | [model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml) |

 **Notes:**
-
-FairMOT used 2 GPUs for training and mini-batch size as 6 on each GPU, and trained for 30 epoches.
+ FairMOT used 8 GPUs for training and mini-batch size as 6 on each GPU, and trained for 30 epoches.

 ## Getting Start

 ### 1. Training

-Training FairMOT on 2 GPUs with following command
+Training FairMOT on 8 GPUs with following command

 ```bash
-python -m paddle.distributed.launch --log_dir=./fairmot_dla34_30e_1088x608/ --gpus 0,1 tools/train.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml &>fairmot_dla34_30e_1088x608.log 2>&1 &
+python -m paddle.distributed.launch --log_dir=./fairmot_dla34_30e_1088x608/ --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml
 ```



--- a/configs/mot/fairmot/README_cn.md
+++ b/configs/mot/fairmot/README_cn.md
+简体中文 | [English](README.md)
+
+# FairMOT (FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking)
+
+## 内容
+- [简介](#简介)
+- [模型库](#模型库)
+- [快速开始](#快速开始)
+- [引用](#引用)
+
+## 内容
+
+[FairMOT](https://arxiv.org/abs/2004.01888)着重研究在单个网络中实现检测和ReID以提高推理速度，提出了一种由两个同质分支组成的简单基线来预测像素级目标得分和ReID特征, 实现了两个任务之间的公平性，并获得高水平的检测和跟踪精度。
+
+## 模型库
+
+### FairMOT在MOT-16 train集上结果
+
+|    骨干网络      |  输入尺寸 |  MOTA  |  IDF1  |  IDS  |   FP  |   FN   |   FPS   |  下载链接 | 配置文件 |
+| :--------------| :------- | :----: | :----: | :---: | :----: | :---: | :------: | :----: |:----: |
+| DLA-34(paper)  | 1088x608 |  83.3  |  81.9  |  544  |  3822  | 14095 |    -     |   -   |   -   |
+| DLA-34         | 1088x608 |  83.7  |  83.3  |  435  |  3829  | 13764 |    -     |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml) |
+
+### FairMOT在MOT-16 test集上结果
+
+|    骨干网络      |  输入尺寸 |  MOTA  |  IDF1  |   IDS  |   FP   |   FN   |    FPS   |  下载链接  | 配置文件 |
+| :--------------| :------- | :----: | :----: | :----: | :----: | :----: |:-------: | :----: | :----: |
+| DLA-34(paper)  | 1088x608 |  74.9  |  72.8  |  1074  |    -   |    -   |   25.9   |    -   |   -    |
+| DLA-34         | 1088x608 |  74.8  |  74.4  |  930   |  7038  |  37994 |    -     |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml) |
+
+**注意:**
+ FairMOT使用8个GPU进行训练，每个GPU上batch size为6，训练30个epoch。
+
+## 快速开始
+
+### 1. 训练
+
+使用8GPU通过如下命令一键式启动训练
+
+```bash
+python -m paddle.distributed.launch --log_dir=./fairmot_dla34_30e_1088x608/ --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml
+```
+
+### 2. 评估
+
+使用单张GPU通过如下命令一键式启动评估
+
+```bash
+# 使用PaddleDetection发布的权重
+CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams
+
+# 使用训练保存的checkpoint
+CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=output/fairmot_dla34_30e_1088x608/model_final
+```
+
+## 引用
+```
+@article{wang2019towards,
+  title={Towards Real-Time Multi-Object Tracking},
+  author={Wang, Zhongdao and Zheng, Liang and Liu, Yixuan and Wang, Shengjin},
+  journal={arXiv preprint arXiv:1909.12605},
+  year={2019}
+}
+```
--- a/configs/mot/fairmot/_base_/optimizer_30e.yml
+++ b/configs/mot/fairmot/_base_/optimizer_30e.yml
 epoch: 30

 LearningRate:
-  base_lr: 0.0001
+  base_lr: 0.0004
  schedulers:
  - !PiecewiseDecay
    gamma: 0.1

--- a/configs/mot/jde/README.md
+++ b/configs/mot/jde/README.md
 English | [简体中文](README_cn.md)

-# JDE (Towards-Realtime-MOT)
+# JDE (Joint Detection and Embedding)

 ## Table of Contents
 - [Introduction](#Introduction)
 - [Model Zoo](#Model_Zoo)
 - [Getting Start](#Getting_Start)
+- [Citations](#Citations)

 ## Introduction

-[Joint Detection and Embedding](https://arxiv.org/abs/1909.12605)(JDE) is a fast and high-performance multiple-object tracker that learns the object detection task and appearance embedding task simutaneously in a shared neural network.
+[JDE](https://arxiv.org/abs/1909.12605) (Joint Detection and Embedding) is a fast and high-performance multiple-object tracker that learns the object detection task and appearance embedding task simutaneously in a shared neural network.
 <div align="center">
  <img src="../../../docs/images/mot16_jde.gif" width=500 />
 </div>
@@ -35,7 +36,7 @@ English | [简体中文](README_cn.md)
 Training JDE on 8 GPUs with following command

 ```bash
-python -m paddle.distributed.launch --log_dir=./jde_darknet53_30e_1088x608/ --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/mot/jde/jde_darknet53_30e_1088x608.yml &>jde_darknet53_30e_1088x608.log 2>&1 &
+python -m paddle.distributed.launch --log_dir=./jde_darknet53_30e_1088x608/ --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/mot/jde/jde_darknet53_30e_1088x608.yml
 ```

 ### 2. Evaluation
@@ -59,7 +60,7 @@ Inference a vidoe in single GPU with following commands.
 CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/jde/jde_darknet53_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams --video_file={your video name}.mp4 --save_videos
 ```
 **Notes:**
- Please make sure that `ffmpeg` is installed first.
+ Please make sure that [ffmpeg](https://www.ffmpeg.org) is installed first.

 ## Citations
 ```

--- a/configs/mot/jde/README_cn.md
+++ b/configs/mot/jde/README_cn.md
 简体中文 | [English](README.md)

-# JDE (Towards-Realtime-MOT)
+# JDE (Joint Detection and Embedding)

 ## 内容
 - [简介](#简介)
- [模型库与基线](#模型库与基线)
+- [模型库](#模型库)
 - [快速开始](#快速开始)
-
+- [引用](#引用)

 ## 内容

-[Joint Detection and Embedding](https://arxiv.org/abs/1909.12605)(JDE) 是一个快速高性能多目标跟踪器，它是在共享神经网络中同时学习目标检测任务和外观嵌入任务的。
+[JDE](https://arxiv.org/abs/1909.12605) (Joint Detection and Embedding)是一个快速高性能多目标跟踪器，它是在共享神经网络中同时学习目标检测任务和外观嵌入任务的。
 <div align="center">
  <img src="../../../docs/images/mot16_jde.gif" width=500 />
 </div>

-## 模型库与基线
+## 模型库

-### JDE on MOT-16 training set
+### JDE在MOT-16 train集上结果

 | 骨干网络            |  输入尺寸  |  MOTA  |  IDF1 |  IDS  |  FP  |  FN  |  FPS  |  检测模型  | 配置文件 |
 | :----------------- | :------- | :----: | :----: | :---: | :----: | :---: | :---: | :---: | :---: |
@@ -26,7 +26,7 @@
 | DarkNet53          | 576x320 |  63.1  |  64.6  | 1357  |  7083  | 32312 |   -   |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_576x320.yml) |

 **注意:**
- JDE使用8个GPU进行训练，每个GPU上batch size为4，训练了30个epoches。
+ JDE使用8个GPU进行训练，每个GPU上batch size为4，训练了30个epoch。

 ## 快速开始

@@ -35,7 +35,7 @@
 使用8GPU通过如下命令一键式启动训练

 ```bash
-python -m paddle.distributed.launch --log_dir=./jde_darknet53_30e_1088x608/ --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/mot/jde/jde_darknet53_30e_1088x608.yml &>jde_darknet53_30e_1088x608.log 2>&1 &
+python -m paddle.distributed.launch --log_dir=./jde_darknet53_30e_1088x608/ --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/mot/jde/jde_darknet53_30e_1088x608.yml
 ```

 ### 2. 评估
@@ -59,7 +59,7 @@ CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/jde/jde_darknet53
 CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/jde/jde_darknet53_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams --video_file={your video name}.mp4  --save_videos
 ```
 **注意:**
- 请先确保已经安装了`ffmpeg`。
+ 请先确保已经安装了[ffmpeg](https://www.ffmpeg.org)。

 ## 引用
 ```

--- a/dataset/mot/gen_labels_MOT.py
+++ b/dataset/mot/gen_labels_MOT.py
@@ -19,6 +19,7 @@ import numpy as np
 MOT_data = 'MOT16'

 # choose a data in ['MOT15', 'MOT16', 'MOT17', 'MOT20']
+# or your custom data (prepare it following the 'docs/tutorials/PrepareMOTDataSet.md')


 def mkdirs(d):

--- a/dataset/mot/image_lists/caltech.10k.val
+++ b/dataset/mot/image_lists/caltech.10k.val
--- a/dataset/mot/image_lists/caltech.all
+++ b/dataset/mot/image_lists/caltech.all
--- a/dataset/mot/image_lists/caltech.train
+++ b/dataset/mot/image_lists/caltech.train
--- a/dataset/mot/image_lists/caltech.val
+++ b/dataset/mot/image_lists/caltech.val
--- a/dataset/mot/image_lists/citypersons.train
+++ b/dataset/mot/image_lists/citypersons.train
--- a/dataset/mot/image_lists/citypersons.val
+++ b/dataset/mot/image_lists/citypersons.val
--- a/dataset/mot/image_lists/crowdhuman.train
+++ b/dataset/mot/image_lists/crowdhuman.train
--- a/dataset/mot/image_lists/crowdhuman.val
+++ b/dataset/mot/image_lists/crowdhuman.val
--- a/dataset/mot/image_lists/cuhksysu.train
+++ b/dataset/mot/image_lists/cuhksysu.train
--- a/dataset/mot/image_lists/cuhksysu.val
+++ b/dataset/mot/image_lists/cuhksysu.val
--- a/dataset/mot/image_lists/eth.train
+++ b/dataset/mot/image_lists/eth.train
--- a/dataset/mot/image_lists/mot15.train
+++ b/dataset/mot/image_lists/mot15.train
--- a/dataset/mot/image_lists/mot16.train
+++ b/dataset/mot/image_lists/mot16.train
--- a/dataset/mot/image_lists/mot17.emb
+++ b/dataset/mot/image_lists/mot17.emb
--- a/dataset/mot/image_lists/mot17.half
+++ b/dataset/mot/image_lists/mot17.half
--- a/dataset/mot/image_lists/mot17.train
+++ b/dataset/mot/image_lists/mot17.train
--- a/dataset/mot/image_lists/mot17.val
+++ b/dataset/mot/image_lists/mot17.val
--- a/dataset/mot/image_lists/mot20.train
+++ b/dataset/mot/image_lists/mot20.train
--- a/dataset/mot/image_lists/prw.train
+++ b/dataset/mot/image_lists/prw.train
--- a/dataset/mot/image_lists/prw.val
+++ b/dataset/mot/image_lists/prw.val
--- a/docs/tutorials/PrepareMOTDataSet.md
+++ b/docs/tutorials/PrepareMOTDataSet.md
-# MOT Dataset
-* **MIXMOT**
-We use the same training data as [JDE](https://github.com/Zhongdao/Towards-Realtime-MOT) and [FairMOT](https://github.com/ifzhang/FairMOT) in this part and we call it "MIXMOT". Please refer to their [DATA ZOO](https://github.com/Zhongdao/Towards-Realtime-MOT/blob/master/DATASET_ZOO.md) to download and prepare all the training data including Caltech Pedestrian, CityPersons, CUHK-SYSU, PRW, ETHZ, MOT17 and MOT16.
+English | [简体中文](PrepareMOTDataSet_cn.md)

-* **2DMOT15 and MOT20**
-[2DMOT15](https://motchallenge.net/data/2D_MOT_2015/) and [MOT20](https://motchallenge.net/data/MOT20/) can be downloaded from the official webpage of MOT challenge. After downloading, you should prepare the data in the following structure:
-```
-MOT15
-   |——————images
-   |        └——————train
-   |        └——————test
-   └——————labels_with_ids
-            └——————train
-MOT20
-   |——————images
-   |        └——————train
-   |        └——————test
-   └——————labels_with_ids
-            └——————train
-```
-Annotations of these several relevant datasets are provided in a unified format. If you want to use these datasets, please **follow their licenses**,
-and if you use any of these datasets in your research, please cite the original work (you can find the BibTeX in the bottom).
-## Data Format
-All the datasets have the following structure:
+# Contents
+## Multi-Object Tracking Dataset Preparation
+- [MOT Dataset](#MOT_Dataset)
+- [Data Format](#Data_Format)
+- [Dataset Directory](#Dataset_Directory)
+- [Download Links](#Download_Links)
+- [Custom Dataset Preparation](#Custom_Dataset_Preparation)
+- [Citations](#Citations)
+
+### MOT Dataset
+PaddleDetection uses the same training data as [JDE](https://github.com/Zhongdao/Towards-Realtime-MOT) and [FairMOT](https://github.com/ifzhang/FairMOT). Please download and prepare all the training data including Caltech Pedestrian, CityPersons, CUHK-SYSU, PRW, ETHZ, MOT17 and MOT16. MOT15 and MOT20 can also be downloaded from the official webpage of MOT challenge. If you want to use these datasets, please **follow their licenses**.
+
+### Data Format
+These several relevant datasets have the following structure:
 ```
 Caltech
   |——————images
@@ -32,9 +24,14 @@ Caltech
            └——————00001.txt
            |—————— ...
            └——————0000N.txt
+MOT17
+   |——————images
+   |        └——————train
+   |        └——————test
+   └——————labels_with_ids
+            └——————train
 ```
-Every image has a corresponding annotation text. Given an image path,
-the annotation text path can be generated by replacing the string `images` with `labels_with_ids` and replacing `.jpg` with `.txt`.
+Annotations of these datasets are provided in a unified format. Every image has a corresponding annotation text. Given an image path, the annotation text path can be generated by replacing the string `images` with `labels_with_ids` and replacing `.jpg` with `.txt`.

 In the annotation text, each line is describing a bounding box and has the following format:
 ```
@@ -46,11 +43,18 @@ The field `[identity]` is an integer from `0` to `num_identities - 1`, or `-1` i

 ***Note** that the values of `[x_center] [y_center] [width] [height]` are normalized by the width/height of the image, so they are floating point numbers ranging from 0 to 1.

-## Final Dataset root
+### Dataset Directory
+
+First, follow the command below to download the `image_list.zip` and unzip it in the `dataset/mot` directory:
+```
+wget https://dataset.bj.bcebos.com/mot/image_lists.zip
+```
+Then download and unzip each dataset, and the final directory is as follows:
 ```
 dataset/mot
  |——————image_lists
            |——————caltech.10k.val  
+            |——————caltech.all  
            |——————caltech.train  
            |——————caltech.val  
            |——————citypersons.train  
@@ -58,8 +62,10 @@ dataset/mot
            |——————cuhksysu.train  
            |——————cuhksysu.val  
            |——————eth.train  
+            |——————mot15.train  
            |——————mot16.train  
            |——————mot17.train  
+            |——————mot20.train  
            |——————prw.train  
            |——————prw.val
  |——————Caltech
@@ -73,9 +79,83 @@ dataset/mot
  |——————PRW
 ```

-## Download
+### Custom Dataset Preparation
+
+In order to standardize training and evaluation, custom data needs to be converted into the same directory and format as MOT-17 dataset:
+```
+custom_data
+   |——————images
+   |        └——————test
+   |        └——————train
+   |                └——————seq1
+   |                |        └——————gt
+   |                |        |       └——————gt.txt
+   |                |        └——————img1
+   |                |        |       └——————000001.jpg
+   |                |        |       |——————000002.jpg
+   |                |        |       └—————— ...
+   |                |        └——————seqinfo.ini
+   |                └——————seq2
+   |                └——————...
+   └——————labels_with_ids
+            └——————train
+                    └——————seq1
+                    |        └——————000001.txt
+                    |        |——————000002.txt
+                    |        └—————— ...
+                    └——————seq2
+                    └—————— ...
+```
+
+#### images
+- `gt.txt` is the original annotation file of all images extracted from the video.
+- `img1` is the folder of images extracted from the video by a certain frame rate.
+- `seqinfo.ini` is a video information description file, and the following format is required:
+```
+[Sequence]
+name=MOT16-02
+imDir=img1
+frameRate=30
+seqLength=600
+imWidth=1920
+imHeight=1080
+imExt=.jpg
+```
+
+Each line in `gt.txt`  describes a bounding box, with the format as follows:
+```
+[frame_id][identity][bb_left][bb_top][width][height][x][y][z]
+```
+**Notes:**:
+- `frame_id` is the current frame id.
+- `identity` is an integer from `0` to `num_identities - 1`(`num_identities` is the total number of instances of objects in the dataset), or `-1` if this box has no identity annotation.
+- `bb_left` is the x coordinate of the left boundary of the target box
+- `bb_top` is the Y coordinate of the upper boundary of the target box
+- `width, height` are the pixel width and height, and `x,y,z` are only used in 3D.
+
+
+#### labels_with_ids
+Annotations of these datasets are provided in a unified format. Every image has a corresponding annotation text. Given an image path, the annotation text path can be generated by replacing the string `images` with `labels_with_ids` and replacing `.jpg` with `.txt`.

-### Caltech Pedestrian
+In the annotation text, each line is describing a bounding box and has the following format:
+```
+[class] [identity] [x_center] [y_center] [width] [height]
+```
+**Notes:**
+- `class` should be `0`. Only single-class multi-object tracking is supported now.
+- `identity` is an integer from `0` to `num_identities - 1`(`num_identities` is the total number of instances of objects in the dataset), or `-1` if this box has no identity annotation.
+- `[x_center] [y_center] [width] [height]` are normalized by the width/height of the image, so they are floating point numbers ranging from 0 to 1.
+
+Generate the corresponding `labels_with_ids` with following command:
+```
+cd dataset/mot
+python gen_labels_MOT.py
+```
+
+
+### Download Links
+
+#### Caltech Pedestrian
 Baidu NetDisk:
 [[0]](https://pan.baidu.com/s/1sYBXXvQaXZ8TuNwQxMcAgg)
 [[1]](https://pan.baidu.com/s/1lVO7YBzagex1xlzqPksaPw)
@@ -91,7 +171,8 @@ please download all the images `.tar` files from [this page](http://www.vision.c

 You may need [this tool](https://github.com/mitmul/caltech-pedestrian-dataset-converter) to convert the original data format to jpeg images.
 Original dataset webpage: [CaltechPedestrians](http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/)
-### CityPersons
+
+#### CityPersons
 Baidu NetDisk:
 [[0]](https://pan.baidu.com/s/1g24doGOdkKqmbgbJf03vsw)
 [[1]](https://pan.baidu.com/s/1mqDF9M5MdD3MGxSfe0ENsA)
@@ -104,9 +185,9 @@ Google Drive:
 [[2]](https://drive.google.com/file/d/1q_OltirP68YFvRWgYkBHLEFSUayjkKYE/view?usp=sharing)
 [[3]](https://drive.google.com/file/d/1VSL0SFoQxPXnIdBamOZJzHrHJ1N2gsTW/view?usp=sharing)

-Original dataset webpage: [Citypersons pedestrian detection dataset](https://bitbucket.org/shanshanzhang/citypersons)
+Original dataset webpage: [Citypersons pedestrian detection dataset](https://github.com/cvgroup-njust/CityPersons)

-### CUHK-SYSU
+#### CUHK-SYSU
 Baidu NetDisk:
 [[0]](https://pan.baidu.com/s/1YFrlyB1WjcQmFW3Vt_sEaQ)

@@ -115,16 +196,15 @@ Google Drive:

 Original dataset webpage: [CUHK-SYSU Person Search Dataset](http://www.ee.cuhk.edu.hk/~xgwang/PS/dataset.html)

-### PRW
+#### PRW
 Baidu NetDisk:
 [[0]](https://pan.baidu.com/s/1iqOVKO57dL53OI1KOmWeGQ)

 Google Drive:
 [[0]](https://drive.google.com/file/d/116_mIdjgB-WJXGe8RYJDWxlFnc_4sqS8/view?usp=sharing)

-Original dataset webpage: [Person Search in the Wild datset](http://www.liangzheng.com.cn/Project/project_prw.html)

-### ETHZ (overlapping videos with MOT-16 removed):
+#### ETHZ (overlapping videos with MOT-16 removed):
 Baidu NetDisk:
 [[0]](https://pan.baidu.com/s/14EauGb2nLrcB3GRSlQ4K9Q)

@@ -133,7 +213,7 @@ Google Drive:

 Original dataset webpage: [ETHZ pedestrian datset](https://data.vision.ee.ethz.ch/cvl/aess/dataset/)

-### MOT-17
+#### MOT-17
 Baidu NetDisk:
 [[0]](https://pan.baidu.com/s/1lHa6UagcosRBz-_Y308GvQ)

@@ -142,7 +222,7 @@ Google Drive:

 Original dataset webpage: [MOT-17](https://motchallenge.net/data/MOT17/)

-### MOT-16 (for evaluation )
+#### MOT-16
 Baidu NetDisk:
 [[0]](https://pan.baidu.com/s/10pUuB32Hro-h-KUZv8duiw)

@@ -151,8 +231,17 @@ Google Drive:

 Original dataset webpage: [MOT-16](https://motchallenge.net/data/MOT16/)

+#### MOT-15
+Original dataset webpage: [MOT-15](https://motchallenge.net/data/MOT15/)
+
+#### MOT-20
+Original dataset webpage: [MOT-20](https://motchallenge.net/data/MOT20/)
+
+
+
+

-# Citation
+### Citation
 Caltech:
 ```
 @inproceedings{ dollarCVPR09peds,

--- a/docs/tutorials/PrepareMOTDataSet_cn.md
+++ b/docs/tutorials/PrepareMOTDataSet_cn.md