[MOT] fit some detectors for deepsort (#4329)

* fit some detectors for deepsort * fix configs and readme * add tracked bbox score cls_id, fix deploy bugs * add deepsort cfg score threshold * fix deploy deepsorttracker, fix cfgs * merge develop, fix format * add pplcnet * add pplcnet cfgs * fix cfgs * fix cfgs * clean pplcnet code * fix readme

[MOT] fit some detectors for deepsort (#4329)
* fit some detectors for deepsort * fix configs and readme * add tracked bbox score cls_id, fix deploy bugs * add deepsort cfg score threshold * fix deploy deepsorttracker, fix cfgs * merge develop, fix format * add pplcnet * add pplcnet cfgs * fix cfgs * fix cfgs * clean pplcnet code * fix readme
39d54f23 · Feng Ni · GitHub · 69c12a2a · 39d54f23 · 39d54f23
35 changed file
--- a/configs/mot/README.md
+++ b/configs/mot/README.md
@@ -20,7 +20,7 @@ The current mainstream multi-objective tracking (MOT) algorithm is mainly compos

 Paddledetection implements three MOT algorithms of these two series.

- [DeepSORT](https://arxiv.org/abs/1812.00442) (Deep Cosine Metric Learning SORT) extends the original [SORT](https://arxiv.org/abs/1703.07402) (Simple Online and Realtime Tracking) algorithm, it adds a CNN model to extract features in image of human part bounded by a detector. It integrates appearance information based on a deep appearance descriptor, and assigns and updates the detected targets to the existing corresponding trajectories like ReID task. The detection bboxes result required by DeepSORT can be generated by any detection model, and then the saved detection result file can be loaded for tracking. Here we select the `PCB + Pyramid ResNet101` model provided by [PaddleClas](https://github.com/PaddlePaddle/PaddleClas) as the ReID model.
+- [DeepSORT](https://arxiv.org/abs/1812.00442) (Deep Cosine Metric Learning SORT) extends the original [SORT](https://arxiv.org/abs/1703.07402) (Simple Online and Realtime Tracking) algorithm, it adds a CNN model to extract features in image of human part bounded by a detector. It integrates appearance information based on a deep appearance descriptor, and assigns and updates the detected targets to the existing corresponding trajectories like ReID task. The detection bboxes result required by DeepSORT can be generated by any detection model, and then the saved detection result file can be loaded for tracking. Here we select the `PCB + Pyramid ResNet101` and `PPLCNet` models provided by [PaddleClas](https://github.com/PaddlePaddle/PaddleClas) as the ReID model.

 - [JDE](https://arxiv.org/abs/1909.12605) (Joint Detection and Embedding) learns the object detection task and appearance embedding task simutaneously in a shared neural network. And the detection results and the corresponding embeddings are also outputed at the same time. JDE original paper is based on an Anchor Base detector YOLOv3 , adding a new ReID branch to learn embeddings. The training process is constructed as a multi-task learning problem, taking into account both accuracy and speed.

@@ -50,19 +50,22 @@ pip install -r requirements.txt

 | backbone  | input shape | MOTA | IDF1 |  IDS  |   FP  |   FN  |   FPS  | det result/model |ReID model| config |
 | :---------| :------- | :----: | :----: | :--: | :----: | :---: | :---: | :---: | :---: | :---: |
-| ResNet-101 | 1088x608 |  72.2  |  60.5  | 998  |  8054  | 21644 |  - | [det result](https://dataset.bj.bcebos.com/mot/det_results_dir.zip) |[ReID model](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) |
-| ResNet-101 | 1088x608 |  68.3  |  56.5  | 1722 |  17337 | 15890 |  - | [det model](https://paddledet.bj.bcebos.com/models/mot/jde_yolov3_darknet53_30e_1088x608.pdparams) |[ReID model](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) |
+| ResNet-101 | 1088x608 |  72.2  |  60.5  | 998  |  8054  | 21644 |  - | [det result](https://dataset.bj.bcebos.com/mot/det_results_dir.zip) |[ReID model](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams)|[config](./deepsort/reid/deepsort_pcb_pyramid_r101.yml) |
+| ResNet-101 | 1088x608 |  68.3  |  56.5  | 1722 |  17337 | 15890 |  - | [det model](https://paddledet.bj.bcebos.com/models/mot/deepsort/jde_yolov3_darknet53_30e_1088x608_mix.pdparams) |[ReID model](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams)|[config](./deepsort/deepsort_jde_yolov3_pcb_pyramid.yml) |
+| PPLCNet    | 1088x608 |  72.2  |  59.5  | 1087  |  8034  | 21481 |  - | [det result](https://dataset.bj.bcebos.com/mot/det_results_dir.zip) |[ReID model](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams)|[config](./deepsort/reid/deepsort_pplcnet.yml) |
+| PPLCNet    | 1088x608 |  68.1  |  53.6  | 1979 |  17446 | 15766 |  - | [det model](https://paddledet.bj.bcebos.com/models/mot/deepsort/jde_yolov3_darknet53_30e_1088x608_mix.pdparams) |[ReID model](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams)|[config](./deepsort/deepsort_jde_yolov3_pplcnet.yml) |

 ### DeepSORT Results on MOT-16 Test Set

 | backbone  | input shape | MOTA | IDF1 |  IDS  |   FP  |   FN  |   FPS  | det result/model |ReID model| config |
 | :---------| :------- | :----: | :----: | :--: | :----: | :---: | :---: | :---: | :---: | :---: |
-| ResNet-101 | 1088x608 |  64.1  |  53.0  | 1024  |  12457  | 51919 |  - |[det result](https://dataset.bj.bcebos.com/mot/det_results_dir.zip) |[ReID model](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) |
-| ResNet-101 | 1088x608 |  61.2  |  48.5  | 1799  |  25796  | 43232 |  - | [det model](https://paddledet.bj.bcebos.com/models/mot/jde_yolov3_darknet53_30e_1088x608.pdparams)  |[ReID model](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) |
+| ResNet-101 | 1088x608 |  64.1  |  53.0  | 1024  |  12457  | 51919 |  - | [det result](https://dataset.bj.bcebos.com/mot/det_results_dir.zip) | [ReID model](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams)|[config](./deepsort/reid/deepsort_pcb_pyramid_r101.yml) |
+| ResNet-101 | 1088x608 |  61.2  |  48.5  | 1799  |  25796  | 43232 |  - | [det model](https://paddledet.bj.bcebos.com/models/mot/deepsort/jde_yolov3_darknet53_30e_1088x608_mix.pdparams)  |[ReID model](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams)|[config](./deepsort/deepsort_jde_yolov3_pcb_pyramid.yml) |
+| PPLCNet    | 1088x608 |  64.0  |  51.3  | 1208  |  12697  | 51784 |  - | [det result](https://dataset.bj.bcebos.com/mot/det_results_dir.zip) |[ReID model](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams)|[config](./deepsort/reid/deepsort_pplcnet.yml) |
+| PPLCNet    | 1088x608 |  61.1  |  48.8  | 2010 |  25401 | 43432 |  - | [det model](https://paddledet.bj.bcebos.com/models/mot/deepsort/jde_yolov3_darknet53_30e_1088x608_mix.pdparams) |[ReID model](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams)|[config](./deepsort/deepsort_jde_yolov3_pplcnet.yml) |

 **Notes:**
 DeepSORT does not need to train on MOT dataset, only used for evaluation. Now it supports two evaluation methods.
-
 - 1.Load the result file and the ReID model. Before DeepSORT evaluation, you should get detection results by a detection model first, and then prepare them like this:
 ```
 det_results_dir
@@ -80,34 +83,34 @@ wget https://dataset.bj.bcebos.com/mot/det_results_dir.zip
 ```
 If you use a stronger detection model, you can get better results. Each txt is the detection result of all the pictures extracted from each video, and each line describes a bounding box with the following format:
 ```
-[frame_id],[bb_left],[bb_top],[width],[height],[conf]
+[frame_id],[x0],[y0],[w],[h],[score],[class_id]
 ```
- `frame_id` is the frame number of the image
- `bb_left` is the X coordinate of the left bound of the object box
- `bb_top` is the Y coordinate of the upper bound of the object box
- `width,height` is the pixel width and height
- `conf` is the object score with default value `1` (the results had been filtered out according to the detection score threshold)
+- `frame_id` is the frame number of the image.
+- `x0,y0` is the X and Y coordinates of the left bound of the object box.
+- `w,h` is the pixel width and height of the object box.
+- `score` is the confidence score of the object box.
+- `class_id` is the category of the object box, set `0` if only has one category.

- 2.Load the detection model and the ReID model at the same time. Here, the JDE version of YOLOv3 is selected. For more detail of configuration, see `configs/mot/deepsort/_base_/deepsort_yolov3_darknet53_pcb_pyramid_r101.yml`.
+- 2.Load the detection model and the ReID model at the same time. Here, the JDE version of YOLOv3 is selected. For more detail of configuration, see `configs/mot/deepsort/deepsort_jde_yolov3_pcb_pyramid.yml` and `configs/mot/deepsort/deepsort_ppyolov2_pplcnet.yml` for other general detectors.


 ### JDE Results on MOT-16 Training Set

 | backbone           | input shape | MOTA | IDF1  |  IDS  |   FP  |  FN  |  FPS  | download | config |
 | :----------------- | :------- | :----: | :----: | :---: | :----: | :---: | :---: | :---: | :---: |
-| DarkNet53          | 1088x608 |  72.0  |  66.9  | 1397  |  7274  | 22209 |   -   |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_1088x608.yml) |
-| DarkNet53          | 864x480 |  69.1  |  64.7  | 1539  |  7544  | 25046 |   -   |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_864x480.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_864x480.yml) |
-| DarkNet53          | 576x320 |  63.7  |  64.4  | 1310  |  6782  | 31964 |   -   |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_576x320.yml) |
+| DarkNet53          | 1088x608 |  72.0  |  66.9  | 1397  |  7274  | 22209 |   -   |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | [config](./jde/jde_darknet53_30e_1088x608.yml) |
+| DarkNet53          | 864x480 |  69.1  |  64.7  | 1539  |  7544  | 25046 |   -   |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_864x480.pdparams) | [config](./jde/jde_darknet53_30e_864x480.yml) |
+| DarkNet53          | 576x320 |  63.7  |  64.4  | 1310  |  6782  | 31964 |   -   |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [config](./jde/jde_darknet53_30e_576x320.yml) |

 ### JDE Results on MOT-16 Test Set

 | backbone           | input shape | MOTA | IDF1  |  IDS  |   FP  |  FN  |  FPS  | download | config |
 | :----------------- | :------- | :----: | :----: | :---: | :----: | :---: | :---: | :---: | :---: |
 | DarkNet53(paper)   | 1088x608 |  64.4  |  55.8  | 1544  |    -    |   -   |   -   |   -  |   -   |
-| DarkNet53          | 1088x608 |  64.6  |  58.5  | 1864  |  10550 | 52088 |   -   |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_1088x608.yml) |
+| DarkNet53          | 1088x608 |  64.6  |  58.5  | 1864  |  10550 | 52088 |   -   |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | [config](./jde/jde_darknet53_30e_1088x608.yml) |
 | DarkNet53(paper)   | 864x480 |   62.1  |  56.9  | 1608  |    -    |   -   |   -   |   -  |   -   |
-| DarkNet53          | 864x480 |   63.2  |  57.7  | 1966  |  10070  | 55081 |   -   |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_864x480.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_864x480.yml) |
-| DarkNet53          | 576x320 |   59.1  |  56.4  | 1911  |  10923  | 61789  |   -   |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_576x320.yml) |
+| DarkNet53          | 864x480 |   63.2  |  57.7  | 1966  |  10070  | 55081 |   -   |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_864x480.pdparams) | [config](./jde/jde_darknet53_30e_864x480.yml) |
+| DarkNet53          | 576x320 |   59.1  |  56.4  | 1911  |  10923  | 61789  |   -   |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [config](./jde/jde_darknet53_30e_576x320.yml) |

 **Notes:**
 JDE used 8 GPUs for training and mini-batch size as 4 on each GPU, and trained for 30 epoches.
@@ -178,19 +181,21 @@ If you use a stronger detection model, you can get better results. Each txt is t
 ### FairMOT Results on HT-21 Training Set
 |    backbone      |  input shape |  MOTA  |  IDF1  |  IDS  |   FP  |   FN   |   FPS   |  download | config |
 | :--------------| :------- | :----: | :----: | :---: | :----: | :---: | :------: | :----: |:----: |
-| DLA-34         | 1088x608 |  64.7 |  69.0  |   8533  |  148817  |  234970  |     -   | [model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_headtracking21.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/headtracking21/fairmot_dla34_30e_1088x608_headtracking21.yml) |
+| DLA-34         | 1088x608 |  64.7 |  69.0  |   8533  |  148817  |  234970  |     -   | [model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_headtracking21.pdparams) | [config](./headtracking21/fairmot_dla34_30e_1088x608_headtracking21.yml) |

 ### FairMOT Results on HT-21 Test Set
 |    backbone      |  input shape |  MOTA  |  IDF1  |  IDS  |   FP  |   FN   |   FPS   |  download | config |
 | :--------------| :------- | :----: | :----: | :----: | :----: | :----: |:-------: | :----: | :----: |
-| DLA-34         | 1088x608 |  60.8  |  62.8  |  12781   |  118109  |  198896 |    -     | [model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_headtracking21.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/headtracking21/fairmot_dla34_30e_1088x608_headtracking21.yml) |
+| DLA-34         | 1088x608 |  60.8  |  62.8  |  12781   |  118109  |  198896 |    -     | [model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_headtracking21.pdparams) | [config](./headtracking21/fairmot_dla34_30e_1088x608_headtracking21.yml) |
+

 ### [Pedestrian Tracking](./pedestrian/README.md)
 ### FairMOT Results on each val-set of Pedestrian category
 |    Dataset      |  input shape |  MOTA  |  IDF1  |  FPS   |  download | config |
 | :-------------| :------- | :----: | :----: | :----: | :-----: |:------: |
-|  PathTrack    | 1088x608 |  44.9 |    59.3   |    -   |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_pathtrack.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/pedestrian/fairmot_dla34_30e_1088x608_pathtrack.yml) |
-|  VisDrone     | 1088x608 |  49.2 |   63.1 |    -   | [model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_visdrone_pedestrian.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/pedestrian/fairmot_dla34_30e_1088x608_visdrone_pedestrian.yml) |
+|  PathTrack    | 1088x608 |  44.9 |    59.3   |    -   |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_pathtrack.pdparams) | [config](./pedestrian/fairmot_dla34_30e_1088x608_pathtrack.yml) |
+|  VisDrone     | 1088x608 |  49.2 |   63.1 |    -   | [model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_visdrone_pedestrian.pdparams) | [config](./pedestrian/fairmot_dla34_30e_1088x608_visdrone_pedestrian.yml) |
+

 ### [Vehicle Tracking](./vehicle/README.md)
 ### FairMOT Results on each val-set of Vehicle category

--- a/configs/mot/README_cn.md
+++ b/configs/mot/README_cn.md
@@ -20,7 +20,7 @@
 - JDE(Joint Detection and Embedding)这类算法完是在一个共享神经网络中同时学习Detection和Embedding，使用一个多任务学习的思路设置损失函数。代表性的算法有**JDE**和**FairMOT**。这样的设计兼顾精度和速度，可以实现高精度的实时多目标跟踪。

 PaddleDetection实现了这两个系列的3种多目标跟踪算法。
- [DeepSORT](https://arxiv.org/abs/1812.00442)(Deep Cosine Metric Learning SORT) 扩展了原有的[SORT](https://arxiv.org/abs/1703.07402)(Simple Online and Realtime Tracking)算法，增加了一个CNN模型用于在检测器限定的人体部分图像中提取特征，在深度外观描述的基础上整合外观信息，将检出的目标分配和更新到已有的对应轨迹上即进行一个ReID重识别任务。DeepSORT所需的检测框可以由任意一个检测器来生成，然后读入保存的检测结果和视频图片即可进行跟踪预测。ReID模型此处选择[PaddleClas](https://github.com/PaddlePaddle/PaddleClas)提供的`PCB+Pyramid ResNet101`模型。
+- [DeepSORT](https://arxiv.org/abs/1812.00442)(Deep Cosine Metric Learning SORT) 扩展了原有的[SORT](https://arxiv.org/abs/1703.07402)(Simple Online and Realtime Tracking)算法，增加了一个CNN模型用于在检测器限定的人体部分图像中提取特征，在深度外观描述的基础上整合外观信息，将检出的目标分配和更新到已有的对应轨迹上即进行一个ReID重识别任务。DeepSORT所需的检测框可以由任意一个检测器来生成，然后读入保存的检测结果和视频图片即可进行跟踪预测。ReID模型此处选择[PaddleClas](https://github.com/PaddlePaddle/PaddleClas)提供的`PCB+Pyramid ResNet101`和`PPLCNet`模型。

 - [JDE](https://arxiv.org/abs/1909.12605)(Joint Detection and Embedding)是在一个单一的共享神经网络中同时学习目标检测任务和embedding任务，并同时输出检测结果和对应的外观embedding匹配的算法。JDE原论文是基于Anchor Base的YOLOv3检测器新增加一个ReID分支学习embedding，训练过程被构建为一个多任务联合学习问题，兼顾精度和速度。

@@ -49,21 +49,23 @@ pip install -r requirements.txt

 |  骨干网络  |  输入尺寸  |  MOTA  |  IDF1  |  IDS |   FP   |   FN  |  FPS | 检测结果或模型 | ReID模型 |配置文件 |
 | :---------| :------- | :----: | :----: | :--: | :----: | :---: | :---: | :-----:| :-----: | :-----: |
-| ResNet-101 | 1088x608 |  72.2  |  60.5  | 998  |  8054  | 21644 |  - | [检测结果](https://dataset.bj.bcebos.com/mot/det_results_dir.zip) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) |
-| ResNet-101 | 1088x608 |  68.3  |  56.5  | 1722 |  17337 | 15890 |  - | [检测模型](https://paddledet.bj.bcebos.com/models/mot/jde_yolov3_darknet53_30e_1088x608.pdparams) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) |
+| ResNet-101 | 1088x608 |  72.2  |  60.5  | 998  |  8054  | 21644 |  - | [检测结果](https://dataset.bj.bcebos.com/mot/det_results_dir.zip) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](./deepsort/reid/deepsort_pcb_pyramid_r101.yml) |
+| ResNet-101 | 1088x608 |  68.3  |  56.5  | 1722 |  17337 | 15890 |  - | [检测模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/jde_yolov3_darknet53_30e_1088x608_mix.pdparams) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](./deepsort/deepsort_jde_yolov3_pcb_pyramid.yml) |
+| PPLCNet    | 1088x608 |  72.2  |  59.5  | 1087  |  8034  | 21481 |  - | [检测结果](https://dataset.bj.bcebos.com/mot/det_results_dir.zip) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams)|[配置文件](./deepsort/reid/deepsort_pplcnet.yml) |
+| PPLCNet    | 1088x608 |  68.1  |  53.6  | 1979 |  17446 | 15766 |  - | [检测模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/jde_yolov3_darknet53_30e_1088x608_mix.pdparams) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams)|[配置文件](./deepsort/deepsort_jde_yolov3_pplcnet.yml) |

 ### DeepSORT在MOT-16 Test Set上结果

 |  骨干网络  |  输入尺寸  |  MOTA  |  IDF1  |  IDS |   FP   |   FN  |  FPS | 检测结果或模型 | ReID模型 |配置文件 |
 | :---------| :------- | :----: | :----: | :--: | :----: | :---: | :---: | :-----: | :-----: |:-----: |
-| ResNet-101 | 1088x608 |  64.1  |  53.0  | 1024  |  12457  | 51919 |  - | [检测结果](https://dataset.bj.bcebos.com/mot/det_results_dir.zip) | [ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) |
-| ResNet-101 | 1088x608 |  61.2  |  48.5  | 1799  |  25796  | 43232 |  - | [检测模型](https://paddledet.bj.bcebos.com/models/mot/jde_yolov3_darknet53_30e_1088x608.pdparams)  |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) |
+| ResNet-101 | 1088x608 |  64.1  |  53.0  | 1024  |  12457  | 51919 |  - | [检测结果](https://dataset.bj.bcebos.com/mot/det_results_dir.zip) | [ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](./deepsort/reid/deepsort_pcb_pyramid_r101.yml) |
+| ResNet-101 | 1088x608 |  61.2  |  48.5  | 1799  |  25796  | 43232 |  - | [检测模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/jde_yolov3_darknet53_30e_1088x608_mix.pdparams)  |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](./deepsort/deepsort_jde_yolov3_pcb_pyramid.yml) |
+| PPLCNet    | 1088x608 |  64.0  |  51.3  | 1208  |  12697  | 51784 |  - | [检测结果](https://dataset.bj.bcebos.com/mot/det_results_dir.zip) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams)|[配置文件](./deepsort/reid/deepsort_pplcnet.yml) |
+| PPLCNet    | 1088x608 |  61.1  |  48.8  | 2010 |  25401 | 43432 |  - | [检测模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/jde_yolov3_darknet53_30e_1088x608_mix.pdparams) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams)|[配置文件](./deepsort/deepsort_jde_yolov3_pplcnet.yml) |

 **注意:**
-
 DeepSORT不需要训练MOT数据集，只用于评估，现在支持两种评估的方式。
-
- 第1种方式是加载检测结果文件和ReID模型，在使用DeepSORT模型评估之前，应该首先通过一个检测模型得到检测结果，然后像这样准备好结果文件:
+- **方式1**：加载检测结果文件和ReID模型，在使用DeepSORT模型评估之前，应该首先通过一个检测模型得到检测结果，然后像这样准备好结果文件:
 ```
 det_results_dir
   |——————MOT16-02.txt
@@ -80,24 +82,24 @@ wget https://dataset.bj.bcebos.com/mot/det_results_dir.zip
 ```
 如果使用更强的检测模型，可以取得更好的结果。其中每个txt是每个视频中所有图片的检测结果，每行都描述一个边界框，格式如下：
 ```
-[frame_id],[bb_left],[bb_top],[width],[height],[conf]
+[frame_id],[x0],[y0],[w],[h],[score],[class_id]
 ```
 - `frame_id`是图片帧的序号
- `bb_left`是目标框的左边界的x坐标
- `bb_top`是目标框的上边界的y坐标
- `width,height`是真实的像素宽高
- `conf`是目标得分设置为`1`(已经按检测的得分阈值筛选出的检测结果)
+- `x0,y0`是目标框的左上角x和y坐标
+- `w,h`是目标框的像素宽高
+- `score`是目标框的得分
+- `class_id`是目标框的类别，如果只有1类则是`0`

- 第2种方式是同时加载检测模型和ReID模型，此处选用JDE版本的YOLOv3，具体配置见`configs/mot/deepsort/_base_/deepsort_yolov3_darknet53_pcb_pyramid_r101.yml`
+- **方式2**：同时加载检测模型和ReID模型，此处选用JDE版本的YOLOv3，具体配置见`configs/mot/deepsort/deepsort_jde_yolov3_pcb_pyramid.yml`。加载其他通用检测模型可参照`configs/mot/deepsort/deepsort_ppyolov2_pplcnet.yml`进行修改。


 ### JDE在MOT-16 Training Set上结果

 | 骨干网络            |  输入尺寸  |  MOTA  |  IDF1 |  IDS  |  FP  |  FN  |  FPS  |  下载链接  | 配置文件 |
 | :----------------- | :------- | :----: | :----: | :---: | :----: | :---: | :---: | :---: | :---: |
-| DarkNet53          | 1088x608 |  72.0  |  66.9  | 1397  |  7274  | 22209 |   -   |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_1088x608.yml) |
-| DarkNet53          | 864x480 |  69.1  |  64.7  | 1539  |  7544  | 25046 |   -   |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_864x480.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_864x480.yml) |
-| DarkNet53          | 576x320 |  63.7  |  64.4  | 1310  |  6782  | 31964 |   -   |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_576x320.yml) |
+| DarkNet53          | 1088x608 |  72.0  |  66.9  | 1397  |  7274  | 22209 |   -   |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | [配置文件](./jde/jde_darknet53_30e_1088x608.yml) |
+| DarkNet53          | 864x480 |  69.1  |  64.7  | 1539  |  7544  | 25046 |   -   |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_864x480.pdparams) | [配置文件](./jde/jde_darknet53_30e_864x480.yml) |
+| DarkNet53          | 576x320 |  63.7  |  64.4  | 1310  |  6782  | 31964 |   -   |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [配置文件](./jde/jde_darknet53_30e_576x320.yml) |


 ### JDE在MOT-16 Test Set上结果
@@ -105,10 +107,10 @@ wget https://dataset.bj.bcebos.com/mot/det_results_dir.zip
 | 骨干网络            |  输入尺寸  |  MOTA  |  IDF1 |  IDS  |  FP  |  FN  |  FPS  |  下载链接  | 配置文件 |
 | :----------------- | :------- | :----: | :----: | :---: | :----: | :---: | :---: | :---: | :---: |
 | DarkNet53(paper)   | 1088x608 |  64.4  |  55.8  | 1544  |    -   |   -   |   -   |   -   |   -   |
-| DarkNet53          | 1088x608 |  64.6  |  58.5  | 1864  |  10550 | 52088 |   -   |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_1088x608.yml) |
+| DarkNet53          | 1088x608 |  64.6  |  58.5  | 1864  |  10550 | 52088 |   -   |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | [配置文件](./jde/jde_darknet53_30e_1088x608.yml) |
 | DarkNet53(paper)   | 864x480 |   62.1  |  56.9  | 1608  |    -   |   -   |   -   |   -   |   -   |
-| DarkNet53          | 864x480 |   63.2  |  57.7  | 1966  |  10070 | 55081 |   -   |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_864x480.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_864x480.yml) |
-| DarkNet53          | 576x320 |   59.1  |  56.4  | 1911  |  10923 | 61789 |   -   |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_576x320.yml) |
+| DarkNet53          | 864x480 |   63.2  |  57.7  | 1966  |  10070 | 55081 |   -   |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_864x480.pdparams) | [配置文件](./jde/jde_darknet53_30e_864x480.yml) |
+| DarkNet53          | 576x320 |   59.1  |  56.4  | 1911  |  10923 | 61789 |   -   |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [配置文件](./jde/jde_darknet53_30e_576x320.yml) |

 **注意:**
 JDE使用8个GPU进行训练，每个GPU上batch size为4，训练了30个epoch。
@@ -177,19 +179,21 @@ wget https://dataset.bj.bcebos.com/mot/det_results_dir.zip
 ### FairMOT在HT-21 Training Set上结果
 |    骨干网络      |  输入尺寸 |  MOTA  |  IDF1  |  IDS  |   FP  |   FN   |   FPS   |  下载链接 | 配置文件 |
 | :--------------| :------- | :----: | :----: | :---: | :----: | :---: | :------: | :----: |:----: |
-| DLA-34         | 1088x608 |  64.7 |  69.0  |   8533  |  148817  |  234970  |     -   | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_headtracking21.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/headtracking21/fairmot_dla34_30e_1088x608_headtracking21.yml) |
+| DLA-34         | 1088x608 |  64.7 |  69.0  |   8533  |  148817  |  234970  |     -   | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_headtracking21.pdparams) | [配置文件](./headtracking21/fairmot_dla34_30e_1088x608_headtracking21.yml) |

 ### FairMOT在HT-21 Test Set上结果
 |    骨干网络      |  输入尺寸 |  MOTA  |  IDF1  |   IDS  |   FP   |   FN   |    FPS   |  下载链接  | 配置文件 |
 | :--------------| :------- | :----: | :----: | :----: | :----: | :----: |:-------: | :----: | :----: |
-| DLA-34         | 1088x608 |  60.8  |  62.8  |  12781   |  118109  |  198896 |    -     | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_headtracking21.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/headtracking21/fairmot_dla34_30e_1088x608_headtracking21.yml) |
+| DLA-34         | 1088x608 |  60.8  |  62.8  |  12781   |  118109  |  198896 |    -     | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_headtracking21.pdparams) | [配置文件](./headtracking21/fairmot_dla34_30e_1088x608_headtracking21.yml) |
+

 ### [行人跟踪 (Pedestrian Tracking)](./pedestrian/README.md)
 ### FairMOT在各个数据集val-set上Pedestrian类别的结果
 |    数据集      |  输入尺寸 |  MOTA  |  IDF1  |  FPS   |  下载链接 | 配置文件 |
 | :-------------| :------- | :----: | :----: | :----: | :-----: |:------: |
-|  PathTrack    | 1088x608 |  44.9 |    59.3   |    -   |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_pathtrack.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/pedestrian/fairmot_dla34_30e_1088x608_pathtrack.yml) |
-|  VisDrone     | 1088x608 |  49.2 |   63.1 |    -   | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_visdrone_pedestrian.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/pedestrian/fairmot_dla34_30e_1088x608_visdrone_pedestrian.yml) |
+|  PathTrack    | 1088x608 |  44.9 |    59.3   |    -   |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_pathtrack.pdparams) | [配置文件](./pedestrian/fairmot_dla34_30e_1088x608_pathtrack.yml) |
+|  VisDrone     | 1088x608 |  49.2 |   63.1 |    -   | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_visdrone_pedestrian.pdparams) | [配置文件](./pedestrian/fairmot_dla34_30e_1088x608_visdrone_pedestrian.yml) |
+

 ### [车辆跟踪 (Vehicle Tracking)](./vehicle/README.md)
 ### FairMOT在各个数据集val-set上Vehicle类别的结果

--- a/configs/mot/deepsort/README.md
+++ b/configs/mot/deepsort/README.md
-English | [简体中文](README_cn.md)
-
-# DeepSORT (Deep Cosine Metric Learning for Person Re-identification)
-
-## Table of Contents
- [Introduction](#Introduction)
- [Model Zoo](#Model_Zoo)
- [Getting Start](#Getting_Start)
- [Citations](#Citations)
-
-## Introduction
-[DeepSORT](https://arxiv.org/abs/1812.00442) (Deep Cosine Metric Learning SORT) extends the original [SORT](https://arxiv.org/abs/1703.07402) (Simple Online and Realtime Tracking) algorithm, it adds a CNN model to extract features in image of human part bounded by a detector. It integrates appearance information based on a deep appearance descriptor, and assigns and updates the detected targets to the existing corresponding trajectories like ReID task. The detection bboxes result required by DeepSORT can be generated by any detection model, and then the saved detection result file can be loaded for tracking. Here we select the `PCB + Pyramid ResNet101` model provided by [PaddleClas](https://github.com/PaddlePaddle/PaddleClas) as the ReID model.
-
-## Model Zoo
-
-### DeepSORT Results on MOT-16 Training Set
-
-| backbone  | input shape | MOTA | IDF1 |  IDS  |   FP  |   FN  |   FPS  | det result/model |ReID model| config |
-| :---------| :------- | :----: | :----: | :--: | :----: | :---: | :---: | :---: | :---: | :---: |
-| ResNet-101 | 1088x608 |  72.2  |  60.5  | 998  |  8054  | 21644 |  - | [det result](https://dataset.bj.bcebos.com/mot/det_results_dir.zip) |[ReID model](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) |
-| ResNet-101 | 1088x608 |  68.3  |  56.5  | 1722 |  17337 | 15890 |  - | [det model](https://paddledet.bj.bcebos.com/models/mot/jde_yolov3_darknet53_30e_1088x608.pdparams) |[ReID model](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) |
-
-### DeepSORT Results on MOT-16 Test Set
-
-| backbone  | input shape | MOTA | IDF1 |  IDS  |   FP  |   FN  |   FPS  | det result/model |ReID model| config |
-| :---------| :------- | :----: | :----: | :--: | :----: | :---: | :---: | :---: | :---: | :---: |
-| ResNet-101 | 1088x608 |  64.1  |  53.0  | 1024  |  12457  | 51919 |  - |[det result](https://dataset.bj.bcebos.com/mot/det_results_dir.zip) |[ReID model](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) |
-| ResNet-101 | 1088x608 |  61.2  |  48.5  | 1799  |  25796  | 43232 |  - | [det model](https://paddledet.bj.bcebos.com/models/mot/jde_yolov3_darknet53_30e_1088x608.pdparams)  |[ReID model](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) |
-
-**Notes:**
-DeepSORT does not need to train on MOT dataset, only used for evaluation. Now it supports two evaluation methods.
-
- 1.Load the result file and the ReID model. Before DeepSORT evaluation, you should get detection results by a detection model first, and then prepare them like this:
-```
-det_results_dir
-   |——————MOT16-02.txt
-   |——————MOT16-04.txt
-   |——————MOT16-05.txt
-   |——————MOT16-09.txt
-   |——————MOT16-10.txt
-   |——————MOT16-11.txt
-   |——————MOT16-13.txt
-```
-For MOT16 dataset, you can download a detection result after matching called det_results_dir.zip provided by PaddleDetection：
-```
-wget https://dataset.bj.bcebos.com/mot/det_results_dir.zip
-```
-If you use a stronger detection model, you can get better results. Each txt is the detection result of all the pictures extracted from each video, and each line describes a bounding box with the following format:
-```
-[frame_id],[bb_left],[bb_top],[width],[height],[conf]
-```
- `frame_id` is the frame number of the image
- `bb_left` is the X coordinate of the left bound of the object box
- `bb_top` is the Y coordinate of the upper bound of the object box
- `width,height` is the pixel width and height
- `conf` is the object score with default value `1` (the results had been filtered out according to the detection score threshold)
-
- 2. Load the detection model and the ReID model at the same time. Here, the JDE version of YOLOv3 is selected. For more detail of configuration, see `configs/mot/deepsort/_base_/deepsort_jde_yolov3_darknet53_pcb_pyramid_r101.yml`. Load other general detection model, you can refer to `configs/mot/deepsort/_base_/deepsort_yolov3_darknet53_pcb_pyramid_r101.yml`.
-
-## Getting Start
-
-### 1. Evaluation
-
-```bash
-# Load the result file and ReID model to get the tracking result
-CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml --det_results_dir {your detection results}
-
-# Load JDE YOLOv3 detector and ReID model to get the tracking results
-CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/deepsort/deepsort_jde_yolov3_pcb_pyramid_r101.yml
-
-# or Load genernal YOLOv3 detector and ReID model to get the tracking results
-CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/deepsort/deepsort_yolov3_pcb_pyramid_r101.yml --scaled=True
-```
-**Notes:**
-JDE YOLOv3 pedestrian detector is trained with the same MOT dataset as JDE and FairMOT. In addition, the biggest difference between this model and general YOLOv3 model is that JDEBBoxPostProcess post-processing, and the output coordinates are not scaled back to the original image.
-General YOLOv3 pedestrian detector is not trained on MOT dataset, so the performance is lower. But the output coordinates are scaled back to the original image.
- `--scaled` means whether the coords after detector outputs are scaled back to the original image, False in JDE YOLOv3, True in general detector.
-
-### 2. Inference
-
-Inference a vidoe on single GPU with following command:
-
-```bash
-# load JDE YOLOv3 pedestrian detector and ReID model to get tracking results
-CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/deepsort/deepsort_jde_yolov3_pcb_pyramid_r101.yml --video_file={your video name}.mp4  --save_videos
-
-# or load general YOLOv3 pedestrian detector and ReID model to get tracking results
-CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/deepsort/deepsort_yolov3_pcb_pyramid_r101.yml --video_file={your video name}.mp4 --scaled=True --save_videos
-```
-**Notes:**
- Please make sure that [ffmpeg](https://ffmpeg.org/ffmpeg.html) is installed first, on Linux(Ubuntu) platform you can directly install it by the following command:`apt-get update && apt-get install -y ffmpeg`.
- `--scaled` means whether the coords after detector outputs are scaled back to the original image, False in JDE YOLOv3, True in general detector.
-
-### 3. Export model
-
-```bash
-# 1.export detection model
-# export JDE YOLOv3 pedestrian detector
-CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/deepsort/jde_yolov3_darknet53_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/jde_yolov3_darknet53_30e_1088x608.pdparams
-
-# or export general YOLOv3 pedestrian detector
-CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/pedestrian/pedestrian_yolov3_darknet.yml -o weights=https://paddledet.bj.bcebos.com/models/pedestrian_yolov3_darknet.pdparams
-
-
-# 2. export ReID model
-CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml -o reid_weights=https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams
-```
-
-### 4. Using exported model for python inference
-
-```bash
-# using exported JDE YOLOv3 pedestrian detector
-python deploy/python/mot_sde_infer.py --model_dir=output_inference/jde_yolov3_darknet53_30e_1088x608/ --reid_model_dir=output_inference/deepsort_pcb_pyramid_r101/ --video_file={your video name}.mp4 --device=GPU --save_mot_txts
-
-# or using exported general YOLOv3 pedestrian detector
-python deploy/python/mot_sde_infer.py --model_dir=output_inference/pedestrian_yolov3_darknet/ --reid_model_dir=output_inference/deepsort_pcb_pyramid_r101/ --video_file={your video name}.mp4 --device=GPU --scaled=True --save_mot_txts
-```
-**Notes:**
-The tracking model is used to predict the video, and does not support the prediction of a single image. The visualization video of the tracking results is saved by default. You can add `--save_mot_txts`(save a txt for every video) or `--save_mot_txt_per_img`(save a txt for every image) to save the txt result file, or `--save_images` to save the visualization images.
- `--scaled` means whether the coords after detector outputs are scaled back to the original image, False in JDE YOLOv3, True in general detector.
-
-
-## Citations
-```
-@inproceedings{Wojke2017simple,
-  title={Simple Online and Realtime Tracking with a Deep Association Metric},
-  author={Wojke, Nicolai and Bewley, Alex and Paulus, Dietrich},
-  booktitle={2017 IEEE International Conference on Image Processing (ICIP)},
-  year={2017},
-  pages={3645--3649},
-  organization={IEEE},
-  doi={10.1109/ICIP.2017.8296962}
-}
-
-@inproceedings{Wojke2018deep,
-  title={Deep Cosine Metric Learning for Person Re-identification},
-  author={Wojke, Nicolai and Bewley, Alex},
-  booktitle={2018 IEEE Winter Conference on Applications of Computer Vision (WACV)},
-  year={2018},
-  pages={748--756},
-  organization={IEEE},
-  doi={10.1109/WACV.2018.00087}
-}
-```
--- a/configs/mot/deepsort/README.md
+++ b/configs/mot/deepsort/README.md
+README_cn.md
\ No newline at end of file
--- a/configs/mot/deepsort/README_cn.md
+++ b/configs/mot/deepsort/README_cn.md
@@ -6,10 +6,11 @@
 - [简介](#简介)
 - [模型库](#模型库)
 - [快速开始](#快速开始)
+- [适配其他检测器](适配其他检测器)
 - [引用](#引用)

 ## 简介
-[DeepSORT](https://arxiv.org/abs/1812.00442)(Deep Cosine Metric Learning SORT) 扩展了原有的[SORT](https://arxiv.org/abs/1703.07402)(Simple Online and Realtime Tracking)算法，增加了一个CNN模型用于在检测器限定的人体部分图像中提取特征，在深度外观描述的基础上整合外观信息，将检出的目标分配和更新到已有的对应轨迹上即进行一个ReID重识别任务。DeepSORT所需的检测框可以由任意一个检测器来生成，然后读入保存的检测结果和视频图片即可进行跟踪预测。ReID模型此处选择[PaddleClas](https://github.com/PaddlePaddle/PaddleClas)提供的`PCB+Pyramid ResNet101`模型。
+[DeepSORT](https://arxiv.org/abs/1812.00442)(Deep Cosine Metric Learning SORT) 扩展了原有的[SORT](https://arxiv.org/abs/1703.07402)(Simple Online and Realtime Tracking)算法，增加了一个CNN模型用于在检测器限定的人体部分图像中提取特征，在深度外观描述的基础上整合外观信息，将检出的目标分配和更新到已有的对应轨迹上即进行一个ReID重识别任务。DeepSORT所需的检测框可以由任意一个检测器来生成，然后读入保存的检测结果和视频图片即可进行跟踪预测。ReID模型此处选择[PaddleClas](https://github.com/PaddlePaddle/PaddleClas)提供的`PCB+Pyramid ResNet101`和`PPLCNet`模型。

 ## 模型库

@@ -17,21 +18,33 @@

 |  骨干网络  |  输入尺寸  |  MOTA  |  IDF1  |  IDS |   FP   |   FN  |  FPS | 检测结果或模型 | ReID模型 |配置文件 |
 | :---------| :------- | :----: | :----: | :--: | :----: | :---: | :---: | :-----:| :-----: | :-----: |
-| ResNet-101 | 1088x608 |  72.2  |  60.5  | 998  |  8054  | 21644 |  - | [检测结果](https://dataset.bj.bcebos.com/mot/det_results_dir.zip) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) |
-| ResNet-101 | 1088x608 |  68.3  |  56.5  | 1722 |  17337 | 15890 |  - | [检测模型](https://paddledet.bj.bcebos.com/models/mot/jde_yolov3_darknet53_30e_1088x608.pdparams) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) |
+| ResNet-101 | 1088x608 |  72.2  |  60.5  | 998  |  8054  | 21644 |  - | [检测结果](https://dataset.bj.bcebos.com/mot/det_results_dir.zip) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](./reid/deepsort_pcb_pyramid_r101.yml) |
+| ResNet-101 | 1088x608 |  68.3  |  56.5  | 1722 |  17337 | 15890 |  - | [检测模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/jde_yolov3_darknet53_30e_1088x608_mix.pdparams) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](./deepsort_jde_yolov3_pcb_pyramid.yml) |
+| PPLCNet    | 1088x608 |  72.2  |  59.5  | 1087  |  8034  | 21481 |  - | [检测结果](https://dataset.bj.bcebos.com/mot/det_results_dir.zip) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams)|[配置文件](./reid/deepsort_pplcnet.yml) |
+| PPLCNet    | 1088x608 |  68.1  |  53.6  | 1979 |  17446 | 15766 |  - | [检测模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/jde_yolov3_darknet53_30e_1088x608_mix.pdparams) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams)|[配置文件](./deepsort_jde_yolov3_pplcnet.yml) |

 ### DeepSORT在MOT-16 Test Set上结果

 |  骨干网络  |  输入尺寸  |  MOTA  |  IDF1  |  IDS |   FP   |   FN  |  FPS | 检测结果或模型 | ReID模型 |配置文件 |
 | :---------| :------- | :----: | :----: | :--: | :----: | :---: | :---: | :-----: | :-----: |:-----: |
-| ResNet-101 | 1088x608 |  64.1  |  53.0  | 1024  |  12457  | 51919 |  - | [检测结果](https://dataset.bj.bcebos.com/mot/det_results_dir.zip) | [ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) |
-| ResNet-101 | 1088x608 |  61.2  |  48.5  | 1799  |  25796  | 43232 |  - | [检测模型](https://paddledet.bj.bcebos.com/models/mot/jde_yolov3_darknet53_30e_1088x608.pdparams)  |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) |
+| ResNet-101 | 1088x608 |  64.1  |  53.0  | 1024  |  12457  | 51919 |  - | [检测结果](https://dataset.bj.bcebos.com/mot/det_results_dir.zip) | [ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](./reid/deepsort_pcb_pyramid_r101.yml) |
+| ResNet-101 | 1088x608 |  61.2  |  48.5  | 1799  |  25796  | 43232 |  - | [检测模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/jde_yolov3_darknet53_30e_1088x608_mix.pdparams)  |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](./deepsort_jde_yolov3_pcb_pyramid.yml) |
+| PPLCNet    | 1088x608 |  64.0  |  51.3  | 1208  |  12697  | 51784 |  - | [检测结果](https://dataset.bj.bcebos.com/mot/det_results_dir.zip) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams)|[配置文件](./reid/deepsort_pplcnet.yml) |
+| PPLCNet    | 1088x608 |  61.1  |  48.8  | 2010 |  25401 | 43432 |  - | [检测模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/jde_yolov3_darknet53_30e_1088x608_mix.pdparams) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams)|[配置文件](./deepsort_jde_yolov3_pplcnet.yml) |

-**注意:**

-DeepSORT不需要训练MOT数据集，只用于评估，现在支持两种评估的方式。
+### DeepSORT在MOT-17 half Val Set上结果

- 第1种方式是加载检测结果文件和ReID模型，在使用DeepSORT模型评估之前，应该首先通过一个检测模型得到检测结果，然后像这样准备好结果文件:
+|  检测训练数据集      |  检测器    |  ReID       |  检测mAP  |  MOTA  |  IDF1  |  FPS | 配置文件 |
+| :--------         | :-----     | :----:      |:------:  | :----: |:-----: |:----:|:----:   |
+| MIX               | JDE YOLOv3 | PCB Pyramid |  -       |  66.9  |  62.7  |   -    |[配置文件](./deepsort_jde_yolov3_pcb_pyramid.yml) |
+| MIX               | JDE YOLOv3 | PPLCNet     |  -       |  66.3  |  62.1  |   -    |[配置文件](./deepsort_jde_yolov3_pplcnet.yml) |
+| pedestrian(未开放) | YOLOv3     | PPLCNet     |  45.4    |  45.8  |  54.3  |   -    |[配置文件](./deepsort_yolov3_pplcnet.yml) |
+| MOT-17 half train | PPYOLOv2   | PPLCNet     |  46.8    |  48.7  |  54.5  |   -    |[配置文件](./deepsort_ppyolov2_pplcnet.yml) |
+
+**注意:**
+DeepSORT不需要训练MOT数据集，只用于评估，现在支持两种评估的方式。
+- **方式1**：加载检测结果文件和ReID模型，在使用DeepSORT模型评估之前，应该首先通过一个检测模型得到检测结果，然后像这样准备好结果文件:
 ```
 det_results_dir
   |——————MOT16-02.txt
@@ -48,45 +61,50 @@ wget https://dataset.bj.bcebos.com/mot/det_results_dir.zip
 ```
 如果使用更强的检测模型，可以取得更好的结果。其中每个txt是每个视频中所有图片的检测结果，每行都描述一个边界框，格式如下：
 ```
-[frame_id],[bb_left],[bb_top],[width],[height],[conf]
+[frame_id],[x0],[y0],[w],[h],[score],[class_id]
 ```
 - `frame_id`是图片帧的序号
- `bb_left`是目标框的左边界的x坐标
- `bb_top`是目标框的上边界的y坐标
- `width,height`是真实的像素宽高
- `conf`是目标得分设置为`1`(已经按检测的得分阈值筛选出的检测结果)
+- `x0,y0`是目标框的左上角x和y坐标
+- `w,h`是目标框的像素宽高
+- `score`是目标框的得分
+- `class_id`是目标框的类别，如果只有1类则是`0`

- 第2种方式是同时加载检测模型和ReID模型，此处选用JDE版本的YOLOv3，具体配置见`configs/mot/deepsort/_base_/deepsort_jde_yolov3_darknet53_pcb_pyramid_r101.yml`。加载其他通用检测模型可参照`configs/mot/deepsort/_base_/deepsort_yolov3_darknet53_pcb_pyramid_r101.yml`进行修改。
+- **方式2**：同时加载检测模型和ReID模型，此处选用JDE版本的YOLOv3，具体配置见`configs/mot/deepsort/deepsort_jde_yolov3_pcb_pyramid.yml`。加载其他通用检测模型可参照`configs/mot/deepsort/deepsort_ppyolov2_pplcnet.yml`进行修改。

 ## 快速开始

 ### 1. 评估

+**方式1**：加载检测结果文件和ReID模型，得到跟踪结果
 ```bash
-# 加载检测结果文件和ReID模型，得到跟踪结果
-CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml --det_results_dir {your detection results}
-
-# 加载JDE YOLOv3行人检测模型和ReID模型，得到跟踪结果
-CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/deepsort/deepsort_jde_yolov3_pcb_pyramid_r101.yml
+CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/deepsort/reid/deepsort_pcb_pyramid_r101.yml --det_results_dir {your detection results}
+# 或者
+CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/deepsort/reid/deepsort_pplcnet.yml --det_results_dir {your detection results}
+```

-# 或者加载普通YOLOv3行人检测模型和ReID模型，得到跟踪结果
-CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/deepsort/deepsort_yolov3_pcb_pyramid_r101.yml --scaled=True
+**方式2**：加载行人检测模型和ReID模型，得到跟踪结果
+```bash
+CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/deepsort/deepsort_jde_yolov3_pcb_pyramid.yml
+# 或者
+CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/deepsort/deepsort_jde_yolov3_pplcnet.yml
+# 或者
+CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/deepsort/deepsort_ppyolov2_pplcnet.yml --scaled=True
 ```
 **注意:**
- JDE YOLOv3行人检测模型是和JDE和FairMOT使用同样的MOT数据集训练的，这个模型与普通YOLOv3模型最大的区别是使用了JDEBBoxPostProcess后处理，结果输出坐标没有缩放回原图。
- 普通YOLOv3行人检测模型不是用MOT数据集训练的，所以精度效果更低, 其模型输出坐标是缩放回原图的。
- `--scaled`表示在模型输出结果的坐标是否已经是缩放回原图的，如果使用的检测模型是JDE的YOLOv3则为False，如果使用通用检测模型则为True。
+ - JDE YOLOv3行人检测模型是和JDE和FairMOT使用同样的MOT数据集训练的，因此MOTA较高。而其他通用检测模型如PPYOLOv2只使用了MOT17 half数据集训练。
+ - JDE YOLOv3模型与通用检测模型如YOLOv3和PPYOLOv2最大的区别是使用了JDEBBoxPostProcess后处理，结果输出坐标没有缩放回原图，而通用检测模型输出坐标是缩放回原图的。
+ - `--scaled`表示在模型输出结果的坐标是否已经是缩放回原图的，如果使用的检测模型是JDE YOLOv3则为False，如果使用通用检测模型则为True, 默认值是False。

 ### 2. 预测

 使用单个GPU通过如下命令预测一个视频，并保存为视频

 ```bash
-# 加载JDE YOLOv3行人检测模型和ReID模型，并保存为视频
-CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/deepsort/deepsort_jde_yolov3_pcb_pyramid_r101.yml --video_file={your video name}.mp4  --save_videos
+# 加载JDE YOLOv3行人检测模型和PCB Pyramid ReID模型，并保存为视频
+CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/deepsort/deepsort_jde_yolov3_pcb_pyramid.yml --video_file={your video name}.mp4  --save_videos

-# 或者加载普通YOLOv3行人检测模型和ReID模型，并保存为视频
-CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/deepsort/deepsort_yolov3_pcb_pyramid_r101.yml --video_file={your video name}.mp4 --scaled=True --save_videos
+# 或者加载PPYOLOv2行人检测模型和PPLCNet ReID模型，并保存为视频
+CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/deepsort/deepsort_ppyolov2_pplcnet.yml --video_file={your video name}.mp4 --scaled=True --save_videos
 ```

 **注意:**
@@ -96,32 +114,77 @@ CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/deepsort/deepsor

 ### 3. 导出预测模型

+Step 1：导出检测模型
 ```bash
-# 1.先导出检测模型
 # 导出JDE YOLOv3行人检测模型
-CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/deepsort/jde_yolov3_darknet53_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/jde_yolov3_darknet53_30e_1088x608.pdparams
-
-# 或导出普通YOLOv3行人检测模型
-CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/pedestrian/pedestrian_yolov3_darknet.yml -o weights=https://paddledet.bj.bcebos.com/models/pedestrian_yolov3_darknet.pdparams
+CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/deepsort/detector/jde_yolov3_darknet53_30e_1088x608_mix.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/deepsort/jde_yolov3_darknet53_30e_1088x608_mix.pdparams

+# 或导出PPYOLOv2行人检测模型
+CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/deepsort/detector/ppyolov2_r50vd_dcn_365e_640x640_mot17half.yml -o weights=https://paddledet.bj.bcebos.com/mot/deepsort/ppyolov2_r50vd_dcn_365e_640x640_mot17half.pdparams
+```

-# 2.再导出ReID模型
-CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml -o reid_weights=https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams
+Step 2：导出ReID模型
+```bash
+# 导出PCB Pyramid ReID模型
+CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/deepsort/reid/deepsort_pcb_pyramid_r101.yml -o reid_weights=https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams
+# 或者导出PPLCNet ReID模型
+CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/deepsort/reid/deepsort_pplcnet.yml -o reid_weights=https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams
 ```

 ### 4. 用导出的模型基于Python去预测

 ```bash
-# 用导出JDE YOLOv3行人检测模型
-python deploy/python/mot_sde_infer.py --model_dir=output_inference/jde_yolov3_darknet53_30e_1088x608/ --reid_model_dir=output_inference/deepsort_pcb_pyramid_r101/ --video_file={your video name}.mp4 --device=GPU --save_mot_txts
+# 用导出JDE YOLOv3行人检测模型和PCB Pyramid ReID模型
+python deploy/python/mot_sde_infer.py --model_dir=output_inference/jde_yolov3_darknet53_30e_1088x608_mix/ --reid_model_dir=output_inference/deepsort_pcb_pyramid_r101/ --video_file={your video name}.mp4 --device=GPU --save_mot_txts

-# 或用导出的普通yolov3行人检测模型
-python deploy/python/mot_sde_infer.py --model_dir=output_inference/pedestrian_yolov3_darknet/ --reid_model_dir=output_inference/deepsort_pcb_pyramid_r101/ --video_file={your video name}.mp4 --device=GPU --scaled=True --save_mot_txts
+# 或用导出的PPYOLOv2行人检测模型和PPLCNet ReID模型
+python deploy/python/mot_sde_infer.py --model_dir=output_inference/ppyolov2_r50vd_dcn_365e_640x640_mot17half/ --reid_model_dir=output_inference/deepsort_pplcnet/ --video_file={your video name}.mp4 --device=GPU --scaled=True --save_mot_txts
 ```
 **注意:**
 跟踪模型是对视频进行预测，不支持单张图的预测，默认保存跟踪结果可视化后的视频，可添加`--save_mot_txts`(对每个视频保存一个txt)或`--save_mot_txt_per_img`(对每张图片保存一个txt)表示保存跟踪结果的txt文件，或`--save_images`表示保存跟踪结果可视化图片。
 `--scaled`表示在模型输出结果的坐标是否已经是缩放回原图的，如果使用的检测模型是JDE的YOLOv3则为False，如果使用通用检测模型则为True。

+
+## 适配其他检测器
+
+### 1、配置文件目录说明
+- `detector/xxx.yml`是纯粹的检测模型配置文件，如`detector/ppyolov2_r50vd_dcn_365e_640x640_mot17half.yml`，支持检测的所有流程(train/eval/infer/export/deploy)。DeepSORT跟踪的eval/infer与这个纯检测的yml文件无关，但是export的时候需要这个纯检测的yml单独导出检测模型，DeepSORT跟踪导出模型是分开detector和reid分别导出的，用户可自行定义和组装detector+reid成为一个完整的DeepSORT跟踪系统。
+- `detector/`下的检测器配置文件中，用户需要将自己的数据集转为COCO格式。由于ID的真实标签不需要参与进去，用户可以在此自行配置任何检测模型，只需保证输出结果包含结果框的种类、坐标和分数即可。
+- `reid/deepsort_yyy.yml`文件夹里的是ReID模型和tracker的配置文件，如`reid/deepsort_pplcnet.yml`，此处ReID模型是由[PaddleClas](https://github.com/PaddlePaddle/PaddleClas)提供的`deepsort_pcb_pyramid_r101.yml`和`deepsort_pplcnet.yml`，是在Market1501(751类人)行人ReID数据集上训练得到的，训练细节待PaddleClas公布。
+- `deepsort_xxx_yyy.yml`是一个完整的DeepSORT跟踪的配置，如`deepsort_ppyolov2_pplcnet.yml`，其中检测部分`xxx`是`detector/`里的，reid和tracker部分`yyy`是`reid/`里的。
+- DeepSORT跟踪的eval/infer有两种方式，方式1是只使用`reid/deepsort_yyy.yml`加载检测结果文件和`yyy`ReID模型，方式2是使用`deepsort_xxx_yyy.yml`加载`xxx`检测模型和`yyy`ReID模型，但是DeepSORT跟踪的deploy必须使用`deepsort_xxx_yyy.yml`。
+- 检测器的eval/infer/deploy只使用到`detector/xxx.yml`，ReID一般不单独使用，如需单独使用必须提前加载检测结果文件然后只使用`reid/deepsort_yyy.yml`。
+
+
+### 2、适配的具体步骤
+1.先将数据集制作成COCO格式按通用检测模型配置来训练，参照`detector/`文件夹里的模型配置文件，制作生成`detector/xxx.yml`, 已经支持有Faster R-CNN、YOLOv3、PPYOLOv2、JDE YOLOv3和PicoDet等模型。
+
+2.制作`deepsort_xxx_yyy.yml`, 其中`DeepSORT.detector`的配置就是`detector/xxx.yml`里的, `EvalMOTDataset`和`det_weights`可以自行设置。`yyy`是`reid/deepsort_yyy.yml`如`reid/deepsort_pplcnet.yml`。
+
+### 3、使用的具体步骤
+#### 1.加载检测模型和ReID模型去评估:
+```
+CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/deepsort/deepsort_xxx_yyy.yml --scaled=True
+```
+#### 2.加载检测模型和ReID模型去推理:
+```
+CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/deepsort/deepsort_xxx_yyy.yml --video_file={your video name}.mp4 --scaled=True --save_videos
+```
+#### 3.导出检测模型和ReID模型:
+```bash
+# 导出检测模型
+CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/deepsort/detector/xxx.yml
+# 导出ReID模型
+CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/deepsort/reid/deepsort_yyy.yml
+```
+#### 4.使用导出的检测模型和ReID模型去部署:
+```
+python deploy/python/mot_sde_infer.py --model_dir=output_inference/xxx./ --reid_model_dir=output_inference/deepsort_yyy/ --video_file={your video name}.mp4 --device=GPU --scaled=True --save_mot_txts
+```
+**注意:**
+ `--scaled`表示在模型输出结果的坐标是否已经是缩放回原图的，如果使用的检测模型是JDE的YOLOv3则为False，如果使用通用检测模型则为True。
+
+
 ## 引用
 ```
 @inproceedings{Wojke2017simple,

--- a/configs/mot/deepsort/deepsort_faster_rcnn_fpn_pplcnet.yml
+++ b/configs/mot/deepsort/deepsort_faster_rcnn_fpn_pplcnet.yml
+_BASE_: [
+  'detector/faster_rcnn_r50_fpn_2x_1333x800_mot17half.yml',
+  '../../datasets/mot.yml',
+  '../../runtime.yml',
+  '_base_/deepsort_reader_1088x608.yml',
+]
+metric: MOT
+num_classes: 1
+
+EvalMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot
+    data_root: MOT17/images/half
+    keep_ori_im: True # set as True in DeepSORT
+
+det_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort/faster_rcnn_r50_fpn_2x_1333x800_mot17half.pdparams
+reid_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams
+
+
+# DeepSORT configuration
+architecture: DeepSORT
+pretrain_weights: None
+
+DeepSORT:
+  detector: FasterRCNN
+  reid: PPLCNetEmbedding
+  tracker: DeepSORTTracker
+
+
+# reid and tracker configuration
+# see 'configs/mot/deepsort/reid/deepsort_pplcnet.yml'
+PPLCNetEmbedding:
+  input_ch: 1280
+  output_ch: 512
+
+DeepSORTTracker:
+  input_size: [64, 192]
+  min_box_area: 0
+  vertical_ratio: -1
+  budget: 100
+  max_age: 70
+  n_init: 3
+  metric_type: cosine
+  matching_threshold: 0.2
+  max_iou_distance: 0.9
+  motion: KalmanFilter
+
+
+# detector configuration
+# see 'configs/mot/deepsort/detector/faster_rcnn_r50_fpn_2x_1333x800_mot17half.yml'
+FasterRCNN:
+  backbone: ResNet
+  neck: FPN
+  rpn_head: RPNHead
+  bbox_head: BBoxHead
+  bbox_post_process: BBoxPostProcess
+
+# Tracking requires higher quality boxes, so nms.score_threshold will be higher
+BBoxPostProcess:
+  decode: RCNNBox
+  nms:
+    name: MultiClassNMS
+    keep_top_k: 100
+    score_threshold: 0.2 # 0.05 in original detector
+    nms_threshold: 0.5
--- a/configs/mot/deepsort/_base_/deepsort_jde_yolov3_darknet53_pcb_pyramid_r101.yml
+++ b/configs/mot/deepsort/_base_/deepsort_jde_yolov3_darknet53_pcb_pyramid_r101.yml
+_BASE_: [
+  'detector/jde_yolov3_darknet53_30e_1088x608_mix.yml',
+  '../../datasets/mot.yml',
+  '../../runtime.yml',
+  '_base_/deepsort_reader_1088x608.yml',
+]
+metric: MOT
+num_classes: 1
+
+EvalMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot
+    data_root: MOT16/images/train
+    keep_ori_im: True # set as True in DeepSORT
+
+det_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort/jde_yolov3_darknet53_30e_1088x608_mix.pdparams
+reid_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams
+
+
+# DeepSORT configuration
 architecture: DeepSORT
 pretrain_weights: None

 DeepSORT:
-  detector: YOLOv3 # JDE version
+  detector: YOLOv3 # JDE version YOLOv3
  reid: PCBPyramid
  tracker: DeepSORTTracker

+
+# reid and tracker configuration
+# see 'configs/mot/deepsort/reid/deepsort_pcb_pyramid_r101.yml'
 PCBPyramid:
  num_conv_out_channels: 128
  num_classes: 751

 DeepSORTTracker:
+  input_size: [64, 192]
+  min_box_area: 0
+  vertical_ratio: -1
  budget: 100
  max_age: 70
  n_init: 3
@@ -20,30 +46,16 @@ DeepSORTTracker:
  motion: KalmanFilter


-# JDE version YOLOv3 detector for MOT dataset.
-# The most obvious difference is JDEBBoxPostProcess and the bboxes coordinates
-# output are not scaled to the original image.
+# detector configuration: JDE version YOLOv3
+# see 'configs/mot/deepsort/detector/jde_yolov3_darknet53_30e_1088x608_mix.yml'
+# The most obvious difference from general YOLOv3 is the JDEBBoxPostProcess and the bboxes coordinates output are not scaled to the original image.
 YOLOv3:
  backbone: DarkNet
  neck: YOLOv3FPN
  yolo_head: YOLOv3Head
  post_process: JDEBBoxPostProcess

-DarkNet:
-  depth: 53
-  return_idx: [2, 3, 4]
-  freeze_norm: True
-
-YOLOv3FPN:
-  freeze_norm: True
-
-YOLOv3Head:
-  anchors: [[128,384], [180,540], [256,640], [512,640],
-            [32,96], [45,135], [64,192], [90,271],
-            [8,24], [11,34], [16,48], [23,68]]
-  anchor_masks: [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]
-  loss: JDEDetectionLoss
-
+# Tracking requires higher quality boxes, so decode.conf_thresh will be higher
 JDEBBoxPostProcess:
  decode:
    name: JDEBox

--- a/configs/mot/deepsort/deepsort_jde_yolov3_pcb_pyramid_r101.yml
+++ b/configs/mot/deepsort/deepsort_jde_yolov3_pcb_pyramid_r101.yml
-_BASE_: [
-  '../../datasets/mot.yml',
-  '../../runtime.yml',
-  '_base_/deepsort_jde_yolov3_darknet53_pcb_pyramid_r101.yml',
-  '_base_/deepsort_reader_1088x608.yml',
-]
-
-EvalMOTDataset:
-  !MOTImageFolder
-    dataset_dir: dataset/mot
-    data_root: MOT16/images/train
-    keep_ori_im: True # set as True in DeepSORT
-
-det_weights: https://paddledet.bj.bcebos.com/models/mot/jde_yolov3_darknet53_30e_1088x608.pdparams
-reid_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams
-
-DeepSORT:
-  detector: YOLOv3
-  reid: PCBPyramid
-  tracker: DeepSORTTracker
-
-# JDE version YOLOv3 detector for MOT dataset.
-# The most obvious difference is JDEBBoxPostProcess and the bboxes coordinates
-# output are not scaled to the original image.
-YOLOv3:
-  backbone: DarkNet
-  neck: YOLOv3FPN
-  yolo_head: YOLOv3Head
-  post_process: JDEBBoxPostProcess
--- a/configs/mot/deepsort/deepsort_jde_yolov3_pplcnet.yml
+++ b/configs/mot/deepsort/deepsort_jde_yolov3_pplcnet.yml
+_BASE_: [
+  'detector/jde_yolov3_darknet53_30e_1088x608_mix.yml',
+  '../../datasets/mot.yml',
+  '../../runtime.yml',
+  '_base_/deepsort_reader_1088x608.yml',
+]
+metric: MOT
+num_classes: 1
+
+EvalMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot
+    data_root: MOT16/images/train
+    keep_ori_im: True # set as True in DeepSORT
+
+det_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort/jde_yolov3_darknet53_30e_1088x608_mix.pdparams
+reid_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams
+
+
+# DeepSORT configuration
+architecture: DeepSORT
+pretrain_weights: None
+
+DeepSORT:
+  detector: YOLOv3 # JDE version YOLOv3
+  reid: PPLCNetEmbedding
+  tracker: DeepSORTTracker
+
+
+# reid and tracker configuration
+# see 'configs/mot/deepsort/reid/deepsort_pplcnet.yml'
+PPLCNetEmbedding:
+  input_ch: 1280
+  output_ch: 512
+
+DeepSORTTracker:
+  input_size: [64, 192]
+  min_box_area: 0
+  vertical_ratio: -1
+  budget: 100
+  max_age: 70
+  n_init: 3
+  metric_type: cosine
+  matching_threshold: 0.2
+  max_iou_distance: 0.9
+  motion: KalmanFilter
+
+
+# detector configuration: JDE version YOLOv3
+# see 'configs/mot/deepsort/detector/jde_yolov3_darknet53_30e_1088x608_mix.yml'
+# The most obvious difference from general YOLOv3 is the JDEBBoxPostProcess and the bboxes coordinates output are not scaled to the original image.
+YOLOv3:
+  backbone: DarkNet
+  neck: YOLOv3FPN
+  yolo_head: YOLOv3Head
+  post_process: JDEBBoxPostProcess
+
+# Tracking requires higher quality boxes, so decode.conf_thresh will be higher
+JDEBBoxPostProcess:
+  decode:
+    name: JDEBox
+    conf_thresh: 0.3
+    downsample_ratio: 32
+  nms:
+    name: MultiClassNMS
+    keep_top_k: 500
+    score_threshold: 0.01
+    nms_threshold: 0.5
+    nms_top_k: 2000
+    normalized: true
+  return_idx: false
--- a/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml
+++ b/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml
-_BASE_: [
-  '../../datasets/mot.yml',
-  '../../runtime.yml',
-  '_base_/deepsort_yolov3_darknet53_pcb_pyramid_r101.yml',
-  '_base_/deepsort_reader_1088x608.yml',
-]
-
-EvalMOTDataset:
-  !MOTImageFolder
-    dataset_dir: dataset/mot
-    data_root: MOT16/images/train
-    keep_ori_im: True # set as True in DeepSORT
-
-det_weights: None
-reid_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams
-
-DeepSORT:
-  detector: None
-  reid: PCBPyramid
-  tracker: DeepSORTTracker
--- a/configs/mot/deepsort/deepsort_picodet_pplcnet.yml
+++ b/configs/mot/deepsort/deepsort_picodet_pplcnet.yml
+_BASE_: [
+  'detector/picodet_l_esnet_300e_896x896_mot17half.yml',
+  '../../datasets/mot.yml',
+  '../../runtime.yml',
+  '_base_/deepsort_reader_1088x608.yml',
+]
+metric: MOT
+num_classes: 1
+
+EvalMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot
+    data_root: MOT17/images/half
+    keep_ori_im: True # set as True in DeepSORT
+
+det_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort/picodet_l_esnet_300e_896x896_mot17half.pdparams
+reid_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams
+
+
+# DeepSORT configuration
+architecture: DeepSORT
+pretrain_weights: None
+
+DeepSORT:
+  detector: PicoDet
+  reid: PPLCNetEmbedding
+  tracker: DeepSORTTracker
+
+
+# reid and tracker configuration
+# see 'configs/mot/deepsort/reid/deepsort_pplcnet.yml'
+PPLCNetEmbedding:
+  input_ch: 1280
+  output_ch: 512
+
+DeepSORTTracker:
+  input_size: [64, 192]
+  min_box_area: 0
+  vertical_ratio: -1
+  budget: 100
+  max_age: 70
+  n_init: 3
+  metric_type: cosine
+  matching_threshold: 0.2
+  max_iou_distance: 0.9
+  motion: KalmanFilter
+
+
+# detector configuration
+# see 'configs/mot/deepsort/detector/picodet_l_esnet_300e_640x640_mot17half.yml'
+PicoDet:
+  backbone: ESNet
+  neck: CSPPAN
+  head: PicoHead
+
+PicoHead:
+  conv_feat:
+    name: PicoFeat
+    feat_in: 128
+    feat_out: 128
+    num_convs: 4
+    num_fpn_stride: 4
+    norm_type: bn
+    share_cls_reg: False
+  fpn_stride: [8, 16, 32, 64]
+  feat_in_chan: 128
+  prior_prob: 0.01
+  reg_max: 7
+  cell_offset: 0.5
+  loss_class:
+    name: VarifocalLoss
+    use_sigmoid: True
+    iou_weighted: True
+    loss_weight: 1.0
+  loss_dfl:
+    name: DistributionFocalLoss
+    loss_weight: 0.25
+  loss_bbox:
+    name: GIoULoss
+    loss_weight: 2.0
+  assigner:
+    name: SimOTAAssigner
+    candidate_topk: 10
+    iou_weight: 6
+  nms:
+    name: MultiClassNMS
+    nms_top_k: 1000
+    keep_top_k: 100
+    score_threshold: 0.25 # 0.025 in original detector
+    nms_threshold: 0.6
+
--- a/configs/mot/deepsort/deepsort_ppyolov2_pplcnet.yml
+++ b/configs/mot/deepsort/deepsort_ppyolov2_pplcnet.yml
+_BASE_: [
+  'detector/ppyolov2_r50vd_dcn_365e_640x640_mot17half.yml',
+  '../../datasets/mot.yml',
+  '../../runtime.yml',
+  '_base_/deepsort_reader_1088x608.yml',
+]
+metric: MOT
+num_classes: 1
+
+EvalMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot
+    data_root: MOT17/images/half
+    keep_ori_im: True # set as True in DeepSORT
+
+det_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort/ppyolov2_r50vd_dcn_365e_640x640_mot17half.pdparams
+reid_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams
+
+
+# DeepSORT configuration
+architecture: DeepSORT
+pretrain_weights: None
+
+DeepSORT:
+  detector: YOLOv3 # PPYOLOv2 version
+  reid: PPLCNetEmbedding
+  tracker: DeepSORTTracker
+
+
+# reid and tracker configuration
+# see 'configs/mot/deepsort/reid/deepsort_pplcnet.yml'
+PPLCNetEmbedding:
+  input_ch: 1280
+  output_ch: 512
+
+DeepSORTTracker:
+  input_size: [64, 192]
+  min_box_area: 0
+  vertical_ratio: -1
+  budget: 100
+  max_age: 70
+  n_init: 3
+  metric_type: cosine
+  matching_threshold: 0.2
+  max_iou_distance: 0.9
+  motion: KalmanFilter
+
+
+# detector configuration: PPYOLOv2 version
+# see 'configs/mot/deepsort/detector/ppyolov2_r50vd_dcn_365e_640x640_mot17half.yml'
+YOLOv3:
+  backbone: ResNet
+  neck: PPYOLOPAN
+  yolo_head: YOLOv3Head
+  post_process: BBoxPostProcess
+
+ResNet:
+  depth: 50
+  variant: d
+  return_idx: [1, 2, 3]
+  dcn_v2_stages: [3]
+  freeze_at: -1
+  freeze_norm: false
+  norm_decay: 0.
+
+# Tracking requires higher quality boxes, so decode.conf_thresh will be higher
+BBoxPostProcess:
+  decode:
+    name: YOLOBox
+    conf_thresh: 0.25 # 0.01 in original detector
+    downsample_ratio: 32
+    clip_bbox: true
+    scale_x_y: 1.05
+  nms:
+    name: MatrixNMS
+    keep_top_k: 100
+    score_threshold: 0.25 # 0.01 in original detector
+    post_threshold: 0.25 # 0.01 in original detector
+    nms_top_k: -1
+    background_label: -1
--- a/configs/mot/deepsort/deepsort_yolov3_pcb_pyramid_r101.yml
+++ b/configs/mot/deepsort/deepsort_yolov3_pcb_pyramid_r101.yml
-_BASE_: [
-  '../../datasets/mot.yml',
-  '../../runtime.yml',
-  '_base_/deepsort_yolov3_darknet53_pcb_pyramid_r101.yml',
-  '_base_/deepsort_reader_1088x608.yml',
-]
-
-EvalMOTDataset:
-  !MOTImageFolder
-    dataset_dir: dataset/mot
-    data_root: MOT16/images/train
-    keep_ori_im: True # set as True in DeepSORT
-
-det_weights: https://paddledet.bj.bcebos.com/models/pedestrian_yolov3_darknet.pdparams
-reid_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams
-
-DeepSORT:
-  detector: YOLOv3
-  reid: PCBPyramid
-  tracker: DeepSORTTracker
-
-# General version YOLOv3
-# Using BBoxPostProcess and the bboxes output are scaled to the original image.
-YOLOv3:
-  backbone: DarkNet
-  neck: YOLOv3FPN
-  yolo_head: YOLOv3Head
-  post_process: BBoxPostProcess
--- a/configs/mot/deepsort/deepsort_yolov3_pplcnet.yml
+++ b/configs/mot/deepsort/deepsort_yolov3_pplcnet.yml
+_BASE_: [
+  'detector/yolov3_darknet53_270e_608x608_pedestrian.yml',
+  '../../datasets/mot.yml',
+  '../../runtime.yml',
+  '_base_/deepsort_reader_1088x608.yml',
+]
+metric: MOT
+num_classes: 1
+
+EvalMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot
+    data_root: MOT17/images/half
+    keep_ori_im: True # set as True in DeepSORT
+
+det_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort/yolov3_darknet53_270e_608x608_pedestrian.pdparams
+reid_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams
+
+
+# DeepSORT configuration
+architecture: DeepSORT
+pretrain_weights: None
+
+DeepSORT:
+  detector: YOLOv3 # General YOLOv3 version
+  reid: PPLCNetEmbedding
+  tracker: DeepSORTTracker
+
+
+# reid and tracker configuration
+# see 'configs/mot/deepsort/reid/deepsort_pplcnet.yml'
+PPLCNetEmbedding:
+  input_ch: 1280
+  output_ch: 512
+
+DeepSORTTracker:
+  input_size: [64, 192]
+  min_box_area: 0
+  vertical_ratio: -1
+  budget: 100
+  max_age: 70
+  n_init: 3
+  metric_type: cosine
+  matching_threshold: 0.2
+  max_iou_distance: 0.9
+  motion: KalmanFilter
+
+
+# detector configuration: General YOLOv3 version
+# see 'configs/mot/deepsort/detector/yolov3_darknet53_270e_608x608_pedestrian.yml'
+YOLOv3:
+  backbone: DarkNet
+  neck: YOLOv3FPN
+  yolo_head: YOLOv3Head
+  post_process: BBoxPostProcess
+
+# Tracking requires higher quality boxes, so decode.conf_thresh will be higher
+BBoxPostProcess:
+  decode:
+    name: YOLOBox
+    conf_thresh: 0.1 # 0.005 in original detector
+    downsample_ratio: 32
+    clip_bbox: true
+  nms:
+    name: MultiClassNMS
+    keep_top_k: 100
+    score_threshold: 0.01
+    nms_threshold: 0.45
+    nms_top_k: 1000
--- a/configs/mot/deepsort/detector/README.md
+++ b/configs/mot/deepsort/detector/README.md
+English | [简体中文](README_cn.md)
+
+# Detector for DeepSORT
+
+## Introduction
+[DeepSORT](https://arxiv.org/abs/1812.00442)(Deep Cosine Metric Learning SORT) is composed of a detector and a ReID model in series. The configs of several common detectors are provided here as a reference. Note that different training dataset, backbone, input size, training epochs and NMS threshold will lead to differences in model accuracy and performance. Please adapt according to your needs.
+
+## Model Zoo
+### Results on MOT17-half dataset
+| Backbone        | Model           | input size  | lr schedule |  FPS          | Box AP  |  download    | config  |
+| :-------------- | :-------------  | :--------:  | :---------: | :-----------: | :-----: | :----------: | :-----: |
+| ResNet50-vd     | PPYOLOv2        |   640x640   |   365e      |      ----     |  46.8   | [download](https://paddledet.bj.bcebos.com/models/mot/deepsort/ppyolov2_r50vd_dcn_365e_640x640_mot17half.pdparams)  | [config](./ppyolov2_r50vd_dcn_365e_640x640_mot17half.yml) |
+| ResNet50-FPN    | Faster R-CNN    |   1333x800  |   1x        |      ----     |  44.2   | [download](https://paddledet.bj.bcebos.com/models/mot/deepsort/faster_rcnn_r50_fpn_2x_1333x800_mot17half.pdparams)  | [config](./faster_rcnn_r50_fpn_2x_1333x800_mot17half.yml) |
+| DarkNet-53      | YOLOv3          |   608X608   |   270e      |      ----     |  45.4   | [download](https://paddledet.bj.bcebos.com/models/mot/deepsort/yolov3_darknet53_270e_608x608_pedestrian.pdparams)  | [config](./yolov3_darknet53_270e_608x608_pedestrian.yml) |
+| ESNet           | PicoDet         |    896x896  |   300e      |      ----     |  40.9   | [download](https://paddledet.bj.bcebos.com/models/mot/deepsort/picodet_l_esnet_300e_896x896_mot17half.pdparams)     | [config](./picodet_l_esnet_300e_896x896_mot17half.yml)    |
+
+**Notes:**
+  - The above model except for YOLOv3 is trained with **MOT17-half train** set.
+  - **MOT17-half train** set is a dataset composed of pictures and labels of the first half frame of each video in MOT17 Train dataset (7 sequences in total). **MOT17-half val set** is used for evaluation, which is composed of the second half frame of each video. They can be downloaded from this [link](https://paddledet.bj.bcebos.com/data/mot/mot17half/annotations.zip). Download and unzip it in the `dataset/mot/MOT17/images/`folder.
+  - YOLOv3 is trained with the same pedestrian dataset as `configs/pedestrian/pedestrian_yolov3_darknet.yml`, which is not open yet.
+  - For pedestrian tracking, please use pedestrian detector combined with pedestrian ReID model. For vehicle tracking, please use vehicle detector combined with vehicle ReID model.
+  - High quality detected boxes are required for DeepSORT tracking, so the post-processing settings such as NMS threshold of these models are different from those in pure detection tasks.
+
+## Quick Start
+
+Start the training and evaluation with the following command
+```bash
+job_name=ppyolov2_r50vd_dcn_365e_640x640_mot17half
+config=configs/mot/deepsort/detector/${job_name}.yml
+log_dir=log_dir/${job_name}
+# 1. training
+python -m paddle.distributed.launch --log_dir=${log_dir} --gpus 0,1,2,3,4,5,6,7 tools/train.py -c ${config}
+# 2. evaluation
+CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c ${config} -o weights=https://paddledet.bj.bcebos.com/models/mot/deepsort/${job_name}.pdparams
+```
--- a/configs/mot/deepsort/detector/README_cn.md
+++ b/configs/mot/deepsort/detector/README_cn.md
+简体中文 | [English](README.md)
+
+# DeepSORT的检测器
+
+## 简介
+[DeepSORT](https://arxiv.org/abs/1812.00442)(Deep Cosine Metric Learning SORT) 由检测器和ReID模型串联组合而成，此处提供了几个常用检测器的配置作为参考。由于训练数据集、输入尺度、训练epoch数、NMS阈值设置等的不同均会导致模型精度和性能的差异，请自行根据需求进行适配。
+
+## 模型库
+
+### 在MOT17-half val数据集上的检测结果
+| 骨架网络         | 网络类型          |   输入尺度   | 学习率策略    |推理时间(fps)   |  Box AP |   下载    | 配置文件 |
+| :-------------- | :-------------  | :--------:  | :---------: | :-----------: | :-----: | :------: | :-----: |
+| ResNet50-vd     | PPYOLOv2        |   640x640   |   365e      |      ----     |  46.8   | [下载链接](https://paddledet.bj.bcebos.com/models/mot/deepsort/ppyolov2_r50vd_dcn_365e_640x640_mot17half.pdparams)  | [配置文件](./ppyolov2_r50vd_dcn_365e_640x640_mot17half.yml) |
+| ResNet50-FPN    | Faster R-CNN    |   1333x800  |   1x        |      ----     |  44.2   | [下载链接](https://paddledet.bj.bcebos.com/models/mot/deepsort/faster_rcnn_r50_fpn_2x_1333x800_mot17half.pdparams)  | [配置文件](./faster_rcnn_r50_fpn_2x_1333x800_mot17half.yml) |
+| DarkNet-53      | YOLOv3          |   608X608   |   270e      |      ----     |  45.4   | [下载链接](https://paddledet.bj.bcebos.com/models/mot/deepsort/yolov3_darknet53_270e_608x608_pedestrian.pdparams)  | [配置文件](./yolov3_darknet53_270e_608x608_pedestrian.yml) |
+| ESNet           | PicoDet         |   896x896   |   300e      |      ----     |  40.9   | [下载链接](https://paddledet.bj.bcebos.com/models/mot/deepsort/picodet_l_esnet_300e_896x896_mot17half.pdparams)     | [配置文件](./picodet_l_esnet_300e_896x896_mot17half.yml)    |
+
+**注意:**
+  - 以上模型除了YOLOv3以外均采用**MOT17-half train**数据集训练。
+  - **MOT17-half train**是MOT17的train序列(共7个)每个视频的前一半帧的图片和标注组成的数据集，而为了验证精度可以都用**MOT17-half val**数据集去评估，它是每个视频的后一半帧组成的，数据集可以从[此链接](https://paddledet.bj.bcebos.com/data/mot/mot17half/annotations.zip)下载，并解压放在`dataset/mot/MOT17/images/`文件夹下。
+  - YOLOv3和`configs/pedestrian/pedestrian_yolov3_darknet.yml`是相同的pedestrian数据集训练的，此数据集暂未开放。
+  - 行人跟踪请使用行人检测器结合行人ReID模型。车辆跟踪请使用车辆检测器结合车辆ReID模型。
+  - 用于DeepSORT跟踪时需要高质量的检出框，因此这些模型的NMS阈值等后处理设置会与纯检测任务的设置不同。
+
+
+## 快速开始
+
+通过如下命令一键式启动训练和评估
+```bash
+job_name=ppyolov2_r50vd_dcn_365e_640x640_mot17half
+config=configs/mot/deepsort/detector/${job_name}.yml
+log_dir=log_dir/${job_name}
+# 1. training
+python -m paddle.distributed.launch --log_dir=${log_dir} --gpus 0,1,2,3,4,5,6,7 tools/train.py -c ${config}
+# 2. evaluation
+CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c ${config} -o weights=https://paddledet.bj.bcebos.com/models/mot/deepsort/${job_name}.pdparams
+```
--- a/configs/mot/deepsort/detector/faster_rcnn_r50_fpn_2x_1333x800_mot17half.yml
+++ b/configs/mot/deepsort/detector/faster_rcnn_r50_fpn_2x_1333x800_mot17half.yml
+_BASE_: [
+  '../../../faster_rcnn/faster_rcnn_r50_fpn_2x_coco.yml',
+]
+weights: output/faster_rcnn_r50_fpn_2x_1333x800_mot17half/model_final
+
+num_classes: 1
+TrainDataset:
+  !COCODataSet
+    dataset_dir: dataset/mot/MOT17/images
+    anno_path: annotations/train_half.json
+    image_dir: train
+    data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
+
+EvalDataset:
+  !COCODataSet
+    dataset_dir: dataset/mot/MOT17/images
+    anno_path: annotations/val_half.json
+    image_dir: train
+
+
+# detector configuration
+architecture: FasterRCNN
+pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_cos_pretrained.pdparams
+
+FasterRCNN:
+  backbone: ResNet
+  neck: FPN
+  rpn_head: RPNHead
+  bbox_head: BBoxHead
+  bbox_post_process: BBoxPostProcess
+
+ResNet:
+  depth: 50
+  norm_type: bn
+  freeze_at: 0
+  return_idx: [0,1,2,3]
+  num_stages: 4
+
+FPN:
+  out_channel: 256
+
+RPNHead:
+  anchor_generator:
+    aspect_ratios: [0.5, 1.0, 2.0]
+    anchor_sizes: [[32], [64], [128], [256], [512]]
+    strides: [4, 8, 16, 32, 64]
+  rpn_target_assign:
+    batch_size_per_im: 256
+    fg_fraction: 0.5
+    negative_overlap: 0.3
+    positive_overlap: 0.7
+    use_random: True
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 2000
+    post_nms_top_n: 1000
+    topk_after_collect: True
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 1000
+    post_nms_top_n: 1000
+
+BBoxHead:
+  head: TwoFCHead
+  roi_extractor:
+    resolution: 7
+    sampling_ratio: 0
+    aligned: True
+  bbox_assigner: BBoxAssigner
+
+BBoxAssigner:
+  batch_size_per_im: 512
+  bg_thresh: 0.5
+  fg_thresh: 0.5
+  fg_fraction: 0.25
+  use_random: True
+
+TwoFCHead:
+  out_channel: 1024
+
+BBoxPostProcess:
+  decode: RCNNBox
+  nms:
+    name: MultiClassNMS
+    keep_top_k: 100
+    score_threshold: 0.05
+    nms_threshold: 0.5
--- a/configs/mot/deepsort/jde_yolov3_darknet53_30e_1088x608.yml
+++ b/configs/mot/deepsort/jde_yolov3_darknet53_30e_1088x608.yml
 _BASE_: [
-  '../../datasets/mot.yml',
-  '../../runtime.yml',
-  '../jde/_base_/optimizer_30e.yml',
-  '../jde/_base_/jde_reader_1088x608.yml',
+  '../../../datasets/mot.yml',
+  '../../../runtime.yml',
+  '../../jde/_base_/optimizer_30e.yml',
+  '../../jde/_base_/jde_reader_1088x608.yml',
 ]
-weights: output/jde_yolov3_darknet53_30e_1088x608/model_final
+weights: output/jde_yolov3_darknet53_30e_1088x608_mix/model_final

 metric: MOTDet
-
+num_classes: 1
 EvalReader:
  inputs_def:
    num_max_boxes: 50
@@ -31,13 +31,15 @@ TestReader:
 EvalDataset:
  !MOTDataSet
    dataset_dir: dataset/mot
-    image_lists: ['mot16.train']
+    image_lists: ['mot17.half']
    data_fields: ['image', 'gt_bbox', 'gt_class', 'gt_ide']

 TestDataset:
  !ImageFolder
    anno_path: None

+
+# detector configuration
 architecture: YOLOv3
 pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/DarkNet53_pretrained.pdparams


--- a/configs/mot/deepsort/detector/picodet_l_esnet_300e_896x896_mot17half.yml
+++ b/configs/mot/deepsort/detector/picodet_l_esnet_300e_896x896_mot17half.yml
+_BASE_: [
+  '../../../picodet/picodet_l_640_coco.yml',
+]
+weights: output/picodet_l_esnet_300e_896x896_mot17half/model_final
+
+num_classes: 1
+TrainDataset:
+  !COCODataSet
+    dataset_dir: dataset/mot/MOT17/images
+    anno_path: annotations/train_half.json
+    image_dir: train
+    data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
+
+EvalDataset:
+  !COCODataSet
+    dataset_dir: dataset/mot/MOT17/images
+    anno_path: annotations/val_half.json
+    image_dir: train
+
+worker_num: 6
+TrainReader:
+  sample_transforms:
+  - Decode: {}
+  - RandomCrop: {}
+  - RandomFlip: {prob: 0.5}
+  - RandomDistort: {}
+  batch_transforms:
+  - BatchRandomResize: {target_size: [832, 864, 896, 928, 960], random_size: True, random_interp: True, keep_ratio: False}
+  - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
+  - Permute: {}
+  batch_size: 32
+  shuffle: true
+  drop_last: true
+  collate_batch: false
+
+EvalReader:
+  sample_transforms:
+  - Decode: {}
+  - Resize: {interp: 2, target_size: [896, 896], keep_ratio: False}
+  - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
+  - Permute: {}
+  batch_transforms:
+  - PadBatch: {pad_to_stride: 32}
+  batch_size: 8
+  shuffle: false
+
+
+# detector configuration
+architecture: PicoDet
+pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ESNet_x1_25_pretrained.pdparams
+find_unused_parameters: True
+use_ema: true
+cycle_epoch: 40
+snapshot_epoch: 10
+epoch: 250
+
+PicoDet:
+  backbone: ESNet
+  neck: CSPPAN
+  head: PicoHead
+
+ESNet:
+  scale: 1.25
+  feature_maps: [4, 11, 14]
+  act: hard_swish
+  channel_ratio: [0.875, 0.5, 1.0, 0.625, 0.5, 0.75, 0.625, 0.625, 0.5, 0.625, 1.0, 0.625, 0.75]
+
+CSPPAN:
+  out_channels: 128
+  use_depthwise: True
+  num_csp_blocks: 1
+  num_features: 4
+
+PicoHead:
+  conv_feat:
+    name: PicoFeat
+    feat_in: 128
+    feat_out: 128
+    num_convs: 4
+    num_fpn_stride: 4
+    norm_type: bn
+    share_cls_reg: False
+  fpn_stride: [8, 16, 32, 64]
+  feat_in_chan: 128
+  prior_prob: 0.01
+  reg_max: 7
+  cell_offset: 0.5
+  loss_class:
+    name: VarifocalLoss
+    use_sigmoid: True
+    iou_weighted: True
+    loss_weight: 1.0
+  loss_dfl:
+    name: DistributionFocalLoss
+    loss_weight: 0.25
+  loss_bbox:
+    name: GIoULoss
+    loss_weight: 2.0
+  assigner:
+    name: SimOTAAssigner
+    candidate_topk: 10
+    iou_weight: 6
+  nms:
+    name: MultiClassNMS
+    nms_top_k: 1000
+    keep_top_k: 100
+    score_threshold: 0.025
+    nms_threshold: 0.6
--- a/configs/mot/deepsort/detector/ppyolov2_r50vd_dcn_365e_640x640_mot17half.yml
+++ b/configs/mot/deepsort/detector/ppyolov2_r50vd_dcn_365e_640x640_mot17half.yml
+_BASE_: [
+  '../../../ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml',
+]
+weights: output/ppyolov2_r50vd_dcn_365e_640x640_mot17half/model_final
+
+num_classes: 1
+TrainDataset:
+  !COCODataSet
+    dataset_dir: dataset/mot/MOT17/images
+    anno_path: annotations/train_half.json
+    image_dir: train
+    data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
+
+EvalDataset:
+  !COCODataSet
+    dataset_dir: dataset/mot/MOT17/images
+    anno_path: annotations/val_half.json
+    image_dir: train
+
+
+# detector configuration
+architecture: YOLOv3
+pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_pretrained.pdparams
+norm_type: sync_bn
+use_ema: true
+ema_decay: 0.9998
+
+YOLOv3:
+  backbone: ResNet
+  neck: PPYOLOPAN
+  yolo_head: YOLOv3Head
+  post_process: BBoxPostProcess
+
+ResNet:
+  depth: 50
+  variant: d
+  return_idx: [1, 2, 3]
+  dcn_v2_stages: [3]
+  freeze_at: -1
+  freeze_norm: false
+  norm_decay: 0.
+
+PPYOLOPAN:
+  drop_block: true
+  block_size: 3
+  keep_prob: 0.9
+  spp: true
+
+YOLOv3Head:
+  anchors: [[10, 13], [16, 30], [33, 23],
+            [30, 61], [62, 45], [59, 119],
+            [116, 90], [156, 198], [373, 326]]
+  anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
+  loss: YOLOv3Loss
+  iou_aware: true
+  iou_aware_factor: 0.5
+
+YOLOv3Loss:
+  ignore_thresh: 0.7
+  downsample: [32, 16, 8]
+  label_smooth: false
+  scale_x_y: 1.05
+  iou_loss: IouLoss
+  iou_aware_loss: IouAwareLoss
+
+IouLoss:
+  loss_weight: 2.5
+  loss_square: true
+
+IouAwareLoss:
+  loss_weight: 1.0
+
+BBoxPostProcess:
+  decode:
+    name: YOLOBox
+    conf_thresh: 0.01
+    downsample_ratio: 32
+    clip_bbox: true
+    scale_x_y: 1.05
+  nms:
+    name: MatrixNMS
+    keep_top_k: 100
+    score_threshold: 0.01
+    post_threshold: 0.01
+    nms_top_k: -1
+    background_label: -1
--- a/configs/mot/deepsort/_base_/deepsort_yolov3_darknet53_pcb_pyramid_r101.yml
+++ b/configs/mot/deepsort/_base_/deepsort_yolov3_darknet53_pcb_pyramid_r101.yml
-architecture: DeepSORT
-pretrain_weights: None
-
-DeepSORT:
-  detector: YOLOv3 # General version
-  reid: PCBPyramid
-  tracker: DeepSORTTracker
-
-PCBPyramid:
-  num_conv_out_channels: 128
-  num_classes: 751
-
-DeepSORTTracker:
-  budget: 100
-  max_age: 70
-  n_init: 3
-  metric_type: cosine
-  matching_threshold: 0.2
-  max_iou_distance: 0.9
-  motion: KalmanFilter
-
-
-# General version YOLOv3
-# Using BBoxPostProcess and the bboxes output are scaled to the original image.
+# This config is the same as '../../../pedestrian/pedestrian_yolov3_darknet.yml'.
+_BASE_: [
+  '../../../yolov3/yolov3_darknet53_270e_coco.yml',
+]
+weights: https://paddledet.bj.bcebos.com/models/mot/deepsort/pedestrian_yolov3_darknet.pdparams
+num_classes: 1
+
+# This pedestrian training dataset used is not open temporarily.
+# Only the trained yolov3 model is provided, but you can eval on MOT17 half val dataset.
+TrainDataset:
+  !COCODataSet
+    dataset_dir: dataset/pedestrian
+    anno_path: annotations/instances_train2017.json
+    image_dir: train2017
+    data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
+
+EvalDataset:
+  !COCODataSet
+    dataset_dir: dataset/mot/MOT17/images
+    anno_path: annotations/val_half.json
+    image_dir: train
+
+
+# detector configuration
+architecture: YOLOv3
+pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/DarkNet53_pretrained.pdparams
+norm_type: sync_bn
+
 YOLOv3:
  backbone: DarkNet
  neck: YOLOv3FPN
  yolo_head: YOLOv3Head
  post_process: BBoxPostProcess

-norm_type: sync_bn
-
 DarkNet:
  depth: 53
  return_idx: [2, 3, 4]
@@ -44,6 +46,11 @@ YOLOv3Head:
  anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
  loss: YOLOv3Loss

+YOLOv3Loss:
+  ignore_thresh: 0.7
+  downsample: [32, 16, 8]
+  label_smooth: false
+
 BBoxPostProcess:
  decode:
    name: YOLOBox

--- a/configs/mot/deepsort/reid/README.md
+++ b/configs/mot/deepsort/reid/README.md
+English | [简体中文](README_cn.md)
+
+# ReID of DeepSORT
+
+## Introduction
+[DeepSORT](https://arxiv.org/abs/1812.00442)(Deep Cosine Metric Learning SORT) is composed of detector and ReID model in series. Several common ReID models are provided here for the configs of DeepSORT as a reference.
+
+## Model Zoo
+
+### Results on Market1501 pedestrian ReID dataset
+
+| Backbone        | Model                   |   Params   |   FPS     |    mAP    |   Top1    |   Top5    | download  |  config   |
+| :-------------: |  :-----------------:    | :-------: |  :------: | :-------: | :-------: | :-------: | :-------: | :-------: |
+| ResNet-101      |  PCB Pyramid Embedding  |  289M     |   ---     |   86.31   |   94.95   |   98.28   | [download](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams)   |   [config](./deepsort_pcb_pyramid_r101.yml)     |
+| PPLCNet-2.5x    |  PPLCNet Embedding      |  36M      |   ---     |   71.59   |   87.38   |   95.49   | [download](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams)   |   [config](./deepsort_pplcnet.yml)     |
+
+### Results on VERI-Wild vehicle ReID dataset
+
+| Backbone        | Model                   |  Params   |   FPS     |    mAP    |   Top1    |   Top5    | download  |  config   |
+| :-------------: |  :-----------------:    | :-------: |  :------: | :-------: | :-------: | :-------: | :-------: | :-------: |
+| PPLCNet-2.5x    |  PPLCNet Embedding      |  93M      |   ---     |   82.44   |   93.54   |   98.53   | [download](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet_vehicle.pdparams)   |   [config](./deepsort_pplcnet_vehicle.yml)     |
+
+**Notes:**
+  - ReID models are provided by [PaddleClas](https://github.com/PaddlePaddle/PaddleClas), the specific training process and code will be published by PaddleClas.
+  - For pedestrian tracking, please use the **Market1501** pedestrian ReID model in combination with a pedestrian detector.
+  - For vehicle tracking, please use the **VERI-Wild** vehicle ReID model in combination with a vehicle detector.
--- a/configs/mot/deepsort/reid/README_cn.md
+++ b/configs/mot/deepsort/reid/README_cn.md
+简体中文 | [English](README.md)
+
+# DeepSORT的ReID模型
+
+## 简介
+[DeepSORT](https://arxiv.org/abs/1812.00442)(Deep Cosine Metric Learning SORT) 由检测器和ReID模型串联组合而成，此处提供了几个常用ReID模型的配置作为DeepSORT使用的参考。
+
+## 模型库
+
+### 在Market1501行人重识别数据集上的结果
+
+| 骨架网络         | 网络类型                  |  Params   |   FPS     |    mAP    |   Top1    |   Top5    |  下载链接  |   配置文件 |
+| :-------------: |  :-----------------:    | :-------: |  :------: | :-------: | :-------: | :-------: | :-------: | :-------: |
+| ResNet-101      |  PCB Pyramid Embedding  |  289M     |   ---     |   86.31   |   94.95   |   98.28   | [下载链接](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams)   |   [配置文件](./deepsort_pcb_pyramid_r101.yml)     |
+| PPLCNet-2.5x    |  PPLCNet Embedding      |  36M      |   ---     |   71.59   |   87.38   |   95.49   | [下载链接](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams)   |   [配置文件](./deepsort_pplcnet.yml)     |
+
+### 在VERI-Wild车辆重识别数据集上的结果
+
+| 骨架网络         | 网络类型                  |  Params   |   FPS     |    mAP    |   Top1    |   Top5    |  下载链接  |   配置文件 |
+| :-------------: |  :-----------------:    | :-------: |  :------: | :-------: | :-------: | :-------: | :-------: | :-------: |
+| PPLCNet-2.5x    |  PPLCNet Embedding      |  93M      |   ---     |   82.44   |   93.54   |   98.53   | [下载链接](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet_vehicle.pdparams)   |   [配置文件](./deepsort_pplcnet_vehicle.yml)     |
+
+**注意:**
+  - ReID模型由[PaddleClas](https://github.com/PaddlePaddle/PaddleClas)提供，具体训练流程和代码待PaddleClas公布.
+  - 行人跟踪请用**Market1501**行人重识别数据集训练的ReID模型结合行人检测器去使用。
+  - 车辆跟踪请用**VERI-Wild**车辆重识别数据集训练的ReID模型结合车辆检测器去使用。
--- a/configs/mot/deepsort/reid/deepsort_pcb_pyramid_r101.yml
+++ b/configs/mot/deepsort/reid/deepsort_pcb_pyramid_r101.yml
+# This config represents a ReID only configuration of DeepSORT, it has two uses.
+# One is used for loading the detection results and ReID model to get tracking results;
+# Another is used for exporting the ReID model to deploy infer.
+
+_BASE_: [
+  '../../../datasets/mot.yml',
+  '../../../runtime.yml',
+  '../_base_/deepsort_reader_1088x608.yml',
+]
+
+EvalMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot
+    data_root: MOT16/images/train
+    keep_ori_im: True # set as True in DeepSORT
+
+det_weights: None
+reid_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams
+
+
+# A ReID only configuration of DeepSORT, detector should be None.
+architecture: DeepSORT
+pretrain_weights: None
+
+DeepSORT:
+  detector: None
+  reid: PCBPyramid
+  tracker: DeepSORTTracker
+
+PCBPyramid:
+  num_conv_out_channels: 128
+  num_classes: 751 # default 751 classes in Market-1501 dataset.
+
+DeepSORTTracker:
+  input_size: [64, 192]
+  min_box_area: 0     # 0 means no need to filter out too small boxes
+  vertical_ratio: -1  # -1 means no need to filter out bboxes, usuallly set 1.6 for pedestrian
+  budget: 100
+  max_age: 70
+  n_init: 3
+  metric_type: cosine
+  matching_threshold: 0.2
+  max_iou_distance: 0.9
+  motion: KalmanFilter
--- a/configs/mot/deepsort/reid/deepsort_pplcnet.yml
+++ b/configs/mot/deepsort/reid/deepsort_pplcnet.yml
+# This config represents a ReID only configuration of DeepSORT, it has two uses.
+# One is used for loading the detection results and ReID model to get tracking results;
+# Another is used for exporting the ReID model to deploy infer.
+
+_BASE_: [
+  '../../../datasets/mot.yml',
+  '../../../runtime.yml',
+  '../_base_/deepsort_reader_1088x608.yml',
+]
+
+EvalMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot
+    data_root: MOT16/images/train
+    keep_ori_im: True # set as True in DeepSORT
+
+det_weights: None
+reid_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort_pplcnet.pdparams
+
+
+# A ReID only configuration of DeepSORT, detector should be None.
+architecture: DeepSORT
+pretrain_weights: None
+
+DeepSORT:
+  detector: None
+  reid: PPLCNetEmbedding
+  tracker: DeepSORTTracker
+
+PPLCNetEmbedding:
+  input_ch: 1280
+  output_ch: 512
+
+DeepSORTTracker:
+  input_size: [64, 192]
+  min_box_area: 0 # filter out too small boxes
+  vertical_ratio: -1 # filter out bboxes, usuallly set 1.6 for pedestrian
+  budget: 100
+  max_age: 70
+  n_init: 3
+  metric_type: cosine
+  matching_threshold: 0.2
+  max_iou_distance: 0.9
+  motion: KalmanFilter
--- a/configs/mot/deepsort/reid/deepsort_pplcnet_vehicle.yml
+++ b/configs/mot/deepsort/reid/deepsort_pplcnet_vehicle.yml
+# This config represents a ReID only configuration of DeepSORT, it has two uses.
+# One is used for loading the detection results and ReID model to get tracking results;
+# Another is used for exporting the ReID model to deploy infer.
+
+_BASE_: [
+  '../../../datasets/mot.yml',
+  '../../../runtime.yml',
+  '../_base_/deepsort_reader_1088x608.yml',
+]
+
+EvalMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot
+    data_root: kitti_vehicle/images/train
+    keep_ori_im: True # set as True in DeepSORT
+
+det_weights: None
+reid_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort_pplcnet_vehicle.pdparams
+
+
+# A ReID only configuration of DeepSORT, detector should be None.
+architecture: DeepSORT
+pretrain_weights: None
+
+DeepSORT:
+  detector: None
+  reid: PPLCNetEmbedding
+  tracker: DeepSORTTracker
+
+PPLCNetEmbedding:
+  input_ch: 1280
+  output_ch: 512
+
+DeepSORTTracker:
+  input_size: [64, 192]
+  min_box_area: 0     # 0 means no need to filter out too small boxes
+  vertical_ratio: -1  # -1 means no need to filter out bboxes, usuallly set 1.6 for pedestrian
+  budget: 100
+  max_age: 70
+  n_init: 3
+  metric_type: cosine
+  matching_threshold: 0.2
+  max_iou_distance: 0.9
+  motion: KalmanFilter
--- a/deploy/python/mot_sde_infer.py
+++ b/deploy/python/mot_sde_infer.py
@@ -23,7 +23,6 @@ from preprocess import preprocess
 from tracker import DeepSORTTracker
 from ppdet.modeling.mot import visualization as mot_vis
 from ppdet.modeling.mot.utils import Timer as MOTTimer
-from ppdet.modeling.mot.utils import Detection

 from paddle.inference import Config
 from paddle.inference import create_predictor
@@ -71,7 +70,11 @@ def clip_box(xyxy, input_shape, im_shape, scale_factor):
    img0_shape = [int(im_shape[0] / ratio), int(im_shape[1] / ratio)]
    xyxy[:, 0::2] = np.clip(xyxy[:, 0::2], a_min=0, a_max=img0_shape[1])
    xyxy[:, 1::2] = np.clip(xyxy[:, 1::2], a_min=0, a_max=img0_shape[0])
-    return xyxy
+    w = xyxy[:, 2:3] - xyxy[:, 0:1]
+    h = xyxy[:, 3:4] - xyxy[:, 1:2]
+    mask = np.logical_and(h > 0, w > 0)
+    keep_idx = np.nonzero(mask)
+    return xyxy[keep_idx[0]], keep_idx


 def preprocess_reid(imgs,
@@ -137,19 +140,33 @@ class SDE_Detector(Detector):

    def postprocess(self, boxes, input_shape, im_shape, scale_factor, threshold,
                    scaled):
+        over_thres_idx = np.nonzero(boxes[:, 1:2] >= threshold)[0]
+        if len(over_thres_idx) == 0:
+            pred_dets = np.zeros((1, 6), dtype=np.float32)
+            pred_xyxys = np.zeros((1, 4), dtype=np.float32)
+            return pred_dets, pred_xyxys
+
        if not scaled:
-            # postprocess output of jde yolov3 detector
+            # scaled means whether the coords after detector outputs
+            # have been scaled back to the original image, set True 
+            # in general detector, set False in JDE YOLOv3.
            pred_bboxes = scale_coords(boxes[:, 2:], input_shape, im_shape,
                                       scale_factor)
-            pred_bboxes = clip_box(pred_bboxes, input_shape, im_shape,
-                                   scale_factor)
        else:
-            # postprocess output of general detector
            pred_bboxes = boxes[:, 2:]

-        pred_scores = boxes[:, 1:2]
-        keep_mask = pred_scores[:, 0] >= threshold
-        return pred_bboxes[keep_mask], pred_scores[keep_mask]
+        pred_xyxys, keep_idx = clip_box(pred_bboxes, input_shape, im_shape,
+                                        scale_factor)
+        pred_scores = boxes[:, 1:2][keep_idx[0]]
+        pred_cls_ids = boxes[:, 0:1][keep_idx[0]]
+        pred_tlwhs = np.concatenate(
+            (pred_xyxys[:, 0:2], pred_xyxys[:, 2:4] - pred_xyxys[:, 0:2] + 1),
+            axis=1)
+
+        pred_dets = np.concatenate(
+            (pred_tlwhs, pred_scores, pred_cls_ids), axis=1)
+
+        return pred_dets[over_thres_idx], pred_xyxys[over_thres_idx]

    def predict(self, image, scaled, threshold=0.5, warmup=0, repeats=1):
        '''
@@ -159,13 +176,12 @@ class SDE_Detector(Detector):
            scaled (bool): whether the coords after detector outputs are scaled,
                default False in jde yolov3, set True in general detector.
        Returns:
-            pred_bboxes, pred_scores (np.ndarray)
+            pred_dets (np.ndarray, [N, 6])
        '''
        self.det_times.preprocess_time_s.start()
        inputs = self.preprocess(image)
        self.det_times.preprocess_time_s.end()

-        pred_bboxes, pred_scores = None, None
        input_names = self.predictor.get_input_names()
        for i in range(len(input_names)):
            input_tensor = self.predictor.get_input_handle(input_names[i])
@@ -186,14 +202,20 @@ class SDE_Detector(Detector):
        self.det_times.inference_time_s.end(repeats=repeats)

        self.det_times.postprocess_time_s.start()
-        input_shape = inputs['image'].shape[2:]
-        im_shape = inputs['im_shape']
-        scale_factor = inputs['scale_factor']
-        pred_bboxes, pred_scores = self.postprocess(
-            boxes, input_shape, im_shape, scale_factor, threshold, scaled)
+        if len(boxes) == 0:
+            pred_dets = np.zeros((1, 6), dtype=np.float32)
+            pred_xyxys = np.zeros((1, 4), dtype=np.float32)
+        else:
+            input_shape = inputs['image'].shape[2:]
+            im_shape = inputs['im_shape']
+            scale_factor = inputs['scale_factor']
+
+            pred_dets, pred_xyxys = self.postprocess(
+                boxes, input_shape, im_shape, scale_factor, threshold, scaled)
+
        self.det_times.postprocess_time_s.end()
        self.det_times.img_num += 1
-        return pred_bboxes, pred_scores
+        return pred_dets, pred_xyxys


 class SDE_ReID(object):
@@ -227,34 +249,57 @@ class SDE_ReID(object):
        self.cpu_mem, self.gpu_mem, self.gpu_util = 0, 0, 0
        self.batch_size = batch_size
        assert pred_config.tracker, "Tracking model should have tracker"
-        self.tracker = DeepSORTTracker()
+        pt = pred_config.tracker
+        max_age = pt['max_age'] if 'max_age' in pt else 30
+        max_iou_distance = pt[
+            'max_iou_distance'] if 'max_iou_distance' in pt else 0.7
+        self.tracker = DeepSORTTracker(
+            max_age=max_age, max_iou_distance=max_iou_distance)
+
+    def get_crops(self, xyxy, ori_img):
+        w, h = self.tracker.input_size
+        self.det_times.preprocess_time_s.start()
+        crops = []
+        xyxy = xyxy.astype(np.int64)
+        ori_img = ori_img.transpose(1, 0, 2)  # [h,w,3]->[w,h,3]
+        for i, bbox in enumerate(xyxy):
+            crop = ori_img[bbox[0]:bbox[2], bbox[1]:bbox[3], :]
+            crops.append(crop)
+        crops = preprocess_reid(crops, w, h)
+        self.det_times.preprocess_time_s.end()
+
+        return crops

    def preprocess(self, crops):
+        # to keep fast speed, only use topk crops
        crops = crops[:self.batch_size]
        inputs = {}
        inputs['crops'] = np.array(crops).astype('float32')
        return inputs

-    def postprocess(self, bbox_tlwh, pred_scores, features):
-        detections = [
-            Detection(tlwh, score, feat)
-            for tlwh, score, feat in zip(bbox_tlwh, pred_scores, features)
-        ]
-        self.tracker.predict()
-        online_targets = self.tracker.update(detections)
-
-        online_tlwhs = []
-        online_scores = []
-        online_ids = []
-        for track in online_targets:
-            if not track.is_confirmed() or track.time_since_update > 1:
+    def postprocess(self, pred_dets, pred_embs):
+        tracker = self.tracker
+        tracker.predict()
+        online_targets = tracker.update(pred_dets, pred_embs)
+
+        online_tlwhs, online_scores, online_ids = [], [], []
+        for t in online_targets:
+            if not t.is_confirmed() or t.time_since_update > 1:
                continue
-            online_tlwhs.append(track.to_tlwh())
-            online_scores.append(1.0)
-            online_ids.append(track.track_id)
+            tlwh = t.to_tlwh()
+            tscore = t.score
+            tid = t.track_id
+            if tlwh[2] * tlwh[3] <= tracker.min_box_area: continue
+            if tracker.vertical_ratio > 0 and tlwh[2] / tlwh[
+                    3] > tracker.vertical_ratio:
+                continue
+            online_tlwhs.append(tlwh)
+            online_scores.append(tscore)
+            online_ids.append(tid)
+
        return online_tlwhs, online_scores, online_ids

-    def predict(self, crops, bbox_tlwh, pred_scores, warmup=0, repeats=1):
+    def predict(self, crops, pred_dets, warmup=0, repeats=1):
        self.det_times.preprocess_time_s.start()
        inputs = self.preprocess(crops)
        self.det_times.preprocess_time_s.end()
@@ -268,49 +313,31 @@ class SDE_ReID(object):
            self.predictor.run()
            output_names = self.predictor.get_output_names()
            feature_tensor = self.predictor.get_output_handle(output_names[0])
-            features = feature_tensor.copy_to_cpu()
+            pred_embs = feature_tensor.copy_to_cpu()

        self.det_times.inference_time_s.start()
        for i in range(repeats):
            self.predictor.run()
            output_names = self.predictor.get_output_names()
            feature_tensor = self.predictor.get_output_handle(output_names[0])
-            features = feature_tensor.copy_to_cpu()
+            pred_embs = feature_tensor.copy_to_cpu()
        self.det_times.inference_time_s.end(repeats=repeats)

        self.det_times.postprocess_time_s.start()
-        online_tlwhs, online_scores, online_ids = self.postprocess(
-            bbox_tlwh, pred_scores, features)
+        online_tlwhs, online_scores, online_ids = self.postprocess(pred_dets,
+                                                                   pred_embs)
        self.det_times.postprocess_time_s.end()
        self.det_times.img_num += 1
-        return online_tlwhs, online_scores, online_ids

-    def get_crops(self, xyxy, ori_img, pred_scores, w, h):
-        self.det_times.preprocess_time_s.start()
-        crops = []
-        keep_scores = []
-        xyxy = xyxy.astype(np.int64)
-        ori_img = ori_img.transpose(1, 0, 2)  # [h,w,3]->[w,h,3]
-        for i, bbox in enumerate(xyxy):
-            if bbox[2] <= bbox[0] or bbox[3] <= bbox[1]:
-                continue
-            crop = ori_img[bbox[0]:bbox[2], bbox[1]:bbox[3], :]
-            crops.append(crop)
-            keep_scores.append(pred_scores[i])
-        if len(crops) == 0:
-            return [], []
-        crops = preprocess_reid(crops, w, h)
-        self.det_times.preprocess_time_s.end()
-        return crops, keep_scores
+        return online_tlwhs, online_scores, online_ids


 def predict_image(detector, reid_model, image_list):
-    results = []
    image_list.sort()
    for i, img_file in enumerate(image_list):
        frame = cv2.imread(img_file)
        if FLAGS.run_benchmark:
-            pred_bboxes, pred_scores = detector.predict(
+            pred_dets, pred_xyxys = detector.predict(
                [frame], FLAGS.scaled, FLAGS.threshold, warmup=10, repeats=10)
            cm, gm, gu = get_current_memory_mb()
            detector.cpu_mem += cm
@@ -318,34 +345,33 @@ def predict_image(detector, reid_model, image_list):
            detector.gpu_util += gu
            print('Test iter {}, file name:{}'.format(i, img_file))
        else:
-            pred_bboxes, pred_scores = detector.predict([frame], FLAGS.scaled,
-                                                        FLAGS.threshold)
+            pred_dets, pred_xyxys = detector.predict([frame], FLAGS.scaled,
+                                                     FLAGS.threshold)

-        # process
-        bbox_tlwh = np.concatenate(
-            (pred_bboxes[:, 0:2],
-             pred_bboxes[:, 2:4] - pred_bboxes[:, 0:2] + 1),
-            axis=1)
-        crops, pred_scores = reid_model.get_crops(
-            pred_bboxes, frame, pred_scores, w=64, h=192)
-        if len(crops) == 0:
-            continue
-        if FLAGS.run_benchmark:
-            online_tlwhs, online_scores, online_ids = reid_model.predict(
-                crops, bbox_tlwh, pred_scores, warmup=10, repeats=10)
+        if len(pred_dets) == 1 and sum(pred_dets) == 0:
+            print('Frame {} has no object, try to modify score threshold.'.
+                  format(i))
+            online_im = frame
        else:
-            online_tlwhs, online_scores, online_ids = reid_model.predict(
-                crops, bbox_tlwh, pred_scores)
-            online_im = mot_vis.plot_tracking(
-                frame, online_tlwhs, online_ids, online_scores, frame_id=i)
+            # reid process
+            crops = reid_model.get_crops(pred_xyxys, frame)
+
+            if FLAGS.run_benchmark:
+                online_tlwhs, online_scores, online_ids = reid_model.predict(
+                    crops, pred_dets, warmup=10, repeats=10)
+            else:
+                online_tlwhs, online_scores, online_ids = reid_model.predict(
+                    crops, pred_dets)
+                online_im = mot_vis.plot_tracking(
+                    frame, online_tlwhs, online_ids, online_scores, frame_id=i)

-            if FLAGS.save_images:
-                if not os.path.exists(FLAGS.output_dir):
-                    os.makedirs(FLAGS.output_dir)
-                img_name = os.path.split(img_file)[-1]
-                out_path = os.path.join(FLAGS.output_dir, img_name)
-                cv2.imwrite(out_path, online_im)
-                print("save result to: " + out_path)
+        if FLAGS.save_images:
+            if not os.path.exists(FLAGS.output_dir):
+                os.makedirs(FLAGS.output_dir)
+            img_name = os.path.split(img_file)[-1]
+            out_path = os.path.join(FLAGS.output_dir, img_name)
+            cv2.imwrite(out_path, online_im)
+            print("save result to: " + out_path)


 def predict_video(detector, reid_model, camera_id):
@@ -376,29 +402,32 @@ def predict_video(detector, reid_model, camera_id):
        if not ret:
            break
        timer.tic()
-        pred_bboxes, pred_scores = detector.predict([frame], FLAGS.scaled,
-                                                    FLAGS.threshold)
-        timer.toc()
-        bbox_tlwh = np.concatenate(
-            (pred_bboxes[:, 0:2],
-             pred_bboxes[:, 2:4] - pred_bboxes[:, 0:2] + 1),
-            axis=1)
-        crops, pred_scores = reid_model.get_crops(
-            pred_bboxes, frame, pred_scores, w=64, h=192)
-        if len(crops) == 0:
-            continue
-        online_tlwhs, online_scores, online_ids = reid_model.predict(
-            crops, bbox_tlwh, pred_scores)
-
-        results.append((frame_id + 1, online_tlwhs, online_scores, online_ids))
-        fps = 1. / timer.average_time
-        im = mot_vis.plot_tracking(
-            frame,
-            online_tlwhs,
-            online_ids,
-            online_scores,
-            frame_id=frame_id,
-            fps=fps)
+        pred_dets, pred_xyxys = detector.predict([frame], FLAGS.scaled,
+                                                 FLAGS.threshold)
+
+        if len(pred_dets) == 1 and sum(pred_dets) == 0:
+            print('Frame {} has no object, try to modify score threshold.'.
+                  format(frame_id))
+            timer.toc()
+            im = frame
+        else:
+            # reid process
+            crops = reid_model.get_crops(pred_xyxys, frame)
+            online_tlwhs, online_scores, online_ids = reid_model.predict(
+                crops, pred_dets)
+            results.append(
+                (frame_id + 1, online_tlwhs, online_scores, online_ids))
+            timer.toc()
+
+            fps = 1. / timer.average_time
+            im = mot_vis.plot_tracking(
+                frame,
+                online_tlwhs,
+                online_ids,
+                online_scores,
+                frame_id=frame_id,
+                fps=fps)
+
        if FLAGS.save_images:
            save_dir = os.path.join(FLAGS.output_dir, video_name.split('.')[-2])
            if not os.path.exists(save_dir):
@@ -417,19 +446,22 @@ def predict_video(detector, reid_model, camera_id):
            # First few frames, the model may have no tracking results but have
            # detection results，use the detection results instead, and set id -1.
            if results[-1][2] == []:
-                tlwhs = [tlwh for tlwh in bbox_tlwh]
-                scores = [score[0] for score in pred_scores]
-                result = (frame_id + 1, tlwhs, scores, [-1] * len(tlwhs))
+                tlwhs = [tlwh for tlwh in pred_dets[:, :4]]
+                scores = [score[0] for score in pred_dets[:, 4:5]]
+                ids = [-1] * len(tlwhs)
+                result = (frame_id + 1, tlwhs, scores, ids)
            else:
                result = results[-1]
            write_mot_results(result_filename, [result])

        frame_id += 1
-        print('detect frame: %d' % (frame_id))
+        print('detect frame:%d' % (frame_id))
+
        if camera_id != -1:
            cv2.imshow('Tracking Detection', im)
            if cv2.waitKey(1) & 0xFF == ord('q'):
                break
+
    if FLAGS.save_mot_txts:
        result_filename = os.path.join(FLAGS.output_dir,
                                       video_name.split('.')[-2] + '.txt')

--- a/deploy/python/tracker/deepsort_tracker.py
+++ b/deploy/python/tracker/deepsort_tracker.py
@@ -20,6 +20,7 @@ from ppdet.modeling.mot.motion import KalmanFilter
 from ppdet.modeling.mot.matching.deepsort_matching import NearestNeighborDistanceMetric
 from ppdet.modeling.mot.matching.deepsort_matching import iou_cost, min_cost_matching, matching_cascade, gate_cost_matrix
 from ppdet.modeling.mot.tracker.base_sde_tracker import Track
+from ppdet.modeling.mot.utils import Detection

 __all__ = ['DeepSORTTracker']

@@ -29,7 +30,12 @@ class DeepSORTTracker(object):
    DeepSORT tracker

    Args:
-        img_size (list): input image size, [h, w]
+        input_size (list): input feature map size to reid model, [h, w] format,
+            [64, 192] as default.
+        min_box_area (int): min box area to filter out low quality boxes
+        vertical_ratio (float): w/h, the vertical ratio of the bbox to filter
+            bad results, set 1.6 default for pedestrian tracking. If set <=0
+            means no need to filter bboxes.
        budget (int): If not None, fix samples per class to at most this number.
            Removes the oldest samples when the budget is reached.
        max_age (int): maximum number of missed misses before a track is deleted
@@ -46,15 +52,19 @@ class DeepSORTTracker(object):
    """

    def __init__(self,
-                 img_size=[608, 1088],
+                 input_size=[64, 192],
+                 min_box_area=0,
+                 vertical_ratio=-1,
                 budget=100,
-                 max_age=30,
+                 max_age=70,
                 n_init=3,
                 metric_type='cosine',
                 matching_threshold=0.2,
-                 max_iou_distance=0.7,
+                 max_iou_distance=0.9,
                 motion='KalmanFilter'):
-        self.img_size = img_size
+        self.input_size = input_size
+        self.min_box_area = min_box_area
+        self.vertical_ratio = vertical_ratio
        self.max_age = max_age
        self.n_init = n_init
        self.metric = NearestNeighborDistanceMetric(metric_type,
@@ -73,13 +83,23 @@ class DeepSORTTracker(object):
        for track in self.tracks:
            track.predict(self.motion)

-    def update(self, detections):
+    def update(self, pred_dets, pred_embs):
        """
-        Perform measurement update and track management.
-        Args:
-            detections (list): List[ppdet.modeling.mot.utils.Detection]
-            A list of detections at the current time step.
+            pred_dets (Tensor): Detection results of the image, shape is [N, 6].
+            pred_embs (Tensor): Embedding results of the image, shape is [N, 128],
+                usually pred_embs.shape[1] can be a multiple of 128, in PCB
+                Pyramidal model is 128*21.
        """
+        pred_tlwhs = pred_dets[:, :4]
+        pred_scores = pred_dets[:, 4:5]
+        pred_cls_ids = pred_dets[:, 5:]
+
+        detections = [
+            Detection(tlwh, score, feat, cls_id)
+            for tlwh, score, feat, cls_id in zip(pred_tlwhs, pred_scores,
+                                                 pred_embs, pred_cls_ids)
+        ]
+
        # Run matching cascade.
        matches, unmatched_tracks, unmatched_detections = \
            self._match(detections)
@@ -154,5 +174,5 @@ class DeepSORTTracker(object):
        mean, covariance = self.motion.initiate(detection.to_xyah())
        self.tracks.append(
            Track(mean, covariance, self._next_id, self.n_init, self.max_age,
-                  detection.feature))
+                  detection.cls_id, detection.score, detection.feature))
        self._next_id += 1
--- a/ppdet/engine/tracker.py
+++ b/ppdet/engine/tracker.py
@@ -137,8 +137,7 @@ class Tracker(object):
            pred_dets, pred_embs = self.model(data)
            online_targets = self.model.tracker.update(pred_dets, pred_embs)

-            online_tlwhs, online_ids = [], []
-            online_scores = []
+            online_tlwhs, online_scores, online_ids = [], [], []
            for t in online_targets:
                tlwh = t.tlwh
                tid = t.track_id
@@ -173,7 +172,6 @@ class Tracker(object):
                      draw_threshold=0):
        if save_dir:
            if not os.path.exists(save_dir): os.makedirs(save_dir)
-        tracker = self.model.tracker
        use_detector = False if not self.model.detector else True

        timer = Timer()
@@ -197,65 +195,90 @@ class Tracker(object):
            input_shape = data['image'].shape[2:]
            im_shape = data['im_shape']
            scale_factor = data['scale_factor']
+
+            # forward
            timer.tic()
            if not use_detector:
                dets = dets_list[frame_id]
                bbox_tlwh = paddle.to_tensor(dets['bbox'], dtype='float32')
-                pred_scores = paddle.to_tensor(dets['score'], dtype='float32')
-                if pred_scores < draw_threshold: continue
                if bbox_tlwh.shape[0] > 0:
+                    # detector outputs: pred_cls_ids, pred_scores, pred_bboxes
+                    pred_cls_ids = paddle.to_tensor(
+                        dets['cls_id'], dtype='float32').unsqueeze(1)
+                    pred_scores = paddle.to_tensor(
+                        dets['score'], dtype='float32').unsqueeze(1)
                    pred_bboxes = paddle.concat(
                        (bbox_tlwh[:, 0:2],
                         bbox_tlwh[:, 2:4] + bbox_tlwh[:, 0:2]),
                        axis=1)
                else:
-                    pred_bboxes = []
-                    pred_scores = []
+                    logger.warning(
+                        'Frame {} has not object, try to modify score threshold.'.
+                        format(frame_id))
+                    frame_id += 1
+                    continue
            else:
                outs = self.model.detector(data)
                if outs['bbox_num'] > 0:
+                    # detector outputs: pred_cls_ids, pred_scores, pred_bboxes
+                    pred_cls_ids = outs['bbox'][:, 0:1]
+                    pred_scores = outs['bbox'][:, 1:2]
                    if not scaled:
+                        # scaled means whether the coords after detector outputs
+                        # have been scaled back to the original image, set True 
+                        # in general detector, set False in JDE YOLOv3.
                        pred_bboxes = scale_coords(outs['bbox'][:, 2:],
                                                   input_shape, im_shape,
                                                   scale_factor)
                    else:
                        pred_bboxes = outs['bbox'][:, 2:]
-                    pred_scores = outs['bbox'][:, 1:2]
                else:
-                    pred_bboxes = []
-                    pred_scores = []
-
-            pred_bboxes = clip_box(pred_bboxes, input_shape, im_shape,
-                                   scale_factor)
-            bbox_tlwh = paddle.concat(
-                (pred_bboxes[:, 0:2],
-                 pred_bboxes[:, 2:4] - pred_bboxes[:, 0:2] + 1),
-                axis=1)
+                    logger.warning(
+                        'Frame {} has not object, try to modify score threshold.'.
+                        format(frame_id))
+                    frame_id += 1
+                    continue

-            crops, pred_scores = get_crops(
-                pred_bboxes, ori_image, pred_scores, w=64, h=192)
+            pred_xyxys, keep_idx = clip_box(pred_bboxes, input_shape, im_shape,
+                                            scale_factor)
+            pred_scores = paddle.gather_nd(pred_scores, keep_idx).unsqueeze(1)
+            pred_cls_ids = paddle.gather_nd(pred_cls_ids, keep_idx).unsqueeze(1)
+            pred_tlwhs = paddle.concat(
+                (pred_xyxys[:, 0:2],
+                 pred_xyxys[:, 2:4] - pred_xyxys[:, 0:2] + 1),
+                axis=1)
+            pred_dets = paddle.concat(
+                (pred_tlwhs, pred_scores, pred_cls_ids), axis=1)
+
+            tracker = self.model.tracker
+            crops = get_crops(
+                pred_xyxys,
+                ori_image,
+                w=tracker.input_size[0],
+                h=tracker.input_size[1])
            crops = paddle.to_tensor(crops)
-            pred_scores = paddle.to_tensor(pred_scores)

            data.update({'crops': crops})
-            features = self.model(data)
-            features = features.numpy()
-            detections = [
-                Detection(tlwh, score, feat)
-                for tlwh, score, feat in zip(bbox_tlwh, pred_scores, features)
-            ]
-            self.model.tracker.predict()
-            online_targets = self.model.tracker.update(detections)
-
-            online_tlwhs = []
-            online_scores = []
-            online_ids = []
-            for track in online_targets:
-                if not track.is_confirmed() or track.time_since_update > 1:
+            pred_embs = self.model(data)
+
+            tracker.predict()
+            online_targets = tracker.update(pred_dets, pred_embs)
+
+            online_tlwhs, online_scores, online_ids = [], [], []
+            for t in online_targets:
+                if not t.is_confirmed() or t.time_since_update > 1:
                    continue
-                online_tlwhs.append(track.to_tlwh())
-                online_scores.append(1.0)
-                online_ids.append(track.track_id)
+                tlwh = t.to_tlwh()
+                tscore = t.score
+                tid = t.track_id
+                if tscore < draw_threshold: continue
+                if tlwh[2] * tlwh[3] <= tracker.min_box_area: continue
+                if tracker.vertical_ratio > 0 and tlwh[2] / tlwh[
+                        3] > tracker.vertical_ratio:
+                    continue
+                online_tlwhs.append(tlwh)
+                online_scores.append(tscore)
+                online_ids.append(tid)
            timer.toc()

            # save results

--- a/ppdet/modeling/mot/tracker/base_sde_tracker.py
+++ b/ppdet/modeling/mot/tracker/base_sde_tracker.py
@@ -15,6 +15,7 @@
 This code is borrow from https://github.com/nwojke/deep_sort/blob/master/deep_sort/track.py
 """

+import datetime
 from ppdet.core.workspace import register, serializable

 __all__ = ['TrackState', 'Track']
@@ -50,6 +51,8 @@ class Track(object):
            `n_init` frames.
        max_age (int): The maximum number of consecutive misses before the track
            state is set to `Deleted`.
+        cls_id (int): The category id of the tracked box.
+        score (float): The confidence score of the tracked box.
        feature (Optional[ndarray]): Feature vector of the detection this track
            originates from. If not None, this feature is added to the `features` cache.

@@ -69,6 +72,8 @@ class Track(object):
                 track_id,
                 n_init,
                 max_age,
+                 cls_id,
+                 score,
                 feature=None):
        self.mean = mean
        self.covariance = covariance
@@ -76,6 +81,9 @@ class Track(object):
        self.hits = 1
        self.age = 1
        self.time_since_update = 0
+        self.cls_id = cls_id
+        self.score = score
+        self.start_time = datetime.datetime.now()

        self.state = TrackState.Tentative
        self.features = []
@@ -117,6 +125,8 @@ class Track(object):
                                                          self.covariance,
                                                          detection.to_xyah())
        self.features.append(detection.feature)
+        self.cls_id = detection.cls_id
+        self.score = detection.score

        self.hits += 1
        self.time_since_update = 0

--- a/ppdet/modeling/mot/tracker/deepsort_tracker.py
+++ b/ppdet/modeling/mot/tracker/deepsort_tracker.py
@@ -20,6 +20,7 @@ import numpy as np
 from ..matching.deepsort_matching import NearestNeighborDistanceMetric
 from ..matching.deepsort_matching import iou_cost, min_cost_matching, matching_cascade, gate_cost_matrix
 from .base_sde_tracker import Track
+from ..utils import Detection

 from ppdet.core.workspace import register, serializable
 from ppdet.utils.logger import setup_logger
@@ -36,7 +37,12 @@ class DeepSORTTracker(object):
    DeepSORT tracker

    Args:
-        img_size (list): input image size, [h, w]
+        input_size (list): input feature map size to reid model, [h, w] format,
+            [64, 192] as default.
+        min_box_area (int): min box area to filter out low quality boxes
+        vertical_ratio (float): w/h, the vertical ratio of the bbox to filter
+            bad results, set 1.6 default for pedestrian tracking. If set <=0
+            means no need to filter bboxes.
        budget (int): If not None, fix samples per class to at most this number.
            Removes the oldest samples when the budget is reached.
        max_age (int): maximum number of missed misses before a track is deleted
@@ -53,15 +59,19 @@ class DeepSORTTracker(object):
    """

    def __init__(self,
-                 img_size=[608, 1088],
+                 input_size=[64, 192],
+                 min_box_area=0,
+                 vertical_ratio=-1,
                 budget=100,
-                 max_age=30,
+                 max_age=70,
                 n_init=3,
                 metric_type='cosine',
                 matching_threshold=0.2,
-                 max_iou_distance=0.7,
+                 max_iou_distance=0.9,
                 motion='KalmanFilter'):
-        self.img_size = img_size
+        self.input_size = input_size
+        self.min_box_area = min_box_area
+        self.vertical_ratio = vertical_ratio
        self.max_age = max_age
        self.n_init = n_init
        self.metric = NearestNeighborDistanceMetric(metric_type,
@@ -80,13 +90,25 @@ class DeepSORTTracker(object):
        for track in self.tracks:
            track.predict(self.motion)

-    def update(self, detections):
+    def update(self, pred_dets, pred_embs):
        """
        Perform measurement update and track management.
        Args:
-            detections (list): List[ppdet.modeling.mot.utils.Detection]
-            A list of detections at the current time step.
+            pred_dets (Tensor): Detection results of the image, shape is [N, 6].
+            pred_embs (Tensor): Embedding results of the image, shape is [N, 128],
+                usually pred_embs.shape[1] can be a multiple of 128, in PCB 
+                Pyramidal model is 128*21.
        """
+        pred_tlwhs = pred_dets[:, :4]
+        pred_scores = pred_dets[:, 4:5].squeeze(1)
+        pred_cls_ids = pred_dets[:, 5:].squeeze(1)
+
+        detections = [
+            Detection(tlwh, score, feat, cls_id)
+            for tlwh, score, feat, cls_id in zip(pred_tlwhs, pred_scores,
+                                                 pred_embs, pred_cls_ids)
+        ]
+
        # Run matching cascade.
        matches, unmatched_tracks, unmatched_detections = \
            self._match(detections)
@@ -161,5 +183,5 @@ class DeepSORTTracker(object):
        mean, covariance = self.motion.initiate(detection.to_xyah())
        self.tracks.append(
            Track(mean, covariance, self._next_id, self.n_init, self.max_age,
-                  detection.feature))
+                  detection.cls_id, detection.score, detection.feature))
        self._next_id += 1
--- a/ppdet/modeling/mot/utils.py
+++ b/ppdet/modeling/mot/utils.py
@@ -72,17 +72,19 @@ class Detection(object):
    This class represents a bounding box detection in a single image.

    Args:
-        tlwh (ndarray): Bounding box in format `(top left x, top left y,
+        tlwh (Tensor): Bounding box in format `(top left x, top left y,
            width, height)`.
-        confidence (ndarray): Detector confidence score.
+        score (Tensor): Bounding box confidence score.
        feature (Tensor): A feature vector that describes the object 
            contained in this image.
+        cls_id (Tensor): Bounding box category id.
    """

-    def __init__(self, tlwh, confidence, feature):
+    def __init__(self, tlwh, score, feature, cls_id):
        self.tlwh = np.asarray(tlwh, dtype=np.float32)
-        self.confidence = np.asarray(confidence, dtype=np.float32)
-        self.feature = feature
+        self.score = float(score)
+        self.feature = np.asarray(feature, dtype=np.float32)
+        self.cls_id = int(cls_id)

    def to_tlbr(self):
        """
@@ -106,15 +108,20 @@ class Detection(object):

 def load_det_results(det_file, num_frames):
    assert os.path.exists(det_file) and os.path.isfile(det_file), \
-        'Error: det_file: {} not exist or not a file.'.format(det_file)
+        '{} is not exist or not a file.'.format(det_file)
    labels = np.loadtxt(det_file, dtype='float32', delimiter=',')
+    assert labels.shape[1] == 7, \
+        "Each line of {} should have 7 items: '[frame_id],[x0],[y0],[w],[h],[score],[class_id]'.".format(det_file)
    results_list = []
-    for frame_i in range(0, num_frames):
-        results = {'bbox': [], 'score': []}
+    for frame_i in range(num_frames):
+        results = {'bbox': [], 'score': [], 'cls_id': []}
        lables_with_frame = labels[labels[:, 0] == frame_i + 1]
+        # each line of lables_with_frame:
+        # [frame_id],[x0],[y0],[w],[h],[score],[class_id]
        for l in lables_with_frame:
            results['bbox'].append(l[1:5])
            results['score'].append(l[5])
+            results['cls_id'].append(l[6])
        results_list.append(results)
    return results_list

@@ -139,26 +146,24 @@ def clip_box(xyxy, input_shape, im_shape, scale_factor):

    xyxy[:, 0::2] = paddle.clip(xyxy[:, 0::2], min=0, max=img0_shape[1])
    xyxy[:, 1::2] = paddle.clip(xyxy[:, 1::2], min=0, max=img0_shape[0])
-    return xyxy
+    w = xyxy[:, 2:3] - xyxy[:, 0:1]
+    h = xyxy[:, 3:4] - xyxy[:, 1:2]
+    mask = paddle.logical_and(h > 0, w > 0)
+    keep_idx = paddle.nonzero(mask)
+    xyxy = paddle.gather_nd(xyxy, keep_idx[:, :1])
+    return xyxy, keep_idx


-def get_crops(xyxy, ori_img, pred_scores, w, h):
+def get_crops(xyxy, ori_img, w, h):
    crops = []
-    keep_scores = []
    xyxy = xyxy.numpy().astype(np.int64)
    ori_img = ori_img.numpy()
    ori_img = np.squeeze(ori_img, axis=0).transpose(1, 0, 2)
-    pred_scores = pred_scores.numpy()
    for i, bbox in enumerate(xyxy):
-        if bbox[2] <= bbox[0] or bbox[3] <= bbox[1]:
-            continue
        crop = ori_img[bbox[0]:bbox[2], bbox[1]:bbox[3], :]
        crops.append(crop)
-        keep_scores.append(pred_scores[i])
-    if len(crops) == 0:
-        return [], []
    crops = preprocess_reid(crops, w, h)
-    return crops, keep_scores
+    return crops


 def preprocess_reid(imgs,

--- a/ppdet/modeling/reid/__init__.py
+++ b/ppdet/modeling/reid/__init__.py
@@ -13,11 +13,13 @@
 # limitations under the License.

 from . import jde_embedding_head
-from . import pyramidal_embedding
-from . import resnet
 from . import fairmot_embedding_head
+from . import resnet
+from . import pyramidal_embedding
+from . import pplcnet_embedding

+from .fairmot_embedding_head import *
 from .jde_embedding_head import *
-from .pyramidal_embedding import *
 from .resnet import *
-from .fairmot_embedding_head import *
+from .pyramidal_embedding import *
+from .pplcnet_embedding import *
--- a/ppdet/modeling/reid/pplcnet_embedding.py
+++ b/ppdet/modeling/reid/pplcnet_embedding.py
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. 
+#   
+# Licensed under the Apache License, Version 2.0 (the "License");   
+# you may not use this file except in compliance with the License.  
+# You may obtain a copy of the License at   
+#   
+#     http://www.apache.org/licenses/LICENSE-2.0    
+#   
+# Unless required by applicable law or agreed to in writing, software   
+# distributed under the License is distributed on an "AS IS" BASIS, 
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  
+# See the License for the specific language governing permissions and   
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddle.nn.initializer import Normal, Constant
+from paddle import ParamAttr
+from paddle.nn import AdaptiveAvgPool2D, BatchNorm, Conv2D, Dropout, Linear
+from paddle.regularizer import L2Decay
+from paddle.nn.initializer import KaimingNormal
+from ppdet.core.workspace import register
+
+__all__ = ['PPLCNetEmbedding']
+
+
+# Each element(list) represents a depthwise block, which is composed of k, in_c, out_c, s, use_se.
+# k: kernel_size
+# in_c: input channel number in depthwise block
+# out_c: output channel number in depthwise block
+# s: stride in depthwise block
+# use_se: whether to use SE block
+
+NET_CONFIG = {
+    "blocks2":
+    #k, in_c, out_c, s, use_se
+    [[3, 16, 32, 1, False]],
+    "blocks3": [[3, 32, 64, 2, False], [3, 64, 64, 1, False]],
+    "blocks4": [[3, 64, 128, 2, False], [3, 128, 128, 1, False]],
+    "blocks5": [[3, 128, 256, 2, False], [5, 256, 256, 1, False],
+                [5, 256, 256, 1, False], [5, 256, 256, 1, False],
+                [5, 256, 256, 1, False], [5, 256, 256, 1, False]],
+    "blocks6": [[5, 256, 512, 2, True], [5, 512, 512, 1, True]]
+}
+
+
+def make_divisible(v, divisor=8, min_value=None):
+    if min_value is None:
+        min_value = divisor
+    new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
+    if new_v < 0.9 * v:
+        new_v += divisor
+    return new_v
+
+
+class ConvBNLayer(nn.Layer):
+    def __init__(self,
+                 num_channels,
+                 filter_size,
+                 num_filters,
+                 stride,
+                 num_groups=1):
+        super().__init__()
+
+        self.conv = Conv2D(
+            in_channels=num_channels,
+            out_channels=num_filters,
+            kernel_size=filter_size,
+            stride=stride,
+            padding=(filter_size - 1) // 2,
+            groups=num_groups,
+            weight_attr=ParamAttr(initializer=KaimingNormal()),
+            bias_attr=False)
+
+        self.bn = BatchNorm(
+            num_filters,
+            param_attr=ParamAttr(regularizer=L2Decay(0.0)),
+            bias_attr=ParamAttr(regularizer=L2Decay(0.0)))
+        self.hardswish = nn.Hardswish()
+
+    def forward(self, x):
+        x = self.conv(x)
+        x = self.bn(x)
+        x = self.hardswish(x)
+        return x
+
+
+class DepthwiseSeparable(nn.Layer):
+    def __init__(self,
+                 num_channels,
+                 num_filters,
+                 stride,
+                 dw_size=3,
+                 use_se=False):
+        super().__init__()
+        self.use_se = use_se
+        self.dw_conv = ConvBNLayer(
+            num_channels=num_channels,
+            num_filters=num_channels,
+            filter_size=dw_size,
+            stride=stride,
+            num_groups=num_channels)
+        if use_se:
+            self.se = SEModule(num_channels)
+        self.pw_conv = ConvBNLayer(
+            num_channels=num_channels,
+            filter_size=1,
+            num_filters=num_filters,
+            stride=1)
+
+    def forward(self, x):
+        x = self.dw_conv(x)
+        if self.use_se:
+            x = self.se(x)
+        x = self.pw_conv(x)
+        return x
+
+
+class SEModule(nn.Layer):
+    def __init__(self, channel, reduction=4):
+        super().__init__()
+        self.avg_pool = AdaptiveAvgPool2D(1)
+        self.conv1 = Conv2D(
+            in_channels=channel,
+            out_channels=channel // reduction,
+            kernel_size=1,
+            stride=1,
+            padding=0)
+        self.relu = nn.ReLU()
+        self.conv2 = Conv2D(
+            in_channels=channel // reduction,
+            out_channels=channel,
+            kernel_size=1,
+            stride=1,
+            padding=0)
+        self.hardsigmoid = nn.Hardsigmoid()
+
+    def forward(self, x):
+        identity = x
+        x = self.avg_pool(x)
+        x = self.conv1(x)
+        x = self.relu(x)
+        x = self.conv2(x)
+        x = self.hardsigmoid(x)
+        x = paddle.multiply(x=identity, y=x)
+        return x
+
+
+class PPLCNet(nn.Layer):
+    """
+    PP-LCNet, see https://arxiv.org/abs/2109.15099.
+    This code is different from PPLCNet in ppdet/modeling/backbones/lcnet.py
+    or in PaddleClas, because the output is the flatten feature of last_conv.
+
+    Args:
+        scale (float): Scale ratio of channels.
+        class_expand (int): Number of channels of conv feature.
+    """
+
+    def __init__(self, scale=1.0, class_expand=1280):
+        super(PPLCNet, self).__init__()
+        self.scale = scale
+        self.class_expand = class_expand
+
+        self.conv1 = ConvBNLayer(
+            num_channels=3,
+            filter_size=3,
+            num_filters=make_divisible(16 * scale),
+            stride=2)
+
+        self.blocks2 = nn.Sequential(*[
+            DepthwiseSeparable(
+                num_channels=make_divisible(in_c * scale),
+                num_filters=make_divisible(out_c * scale),
+                dw_size=k,
+                stride=s,
+                use_se=se)
+            for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks2"])
+        ])
+
+        self.blocks3 = nn.Sequential(*[
+            DepthwiseSeparable(
+                num_channels=make_divisible(in_c * scale),
+                num_filters=make_divisible(out_c * scale),
+                dw_size=k,
+                stride=s,
+                use_se=se)
+            for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks3"])
+        ])
+
+        self.blocks4 = nn.Sequential(*[
+            DepthwiseSeparable(
+                num_channels=make_divisible(in_c * scale),
+                num_filters=make_divisible(out_c * scale),
+                dw_size=k,
+                stride=s,
+                use_se=se)
+            for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks4"])
+        ])
+
+        self.blocks5 = nn.Sequential(*[
+            DepthwiseSeparable(
+                num_channels=make_divisible(in_c * scale),
+                num_filters=make_divisible(out_c * scale),
+                dw_size=k,
+                stride=s,
+                use_se=se)
+            for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks5"])
+        ])
+
+        self.blocks6 = nn.Sequential(*[
+            DepthwiseSeparable(
+                num_channels=make_divisible(in_c * scale),
+                num_filters=make_divisible(out_c * scale),
+                dw_size=k,
+                stride=s,
+                use_se=se)
+            for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks6"])
+        ])
+
+        self.avg_pool = AdaptiveAvgPool2D(1)
+        self.last_conv = Conv2D(
+            in_channels=make_divisible(NET_CONFIG["blocks6"][-1][2] * scale),
+            out_channels=self.class_expand,
+            kernel_size=1,
+            stride=1,
+            padding=0,
+            bias_attr=False)
+        self.hardswish = nn.Hardswish()
+        self.flatten = nn.Flatten(start_axis=1, stop_axis=-1)
+
+    def forward(self, x):
+        x = self.conv1(x)
+
+        x = self.blocks2(x)
+        x = self.blocks3(x)
+        x = self.blocks4(x)
+        x = self.blocks5(x)
+        x = self.blocks6(x)
+
+        x = self.avg_pool(x)
+        x = self.last_conv(x)
+        x = self.hardswish(x)
+        x = self.flatten(x)
+        return x
+
+
+@register
+class PPLCNetEmbedding(nn.Layer):
+    """
+    PPLCNet Embedding
+
+    Args:
+        input_ch (int): Number of channels of input conv feature.
+        output_ch (int): Number of channels of output conv feature.
+    """
+    def __init__(self, scale=2.5, input_ch=1280, output_ch=512):
+        super(PPLCNetEmbedding, self).__init__()
+        self.backbone = PPLCNet(scale=scale)
+        self.neck = nn.Linear(input_ch, output_ch)
+
+    def forward(self, x):
+        feat = self.backbone(x)
+        feat_out = self.neck(feat)
+        return feat_out
--- a/ppdet/modeling/reid/pyramidal_embedding.py
+++ b/ppdet/modeling/reid/pyramidal_embedding.py
@@ -37,7 +37,8 @@ class PCBPyramid(nn.Layer):
        input_ch (int): Number of channels of the input feature.
        num_stripes (int): Number of sub-parts.
        used_levels (tuple): Whether the level is used, 1 means used.
-        num_classes (int): Number of classes for identities.
+        num_classes (int): Number of classes for identities, default 751 in
+            Market-1501 dataset.
        last_conv_stride (int): Stride of the last conv.
        last_conv_dilation (int): Dilation of the last conv.
        num_conv_out_channels (int): Number of channels of conv feature.