[MOT] add JDE other scales and fix MOT doc (#3008)

2bf412c6 · George Ni · GitHub · f6139c05 · 2bf412c6 · 2bf412c6
9 changed file
--- a/configs/mot/deepsort/README.md
+++ b/configs/mot/deepsort/README.md
@@ -37,17 +37,17 @@ det_results_dir

 ```bash
 # use weights released in PaddleDetection model zoo
-CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/jde/jde_darknet53_30e_1088x608_track.yml -o metric=MOT weights=https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams --output ./det_results_dir
+CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/jde/jde_darknet53_30e_1088x608_track.yml -o metric=MOT weights=https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams

 # use saved checkpoint after training
-CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/jde/jde_darknet53_30e_1088x608_track.yml -o metric=MOT weights=output/jde_darknet53_30e_1088x608/model_final --output ./det_results_dir
+CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/jde/jde_darknet53_30e_1088x608_track.yml -o metric=MOT weights=output/jde_darknet53_30e_1088x608/model_final
 ```

 ### 2. Tracking

 ```bash
 # track the objects by loading detected result files
-CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml --det_results_dir ./det_results_dir/mot_results
+CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml --det_results_dir {your detection results}
 ```

 ## Citations

--- a/configs/mot/deepsort/README_cn.md
+++ b/configs/mot/deepsort/README_cn.md
@@ -37,17 +37,17 @@ det_results_dir

 ```bash
 # 使用PaddleDetection发布的权重
-CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/jde/jde_darknet53_30e_1088x608_track.yml -o metric=MOT weights=https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams --output ./det_results_dir
+CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/jde/jde_darknet53_30e_1088x608_track.yml -o metric=MOT weights=https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams

 # 使用训练保存的checkpoint
-CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/jde/jde_darknet53_30e_1088x608_track.yml -o metric=MOT weights=output/jde_darknet53_30e_1088x608/model_final --output ./det_results_dir
+CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/jde/jde_darknet53_30e_1088x608_track.yml -o metric=MOT weights=output/jde_darknet53_30e_1088x608/model_final
 ```

 ### 2. 跟踪预测

 ```bash
 # 加载检测结果文件得到跟踪结果
-CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml --det_results_dir ./det_results_dir/mot_results
+CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml --det_results_dir {your detection results}
 ```

 ## 引用

--- a/configs/mot/jde/README.md
+++ b/configs/mot/jde/README.md
@@ -11,7 +11,7 @@ English | [简体中文](README_cn.md)

 [Joint Detection and Embedding](https://arxiv.org/abs/1909.12605)(JDE) is a fast and high-performance multiple-object tracker that learns the object detection task and appearance embedding task simutaneously in a shared neural network.
 <div align="center">
-  <img src="../../../../docs/images/mot16_jde.gif" width=500 />
+  <img src="../../../docs/images/mot16_jde.gif" width=500 />
 </div>

 ## Model Zoo
@@ -21,6 +21,9 @@ English | [简体中文](README_cn.md)
 | backbone           | input shape | MOTA | IDF1  |  IDS  |   FP  |  FN  |  FPS  | download | config |
 | :----------------- | :------- | :----: | :----: | :---: | :----: | :---: | :---: | :---: | :---: |
 | DarkNet53          | 1088x608 |  73.2  |  69.4  | 1320  |  6613  | 21629 |   -   |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_1088x608.yml) |
+| DarkNet53          | 864x480 |  70.1  |  65.4  | 1341  |  6454  | 25208 |   -   |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_864x480.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_864x480.yml) |
+| DarkNet53          | 576x320 |  63.1  |  64.6  | 1357  |  7083  | 32312 |   -   |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_576x320.yml) |
+

 **Notes:**
 JDE used 8 GPUs for training and mini-batch size as 4 on each GPU, and trained for 30 epoches.

--- a/configs/mot/jde/README_cn.md
+++ b/configs/mot/jde/README_cn.md
@@ -12,16 +12,18 @@

 [Joint Detection and Embedding](https://arxiv.org/abs/1909.12605)(JDE) 是一个快速高性能多目标跟踪器，它是在共享神经网络中同时学习目标检测任务和外观嵌入任务的。
 <div align="center">
-  <img src="../../../../docs/images/mot16_jde.gif" width=500 />
+  <img src="../../../docs/images/mot16_jde.gif" width=500 />
 </div>

 ## 模型库与基线

 ### JDE on MOT-16 training set

-| 骨干网络            | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 检测模型 | ReID模型 | 配置文件 |
+| 骨干网络            |  输入尺寸  |  MOTA  |  IDF1 |  IDS  |  FP  |  FN  |  FPS  |  检测模型  | 配置文件 |
 | :----------------- | :------- | :----: | :----: | :---: | :----: | :---: | :---: | :---: | :---: |
 | DarkNet53          | 1088x608 |  73.2  |  69.4  | 1320  |  6613  | 21629 |   -   |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_1088x608.yml) |
+| DarkNet53          | 864x480 |  70.1  |  65.4  | 1341  |  6454  | 25208 |   -   |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_864x480.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_864x480.yml) |
+| DarkNet53          | 576x320 |  63.1  |  64.6  | 1357  |  7083  | 32312 |   -   |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_576x320.yml) |

 **Notes:**
 JDE使用8个GPU进行训练，每个GPU上batch size为4，训练了30个epoches。

--- a/configs/mot/jde/_base_/jde_reader_576x320.yml
+++ b/configs/mot/jde/_base_/jde_reader_576x320.yml
+worker_num: 2
+TrainReader:
+  sample_transforms:
+    - Decode: {}
+    - AugmentHSV: {}
+    - LetterBoxResize: {target_size: [320, 576]}
+    - MOTRandomAffine: {}
+    - RandomFlip: {}
+    - BboxXYXY2XYWH: {}
+    - NormalizeBox: {}
+    - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True}
+    - Permute: {}
+  batch_transforms:
+    - Gt2JDETargetThres:
+        anchor_masks: [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]
+        anchors: [[[85,255], [120,320], [170,320], [340,320]],
+                  [[21,64], [30,90], [43,128], [60,180]],
+                  [[6,16], [8,23], [11,32], [16,45]]]
+        downsample_ratios: [32, 16, 8]
+        ide_thresh: 0.5
+        fg_thresh: 0.5
+        bg_thresh: 0.4
+  batch_size: 4
+  shuffle: true
+  drop_last: true
+  use_shared_memory: true
+
+
+EvalReader:
+  sample_transforms:
+    - Decode: {}
+    - LetterBoxResize: {target_size: [320, 576]}
+    - BboxXYXY2XYWH: {}
+    - NormalizeBox: {}
+    - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True}
+    - Permute: {}
+  batch_transforms:
+    - Gt2JDETargetMax:
+        anchor_masks: [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]
+        anchors: [[[85,255], [120,320], [170,320], [340,320]],
+                  [[21,64], [30,90], [43,128], [60,180]],
+                  [[6,16], [8,23], [11,32], [16,45]]]
+        downsample_ratios: [32, 16, 8]
+        max_iou_thresh: 0.60
+    - BboxCXCYWH2XYXY: {}
+    - Norm2PixelBbox: {}
+  batch_size: 1
+  drop_empty: false
+
+
+TestReader:
+  inputs_def:
+    image_shape: [3, 320, 576]
+  sample_transforms:
+    - Decode: {}
+    - LetterBoxResize: {target_size: [320, 576]}
+    - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True}
+    - Permute: {}
+  batch_size: 1
+
+
+EvalMOTReader:
+  sample_transforms:
+    - Decode: {}
+    - LetterBoxResize: {target_size: [320, 576]}
+    - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True}
+    - Permute: {}
+  batch_size: 1
+
+
+TestMOTReader:
+  inputs_def:
+    image_shape: [3, 320, 576]
+  sample_transforms:
+    - Decode: {}
+    - LetterBoxResize: {target_size: [320, 576]}
+    - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True}
+    - Permute: {}
+  batch_size: 1
--- a/configs/mot/jde/_base_/jde_reader_864x480.yml
+++ b/configs/mot/jde/_base_/jde_reader_864x480.yml
+worker_num: 2
+TrainReader:
+  sample_transforms:
+    - Decode: {}
+    - AugmentHSV: {}
+    - LetterBoxResize: {target_size: [480, 864]}
+    - MOTRandomAffine: {}
+    - RandomFlip: {}
+    - BboxXYXY2XYWH: {}
+    - NormalizeBox: {}
+    - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True}
+    - Permute: {}
+  batch_transforms:
+    - Gt2JDETargetThres:
+        anchor_masks: [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]
+        anchors: [[[102,305], [143, 429], [203,508], [407,508]],
+                  [[25,76], [36,107], [51,152], [71,215]],
+                  [[6,19], [9,27], [13,38], [18,54]]]
+        downsample_ratios: [32, 16, 8]
+        ide_thresh: 0.5
+        fg_thresh: 0.5
+        bg_thresh: 0.4
+  batch_size: 4
+  shuffle: true
+  drop_last: true
+  use_shared_memory: true
+
+
+EvalReader:
+  sample_transforms:
+    - Decode: {}
+    - LetterBoxResize: {target_size: [480, 864]}
+    - BboxXYXY2XYWH: {}
+    - NormalizeBox: {}
+    - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True}
+    - Permute: {}
+  batch_transforms:
+    - Gt2JDETargetMax:
+        anchor_masks: [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]
+        anchors: [[[102,305], [143, 429], [203,508], [407,508]],
+                  [[25,76], [36,107], [51,152], [71,215]],
+                  [[6,19], [9,27], [13,38], [18,54]]]
+        downsample_ratios: [32, 16, 8]
+        max_iou_thresh: 0.60
+    - BboxCXCYWH2XYXY: {}
+    - Norm2PixelBbox: {}
+  batch_size: 1
+  drop_empty: false
+
+
+TestReader:
+  inputs_def:
+    image_shape: [3, 480, 864]
+  sample_transforms:
+    - Decode: {}
+    - LetterBoxResize: {target_size: [480, 864]}
+    - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True}
+    - Permute: {}
+  batch_size: 1
+
+
+EvalMOTReader:
+  sample_transforms:
+    - Decode: {}
+    - LetterBoxResize: {target_size: [480, 864]}
+    - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True}
+    - Permute: {}
+  batch_size: 1
+
+
+TestMOTReader:
+  inputs_def:
+    image_shape: [3, 480, 864]
+  sample_transforms:
+    - Decode: {}
+    - LetterBoxResize: {target_size: [480, 864]}
+    - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True}
+    - Permute: {}
+  batch_size: 1
--- a/configs/mot/jde/jde_darknet53_30e_576x320.yml
+++ b/configs/mot/jde/jde_darknet53_30e_576x320.yml
+_BASE_: [
+  '../../datasets/mot.yml',
+  '../../runtime.yml',
+  '_base_/optimizer_30e.yml',
+  '_base_/jde_darknet53.yml',
+  '_base_/jde_reader_576x320.yml',
+]
+weights: output/jde_darknet53_30e_576x320/model_final
+
+JDE:
+  detector: YOLOv3
+  reid: JDEEmbeddingHead
+  tracker: JDETracker
+
+YOLOv3:
+  backbone: DarkNet
+  neck: YOLOv3FPN
+  yolo_head: YOLOv3Head
+  post_process: JDEBBoxPostProcess
+  for_mot: True
+
+YOLOv3Head:
+  anchors: [[85,255], [120,320], [170,320], [340,320],
+            [21,64], [30,90], [43,128], [60,180],
+            [6,16], [8,23], [11,32], [16,45]]
+  anchor_masks: [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]
+  loss: JDEDetectionLoss
+
+JDETracker:
+  det_thresh: 0.3
+  track_buffer: 30
+  min_box_area: 200
+  motion: KalmanFilter
+
+JDEBBoxPostProcess:
+  decode:
+    name: JDEBox
+    conf_thresh: 0.5
+    downsample_ratio: 32
+  nms:
+    name: MultiClassNMS
+    keep_top_k: 500
+    score_threshold: 0.01
+    nms_threshold: 0.4
+    nms_top_k: 2000
+    normalized: true
+    return_index: true
--- a/configs/mot/jde/jde_darknet53_30e_864x480.yml
+++ b/configs/mot/jde/jde_darknet53_30e_864x480.yml
+_BASE_: [
+  '../../datasets/mot.yml',
+  '../../runtime.yml',
+  '_base_/optimizer_30e.yml',
+  '_base_/jde_darknet53.yml',
+  '_base_/jde_reader_864x480.yml',
+]
+weights: output/jde_darknet53_30e_864x480/model_final
+
+JDE:
+  detector: YOLOv3
+  reid: JDEEmbeddingHead
+  tracker: JDETracker
+
+YOLOv3:
+  backbone: DarkNet
+  neck: YOLOv3FPN
+  yolo_head: YOLOv3Head
+  post_process: JDEBBoxPostProcess
+  for_mot: True
+
+YOLOv3Head:
+  anchors: [[102,305], [143, 429], [203,508], [407,508],
+            [25,76], [36,107], [51,152], [71,215],
+            [6,19], [9,27], [13,38], [18,54]]
+  anchor_masks: [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]
+  loss: JDEDetectionLoss
+
+JDETracker:
+  det_thresh: 0.3
+  track_buffer: 30
+  min_box_area: 200
+  motion: KalmanFilter
+
+JDEBBoxPostProcess:
+  decode:
+    name: JDEBox
+    conf_thresh: 0.5
+    downsample_ratio: 32
+  nms:
+    name: MultiClassNMS
+    keep_top_k: 500
+    score_threshold: 0.01
+    nms_threshold: 0.4
+    nms_top_k: 2000
+    normalized: true
+    return_index: true
--- a/requirements.txt
+++ b/requirements.txt
@@ -13,4 +13,5 @@ setuptools>=42.0.0
 lap
 sklearn
 cython_bbox
-motmetrics
\ No newline at end of file
+motmetrics
+openpyxl
\ No newline at end of file