[MOT] add ByteTrack YOLOX-x (#5845)

* add bytetrack x configs doc

[MOT] add ByteTrack YOLOX-x (#5845)
* add bytetrack x configs doc
6f77e8ba · Feng Ni · GitHub · 756e3dca · 6f77e8ba · 6f77e8ba
10 changed file
--- a/configs/mot/bytetrack/README_cn.md
+++ b/configs/mot/bytetrack/README_cn.md
@@ -20,9 +20,12 @@
 | MOT-17 half train | YOLOv3      | 608x608 | -     |  42.7    |  49.5  |  54.8  |   -    |[配置文件](./bytetrack_yolov3.yml) |
 | MOT-17 half train | PPYOLOe     | 640x640 | -     |  52.9    |  50.4  |  59.7  |   -    |[配置文件](./bytetrack_ppyoloe.yml) |
 | MOT-17 half train | PPYOLOe     | 640x640 |PPLCNet|  52.9    |  51.7  |  58.8  |   -    |[配置文件](./bytetrack_ppyoloe_pplcnet.yml) |
+| mix_det           | YOLOX-x     | 800x1440|   -   |  61.9    |  77.3  |  71.6  |   -    |[配置文件](./bytetrack_yolox.yml) |
 **注意:**
 - 模型权重下载链接在配置文件中的```det_weights```和```reid_weights```，运行验证的命令即可自动下载。
+- **MOT17-half train**是MOT17的train序列(共7个)每个视频的前一半帧的图片和标注组成的数据集，而为了验证精度可以都用**MOT17-half val**数据集去评估，它是每个视频的后一半帧组成的，数据集可以从[此链接](https://dataset.bj.bcebos.com/mot/MOT17.zip)下载，并解压放在`dataset/mot/`文件夹下。
+- **mix_det**是MOT17、crowdhuman、Cityscapes、ETHZ组成的联合数据集，数据集整理的格式和目录可以参考[此链接](https://github.com/ifzhang/ByteTrack#data-preparation)，最终放置于`dataset/mot/`目录下。为了验证精度可以都用**MOT17-half val**数据集去评估。
 - ByteTrack的训练是单独的检测器训练MOT数据集，推理是组装跟踪器去评估MOT指标，单独的检测模型也可以评估检测指标。
 - ByteTrack的导出部署，是单独导出检测模型，再组装跟踪器运行的，参照[PP-Tracking](../../../deploy/pptracking/python/README.md)。

--- a/configs/mot/bytetrack/_base_/ht21.yml
+++ b/configs/mot/bytetrack/_base_/ht21.yml
+metric: COCO
+num_classes: 1
+# Detection Dataset for training
+TrainDataset:
+  !COCODataSet
+    image_dir: images/train
+    anno_path: annotations/train.json
+    dataset_dir: dataset/mot/HT21
+    data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
+EvalDataset:
+  !COCODataSet
+    image_dir: images/train
+    anno_path: annotations/val_half.json
+    dataset_dir: dataset/mot/HT21
+TestDataset:
+  !ImageFolder
+    dataset_dir: dataset/mot/HT21
+    anno_path: annotations/val_half.json
+# MOTDataset for MOT evaluation and inference
+EvalMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot
+    data_root: HT21/images/test
+    keep_ori_im: True # set as True in DeepSORT and ByteTrack
+TestMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot
+    keep_ori_im: True # set True if save visualization images or video
--- a/configs/mot/bytetrack/_base_/mix_det.yml
+++ b/configs/mot/bytetrack/_base_/mix_det.yml
+metric: COCO
+num_classes: 1
+# Detection Dataset for training
+TrainDataset:
+  !COCODataSet
+    image_dir: ""
+    anno_path: annotations/train.json
+    dataset_dir: dataset/mot/mix_det
+    data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
+EvalDataset:
+  !COCODataSet
+    image_dir: train
+    anno_path: annotations/val_half.json
+    dataset_dir: dataset/mot/MOT17
+TestDataset:
+  !ImageFolder
+    anno_path: annotations/val_half.json
+    dataset_dir: dataset/mot/MOT17
+# MOTDataset for MOT evaluation and inference
+EvalMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot
+    data_root: MOT17/images/half
+    keep_ori_im: True # set as True in DeepSORT and ByteTrack
+TestMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot
+    keep_ori_im: True # set True if save visualization images or video
--- a/configs/mot/bytetrack/_base_/yolox_mot_reader_800x1440.yml
+++ b/configs/mot/bytetrack/_base_/yolox_mot_reader_800x1440.yml
+input_height: &input_height 800
+input_width: &input_width 1440
+input_size: &input_size [*input_height, *input_width]
+worker_num: 4
+TrainReader:
+  sample_transforms:
+    - Decode: {}
+    - Mosaic:
+        prob: 1.0
+        input_dim: *input_size
+        degrees: [-10, 10]
+        scale: [0.1, 2.0]
+        shear: [-2, 2]
+        translate: [-0.1, 0.1]
+        enable_mixup: True
+        mixup_prob: 1.0
+        mixup_scale: [0.5, 1.5]
+    - AugmentHSV: {is_bgr: False, hgain: 5, sgain: 30, vgain: 30}
+    - PadResize: {target_size: *input_size}
+    - RandomFlip: {}
+  batch_transforms:
+    - Permute: {}
+  batch_size: 6
+  shuffle: True
+  drop_last: True
+  collate_batch: False
+  mosaic_epoch: 20
+EvalReader:
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: *input_size, keep_ratio: True}
+    - Pad: {size: *input_size, fill_value: [114., 114., 114.]}
+    - Permute: {}
+  batch_size: 8
+TestReader:
+  inputs_def:
+    image_shape: [3, 800, 1440]
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: *input_size, keep_ratio: True}
+    - Pad: {size: *input_size, fill_value: [114., 114., 114.]}
+    - Permute: {}
+  batch_size: 1
+# add MOTReader for MOT evaluation and inference, note batch_size should be 1 in MOT
+EvalMOTReader:
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: *input_size, keep_ratio: True}
+    - Pad: {size: *input_size, fill_value: [114., 114., 114.]}
+    - Permute: {}
+  batch_size: 1
+TestMOTReader:
+  inputs_def:
+    image_shape: [3, 800, 1440]
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: *input_size, keep_ratio: True}
+    - Pad: {size: *input_size, fill_value: [114., 114., 114.]}
+    - Permute: {}
+  batch_size: 1
--- a/configs/mot/bytetrack/bytetrack_yolox.yml
+++ b/configs/mot/bytetrack/bytetrack_yolox.yml
+# This config is an assembled config for ByteTrack MOT, used as eval/infer mode for MOT.
+_BASE_: [
+  'detector/yolox_x_24e_800x1440_mix_det.yml',
+  '_base_/mix_det.yml',
+  '_base_/yolox_mot_reader_800x1440.yml'
+]
+weights: output/bytetrack_yolox/model_final
+log_iter: 20
+snapshot_epoch: 2
+metric: MOT # eval/infer mode
+num_classes: 1
+architecture: ByteTrack
+pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/yolox_x_300e_coco.pdparams
+ByteTrack:
+  detector: YOLOX
+  reid: None
+  tracker: JDETracker
+det_weights: https://bj.bcebos.com/v1/paddledet/models/mot/yolox_x_24e_800x1440_mix_det.pdparams
+reid_weights: None
+depth_mult: 1.33
+width_mult: 1.25
+YOLOX:
+  backbone: CSPDarkNet
+  neck: YOLOCSPPAN
+  head: YOLOXHead
+  input_size: [800, 1440]
+  size_stride: 32
+  size_range: [18, 22] # multi-scale range [576*1024 ~ 800*1440], w/h ratio=1.8
+CSPDarkNet:
+  arch: "X"
+  return_idx: [2, 3, 4]
+  depthwise: False
+YOLOCSPPAN:
+  depthwise: False
+# Tracking requires higher quality boxes, so NMS score_threshold will be higher
+YOLOXHead:
+  l1_epoch: 20
+  depthwise: False
+  loss_weight: {cls: 1.0, obj: 1.0, iou: 5.0, l1: 1.0}
+  assigner:
+    name: SimOTAAssigner
+    candidate_topk: 10
+    use_vfl: False
+  nms:
+    name: MultiClassNMS
+    nms_top_k: 1000
+    keep_top_k: 100
+    score_threshold: 0.01
+    nms_threshold: 0.7
+    # For speed while keep high mAP, you can modify 'nms_top_k' to 1000 and 'keep_top_k' to 100, the mAP will drop about 0.1%.
+    # For high speed demo, you can modify 'score_threshold' to 0.25 and 'nms_threshold' to 0.45, but the mAP will drop a lot.
+# BYTETracker
+JDETracker:
+  use_byte: True
+  match_thres: 0.9
+  conf_thres: 0.6
+  low_conf_thres: 0.2
+  min_box_area: 100
+  vertical_ratio: 1.6 # for pedestrian
--- a/configs/mot/bytetrack/bytetrack_yolox_ht21.yml
+++ b/configs/mot/bytetrack/bytetrack_yolox_ht21.yml
+# This config is an assembled config for ByteTrack MOT, used as eval/infer mode for MOT.
+_BASE_: [
+  'detector/yolox_x_24e_800x1440_ht21.yml',
+  '_base_/ht21.yml',
+  '_base_/yolox_mot_reader_800x1440.yml'
+]
+weights: output/bytetrack_yolox_ht21/model_final
+log_iter: 20
+snapshot_epoch: 2
+metric: MOT # eval/infer mode
+num_classes: 1
+architecture: ByteTrack
+pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/yolox_x_300e_coco.pdparams
+ByteTrack:
+  detector: YOLOX
+  reid: None
+  tracker: JDETracker
+det_weights: https://bj.bcebos.com/v1/paddledet/models/mot/yolox_x_24e_800x1440_ht21.pdparams
+reid_weights: None
+depth_mult: 1.33
+width_mult: 1.25
+YOLOX:
+  backbone: CSPDarkNet
+  neck: YOLOCSPPAN
+  head: YOLOXHead
+  input_size: [800, 1440]
+  size_stride: 32
+  size_range: [18, 22] # multi-scale range [576*1024 ~ 800*1440], w/h ratio=1.8
+CSPDarkNet:
+  arch: "X"
+  return_idx: [2, 3, 4]
+  depthwise: False
+YOLOCSPPAN:
+  depthwise: False
+# Tracking requires higher quality boxes, so NMS score_threshold will be higher
+YOLOXHead:
+  l1_epoch: 20
+  depthwise: False
+  loss_weight: {cls: 1.0, obj: 1.0, iou: 5.0, l1: 1.0}
+  assigner:
+    name: SimOTAAssigner
+    candidate_topk: 10
+    use_vfl: False
+  nms:
+    name: MultiClassNMS
+    nms_top_k: 1000
+    keep_top_k: 100
+    score_threshold: 0.01
+    nms_threshold: 0.7
+    # For speed while keep high mAP, you can modify 'nms_top_k' to 1000 and 'keep_top_k' to 100, the mAP will drop about 0.1%.
+    # For high speed demo, you can modify 'score_threshold' to 0.25 and 'nms_threshold' to 0.45, but the mAP will drop a lot.
+# BYTETracker
+JDETracker:
+  use_byte: True
+  match_thres: 0.9
+  conf_thres: 0.6
+  low_conf_thres: 0.2
+  min_box_area: 0
+  vertical_ratio: 0 # 1.6 for pedestrian
--- a/configs/mot/bytetrack/detector/README_cn.md
+++ b/configs/mot/bytetrack/detector/README_cn.md
@@ -12,10 +12,12 @@
 | :-------------- | :-------------  | :--------:  | :---------: | :-----------: | :-----: | :------: | :-----: |
 | DarkNet-53      | YOLOv3          |   608X608   |   40e      |      ----     |  42.7   | [下载链接](https://paddledet.bj.bcebos.com/models/mot/deepsort/yolov3_darknet53_40e_608x608_mot17half.pdparams)  | [配置文件](./yolov3_darknet53_40e_608x608_mot17half.yml) |
 | CSPResNet       | PPYOLOe         |   640x640   |   36e       |      ----     |  52.9   | [下载链接](https://paddledet.bj.bcebos.com/models/mot/deepsort/ppyoloe_crn_l_36e_640x640_mot17half.pdparams)     | [配置文件](./ppyoloe_crn_l_36e_640x640_mot17half.yml)    |
+| CSPDarkNet       | YOLOX-x         |   800x1440   |   24e       |      ----     |  61.9   | [下载链接](https://paddledet.bj.bcebos.com/models/mot/deepsort/yolox_x_24e_800x1440_mix_det.pdparams)     | [配置文件](./yolox_x_24e_800x1440_mix_det.yml)    |
 **注意:**
-  - 以上模型均可采用**MOT17-half train**数据集训练，数据集可以从[此链接](https://dataset.bj.bcebos.com/mot/MOT17.zip)下载。
+  - 以上模型除YOLOX外采用**MOT17-half train**数据集训练，数据集可以从[此链接](https://dataset.bj.bcebos.com/mot/MOT17.zip)下载。
  - **MOT17-half train**是MOT17的train序列(共7个)每个视频的前一半帧的图片和标注组成的数据集，而为了验证精度可以都用**MOT17-half val**数据集去评估，它是每个视频的后一半帧组成的，数据集可以从[此链接](https://paddledet.bj.bcebos.com/data/mot/mot17half/annotations.zip)下载，并解压放在`dataset/mot/MOT17/images/`文件夹下。
+- YOLOX采用**mix_det**数据集，是MOT17、crowdhuman、Cityscapes、ETHZ组成的联合数据集，数据集整理的格式和目录可以参考[此链接](https://github.com/ifzhang/ByteTrack#data-preparation)，最终放置于`dataset/mot/`目录下。为了验证精度可以都用**MOT17-half val**数据集去评估。
  - 行人跟踪请使用行人检测器结合行人ReID模型。车辆跟踪请使用车辆检测器结合车辆ReID模型。
  - 用于ByteTrack跟踪时，这些模型的NMS阈值等后处理设置会与纯检测任务的设置不同。

--- a/configs/mot/bytetrack/detector/yolox_x_24e_800x1440_ht21.yml
+++ b/configs/mot/bytetrack/detector/yolox_x_24e_800x1440_ht21.yml
+# This config is an assembled config for ByteTrack MOT, used as eval/infer mode for MOT.
+_BASE_: [
+  '../../../yolox/yolox_x_300e_coco.yml',
+  '../_base_/ht21.yml',
+]
+weights: output/yolox_x_24e_800x1440_ht21/model_final
+log_iter: 20
+snapshot_epoch: 2
+# schedule configuration for fine-tuning
+epoch: 24
+LearningRate:
+  base_lr: 0.0005 # fintune
+  schedulers:
+  - !CosineDecay
+    max_epochs: 24
+    min_lr_ratio: 0.05
+    last_plateau_epochs: 4
+  - !ExpWarmup
+    epochs: 1
+OptimizerBuilder:
+  optimizer:
+    type: Momentum
+    momentum: 0.9
+    use_nesterov: True
+  regularizer:
+    factor: 0.0005
+    type: L2
+TrainReader:
+  batch_size: 4
+  mosaic_epoch: 20
+# detector configuration
+architecture: YOLOX
+pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/yolox_x_300e_coco.pdparams
+norm_type: sync_bn
+use_ema: True
+ema_decay: 0.9999
+ema_decay_type: "exponential"
+act: silu
+find_unused_parameters: True
+depth_mult: 1.33
+width_mult: 1.25
+YOLOX:
+  backbone: CSPDarkNet
+  neck: YOLOCSPPAN
+  head: YOLOXHead
+  input_size: [800, 1440]
+  size_stride: 32
+  size_range: [18, 32] # multi-scale range [576*1024 ~ 800*1440], w/h ratio=1.8
+CSPDarkNet:
+  arch: "X"
+  return_idx: [2, 3, 4]
+  depthwise: False
+YOLOCSPPAN:
+  depthwise: False
+# Tracking requires higher quality boxes, so NMS score_threshold will be higher
+YOLOXHead:
+  l1_epoch: 20
+  depthwise: False
+  loss_weight: {cls: 1.0, obj: 1.0, iou: 5.0, l1: 1.0}
+  assigner:
+    name: SimOTAAssigner
+    candidate_topk: 10
+    use_vfl: False
+  nms:
+    name: MultiClassNMS
+    nms_top_k: 1000
+    keep_top_k: 100
+    score_threshold: 0.01
+    nms_threshold: 0.7
+    # For speed while keep high mAP, you can modify 'nms_top_k' to 1000 and 'keep_top_k' to 100, the mAP will drop about 0.1%.
+    # For high speed demo, you can modify 'score_threshold' to 0.25 and 'nms_threshold' to 0.45, but the mAP will drop a lot.
--- a/configs/mot/bytetrack/detector/yolox_x_24e_800x1440_mix_det.yml
+++ b/configs/mot/bytetrack/detector/yolox_x_24e_800x1440_mix_det.yml
+# This config is an assembled config for ByteTrack MOT, used as eval/infer mode for MOT.
+_BASE_: [
+  '../../../yolox/yolox_x_300e_coco.yml',
+  '../_base_/mix_det.yml',
+]
+weights: output/yolox_x_24e_800x1440_mix_det/model_final
+log_iter: 20
+snapshot_epoch: 2
+# schedule configuration for fine-tuning
+epoch: 24
+LearningRate:
+  base_lr: 0.00075 # fintune
+  schedulers:
+  - !CosineDecay
+    max_epochs: 24
+    min_lr_ratio: 0.05
+    last_plateau_epochs: 4
+  - !ExpWarmup
+    epochs: 1
+OptimizerBuilder:
+  optimizer:
+    type: Momentum
+    momentum: 0.9
+    use_nesterov: True
+  regularizer:
+    factor: 0.0005
+    type: L2
+TrainReader:
+  batch_size: 6
+  mosaic_epoch: 20
+# detector configuration
+architecture: YOLOX
+pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/yolox_x_300e_coco.pdparams
+norm_type: sync_bn
+use_ema: True
+ema_decay: 0.9999
+ema_decay_type: "exponential"
+act: silu
+find_unused_parameters: True
+depth_mult: 1.33
+width_mult: 1.25
+YOLOX:
+  backbone: CSPDarkNet
+  neck: YOLOCSPPAN
+  head: YOLOXHead
+  input_size: [800, 1440]
+  size_stride: 32
+  size_range: [18, 30] # multi-scale range [576*1024 ~ 800*1440], w/h ratio=1.8
+CSPDarkNet:
+  arch: "X"
+  return_idx: [2, 3, 4]
+  depthwise: False
+YOLOCSPPAN:
+  depthwise: False
+# Tracking requires higher quality boxes, so NMS score_threshold will be higher
+YOLOXHead:
+  l1_epoch: 20
+  depthwise: False
+  loss_weight: {cls: 1.0, obj: 1.0, iou: 5.0, l1: 1.0}
+  assigner:
+    name: SimOTAAssigner
+    candidate_topk: 10
+    use_vfl: False
+  nms:
+    name: MultiClassNMS
+    nms_top_k: 1000
+    keep_top_k: 100
+    score_threshold: 0.01
+    nms_threshold: 0.7
+    # For speed while keep high mAP, you can modify 'nms_top_k' to 1000 and 'keep_top_k' to 100, the mAP will drop about 0.1%.
+    # For high speed demo, you can modify 'score_threshold' to 0.25 and 'nms_threshold' to 0.45, but the mAP will drop a lot.
--- a/configs/mot/headtracking21/README_cn.md
+++ b/configs/mot/headtracking21/README_cn.md
@@ -11,21 +11,22 @@
 ## 模型库
 ### FairMOT在HT-21 Training Set上结果
-|    骨干网络      |  输入尺寸 |  MOTA  |  IDF1  |  IDS  |   FP  |   FN   |   FPS   |  下载链接 | 配置文件 |
+|    模型      |  输入尺寸 |  MOTA  |  IDF1  |  IDS  |   FP  |   FN   |   FPS   |  下载链接 | 配置文件 |
 | :--------------| :------- | :----: | :----: | :---: | :----: | :---: | :------: | :----: |:----: |
-| DLA-34         | 1088x608 |  64.7 |  69.0  |   8533  |  148817  |  234970  |     -   | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_headtracking21.pdparams) | [配置文件](./fairmot_dla34_30e_1088x608_headtracking21.yml) |
+| FairMOT DLA-34  | 1088x608 |  64.7 |  69.0  |   8533  |  148817  |  234970  |     -   | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_headtracking21.pdparams) | [配置文件](./fairmot_dla34_30e_1088x608_headtracking21.yml) |
-| HRNetv2-W18    | 1088x608 |  57.2 |  58.4  |   30950 |  188260  |  256580  |     -   | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_1088x608_headtracking21.pdparams) | [配置文件](./fairmot_hrnetv2_w18_dlafpn_30e_1088x608_headtracking21.yml) |
+| ByteTrack-x     | 1440x800 |  62.2 |  59.9  |  5736   |  222583  |  191737 |    -     | [下载链接](https://paddledet.bj.bcebos.com/models/mot/bytetrack_yolox_ht21.pdparams) | [配置文件](../bytetrack/bytetrack_yolox_ht21.yml) |
 ### FairMOT在HT-21 Test Set上结果
 |    骨干网络      |  输入尺寸 |  MOTA  |  IDF1  |   IDS  |   FP   |   FN   |    FPS   |  下载链接  | 配置文件 |
 | :--------------| :------- | :----: | :----: | :----: | :----: | :----: |:-------: | :----: | :----: |
-| DLA-34         | 1088x608 |  60.8  |  62.8  |  12781   |  118109  |  198896 |    -     | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_headtracking21.pdparams) | [配置文件](./fairmot_dla34_30e_1088x608_headtracking21.yml) |
+| FairMOT DLA-34  | 1088x608 |  60.8  |  62.8  |  12781   |  118109  |  198896 |    -     | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_headtracking21.pdparams) | [配置文件](./fairmot_dla34_30e_1088x608_headtracking21.yml) |
-| HRNetv2-W18    | 1088x608 |  41.2  |  47.1  |  48809   |  241683  |  204346 |    -     | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_headtracking21.pdparams) | [配置文件](./fairmot_dla34_30e_1088x608_headtracking21.yml) |
+| ByteTrack-x         | 1440x800 |  72.6  |  61.8  |  5163   |  71235  |  154139 |    -     | [下载链接](https://paddledet.bj.bcebos.com/models/mot/bytetrack_yolox_ht21.pdparams) | [配置文件](../bytetrack/bytetrack_yolox_ht21.yml) |
 **注意:**
- - FairMOT DLA-34使用2个GPU进行训练，每个GPU上batch size为6，训练30个epoch。目前MOTA精度位于MOT官网[Head Tracking 21](https://motchallenge.net/results/Head_Tracking_21)榜单榜首。
+ - FairMOT DLA-34使用2个GPU进行训练，每个GPU上batch size为6，训练30个epoch。
- - FairMOT HRNetv2-W18使用4个GPU进行训练，每个GPU上batch size为8，训练30个epoch。
+ - ByteTrack使用YOLOX-x做检测器，使用8个GPU进行训练，每个GPU上batch size为8，训练30个epoch，具体细节参照[bytetrack](../bytetrack/)。
+ - 此处提供PaddleDetection团队整理后的[下载链接](https://bj.bcebos.com/v1/paddledet/data/mot/HT21.zip)，下载后需解压放到`dataset/mot/`目录下，HT-21 Test集的结果需要交到[官网](https://motchallenge.net)评测。
 ## 快速开始