未验证 提交 570ec45e 编写于 作者: F Feng Ni 提交者: GitHub

cherry-pick MOT (#4668)

* cherry-pick cfg modelzoo readme

* cherry pick modeling engine source

* cherry-pick deploy python mot
上级 547f2e45
此差异已折叠。
此差异已折叠。
......@@ -120,7 +120,7 @@ Step 1:导出检测模型
CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/deepsort/detector/jde_yolov3_darknet53_30e_1088x608_mix.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/deepsort/jde_yolov3_darknet53_30e_1088x608_mix.pdparams
# 或导出PPYOLOv2行人检测模型
CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/deepsort/detector/ppyolov2_r50vd_dcn_365e_640x640_mot17half.yml -o weights=https://paddledet.bj.bcebos.com/mot/deepsort/ppyolov2_r50vd_dcn_365e_640x640_mot17half.pdparams
CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/deepsort/detector/ppyolov2_r50vd_dcn_365e_640x640_mot17half.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/deepsort/ppyolov2_r50vd_dcn_365e_640x640_mot17half.pdparams
```
Step 2:导出ReID模型
......
......@@ -128,6 +128,15 @@ python deploy/python/mot_jde_infer.py --model_dir=output_inference/fairmot_dla34
The tracking model is used to predict the video, and does not support the prediction of a single image. The visualization video of the tracking results is saved by default. You can add `--save_mot_txts` to save the txt result file, or `--save_images` to save the visualization images.
### 6. Using exported MOT and keypoint model for unite python inference
```bash
python deploy/python/mot_keypoint_unite_infer.py --mot_model_dir=output_inference/fairmot_dla34_30e_1088x608/ --keypoint_model_dir=output_inference/higherhrnet_hrnet_w32_512/ --video_file={your video name}.mp4 --device=GPU
```
**Notes:**
Keypoint model export tutorial: `configs/keypoint/README.md`.
## Citations
```
@article{zhang2020fair,
......
......@@ -125,6 +125,15 @@ python deploy/python/mot_jde_infer.py --model_dir=output_inference/fairmot_dla34
**注意:**
跟踪模型是对视频进行预测,不支持单张图的预测,默认保存跟踪结果可视化后的视频,可添加`--save_mot_txts`表示保存跟踪结果的txt文件,或`--save_images`表示保存跟踪结果可视化图片。
### 6. 用导出的跟踪和关键点模型Python联合预测
```bash
python deploy/python/mot_keypoint_unite_infer.py --mot_model_dir=output_inference/fairmot_dla34_30e_1088x608/ --keypoint_model_dir=output_inference/higherhrnet_hrnet_w32_512/ --video_file={your video name}.mp4 --device=GPU
```
**注意:**
关键点模型导出教程请参考`configs/keypoint/README.md`
## 引用
```
@article{zhang2020fair,
......
......@@ -12,7 +12,6 @@ LearningRate:
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0001
......
......@@ -9,6 +9,13 @@ norm_type: sync_bn
use_ema: true
ema_decay: 0.9998
# add crowdhuman
TrainDataset:
!MOTDataSet
dataset_dir: dataset/mot
image_lists: ['mot17.train', 'caltech.all', 'cuhksysu.train', 'prw.train', 'citypersons.train', 'eth.train', 'crowdhuman.train', 'crowdhuman.val']
data_fields: ['image', 'gt_bbox', 'gt_class', 'gt_ide']
worker_num: 4
TrainReader:
inputs_def:
......
......@@ -10,7 +10,7 @@ norm_type: sync_bn
use_ema: true
ema_decay: 0.9998
# for MOT training
# add crowdhuman
TrainDataset:
!MOTDataSet
dataset_dir: dataset/mot
......
......@@ -10,7 +10,7 @@ norm_type: sync_bn
use_ema: true
ema_decay: 0.9998
# for MOT training
# add crowdhuman
TrainDataset:
!MOTDataSet
dataset_dir: dataset/mot
......
......@@ -10,7 +10,7 @@ norm_type: sync_bn
use_ema: true
ema_decay: 0.9998
# for MOT training
# add crowdhuman
TrainDataset:
!MOTDataSet
dataset_dir: dataset/mot
......
......@@ -18,6 +18,8 @@ MCFairMOT is the Multi-class extended version of [FairMOT](https://arxiv.org/abs
| :--------------| :------- | :----: | :----: | :---: | :------: | :----: |:----: |
| DLA-34 | 1088x608 | 24.3 | 41.6 | 2314 | - |[model](https://paddledet.bj.bcebos.com/models/mot/mcfairmot_dla34_30e_1088x608_visdrone.pdparams) | [config](./mcfairmot_dla34_30e_1088x608_visdrone.yml) |
| HRNetV2-W18 | 1088x608 | 20.4 | 39.9 | 2603 | - |[model](https://paddledet.bj.bcebos.com/models/mot/mcfairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone.pdparams) | [config](./mcfairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone.yml) |
| HRNetV2-W18 | 864x480 | 18.2 | 38.7 | 2416 | - |[model](https://paddledet.bj.bcebos.com/models/mot/mcfairmot_hrnetv2_w18_dlafpn_30e_864x480_visdrone.pdparams) | [config](./mcfairmot_hrnetv2_w18_dlafpn_30e_864x480_visdrone.yml) |
| HRNetV2-W18 | 576x320 | 12.0 | 33.8 | 2178 | - |[model](https://paddledet.bj.bcebos.com/models/mot/mcfairmot_hrnetv2_w18_dlafpn_30e_576x320_visdrone.pdparams) | [config](./mcfairmot_hrnetv2_w18_dlafpn_30e_576x320_visdrone.yml) |
**Notes:**
MOTA is the average MOTA of 10 catecories in the VisDrone2019 MOT dataset, and its value is also equal to the average MOTA of all the evaluated video sequences.
......
......@@ -19,6 +19,8 @@ MCFairMOT是[FairMOT](https://arxiv.org/abs/2004.01888)的多类别扩展版本
| :--------------| :------- | :----: | :----: | :---: | :------: | :----: |:----: |
| DLA-34 | 1088x608 | 24.3 | 41.6 | 2314 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/mcfairmot_dla34_30e_1088x608_visdrone.pdparams) | [配置文件](./mcfairmot_dla34_30e_1088x608_visdrone.yml) |
| HRNetV2-W18 | 1088x608 | 20.4 | 39.9 | 2603 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/mcfairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone.pdparams) | [配置文件](./mcfairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone.yml) |
| HRNetV2-W18 | 864x480 | 18.2 | 38.7 | 2416 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/mcfairmot_hrnetv2_w18_dlafpn_30e_864x480_visdrone.pdparams) | [配置文件](./mcfairmot_hrnetv2_w18_dlafpn_30e_864x480_visdrone.yml) |
| HRNetV2-W18 | 576x320 | 12.0 | 33.8 | 2178 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/mcfairmot_hrnetv2_w18_dlafpn_30e_576x320_visdrone.pdparams) | [配置文件](./mcfairmot_hrnetv2_w18_dlafpn_30e_576x320_visdrone.yml) |
**注意:**
MOTA是VisDrone2019 MOT数据集10类目标的平均MOTA, 其值也等于所有评估的视频序列的平均MOTA。
......
_BASE_: [
'../fairmot/fairmot_hrnetv2_w18_dlafpn_30e_576x320.yml',
'../../datasets/mcmot.yml'
]
architecture: FairMOT
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/HRNet_W18_C_pretrained.pdparams
for_mot: True
FairMOT:
detector: CenterNet
reid: FairMOTEmbeddingHead
loss: FairMOTLoss
tracker: JDETracker # multi-class tracker
CenterNetHead:
regress_ltrb: False
CenterNetPostProcess:
regress_ltrb: False
max_per_img: 200
JDETracker:
min_box_area: 0
vertical_ratio: 0 # no need to filter bboxes according to w/h
conf_thres: 0.4
tracked_thresh: 0.4
metric_type: cosine
weights: output/mcfairmot_hrnetv2_w18_dlafpn_30e_576x320_visdrone/model_final
epoch: 30
LearningRate:
base_lr: 0.0005
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [10, 20]
use_warmup: False
OptimizerBuilder:
optimizer:
type: Adam
regularizer: NULL
TrainReader:
batch_size: 8
_BASE_: [
'../fairmot/fairmot_hrnetv2_w18_dlafpn_30e_864x480.yml',
'../../datasets/mcmot.yml'
]
architecture: FairMOT
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/HRNet_W18_C_pretrained.pdparams
for_mot: True
FairMOT:
detector: CenterNet
reid: FairMOTEmbeddingHead
loss: FairMOTLoss
tracker: JDETracker # multi-class tracker
CenterNetHead:
regress_ltrb: False
CenterNetPostProcess:
regress_ltrb: False
max_per_img: 200
JDETracker:
min_box_area: 0
vertical_ratio: 0 # no need to filter bboxes according to w/h
conf_thres: 0.4
tracked_thresh: 0.4
metric_type: cosine
weights: output/mcfairmot_hrnetv2_w18_dlafpn_30e_864x480_visdrone/model_final
epoch: 30
LearningRate:
base_lr: 0.0005
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [10, 20]
use_warmup: False
OptimizerBuilder:
optimizer:
type: Adam
regularizer: NULL
TrainReader:
batch_size: 8
README_cn.md
\ No newline at end of file
English | [简体中文](README_cn.md)
# MTMCT (Multi-Target Multi-Camera Tracking)
## 内容
- [简介](#简介)
- [模型库](#模型库)
- [快速开始](#快速开始)
- [引用](#引用)
## 简介
MTMCT (Multi-Target Multi-Camera Tracking) 跨镜头多目标跟踪是某一场景下的不同摄像头拍摄的视频进行多目标跟踪,是跟踪领域一个非常重要的研究课题,在安防监控、自动驾驶、智慧城市等行业起着重要作用。MTMCT预测的是同一场景下的不同摄像头拍摄的视频,其方法的效果受场景先验知识和相机数量角度拓扑结构等信息的影响较大,PaddleDetection此处提供的是去除场景和相机相关优化方法后的一个基础版本的MTMCT算法实现,如果要继续提高效果,需要专门针对该场景和相机信息设计后处理算法。此处选用DeepSORT方案做MTMCT,为了达到实时性选用了PaddleDetection自研的PPYOLOv2和PP-PicoDet作为检测器,选用PaddleClas自研的轻量级网络PP-LCNet作为ReID模型。
MTMCT是[PP-Tracking](../../../deploy/pptracking)项目中一个非常重要的方向,[PP-Tracking](../../../deploy/pptracking/README.md)是基于PaddlePaddle深度学习框架的业界首个开源实时跟踪系统。针对实际业务的难点痛点,PP-Tracking内置行人车辆跟踪、跨镜头跟踪、多类别跟踪、小目标跟踪及流量计数等能力与产业应用,同时提供可视化开发界面。模型集成多目标跟踪,目标检测,ReID轻量级算法,进一步提升PP-Tracking在服务器端部署性能。同时支持python,C++部署,适配Linux,Nvidia Jetson多平台环境。具体可前往该目录使用。
## 模型库
### DeepSORT在 AIC21 MTMCT(CityFlow) 车辆跨境跟踪数据集Test集上的结果
| 检测器 | 输入尺度 | ReID | 场景 | Tricks | IDF1 | IDP | IDR | Precision | Recall | FPS | 检测器下载链接 | ReID下载链接 |
| :--------- | :--------- | :------- | :----- | :------ |:----- |:------- |:----- |:--------- |:-------- |:----- |:------ | :------ |
| PP-PicoDet | 640x640 | PP-LCNet | S06 | - | 0.3617 | 0.4417 | 0.3062 | 0.6266 | 0.4343 | - |[Detector](https://paddledet.bj.bcebos.com/models/mot/deepsort/picodet_l_640_aic21mtmct_vehicle.tar) |[ReID](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet_vehicle.tar) |
| PPYOLOv2 | 640x640 | PP-LCNet | S06 | - | 0.4450 | 0.4611 | 0.4300 | 0.6385 | 0.5954 | - |[Detector](https://paddledet.bj.bcebos.com/models/mot/deepsort/ppyolov2_r50vd_dcn_365e_aic21mtmct_vehicle.tar) |[ReID](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet_vehicle.tar) |
**注意:**
S06是AIC21 MTMCT数据集Test集的场景名称,S06场景下有’c041,c042,c043,c044,c045,c046‘共6个摄像头的视频。
## 数据集准备
此处提供了车辆和行人的两种模型方案,对于车辆是选用的[AIC21 MTMCT](https://www.aicitychallenge.org) (CityFlow)车辆跨境跟踪数据集,对于行人是选用的[WILDTRACK](https://www.epfl.ch/labs/cvlab/data/data-wildtrack)行人跨境跟踪数据集。
AIC21 MTMCT原始数据集的目录如下所示:
```
|——————AIC21_Track3_MTMC_Tracking
|——————cam_framenum (Number of frames below each camera)
|——————cam_loc (Positional relationship between cameras)
|——————cam_timestamp (Time difference between cameras)
|——————eval (evaluation function and ground_truth.txt)
|——————test
|——————train
|——————validation
|——————DataLicenseAgreement_AICityChallenge_2021.pdf
|——————list_cam.txt (List of all camera paths)
|——————ReadMe.txt (Dataset description)
|——————gen_aicity_mtmct_data.py (Camera data extraction script)
```
需要处理成如下格式:
```
├── S01
│ ├── c001
│ ├── roi.jog (Area mask of the road)
│ ├── img1
│ ├── ...
│ ├── c002
│ ├── roi.jog
│ ├── img1
│ ├── ...
│ ├── c003
│ ├── roi.jog
│ ├── img1
│ ├── ...
├── gt
│ ├── ground_truth_train.txt
│ ├── ground_truth_validation.txt
├── zone (only for S06 when use camera track trick)
│ ├── ...
```
#### 生成S01场景的验证集数据
python gen_aicity_mtmct_data.py ./AIC21_Track3_MTMC_Tracking/train/S01
## 快速开始
### 1. 导出模型
Step 1:下载导出的检测模型
```bash
wget https://paddledet.bj.bcebos.com/models/mot/deepsort/picodet_l_640_aic21mtmct_vehicle.tar
tar -xvf picodet_l_640_aic21mtmct_vehicle.tar
```
Step 2:下载导出的ReID模型
```bash
wget https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet_vehicle.tar
tar -xvf deepsort_pplcnet_vehicle.tar
```
**注意:**
PP-PicoDet是轻量级检测模型,其训练请参考[configs/picodet](../../picodet/README.md),并注意修改种类数和数据集路径。
PP-LCNet是轻量级ReID模型,其训练请参考[PaddleClas](https://github.com/PaddlePaddle/PaddleClas),是在VERI-Wild车辆重识别数据集训练得到的权重,建议直接使用无需重训。
### 2. 用导出的模型基于Python去预测
```bash
# 用导出PicoDet车辆检测模型和PPLCNet车辆ReID模型
python deploy/pptracking/python/mot_sde_infer.py --model_dir=picodet_l_640_aic21mtmct_vehicle/ --reid_model_dir=deepsort_pplcnet_vehicle/ --mtmct_dir={your mtmct scene video folder} --device=GPU --scaled=True --save_mot_txts --save_images
```
**注意:**
跟踪模型是对视频进行预测,不支持单张图的预测,默认保存跟踪结果可视化后的视频,可添加`--save_mot_txts`(对每个视频保存一个txt),或`--save_images`表示保存跟踪结果可视化图片。
`--scaled`表示在模型输出结果的坐标是否已经是缩放回原图的,如果使用的检测模型是JDE的YOLOv3则为False,如果使用通用检测模型则为True。
`--mtmct_dir`是MTMCT预测的某个场景的文件夹名字,里面包含该场景不同摄像头拍摄视频的图片文件夹,其数量至少为两个。
MTMCT跨镜头跟踪输出结果为视频和txt形式。每个图片文件夹各生成一个可视化的跨镜头跟踪结果,与单镜头跟踪的结果是不同的,单镜头跟踪的结果在几个视频文件夹间是独立无关的。MTMCT的结果txt只有一个,比单镜头跟踪结果txt多了第一列镜头id号。
MTMCT是[PP-Tracking](../../../deploy/pptracking)项目中的一个非常重要的方向,具体可前往该目录使用。
## 引用
```
@InProceedings{Tang19CityFlow,
author = {Zheng Tang and Milind Naphade and Ming-Yu Liu and Xiaodong Yang and Stan Birchfield and Shuo Wang and Ratnesh Kumar and David Anastasiu and Jenq-Neng Hwang},
title = {CityFlow: A City-Scale Benchmark for Multi-Target Multi-Camera Vehicle Tracking and Re-Identification},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2019},
pages = {8797–8806}
}
```
......@@ -14,10 +14,13 @@
### FairMOT在各个数据集val-set上Pedestrian类别的结果
| 数据集 | 输入尺寸 | MOTA | IDF1 | FPS | 下载链接 | 配置文件 |
| :-------------| :------- | :----: | :----: | :----: | :-----: |:------: |
| PathTrack | 1088x608 | 44.9 | 59.3 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_pathtrack.pdparams) | [配置文件](./fairmot_dla34_30e_1088x608_pathtrack.yml) |
| VisDrone | 1088x608 | 49.2 | 63.1 | - | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_visdrone_pedestrian.pdparams) | [配置文件](./fairmot_dla34_30e_1088x608_visdrone_pedestrian.yml) |
| 数据集 | 骨干网络 | 输入尺寸 | MOTA | IDF1 | FPS | 下载链接 | 配置文件 |
| :-------------| :-------- | :------- | :----: | :----: | :----: | :-----: |:------: |
| PathTrack | DLA-34 | 1088x608 | 44.9 | 59.3 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_pathtrack.pdparams) | [配置文件](./fairmot_dla34_30e_1088x608_pathtrack.yml) |
| VisDrone | DLA-34 | 1088x608 | 49.2 | 63.1 | - | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_visdrone_pedestrian.pdparams) | [配置文件](./fairmot_dla34_30e_1088x608_visdrone_pedestrian.yml) |
| VisDrone | HRNetv2-W18| 1088x608 | 40.5 | 54.7 | - | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_864x480_visdrone_pedestrian.pdparams) | [配置文件](./fairmot_hrnetv2_w18_dlafpn_30e_864x480_visdrone_pedestrian.yml) |
| VisDrone | HRNetv2-W18| 864x480 | 38.6 | 50.9 | - | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_864x480_visdrone_pedestrian.pdparams) | [配置文件](./fairmot_hrnetv2_w18_dlafpn_30e_864x480_visdrone_pedestrian.yml) |
| VisDrone | HRNetv2-W18| 576x320 | 30.6 | 47.2 | - | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_576x320_visdrone_pedestrian.pdparams) | [配置文件](./fairmot_hrnetv2_w18_dlafpn_30e_576x320_visdrone_pedestrian.yml) |
**注意:**
FairMOT均使用DLA-34为骨干网络,4个GPU进行训练,每个GPU上batch size为6,训练30个epoch。
......
_BASE_: [
'../fairmot/fairmot_hrnetv2_w18_dlafpn_30e_1088x608.yml'
]
weights: output/fairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone_pedestrian/model_final
# for MOT training
TrainDataset:
!MOTDataSet
dataset_dir: dataset/mot
image_lists: ['visdrone_pedestrian.train']
data_fields: ['image', 'gt_bbox', 'gt_class', 'gt_ide']
# for MOT evaluation
# If you want to change the MOT evaluation dataset, please modify 'data_root'
EvalMOTDataset:
!MOTImageFolder
dataset_dir: dataset/mot
data_root: visdrone_pedestrian/images/val
keep_ori_im: False # set True if save visualization images or video, or used in DeepSORT
# for MOT video inference
TestMOTDataset:
!MOTImageFolder
dataset_dir: dataset/mot
keep_ori_im: True # set True if save visualization images or video
_BASE_: [
'../fairmot/fairmot_hrnetv2_w18_dlafpn_30e_576x320.yml'
]
weights: output/fairmot_hrnetv2_w18_dlafpn_30e_576x320_visdrone_pedestrian/model_final
# for MOT training
TrainDataset:
!MOTDataSet
dataset_dir: dataset/mot
image_lists: ['visdrone_pedestrian.train']
data_fields: ['image', 'gt_bbox', 'gt_class', 'gt_ide']
# for MOT evaluation
# If you want to change the MOT evaluation dataset, please modify 'data_root'
EvalMOTDataset:
!MOTImageFolder
dataset_dir: dataset/mot
data_root: visdrone_pedestrian/images/val
keep_ori_im: False # set True if save visualization images or video, or used in DeepSORT
# for MOT video inference
TestMOTDataset:
!MOTImageFolder
dataset_dir: dataset/mot
keep_ori_im: True # set True if save visualization images or video
_BASE_: [
'../fairmot/fairmot_hrnetv2_w18_dlafpn_30e_864x480.yml'
]
weights: output/fairmot_hrnetv2_w18_dlafpn_30e_864x480_visdrone_pedestrian/model_final
# for MOT training
TrainDataset:
!MOTDataSet
dataset_dir: dataset/mot
image_lists: ['visdrone_pedestrian.train']
data_fields: ['image', 'gt_bbox', 'gt_class', 'gt_ide']
# for MOT evaluation
# If you want to change the MOT evaluation dataset, please modify 'data_root'
EvalMOTDataset:
!MOTImageFolder
dataset_dir: dataset/mot
data_root: visdrone_pedestrian/images/val
keep_ori_im: False # set True if save visualization images or video, or used in DeepSORT
# for MOT video inference
TestMOTDataset:
!MOTImageFolder
dataset_dir: dataset/mot
keep_ori_im: True # set True if save visualization images or video
......@@ -17,11 +17,15 @@
### FairMOT在各个数据集val-set上Vehicle类别的结果
| 数据集 | 输入尺寸 | MOTA | IDF1 | FPS | 下载链接 | 配置文件 |
| :-------------| :------- | :----: | :----: | :----: | :-----: |:------: |
| BDD100K | 1088x608 | 43.5 | 50.0 | - | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_bdd100kmot_vehicle.pdparams) | [配置文件](./fairmot_dla34_30e_1088x608_bdd100kmot_vehicle.yml) |
| KITTI | 1088x608 | 82.7 | - | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_kitti_vehicle.pdparams) | [配置文件](./fairmot_dla34_30e_1088x608_kitti_vehicle.yml) |
| VisDrone | 1088x608 | 52.1 | 63.3 | - | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_visdrone_vehicle.pdparams) | [配置文件](./fairmot_dla34_30e_1088x608_visdrone_vehicle.yml) |
| 数据集 | 骨干网络 | 输入尺寸 | MOTA | IDF1 | FPS | 下载链接 | 配置文件 |
| :-------------| :-------- | :------- | :----: | :----: | :----: | :-----: |:------: |
| BDD100K | DLA-34 | 1088x608 | 43.5 | 50.0 | - | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_bdd100kmot_vehicle.pdparams) | [配置文件](./fairmot_dla34_30e_1088x608_bdd100kmot_vehicle.yml) |
| BDD100K | HRNetv2-W18| 576x320 | 32.6 | 38.7 | - | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_576x320_bdd100kmot_vehicle.pdparams) | [配置文件](./fairmot_hrnetv2_w18_dlafpn_30e_576x320_bdd100kmot_vehicle.yml) |
| KITTI | DLA-34 | 1088x608 | 82.7 | - | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_kitti_vehicle.pdparams) | [配置文件](./fairmot_dla34_30e_1088x608_kitti_vehicle.yml) |
| VisDrone | DLA-34 | 1088x608 | 52.1 | 63.3 | - | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_visdrone_vehicle.pdparams) | [配置文件](./fairmot_dla34_30e_1088x608_visdrone_vehicle.yml) |
| VisDrone | HRNetv2-W18| 1088x608 | 46.0 | 56.8 | - | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone_vehicle.pdparams) | [配置文件](./fairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone_vehicle.yml) |
| VisDrone | HRNetv2-W18| 864x480 | 43.7 | 56.1 | - | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_864x480_visdrone_vehicle.pdparams) | [配置文件](./fairmot_hrnetv2_w18_dlafpn_30e_864x480_visdrone_vehicle.yml) |
| VisDrone | HRNetv2-W18| 576x320 | 39.8 | 52.4 | - | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_576x320_visdrone_vehicle.pdparams) | [配置文件](./fairmot_hrnetv2_w18_dlafpn_30e_576x320_visdrone_vehicle.yml) |
**注意:**
FairMOT均使用DLA-34为骨干网络,4个GPU进行训练,每个GPU上batch size为6,训练30个epoch。
......
_BASE_: [
'../fairmot/fairmot_hrnetv2_w18_dlafpn_30e_1088x608.yml'
]
weights: output/fairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone_vehicle/model_final
# for MOT training
TrainDataset:
!MOTDataSet
dataset_dir: dataset/mot
image_lists: ['visdrone_vehicle.train']
data_fields: ['image', 'gt_bbox', 'gt_class', 'gt_ide']
# for MOT evaluation
# If you want to change the MOT evaluation dataset, please modify 'data_root'
EvalMOTDataset:
!MOTImageFolder
dataset_dir: dataset/mot
data_root: visdrone_vehicle/images/val
keep_ori_im: False # set True if save visualization images or video, or used in DeepSORT
# for MOT video inference
TestMOTDataset:
!MOTImageFolder
dataset_dir: dataset/mot
keep_ori_im: True # set True if save visualization images or video
# model config
FairMOT:
detector: CenterNet
reid: FairMOTEmbeddingHead
loss: FairMOTLoss
tracker: JDETracker
JDETracker:
min_box_area: 0
vertical_ratio: 0 # no need to filter bboxes according to w/h
conf_thres: 0.4
tracked_thresh: 0.4
metric_type: cosine
_BASE_: [
'../fairmot/fairmot_hrnetv2_w18_dlafpn_30e_576x320.yml'
]
weights: output/fairmot_hrnetv2_w18_dlafpn_30e_576x320_bdd100kmot_vehicle/model_final
# for MOT training
TrainDataset:
!MOTDataSet
dataset_dir: dataset/mot
image_lists: ['bdd100kmot_vehicle.train']
data_fields: ['image', 'gt_bbox', 'gt_class', 'gt_ide']
# for MOT evaluation
# If you want to change the MOT evaluation dataset, please modify 'data_root'
EvalMOTDataset:
!MOTImageFolder
dataset_dir: dataset/mot
data_root: bdd100kmot_vehicle/images/val
keep_ori_im: False # set True if save visualization images or video, or used in DeepSORT
# for MOT video inference
TestMOTDataset:
!MOTImageFolder
dataset_dir: dataset/mot
keep_ori_im: True # set True if save visualization images or video
# model config
FairMOT:
detector: CenterNet
reid: FairMOTEmbeddingHead
loss: FairMOTLoss
tracker: JDETracker
JDETracker:
min_box_area: 0
vertical_ratio: 0 # no need to filter bboxes according to w/h
conf_thres: 0.4
tracked_thresh: 0.4
metric_type: cosine
_BASE_: [
'../fairmot/fairmot_hrnetv2_w18_dlafpn_30e_576x320.yml'
]
weights: output/fairmot_hrnetv2_w18_dlafpn_30e_576x320_visdrone_vehicle/model_final
# for MOT training
TrainDataset:
!MOTDataSet
dataset_dir: dataset/mot
image_lists: ['visdrone_vehicle.train']
data_fields: ['image', 'gt_bbox', 'gt_class', 'gt_ide']
# for MOT evaluation
# If you want to change the MOT evaluation dataset, please modify 'data_root'
EvalMOTDataset:
!MOTImageFolder
dataset_dir: dataset/mot
data_root: visdrone_vehicle/images/val
keep_ori_im: False # set True if save visualization images or video, or used in DeepSORT
# for MOT video inference
TestMOTDataset:
!MOTImageFolder
dataset_dir: dataset/mot
keep_ori_im: True # set True if save visualization images or video
# model config
FairMOT:
detector: CenterNet
reid: FairMOTEmbeddingHead
loss: FairMOTLoss
tracker: JDETracker
JDETracker:
min_box_area: 0
vertical_ratio: 0 # no need to filter bboxes according to w/h
conf_thres: 0.4
tracked_thresh: 0.4
metric_type: cosine
_BASE_: [
'../fairmot/fairmot_hrnetv2_w18_dlafpn_30e_864x480.yml'
]
weights: output/fairmot_hrnetv2_w18_dlafpn_30e_864x480_visdrone_vehicle/model_final
# for MOT training
TrainDataset:
!MOTDataSet
dataset_dir: dataset/mot
image_lists: ['visdrone_vehicle.train']
data_fields: ['image', 'gt_bbox', 'gt_class', 'gt_ide']
# for MOT evaluation
# If you want to change the MOT evaluation dataset, please modify 'data_root'
EvalMOTDataset:
!MOTImageFolder
dataset_dir: dataset/mot
data_root: visdrone_vehicle/images/val
keep_ori_im: False # set True if save visualization images or video, or used in DeepSORT
# for MOT video inference
TestMOTDataset:
!MOTImageFolder
dataset_dir: dataset/mot
keep_ori_im: True # set True if save visualization images or video
# model config
FairMOT:
detector: CenterNet
reid: FairMOTEmbeddingHead
loss: FairMOTLoss
tracker: JDETracker
JDETracker:
min_box_area: 0
vertical_ratio: 0 # no need to filter bboxes according to w/h
conf_thres: 0.4
tracked_thresh: 0.4
metric_type: cosine
......@@ -537,7 +537,7 @@ def load_predictor(model_dir,
}
if run_mode in precision_map.keys():
config.enable_tensorrt_engine(
workspace_size=1 << 10,
workspace_size=1 << 25,
max_batch_size=batch_size,
min_subgraph_size=min_subgraph_size,
precision_mode=precision_map[run_mode],
......@@ -680,7 +680,7 @@ def predict_video(detector, camera_id):
if not os.path.exists(FLAGS.output_dir):
os.makedirs(FLAGS.output_dir)
out_path = os.path.join(FLAGS.output_dir, video_out_name)
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
fourcc = cv2.VideoWriter_fourcc(* 'mp4v')
writer = cv2.VideoWriter(out_path, fourcc, fps, (width, height))
index = 1
while (1):
......
......@@ -23,7 +23,6 @@ import paddle
from paddle.inference import Config
from paddle.inference import create_predictor
from preprocess import preprocess
from utils import argsparser, Timer, get_current_memory_mb
from infer import Detector, get_test_images, print_arguments, PredictConfig
from benchmark_utils import PaddleInferBenchmark
......@@ -167,6 +166,8 @@ def predict_image(detector, image_list):
results = []
num_classes = detector.num_classes
data_type = 'mcmot' if num_classes > 1 else 'mot'
ids2names = detector.pred_config.labels
image_list.sort()
for frame_id, img_file in enumerate(image_list):
frame = cv2.imread(img_file)
......@@ -181,7 +182,8 @@ def predict_image(detector, image_list):
online_tlwhs, online_scores, online_ids = detector.predict(
[frame], FLAGS.threshold)
online_im = plot_tracking_dict(frame, num_classes, online_tlwhs,
online_ids, online_scores, frame_id)
online_ids, online_scores, frame_id,
ids2names=ids2names)
if FLAGS.save_images:
if not os.path.exists(FLAGS.output_dir):
os.makedirs(FLAGS.output_dir)
......@@ -216,6 +218,8 @@ def predict_video(detector, camera_id):
results = defaultdict(list) # support single class and multi classes
num_classes = detector.num_classes
data_type = 'mcmot' if num_classes > 1 else 'mot'
ids2names = detector.pred_config.labels
while (1):
ret, frame = capture.read()
if not ret:
......@@ -237,7 +241,8 @@ def predict_video(detector, camera_id):
online_ids,
online_scores,
frame_id=frame_id,
fps=fps)
fps=fps,
ids2names=ids2names)
if FLAGS.save_images:
save_dir = os.path.join(FLAGS.output_dir, video_name.split('.')[-2])
if not os.path.exists(save_dir):
......
......@@ -23,9 +23,9 @@ import paddle
from paddle.inference import Config
from paddle.inference import create_predictor
from preprocess import preprocess
from picodet_postprocess import PicoDetPostProcess
from utils import argsparser, Timer, get_current_memory_mb
from infer import Detector, get_test_images, print_arguments, PredictConfig
from infer import Detector, DetectorPicoDet, get_test_images, print_arguments, PredictConfig
from infer import load_predictor
from benchmark_utils import PaddleInferBenchmark
......@@ -139,6 +139,7 @@ class SDE_Detector(Detector):
cpu_threads=cpu_threads,
enable_mkldnn=enable_mkldnn)
assert batch_size == 1, "The JDE Detector only supports batch size=1 now"
self.pred_config = pred_config
def postprocess(self, boxes, input_shape, im_shape, scale_factor, threshold,
scaled):
......@@ -147,6 +148,8 @@ class SDE_Detector(Detector):
pred_dets = np.zeros((1, 6), dtype=np.float32)
pred_xyxys = np.zeros((1, 4), dtype=np.float32)
return pred_dets, pred_xyxys
else:
boxes = boxes[over_thres_idx]
if not scaled:
# scaled means whether the coords after detector outputs
......@@ -159,6 +162,11 @@ class SDE_Detector(Detector):
pred_xyxys, keep_idx = clip_box(pred_bboxes, input_shape, im_shape,
scale_factor)
if len(keep_idx[0]) == 0:
pred_dets = np.zeros((1, 6), dtype=np.float32)
pred_xyxys = np.zeros((1, 4), dtype=np.float32)
return pred_dets, pred_xyxys
pred_scores = boxes[:, 1:2][keep_idx[0]]
pred_cls_ids = boxes[:, 0:1][keep_idx[0]]
pred_tlwhs = np.concatenate(
......@@ -168,7 +176,7 @@ class SDE_Detector(Detector):
pred_dets = np.concatenate(
(pred_tlwhs, pred_scores, pred_cls_ids), axis=1)
return pred_dets[over_thres_idx], pred_xyxys[over_thres_idx]
return pred_dets, pred_xyxys
def predict(self, image, scaled, threshold=0.5, warmup=0, repeats=1):
'''
......@@ -220,6 +228,142 @@ class SDE_Detector(Detector):
return pred_dets, pred_xyxys
class SDE_DetectorPicoDet(DetectorPicoDet):
"""
Args:
pred_config (object): config of model, defined by `Config(model_dir)`
model_dir (str): root path of model.pdiparams, model.pdmodel and infer_cfg.yml
device (str): Choose the device you want to run, it can be: CPU/GPU/XPU, default is CPU
run_mode (str): mode of running(fluid/trt_fp32/trt_fp16)
trt_min_shape (int): min shape for dynamic shape in trt
trt_max_shape (int): max shape for dynamic shape in trt
trt_opt_shape (int): opt shape for dynamic shape in trt
trt_calib_mode (bool): If the model is produced by TRT offline quantitative
calibration, trt_calib_mode need to set True
cpu_threads (int): cpu threads
enable_mkldnn (bool): whether to open MKLDNN
"""
def __init__(self,
pred_config,
model_dir,
device='CPU',
run_mode='fluid',
batch_size=1,
trt_min_shape=1,
trt_max_shape=1088,
trt_opt_shape=608,
trt_calib_mode=False,
cpu_threads=1,
enable_mkldnn=False):
super(SDE_DetectorPicoDet, self).__init__(
pred_config=pred_config,
model_dir=model_dir,
device=device,
run_mode=run_mode,
batch_size=batch_size,
trt_min_shape=trt_min_shape,
trt_max_shape=trt_max_shape,
trt_opt_shape=trt_opt_shape,
trt_calib_mode=trt_calib_mode,
cpu_threads=cpu_threads,
enable_mkldnn=enable_mkldnn)
assert batch_size == 1, "The JDE Detector only supports batch size=1 now"
self.pred_config = pred_config
def postprocess_bboxes(self, boxes, input_shape, im_shape, scale_factor, threshold):
over_thres_idx = np.nonzero(boxes[:, 1:2] >= threshold)[0]
if len(over_thres_idx) == 0:
pred_dets = np.zeros((1, 6), dtype=np.float32)
pred_xyxys = np.zeros((1, 4), dtype=np.float32)
return pred_dets, pred_xyxys
else:
boxes = boxes[over_thres_idx]
pred_bboxes = boxes[:, 2:]
pred_xyxys, keep_idx = clip_box(pred_bboxes, input_shape, im_shape,
scale_factor)
if len(keep_idx[0]) == 0:
pred_dets = np.zeros((1, 6), dtype=np.float32)
pred_xyxys = np.zeros((1, 4), dtype=np.float32)
return pred_dets, pred_xyxys
pred_scores = boxes[:, 1:2][keep_idx[0]]
pred_cls_ids = boxes[:, 0:1][keep_idx[0]]
pred_tlwhs = np.concatenate(
(pred_xyxys[:, 0:2], pred_xyxys[:, 2:4] - pred_xyxys[:, 0:2] + 1),
axis=1)
pred_dets = np.concatenate(
(pred_tlwhs, pred_scores, pred_cls_ids), axis=1)
return pred_dets, pred_xyxys
def predict(self, image, scaled, threshold=0.5, warmup=0, repeats=1):
'''
Args:
image (np.ndarray): image numpy data
threshold (float): threshold of predicted box' score
scaled (bool): whether the coords after detector outputs are scaled,
default False in jde yolov3, set True in general detector.
Returns:
pred_dets (np.ndarray, [N, 6])
'''
self.det_times.preprocess_time_s.start()
inputs = self.preprocess(image)
self.det_times.preprocess_time_s.end()
input_names = self.predictor.get_input_names()
for i in range(len(input_names)):
input_tensor = self.predictor.get_input_handle(input_names[i])
input_tensor.copy_from_cpu(inputs[input_names[i]])
np_score_list, np_boxes_list = [], []
for i in range(warmup):
self.predictor.run()
output_names = self.predictor.get_output_names()
boxes_tensor = self.predictor.get_output_handle(output_names[0])
boxes = boxes_tensor.copy_to_cpu()
self.det_times.inference_time_s.start()
for i in range(repeats):
self.predictor.run()
np_score_list.clear()
np_boxes_list.clear()
output_names = self.predictor.get_output_names()
num_outs = int(len(output_names) / 2)
for out_idx in range(num_outs):
np_score_list.append(
self.predictor.get_output_handle(output_names[out_idx])
.copy_to_cpu())
np_boxes_list.append(
self.predictor.get_output_handle(output_names[
out_idx + num_outs]).copy_to_cpu())
self.det_times.inference_time_s.end(repeats=repeats)
self.det_times.img_num += 1
self.det_times.postprocess_time_s.start()
self.postprocess = PicoDetPostProcess(
inputs['image'].shape[2:],
inputs['im_shape'],
inputs['scale_factor'],
strides=self.pred_config.fpn_stride,
nms_threshold=self.pred_config.nms['nms_threshold'])
boxes, boxes_num = self.postprocess(np_score_list, np_boxes_list)
if len(boxes) == 0:
pred_dets = np.zeros((1, 6), dtype=np.float32)
pred_xyxys = np.zeros((1, 4), dtype=np.float32)
else:
input_shape = inputs['image'].shape[2:]
im_shape = inputs['im_shape']
scale_factor = inputs['scale_factor']
pred_dets, pred_xyxys = self.postprocess_bboxes(
boxes, input_shape, im_shape, scale_factor, threshold)
return pred_dets, pred_xyxys
class SDE_ReID(object):
def __init__(self,
pred_config,
......@@ -350,7 +494,7 @@ def predict_image(detector, reid_model, image_list):
pred_dets, pred_xyxys = detector.predict([frame], FLAGS.scaled,
FLAGS.threshold)
if len(pred_dets) == 1 and sum(pred_dets) == 0:
if len(pred_dets) == 1 and np.sum(pred_dets) == 0:
print('Frame {} has no object, try to modify score threshold.'.
format(i))
online_im = frame
......@@ -407,7 +551,7 @@ def predict_video(detector, reid_model, camera_id):
pred_dets, pred_xyxys = detector.predict([frame], FLAGS.scaled,
FLAGS.threshold)
if len(pred_dets) == 1 and sum(pred_dets) == 0:
if len(pred_dets) == 1 and np.sum(pred_dets) == 0:
print('Frame {} has no object, try to modify score threshold.'.
format(frame_id))
timer.toc()
......@@ -464,17 +608,21 @@ def predict_video(detector, reid_model, camera_id):
def main():
pred_config = PredictConfig(FLAGS.model_dir)
detector = SDE_Detector(
pred_config,
FLAGS.model_dir,
device=FLAGS.device,
run_mode=FLAGS.run_mode,
trt_min_shape=FLAGS.trt_min_shape,
trt_max_shape=FLAGS.trt_max_shape,
trt_opt_shape=FLAGS.trt_opt_shape,
trt_calib_mode=FLAGS.trt_calib_mode,
cpu_threads=FLAGS.cpu_threads,
enable_mkldnn=FLAGS.enable_mkldnn)
detector_func = 'SDE_Detector'
if pred_config.arch == 'PicoDet':
detector_func = 'SDE_DetectorPicoDet'
detector = eval(detector_func)(pred_config,
FLAGS.model_dir,
device=FLAGS.device,
run_mode=FLAGS.run_mode,
batch_size=FLAGS.batch_size,
trt_min_shape=FLAGS.trt_min_shape,
trt_max_shape=FLAGS.trt_max_shape,
trt_opt_shape=FLAGS.trt_opt_shape,
trt_calib_mode=FLAGS.trt_calib_mode,
cpu_threads=FLAGS.cpu_threads,
enable_mkldnn=FLAGS.enable_mkldnn)
pred_config = PredictConfig(FLAGS.reid_model_dir)
reid_model = SDE_ReID(
......
......@@ -154,7 +154,7 @@ class MOTDataSet(DetDataset):
last_index += v
self.num_identities_dict = defaultdict(int)
self.num_identities_dict[0] = int(last_index + 1) # single class
self.num_identities_dict[0] = int(last_index + 1) # single class
self.num_imgs_each_data = [len(x) for x in self.img_files.values()]
self.total_imgs = sum(self.num_imgs_each_data)
......@@ -249,6 +249,7 @@ class MCMOTDataSet(DetDataset):
└——————labels_with_ids
└——————train
"""
def __init__(self,
dataset_dir=None,
image_lists=[],
......@@ -343,22 +344,26 @@ class MCMOTDataSet(DetDataset):
# cname2cid and cid2cname
cname2cid = {}
if self.label_list:
if self.label_list is not None:
# if use label_list for multi source mix dataset,
# please make sure label_list in the first sub_dataset at least.
sub_dataset = self.image_lists[0].split('.')[0]
label_path = os.path.join(self.dataset_dir, sub_dataset,
self.label_list)
if not os.path.exists(label_path):
raise ValueError("label_list {} does not exists".format(
label_path))
with open(label_path, 'r') as fr:
label_id = 0
for line in fr.readlines():
cname2cid[line.strip()] = label_id
label_id += 1
logger.info(
"Note: label_list {} does not exists, use VisDrone 10 classes labels as default.".
format(label_path))
cname2cid = visdrone_mcmot_label()
else:
with open(label_path, 'r') as fr:
label_id = 0
for line in fr.readlines():
cname2cid[line.strip()] = label_id
label_id += 1
else:
cname2cid = visdrone_mcmot_label()
cid2cname = dict([(v, k) for (k, v) in cname2cid.items()])
logger.info('MCMOT dataset summary: ')
......
......@@ -176,6 +176,7 @@ class Tracker(object):
save_dir=None,
show_image=False,
frame_rate=30,
seq_name='',
scaled=False,
det_file='',
draw_threshold=0):
......@@ -200,23 +201,31 @@ class Tracker(object):
logger.info('Processing frame {} ({:.2f} fps)'.format(
frame_id, 1. / max(1e-5, timer.average_time)))
ori_image = data['ori_image']
ori_image = data['ori_image'] # [bs, H, W, 3]
ori_image_shape = data['ori_image'].shape[1:3]
# ori_image_shape: [H, W]
input_shape = data['image'].shape[2:]
im_shape = data['im_shape']
scale_factor = data['scale_factor']
# input_shape: [h, w], before data transforms, set in model config
im_shape = data['im_shape'][0].numpy()
# im_shape: [new_h, new_w], after data transforms
scale_factor = data['scale_factor'][0].numpy()
empty_detections = False
# when it has no detected bboxes, will not inference reid model
# and if visualize, use original image instead
# forward
timer.tic()
if not use_detector:
dets = dets_list[frame_id]
bbox_tlwh = paddle.to_tensor(dets['bbox'], dtype='float32')
bbox_tlwh = np.array(dets['bbox'], dtype='float32')
if bbox_tlwh.shape[0] > 0:
# detector outputs: pred_cls_ids, pred_scores, pred_bboxes
pred_cls_ids = paddle.to_tensor(
dets['cls_id'], dtype='float32').unsqueeze(1)
pred_scores = paddle.to_tensor(
dets['score'], dtype='float32').unsqueeze(1)
pred_bboxes = paddle.concat(
pred_cls_ids = np.array(dets['cls_id'], dtype='float32')
pred_scores = np.array(dets['score'], dtype='float32')
pred_bboxes = np.concatenate(
(bbox_tlwh[:, 0:2],
bbox_tlwh[:, 2:4] + bbox_tlwh[:, 0:2]),
axis=1)
......@@ -224,16 +233,21 @@ class Tracker(object):
logger.warning(
'Frame {} has not object, try to modify score threshold.'.
format(frame_id))
frame_id += 1
continue
empty_detections = True
else:
outs = self.model.detector(data)
if outs['bbox_num'] > 0:
outs['bbox'] = outs['bbox'].numpy()
outs['bbox_num'] = outs['bbox_num'].numpy()
if outs['bbox_num'] > 0 and empty_detections == False:
# detector outputs: pred_cls_ids, pred_scores, pred_bboxes
pred_cls_ids = outs['bbox'][:, 0:1]
pred_scores = outs['bbox'][:, 1:2]
if not scaled:
# scaled means whether the coords after detector outputs
# Note: scaled=False only in JDE YOLOv3 or other detectors
# with LetterBoxResize and JDEBBoxPostProcess.
#
# 'scaled' means whether the coords after detector outputs
# have been scaled back to the original image, set True
# in general detector, set False in JDE YOLOv3.
pred_bboxes = scale_coords(outs['bbox'][:, 2:],
......@@ -243,20 +257,36 @@ class Tracker(object):
pred_bboxes = outs['bbox'][:, 2:]
else:
logger.warning(
'Frame {} has not object, try to modify score threshold.'.
'Frame {} has not detected object, try to modify score threshold.'.
format(frame_id))
frame_id += 1
continue
empty_detections = True
if not empty_detections:
pred_xyxys, keep_idx = clip_box(pred_bboxes, ori_image_shape)
if len(keep_idx[0]) == 0:
logger.warning(
'Frame {} has not detected object left after clip_box.'.
format(frame_id))
empty_detections = True
if empty_detections:
timer.toc()
# if visualize, use original image instead
online_ids, online_tlwhs, online_scores = None, None, None
save_vis_results(data, frame_id, online_ids, online_tlwhs,
online_scores, timer.average_time, show_image,
save_dir, self.cfg.num_classes)
frame_id += 1
# thus will not inference reid model
continue
pred_xyxys, keep_idx = clip_box(pred_bboxes, input_shape, im_shape,
scale_factor)
pred_scores = paddle.gather_nd(pred_scores, keep_idx).unsqueeze(1)
pred_cls_ids = paddle.gather_nd(pred_cls_ids, keep_idx).unsqueeze(1)
pred_tlwhs = paddle.concat(
pred_scores = pred_scores[keep_idx[0]]
pred_cls_ids = pred_cls_ids[keep_idx[0]]
pred_tlwhs = np.concatenate(
(pred_xyxys[:, 0:2],
pred_xyxys[:, 2:4] - pred_xyxys[:, 0:2] + 1),
axis=1)
pred_dets = paddle.concat(
pred_dets = np.concatenate(
(pred_tlwhs, pred_scores, pred_cls_ids), axis=1)
tracker = self.model.tracker
......@@ -268,8 +298,7 @@ class Tracker(object):
crops = paddle.to_tensor(crops)
data.update({'crops': crops})
pred_embs = self.model(data)
pred_dets, pred_embs = pred_dets.numpy(), pred_embs.numpy()
pred_embs = self.model(data).numpy()
tracker.predict()
online_targets = tracker.update(pred_dets, pred_embs)
......@@ -361,6 +390,7 @@ class Tracker(object):
save_dir=save_dir,
show_image=show_image,
frame_rate=frame_rate,
seq_name=seq,
scaled=scaled,
det_file=os.path.join(det_results_dir,
'{}.txt'.format(seq)))
......@@ -417,19 +447,19 @@ class Tracker(object):
logger.info("Found {} inference images in total.".format(len(images)))
return images
def mot_predict(self,
video_file,
frame_rate,
image_dir,
output_dir,
data_type='mot',
model_type='JDE',
save_images=False,
save_videos=True,
show_image=False,
scaled=False,
det_results_dir='',
draw_threshold=0.5):
def mot_predict_seq(self,
video_file,
frame_rate,
image_dir,
output_dir,
data_type='mot',
model_type='JDE',
save_images=False,
save_videos=True,
show_image=False,
scaled=False,
det_results_dir='',
draw_threshold=0.5):
assert video_file is not None or image_dir is not None, \
"--video_file or --image_dir should be set."
assert video_file is None or os.path.isfile(video_file), \
......@@ -452,6 +482,8 @@ class Tracker(object):
logger.info('Starting tracking video {}'.format(video_file))
elif image_dir:
seq = image_dir.split('/')[-1].split('.')[0]
if os.path.exists(os.path.join(image_dir, 'img1')):
image_dir = os.path.join(image_dir, 'img1')
images = [
'{}/{}'.format(image_dir, x) for x in os.listdir(image_dir)
]
......@@ -484,6 +516,7 @@ class Tracker(object):
save_dir=save_dir,
show_image=show_image,
frame_rate=frame_rate,
seq_name=seq,
scaled=scaled,
det_file=os.path.join(det_results_dir,
'{}.txt'.format(seq)),
......@@ -491,9 +524,6 @@ class Tracker(object):
else:
raise ValueError(model_type)
write_mot_results(result_filename, results, data_type,
self.cfg.num_classes)
if save_videos:
output_video_path = os.path.join(save_dir, '..',
'{}_vis.mp4'.format(seq))
......@@ -501,3 +531,6 @@ class Tracker(object):
save_dir, output_video_path)
os.system(cmd_str)
logger.info('Save video in {}'.format(output_video_path))
write_mot_results(result_filename, results, data_type,
self.cfg.num_classes)
......@@ -16,8 +16,6 @@ from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import paddle
from ppdet.modeling.mot.utils import scale_coords
from ppdet.core.workspace import register, create
from .meta_arch import BaseArch
......@@ -73,8 +71,11 @@ class JDE(BaseArch):
emb_feats = det_outs['emb_feats']
loss_confs = det_outs['det_losses']['loss_confs']
loss_boxes = det_outs['det_losses']['loss_boxes']
jde_losses = self.reid(emb_feats, self.inputs, loss_confs,
loss_boxes)
jde_losses = self.reid(
emb_feats,
self.inputs,
loss_confs=loss_confs,
loss_boxes=loss_boxes)
return jde_losses
else:
if self.metric == 'MOTDet':
......@@ -84,32 +85,18 @@ class JDE(BaseArch):
}
return det_results
elif self.metric == 'ReID':
emb_feats = det_outs['emb_feats']
embs_and_gts = self.reid(emb_feats, self.inputs, test_emb=True)
return embs_and_gts
elif self.metric == 'MOT':
emb_feats = det_outs['emb_feats']
emb_outs = self.reid(emb_feats, self.inputs)
bboxes = det_outs['bbox']
boxes_idx = det_outs['boxes_idx']
bbox = det_outs['bbox']
input_shape = self.inputs['image'].shape[2:]
im_shape = self.inputs['im_shape']
scale_factor = self.inputs['scale_factor']
bbox[:, 2:] = scale_coords(bbox[:, 2:], input_shape, im_shape,
scale_factor)
nms_keep_idx = det_outs['nms_keep_idx']
pred_dets = paddle.concat((bbox[:, 2:], bbox[:, 1:2], bbox[:, 0:1]), axis=1)
emb_valid = paddle.gather_nd(emb_outs, boxes_idx)
pred_embs = paddle.gather_nd(emb_valid, nms_keep_idx)
pred_dets, pred_embs = self.reid(
emb_feats,
self.inputs,
bboxes=bboxes,
boxes_idx=boxes_idx,
nms_keep_idx=nms_keep_idx)
return pred_dets, pred_embs
else:
......
......@@ -87,6 +87,7 @@ class Track(object):
self.state = TrackState.Tentative
self.features = []
self.feat = feature
if feature is not None:
self.features.append(feature)
......@@ -125,6 +126,7 @@ class Track(object):
self.covariance,
detection.to_xyah())
self.features.append(detection.feature)
self.feat = detection.feature
self.cls_id = detection.cls_id
self.score = detection.score
......
......@@ -15,9 +15,8 @@
import os
import cv2
import time
import paddle
import numpy as np
from .visualization import plot_tracking_dict
from .visualization import plot_tracking_dict, plot_tracking
__all__ = [
'MOTTimer',
......@@ -157,14 +156,26 @@ def save_vis_results(data,
if show_image or save_dir is not None:
assert 'ori_image' in data
img0 = data['ori_image'].numpy()[0]
online_im = plot_tracking_dict(
img0,
num_classes,
online_tlwhs,
online_ids,
online_scores,
frame_id=frame_id,
fps=1. / average_time)
if online_ids is None:
online_im = img0
else:
if isinstance(online_tlwhs, dict):
online_im = plot_tracking_dict(
img0,
num_classes,
online_tlwhs,
online_ids,
online_scores,
frame_id=frame_id,
fps=1. / average_time)
else:
online_im = plot_tracking(
img0,
online_tlwhs,
online_ids,
online_scores,
frame_id=frame_id,
fps=1. / average_time)
if show_image:
cv2.imshow('online_im', online_im)
if save_dir is not None:
......@@ -186,45 +197,45 @@ def load_det_results(det_file, num_frames):
# [frame_id],[x0],[y0],[w],[h],[score],[class_id]
for l in lables_with_frame:
results['bbox'].append(l[1:5])
results['score'].append(l[5])
results['cls_id'].append(l[6])
results['score'].append(l[5:6])
results['cls_id'].append(l[6:7])
results_list.append(results)
return results_list
def scale_coords(coords, input_shape, im_shape, scale_factor):
im_shape = im_shape.numpy()[0]
ratio = scale_factor[0][0]
# Note: ratio has only one value, scale_factor[0] == scale_factor[1]
#
# This function only used for JDE YOLOv3 or other detectors with
# LetterBoxResize and JDEBBoxPostProcess, coords output from detector had
# not scaled back to the origin image.
ratio = scale_factor[0]
pad_w = (input_shape[1] - int(im_shape[1])) / 2
pad_h = (input_shape[0] - int(im_shape[0])) / 2
coords = paddle.cast(coords, 'float32')
coords[:, 0::2] -= pad_w
coords[:, 1::2] -= pad_h
coords[:, 0:4] /= ratio
coords[:, :4] = paddle.clip(coords[:, :4], min=0, max=coords[:, :4].max())
coords[:, :4] = np.clip(coords[:, :4], a_min=0, a_max=coords[:, :4].max())
return coords.round()
def clip_box(xyxy, input_shape, im_shape, scale_factor):
im_shape = im_shape.numpy()[0]
ratio = scale_factor.numpy()[0][0]
img0_shape = [int(im_shape[0] / ratio), int(im_shape[1] / ratio)]
xyxy[:, 0::2] = paddle.clip(xyxy[:, 0::2], min=0, max=img0_shape[1])
xyxy[:, 1::2] = paddle.clip(xyxy[:, 1::2], min=0, max=img0_shape[0])
def clip_box(xyxy, ori_image_shape):
H, W = ori_image_shape
xyxy[:, 0::2] = np.clip(xyxy[:, 0::2], a_min=0, a_max=W)
xyxy[:, 1::2] = np.clip(xyxy[:, 1::2], a_min=0, a_max=H)
w = xyxy[:, 2:3] - xyxy[:, 0:1]
h = xyxy[:, 3:4] - xyxy[:, 1:2]
mask = paddle.logical_and(h > 0, w > 0)
keep_idx = paddle.nonzero(mask)
xyxy = paddle.gather_nd(xyxy, keep_idx[:, :1])
return xyxy, keep_idx
mask = np.logical_and(h > 0, w > 0)
keep_idx = np.nonzero(mask)
return xyxy[keep_idx[0]], keep_idx
def get_crops(xyxy, ori_img, w, h):
crops = []
xyxy = xyxy.numpy().astype(np.int64)
xyxy = xyxy.astype(np.int64)
ori_img = ori_img.numpy()
ori_img = np.squeeze(ori_img, axis=0).transpose(1, 0, 2)
ori_img = np.squeeze(ori_img, axis=0).transpose(1, 0, 2) # [h,w,3]->[w,h,3]
for i, bbox in enumerate(xyxy):
crop = ori_img[bbox[0]:bbox[2], bbox[1]:bbox[3], :]
crops.append(crop)
......
......@@ -28,7 +28,7 @@ def plot_tracking(image,
scores=None,
frame_id=0,
fps=0.,
ids2=None):
ids2names=[]):
im = np.ascontiguousarray(np.copy(image))
im_h, im_w = im.shape[:2]
......@@ -52,15 +52,17 @@ def plot_tracking(image,
intbox = tuple(map(int, (x1, y1, x1 + w, y1 + h)))
obj_id = int(obj_ids[i])
id_text = '{}'.format(int(obj_id))
if ids2 is not None:
id_text = id_text + ', {}'.format(int(ids2[i]))
if ids2names != []:
assert len(
ids2names) == 1, "plot_tracking only supports single classes."
id_text = '{}_'.format(ids2names[0]) + id_text
_line_thickness = 1 if obj_id <= 0 else line_thickness
color = get_color(abs(obj_id))
cv2.rectangle(
im, intbox[0:2], intbox[2:4], color=color, thickness=line_thickness)
cv2.putText(
im,
id_text, (intbox[0], intbox[1] + 10),
id_text, (intbox[0], intbox[1] - 10),
cv2.FONT_HERSHEY_PLAIN,
text_scale, (0, 0, 255),
thickness=text_thickness)
......@@ -69,7 +71,7 @@ def plot_tracking(image,
text = '{:.2f}'.format(float(scores[i]))
cv2.putText(
im,
text, (intbox[0], intbox[1] - 10),
text, (intbox[0], intbox[1] + 10),
cv2.FONT_HERSHEY_PLAIN,
text_scale, (0, 255, 255),
thickness=text_thickness)
......@@ -83,7 +85,7 @@ def plot_tracking_dict(image,
scores_dict,
frame_id=0,
fps=0.,
ids2=None):
ids2names=[]):
im = np.ascontiguousarray(np.copy(image))
im_h, im_w = im.shape[:2]
......@@ -111,10 +113,12 @@ def plot_tracking_dict(image,
x1, y1, w, h = tlwh
intbox = tuple(map(int, (x1, y1, x1 + w, y1 + h)))
obj_id = int(obj_ids[i])
if num_classes == 1:
id_text = '{}'.format(int(obj_id))
id_text = '{}'.format(int(obj_id))
if ids2names != []:
id_text = '{}_{}'.format(ids2names[cls_id], id_text)
else:
id_text = 'class{}_id{}'.format(cls_id, int(obj_id))
id_text = 'class{}_{}'.format(cls_id, id_text)
_line_thickness = 1 if obj_id <= 0 else line_thickness
color = get_color(abs(obj_id))
......@@ -126,7 +130,7 @@ def plot_tracking_dict(image,
thickness=line_thickness)
cv2.putText(
im,
id_text, (intbox[0], intbox[1] + 10),
id_text, (intbox[0], intbox[1] - 10),
cv2.FONT_HERSHEY_PLAIN,
text_scale, (0, 0, 255),
thickness=text_thickness)
......@@ -135,7 +139,7 @@ def plot_tracking_dict(image,
text = '{:.2f}'.format(float(scores[i]))
cv2.putText(
im,
text, (intbox[0], intbox[1] - 10),
text, (intbox[0], intbox[1] + 10),
cv2.FONT_HERSHEY_PLAIN,
text_scale, (0, 255, 255),
thickness=text_thickness)
......
......@@ -270,6 +270,8 @@ class CenterNetDLAFPN(nn.Layer):
feat = ida_up_feats[-1]
if self.with_sge:
feat = self.sge_attention(feat)
if self.down_ratio != 4:
feat = F.interpolate(feat, scale_factor=self.down_ratio // 4, mode="bilinear", align_corners=True)
return feat
@property
......
......@@ -440,13 +440,13 @@ class CenterNetPostProcess(TTFBox):
def __call__(self, hm, wh, reg, im_shape, scale_factor):
heat = self._simple_nms(hm)
scores, inds, topk_clses, ys, xs = self._topk(heat)
scores = paddle.tensor.unsqueeze(scores, [1])
clses = paddle.tensor.unsqueeze(topk_clses, [1])
scores = scores.unsqueeze(1)
clses = topk_clses.unsqueeze(1)
reg_t = paddle.transpose(reg, [0, 2, 3, 1])
# Like TTFBox, batch size is 1.
# TODO: support batch size > 1
reg = paddle.reshape(reg_t, [-1, paddle.shape(reg_t)[-1]])
reg = paddle.reshape(reg_t, [-1, reg_t.shape[-1]])
reg = paddle.gather(reg, inds)
xs = paddle.cast(xs, 'float32')
ys = paddle.cast(ys, 'float32')
......@@ -454,7 +454,7 @@ class CenterNetPostProcess(TTFBox):
ys = ys + reg[:, 1:2]
wh_t = paddle.transpose(wh, [0, 2, 3, 1])
wh = paddle.reshape(wh_t, [-1, paddle.shape(wh_t)[-1]])
wh = paddle.reshape(wh_t, [-1, wh_t.shape[-1]])
wh = paddle.gather(wh, inds)
if self.regress_ltrb:
......@@ -486,8 +486,7 @@ class CenterNetPostProcess(TTFBox):
scale_x = scale_factor[:, 1:2]
scale_expand = paddle.concat(
[scale_x, scale_y, scale_x, scale_y], axis=1)
boxes_shape = paddle.shape(bboxes)
boxes_shape.stop_gradient = True
boxes_shape = bboxes.shape[:]
scale_expand = paddle.expand(scale_expand, shape=boxes_shape)
bboxes = paddle.divide(bboxes, scale_expand)
if self.for_mot:
......
......@@ -59,15 +59,11 @@ class FairMOTEmbeddingHead(nn.Layer):
self.reid_loss = nn.CrossEntropyLoss(ignore_index=-1, reduction='sum')
if num_classes == 1:
nID = self.num_identities_dict[0] # single class
nID = self.num_identities_dict[0] # single class
self.classifier = nn.Linear(
ch_emb,
nID,
weight_attr=param_attr,
bias_attr=bias_attr)
ch_emb, nID, weight_attr=param_attr, bias_attr=bias_attr)
# When num_identities(nID) is 1, emb_scale is set as 1
self.emb_scale = math.sqrt(2) * math.log(
nID - 1) if nID > 1 else 1
self.emb_scale = math.sqrt(2) * math.log(nID - 1) if nID > 1 else 1
else:
self.classifiers = dict()
self.emb_scale_dict = dict()
......@@ -84,7 +80,7 @@ class FairMOTEmbeddingHead(nn.Layer):
input_shape = input_shape[0]
return {'in_channels': input_shape.channels}
def process_by_class(self, det_outs, embedding, bbox_inds, topk_clses):
def process_by_class(self, bboxes, embedding, bbox_inds, topk_clses):
pred_dets, pred_embs = [], []
for cls_id in range(self.num_classes):
inds_masks = topk_clses == cls_id
......@@ -97,8 +93,8 @@ class FairMOTEmbeddingHead(nn.Layer):
cls_inds_mask = inds_masks > 0
bbox_mask = paddle.nonzero(cls_inds_mask)
cls_det_outs = paddle.gather_nd(det_outs, bbox_mask)
pred_dets.append(cls_det_outs)
cls_bboxes = paddle.gather_nd(bboxes, bbox_mask)
pred_dets.append(cls_bboxes)
cls_inds = paddle.masked_select(bbox_inds, cls_inds_mask)
cls_inds = cls_inds.unsqueeze(-1)
......@@ -108,12 +104,12 @@ class FairMOTEmbeddingHead(nn.Layer):
return paddle.concat(pred_dets), paddle.concat(pred_embs)
def forward(self,
feat,
neck_feat,
inputs,
det_outs=None,
bboxes=None,
bbox_inds=None,
topk_clses=None):
reid_feat = self.reid(feat)
reid_feat = self.reid(neck_feat)
if self.training:
if self.num_classes == 1:
loss = self.get_loss(reid_feat, inputs)
......@@ -121,18 +117,18 @@ class FairMOTEmbeddingHead(nn.Layer):
loss = self.get_mc_loss(reid_feat, inputs)
return loss
else:
assert det_outs is not None and bbox_inds is not None
assert bboxes is not None and bbox_inds is not None
reid_feat = F.normalize(reid_feat)
embedding = paddle.transpose(reid_feat, [0, 2, 3, 1])
embedding = paddle.reshape(embedding, [-1, self.ch_emb])
# embedding shape: [bs * h * w, ch_emb]
if self.num_classes == 1:
pred_dets = det_outs
pred_dets = bboxes
pred_embs = paddle.gather(embedding, bbox_inds)
else:
pred_dets, pred_embs = self.process_by_class(
det_outs, embedding, bbox_inds, topk_clses)
bboxes, embedding, bbox_inds, topk_clses)
return pred_dets, pred_embs
def get_loss(self, feat, inputs):
......
......@@ -17,6 +17,7 @@ from __future__ import division
from __future__ import print_function
import math
import numpy as np
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
......@@ -115,31 +116,58 @@ class JDEEmbeddingHead(nn.Layer):
def forward(self,
identify_feats,
targets=None,
targets,
loss_confs=None,
loss_boxes=None,
test_emb=False):
bboxes=None,
boxes_idx=None,
nms_keep_idx=None):
assert self.num_classes == 1, 'JDE only support sindle class MOT.'
assert len(identify_feats) == self.anchor_levels
ide_outs = []
for feat, ide_head in zip(identify_feats, self.identify_outputs):
ide_outs.append(ide_head(feat))
if self.training:
assert targets != None
assert len(loss_confs) == len(loss_boxes) == self.anchor_levels
loss_ides = self.emb_loss(ide_outs, targets, self.emb_scale,
self.classifier)
return self.jde_loss(loss_confs, loss_boxes, loss_ides,
self.loss_params_cls, self.loss_params_reg,
self.loss_params_ide, targets)
jde_losses = self.jde_loss(
loss_confs, loss_boxes, loss_ides, self.loss_params_cls,
self.loss_params_reg, self.loss_params_ide, targets)
return jde_losses
else:
if test_emb:
assert targets != None
embs_and_gts = self.get_emb_and_gt_outs(ide_outs, targets)
return embs_and_gts
else:
emb_outs = self.get_emb_outs(ide_outs)
return emb_outs
assert bboxes is not None
assert boxes_idx is not None
assert nms_keep_idx is not None
emb_outs = self.get_emb_outs(ide_outs)
emb_valid = paddle.gather_nd(emb_outs, boxes_idx)
pred_embs = paddle.gather_nd(emb_valid, nms_keep_idx)
input_shape = targets['image'].shape[2:]
# input_shape: [h, w], before data transforms, set in model config
im_shape = targets['im_shape'][0].numpy()
# im_shape: [new_h, new_w], after data transforms
scale_factor = targets['scale_factor'][0].numpy()
bboxes[:, 2:] = self.scale_coords(bboxes[:, 2:], input_shape,
im_shape, scale_factor)
# tlwhs, scores, cls_ids
pred_dets = paddle.concat(
(bboxes[:, 2:], bboxes[:, 1:2], bboxes[:, 0:1]), axis=1)
return pred_dets, pred_embs
def scale_coords(self, coords, input_shape, im_shape, scale_factor):
ratio = scale_factor[0]
pad_w = (input_shape[1] - int(im_shape[1])) / 2
pad_h = (input_shape[0] - int(im_shape[0])) / 2
coords = paddle.cast(coords, 'float32')
coords[:, 0::2] -= pad_w
coords[:, 1::2] -= pad_h
coords[:, 0:4] /= ratio
coords[:, :4] = paddle.clip(
coords[:, :4], min=0, max=coords[:, :4].max())
return coords.round()
def get_emb_and_gt_outs(self, ide_outs, targets):
emb_and_gts = []
......
......@@ -37,7 +37,8 @@ class PCBPyramid(nn.Layer):
input_ch (int): Number of channels of the input feature.
num_stripes (int): Number of sub-parts.
used_levels (tuple): Whether the level is used, 1 means used.
num_classes (int): Number of classes for identities.
num_classes (int): Number of classes for identities, default 751 in
Market-1501 dataset.
last_conv_stride (int): Stride of the last conv.
last_conv_dilation (int): Dilation of the last conv.
num_conv_out_channels (int): Number of channels of conv feature.
......
......@@ -103,7 +103,7 @@ def run(FLAGS, cfg):
tracker.load_weights_jde(cfg.weights)
# inference
tracker.mot_predict(
tracker.mot_predict_seq(
video_file=FLAGS.video_file,
frame_rate=FLAGS.frame_rate,
image_dir=FLAGS.image_dir,
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册