@@ -26,10 +24,15 @@ Paddledetection implements three MOT algorithms of these two series.
-[FairMOT](https://arxiv.org/abs/2004.01888) is based on an Anchor Free detector Centernet, which overcomes the problem of anchor and feature misalignment in anchor based detection framework. The fusion of deep and shallow features enables the detection and ReID tasks to obtain the required features respectively. It also uses low dimensional ReID features. FairMOT is a simple baseline composed of two homogeneous branches propose to predict the pixel level target score and ReID features. It achieves the fairness between the two tasks and obtains a higher level of real-time MOT performance.
[PP-Tracking](../../deploy/pptracking/README.md) is the first open source real-time tracking system based on PaddlePaddle deep learning framework. Aiming at the difficulties and pain points of the actual business, PP-Tracking has built-in capabilities and industrial applications such as pedestrian and vehicle tracking, cross-camera tracking, multi-class tracking, small target tracking and traffic counting, and provides a visual development interface. The model integrates multi-object tracking, object detection and ReID lightweight algorithm to further improve the deployment performance of PP-Tracking on the server. It also supports Python and C + + deployment and adapts to Linux, NVIDIA and Jetson multi platform environment.。
DeepSORT does not need to train on MOT dataset, only used for evaluation. Now it supports two evaluation methods.
- 1.Load the result file and the ReID model. Before DeepSORT evaluation, you should get detection results by a detection model first, and then prepare them like this:
```
det_results_dir
|——————MOT16-02.txt
|——————MOT16-04.txt
|——————MOT16-05.txt
|——————MOT16-09.txt
|——————MOT16-10.txt
|——————MOT16-11.txt
|——————MOT16-13.txt
```
For MOT16 dataset, you can download a detection result after matching called det_results_dir.zip provided by PaddleDetection:
If you use a stronger detection model, you can get better results. Each txt is the detection result of all the pictures extracted from each video, and each line describes a bounding box with the following format:
```
[frame_id],[x0],[y0],[w],[h],[score],[class_id]
```
-`frame_id` is the frame number of the image.
-`x0,y0` is the X and Y coordinates of the left bound of the object box.
-`w,h` is the pixel width and height of the object box.
-`score` is the confidence score of the object box.
-`class_id` is the category of the object box, set `0` if only has one category.
- 2.Load the detection model and the ReID model at the same time. Here, the JDE version of YOLOv3 is selected. For more detail of configuration, see `configs/mot/deepsort/deepsort_jde_yolov3_pcb_pyramid.yml` and `configs/mot/deepsort/deepsort_ppyolov2_pplcnet.yml` for other general detectors.
FairMOT enhance DLA-34 used 8 GPUs for training and mini-batch size as 16 on each GPU,and trained for 60 epoches. The crowdhuman dataset is added to the train-set during training.
FairMOT HRNetV2-W18 used 8 GPUs for training and mini-batch size as 4 on each GPU, and trained for 30 epoches. Only ImageNet pre-train model is used, and the optimizer adopts Momentum. The crowdhuman dataset is added to the train-set during training.
The default evaluation dataset is MOT-16 Train Set. If you want to change the evaluation dataset, please refer to the following code and modify `configs/datasets/mot.yml`, modify `data_root`:
```
EvalMOTDataset:
!MOTImageFolder
dataset_dir: dataset/mot
data_root: MOT17/images/train
keep_ori_im: False # set True if save visualization images or video
```
### 3. Inference
Inference a vidoe on single GPU with following command:
Please make sure that [ffmpeg](https://ffmpeg.org/ffmpeg.html) is installed first, on Linux(Ubuntu) platform you can directly install it by the following command:`apt-get update && apt-get install -y ffmpeg`. `--frame_rate` means the frame rate of the video and the frames extracted per second. It can be set by yourself, default value is -1 indicating the video frame rate read by OpenCV.
python deploy/python/mot_jde_infer.py --model_dir=output_inference/fairmot_dla34_30e_1088x608 --video_file={your video name}.mp4 --device=GPU --save_mot_txts
```
**Notes:**
The tracking model is used to predict the video, and does not support the prediction of a single image. The visualization video of the tracking results is saved by default. You can add `--save_mot_txts` to save the txt result file, or `--save_images` to save the visualization images.
### 6. Using exported MOT and keypoint model for unite python inference
```bash
python deploy/python/mot_keypoint_unite_infer.py --mot_model_dir=output_inference/fairmot_dla34_30e_1088x608/ --keypoint_model_dir=output_inference/higherhrnet_hrnet_w32_512/ --video_file={your video name}.mp4 --device=GPU
```
**Notes:**
Keypoint model export tutorial: `configs/keypoint/README.md`.
The tracking model is used to predict the video, and does not support the prediction of a single image. The visualization video of the tracking results is saved by default. You can add `--save_mot_txts` to save the txt result file, or `--save_images` to save the visualization images.
### 6. Using exported MOT and keypoint model for unite python inference
```bash
python deploy/python/mot_keypoint_unite_infer.py --mot_model_dir=output_inference/fairmot_dla34_30e_1088x608/ --keypoint_model_dir=output_inference/higherhrnet_hrnet_w32_512/ --video_file={your video name}.mp4 --device=GPU
```
**Notes:**
Keypoint model export tutorial: `configs/keypoint/README.md`.