English | [简体中文](README_cn.md) # DeepSORT (Deep Cosine Metric Learning for Person Re-identification) ## Table of Contents - [Introduction](#Introduction) - [Model Zoo](#Model_Zoo) - [Getting Start](#Getting_Start) - [Citations](#Citations) ## Introduction [DeepSORT](https://arxiv.org/abs/1812.00442) (Deep Cosine Metric Learning SORT) extends the original [SORT](https://arxiv.org/abs/1703.07402) (Simple Online and Realtime Tracking) algorithm, it adds a CNN model to extract features in image of human part bounded by a detector. It integrates appearance information based on a deep appearance descriptor, and assigns and updates the detected targets to the existing corresponding trajectories like ReID task. The detection bboxes result required by DeepSORT can be generated by any detection model, and then the saved detection result file can be loaded for tracking. Here we select the `PCB + Pyramid ResNet101` model provided by [PaddleClas](https://github.com/PaddlePaddle/PaddleClas) as the ReID model. ## Model Zoo ### DeepSORT Results on MOT-16 Training Set | backbone | input shape | MOTA | IDF1 | IDS | FP | FN | FPS | det result/model |ReID model| config | | :---------| :------- | :----: | :----: | :--: | :----: | :---: | :---: | :---: | :---: | :---: | | ResNet-101 | 1088x608 | 72.2 | 60.5 | 998 | 8054 | 21644 | - | [det result](https://dataset.bj.bcebos.com/mot/det_results_dir.zip) |[ReID model](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) | | ResNet-101 | 1088x608 | 68.3 | 56.5 | 1722 | 17337 | 15890 | - | [det model](https://paddledet.bj.bcebos.com/models/mot/jde_yolov3_darknet53_30e_1088x608.pdparams) |[ReID model](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) | ### DeepSORT Results on MOT-16 Test Set | backbone | input shape | MOTA | IDF1 | IDS | FP | FN | FPS | det result/model |ReID model| config | | :---------| :------- | :----: | :----: | :--: | :----: | :---: | :---: | :---: | :---: | :---: | | ResNet-101 | 1088x608 | 64.1 | 53.0 | 1024 | 12457 | 51919 | - |[det result](https://dataset.bj.bcebos.com/mot/det_results_dir.zip) |[ReID model](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) | | ResNet-101 | 1088x608 | 61.2 | 48.5 | 1799 | 25796 | 43232 | - | [det model](https://paddledet.bj.bcebos.com/models/mot/jde_yolov3_darknet53_30e_1088x608.pdparams) |[ReID model](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) | **Notes:** DeepSORT does not need to train on MOT dataset, only used for evaluation. Now it supports two evaluation methods. - 1.Load the result file and the ReID model. Before DeepSORT evaluation, you should get detection results by a detection model first, and then prepare them like this: ``` det_results_dir |——————MOT16-02.txt |——————MOT16-04.txt |——————MOT16-05.txt |——————MOT16-09.txt |——————MOT16-10.txt |——————MOT16-11.txt |——————MOT16-13.txt ``` For MOT16 dataset, you can download a detection result after matching called det_results_dir.zip provided by PaddleDetection: ``` wget https://dataset.bj.bcebos.com/mot/det_results_dir.zip ``` If you use a stronger detection model, you can get better results. Each txt is the detection result of all the pictures extracted from each video, and each line describes a bounding box with the following format: ``` [frame_id],[bb_left],[bb_top],[width],[height],[conf] ``` - `frame_id` is the frame number of the image - `bb_left` is the X coordinate of the left bound of the object box - `bb_top` is the Y coordinate of the upper bound of the object box - `width,height` is the pixel width and height - `conf` is the object score with default value `1` (the results had been filtered out according to the detection score threshold) - 2. Load the detection model and the ReID model at the same time. Here, the JDE version of YOLOv3 is selected. For more detail of configuration, see `configs/mot/deepsort/_base_/deepsort_jde_yolov3_darknet53_pcb_pyramid_r101.yml`. Load other general detection model, you can refer to `configs/mot/deepsort/_base_/deepsort_yolov3_darknet53_pcb_pyramid_r101.yml`. ## Getting Start ### 1. Evaluation ```bash # Load the result file and ReID model to get the tracking result CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml --det_results_dir {your detection results} # Load JDE YOLOv3 detector and ReID model to get the tracking results CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/deepsort/deepsort_jde_yolov3_pcb_pyramid_r101.yml # or Load genernal YOLOv3 detector and ReID model to get the tracking results CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/deepsort/deepsort_yolov3_pcb_pyramid_r101.yml --scaled=True ``` **Notes:** JDE YOLOv3 pedestrian detector is trained with the same MOT dataset as JDE and FairMOT. In addition, the biggest difference between this model and general YOLOv3 model is that JDEBBoxPostProcess post-processing, and the output coordinates are not scaled back to the original image. General YOLOv3 pedestrian detector is not trained on MOT dataset, so the performance is lower. But the output coordinates are scaled back to the original image. `--scaled` means whether the coords after detector outputs are scaled back to the original image, False in JDE YOLOv3, True in general detector. ### 2. Inference Inference a vidoe on single GPU with following command: ```bash # load JDE YOLOv3 pedestrian detector and ReID model to get tracking results CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/deepsort/deepsort_jde_yolov3_pcb_pyramid_r101.yml --video_file={your video name}.mp4 --save_videos # or load general YOLOv3 pedestrian detector and ReID model to get tracking results CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/deepsort/deepsort_yolov3_pcb_pyramid_r101.yml --video_file={your video name}.mp4 --scaled=True --save_videos ``` **Notes:** Please make sure that [ffmpeg](https://ffmpeg.org/ffmpeg.html) is installed first, on Linux(Ubuntu) platform you can directly install it by the following command:`apt-get update && apt-get install -y ffmpeg`. `--scaled` means whether the coords after detector outputs are scaled back to the original image, False in JDE YOLOv3, True in general detector. ### 3. Export model ```bash # 1.export detection model # export JDE YOLOv3 pedestrian detector CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/deepsort/jde_yolov3_darknet53_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/jde_yolov3_darknet53_30e_1088x608.pdparams # or export general YOLOv3 pedestrian detector CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/pedestrian/pedestrian_yolov3_darknet.yml -o weights=https://paddledet.bj.bcebos.com/models/pedestrian_yolov3_darknet.pdparams # 2. export ReID model CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml -o reid_weights=https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams ``` ### 4. Using exported model for python inference ```bash # using exported JDE YOLOv3 pedestrian detector python deploy/python/mot_sde_infer.py --model_dir=output_inference/jde_yolov3_darknet53_30e_1088x608/ --reid_model_dir=output_inference/deepsort_pcb_pyramid_r101/ --video_file={your video name}.mp4 --device=GPU --save_mot_txts # or using exported general YOLOv3 pedestrian detector python deploy/python/mot_sde_infer.py --model_dir=output_inference/pedestrian_yolov3_darknet/ --reid_model_dir=output_inference/deepsort_pcb_pyramid_r101/ --video_file={your video name}.mp4 --device=GPU --scaled=True --save_mot_txts ``` **Notes:** The tracking model is used to predict the video, and does not support the prediction of a single image. The visualization video of the tracking results is saved by default. You can add `--save_mot_txts`(save a txt for every video) or `--save_mot_txt_per_img`(save a txt for every image) to save the txt result file, or `--save_images` to save the visualization images. `--scaled` means whether the coords after detector outputs are scaled back to the original image, False in JDE YOLOv3, True in general detector. ## Citations ``` @inproceedings{Wojke2017simple, title={Simple Online and Realtime Tracking with a Deep Association Metric}, author={Wojke, Nicolai and Bewley, Alex and Paulus, Dietrich}, booktitle={2017 IEEE International Conference on Image Processing (ICIP)}, year={2017}, pages={3645--3649}, organization={IEEE}, doi={10.1109/ICIP.2017.8296962} } @inproceedings{Wojke2018deep, title={Deep Cosine Metric Learning for Person Re-identification}, author={Wojke, Nicolai and Bewley, Alex}, booktitle={2018 IEEE Winter Conference on Applications of Computer Vision (WACV)}, year={2018}, pages={748--756}, organization={IEEE}, doi={10.1109/WACV.2018.00087} } ```