[MOT] fix JDE update modelzoo (#3058)

* fix jde negative coords * update jde model zoo * describe format for some txt, test=document_fix

[MOT] fix JDE update modelzoo (#3058)
* fix jde negative coords * update jde model zoo * describe format for some txt, test=document_fix
00c02011 · George Ni · GitHub · e76e1a8a · 00c02011 · 00c02011
9 changed file
--- a/configs/mot/README.md
+++ b/configs/mot/README.md
@@ -28,9 +28,9 @@ PaddleDetection implements three multi-object tracking methods.
 | backbone           | input shape | MOTA | IDF1  |  IDS  |   FP  |  FN  |  FPS  | download | config |
 | :----------------- | :------- | :----: | :----: | :---: | :----: | :---: | :---: | :---: | :---: |
-| DarkNet53          | 1088x608 |  73.2  |  69.4  | 1320  |  6613  | 21629 |   -   |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_1088x608.yml) |
+| DarkNet53          | 1088x608 |  73.2  |  69.3  | 1351  |  6591  | 21625 |   -   |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_1088x608.yml) |
-| DarkNet53          | 864x480 |  70.1  |  65.4  | 1341  |  6454  | 25208 |   -   |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_864x480.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_864x480.yml) |
+| DarkNet53          | 864x480 |  70.1  |  65.2  | 1328  |  6441  | 25187 |   -   |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_864x480.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_864x480.yml) |
-| DarkNet53          | 576x320 |  63.1  |  64.6  | 1357  |  7083  | 32312 |   -   |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_576x320.yml) |
+| DarkNet53          | 576x320 |  63.2  |  64.5  | 1308  |  7011  | 32252 |   -   |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_576x320.yml) |
 **Notes:**
 JDE used 8 GPUs for training and mini-batch size as 4 on each GPU, and trained for 30 epoches.
@@ -117,7 +117,7 @@ In the annotation text, each line is describing a bounding box and has the follo
 **Notes:**
 - `class` should be `0`. Only single-class multi-object tracking is supported now.
 - `identity` is an integer from `0` to `num_identities - 1`(`num_identities` is the total number of instances of objects in the dataset), or `-1` if this box has no identity annotation.
- `[x_center] [y_center] [width] [height]` are normalized by the width/height of the image, so they are floating point numbers ranging from 0 to 1.
+- `[x_center] [y_center] [width] [height]` are the center coordinates, width and height, note that they are normalized by the width/height of the image, so they are floating point numbers ranging from 0 to 1.
 ### Dataset Directory

--- a/configs/mot/README_cn.md
+++ b/configs/mot/README_cn.md
@@ -29,9 +29,9 @@ PaddleDetection实现了3种多目标跟踪方法。
 | 骨干网络            |  输入尺寸  |  MOTA  |  IDF1 |  IDS  |  FP  |  FN  |  FPS  |  检测模型  | 配置文件 |
 | :----------------- | :------- | :----: | :----: | :---: | :----: | :---: | :---: | :---: | :---: |
-| DarkNet53          | 1088x608 |  73.2  |  69.4  | 1320  |  6613  | 21629 |   -   |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_1088x608.yml) |
+| DarkNet53          | 1088x608 |  73.2  |  69.3  | 1351  |  6591  | 21625 |   -   |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_1088x608.yml) |
-| DarkNet53          | 864x480 |  70.1  |  65.4  | 1341  |  6454  | 25208 |   -   |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_864x480.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_864x480.yml) |
+| DarkNet53          | 864x480 |  70.1  |  65.2  | 1328  |  6441  | 25187 |   -   |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_864x480.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_864x480.yml) |
-| DarkNet53          | 576x320 |  63.1  |  64.6  | 1357  |  7083  | 32312 |   -   |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_576x320.yml) |
+| DarkNet53          | 576x320 |  63.2  |  64.5  | 1308  |  7011  | 32252 |   -   |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_576x320.yml) |
 **注意:**
 JDE使用8个GPU进行训练，每个GPU上batch size为4，训练30个epoch。
@@ -111,12 +111,12 @@ MOT17
 ```
 所有数据集的标注是以统一数据格式提供的。各个数据集中每张图片都有相应的标注文本。给定一个图像路径，可以通过将字符串`images`替换为`labels_with_ids`并将`.jpg`替换为`.txt`来生成标注文本路径。在标注文本中，每行都描述一个边界框，格式如下：
 ```
-[class][identity][x_center][y_center][width][height]
+[class] [identity] [x_center] [y_center] [width] [height]
 ```
 **注意**:
 - `class`为`0`，目前仅支持单类别多目标跟踪。
 - `identity`是从`0`到`num_identifies-1`的整数(`num_identifies`是数据集中不同物体实例的总数)，如果此框没有`identity`标注，则为`-1`。
- `[x_center][y_center][width][height]`的值是由图片的宽度/高度标准化的，因此它们是从0到1的浮点数。
+- `[x_center] [y_center] [width] [height]`是中心点坐标和宽高，注意他们的值是由图片的宽度/高度标准化的，因此它们是从0到1的浮点数。
 ### 数据集目录

--- a/configs/mot/deepsort/README.md
+++ b/configs/mot/deepsort/README.md
@@ -33,10 +33,16 @@ det_results_dir
 ```
 Each txt is the detection result of all the pictures extracted from each video, and each line describes a bounding box with the following format:
 ```
-[frame_id][identity][bb_left][bb_top][width][height][conf][x][y][z]
+[frame_id],[identity],[bb_left],[bb_top],[width],[height],[conf],[x],[y],[z]
 ```
 **Notes:**
-`frame_id` is the frame number of the image, `identity` is the object id using default value `-1`, `bb_left` is the X coordinate of the left bound of the object box, `bb_top` is the Y coordinate of the upper boundary of the object box, `width, height` is the pixel width and height, `conf` is the object score with default value `1` (the results had been filtered out according to the detection score threshold), `x,y,z` are used in 3D, default to `-1` in 2D.
+- `frame_id` is the frame number of the image
+- `identity` is the object id using default value `-1`
+- `bb_left` is the X coordinate of the left bound of the object box
+- `bb_top` is the Y coordinate of the upper bound of the object box
+- `width,height` is the pixel width and height
+- `conf` is the object score with default value `1` (the results had been filtered out according to the detection score threshold)
+- `x,y,z` are used in 3D, default to `-1` in 2D.
 ## Getting Start
@@ -44,10 +50,10 @@ Each txt is the detection result of all the pictures extracted from each video,
 ```bash
 # use weights released in PaddleDetection model zoo
-CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/jde/jde_darknet53_30e_1088x608_track.yml -o metric=MOT weights=https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams
+CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/jde/jde_darknet53_30e_1088x608_track.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams
 # use saved checkpoint after training
-CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/jde/jde_darknet53_30e_1088x608_track.yml -o metric=MOT weights=output/jde_darknet53_30e_1088x608/model_final
+CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/jde/jde_darknet53_30e_1088x608_track.yml -o weights=output/jde_darknet53_30e_1088x608/model_final
 ```
 ### 2. Tracking

--- a/configs/mot/deepsort/README_cn.md
+++ b/configs/mot/deepsort/README_cn.md
@@ -33,9 +33,16 @@ det_results_dir
 ```
 其中每个txt是每个视频中所有图片的检测结果，每行都描述一个边界框，格式如下：
 ```
-[frame_id][identity][bb_left][bb_top][width][height][conf][x][y][z]
+[frame_id],[identity],[bb_left],[bb_top],[width],[height],[conf],[x],[y],[z]
 ```
-**注意**: `frame_id`是图片帧的序号，`identity`是目标id采用默认值为`-1`，`bb_left`是目标框的左边界的x坐标，`bb_top`是目标框的上边界的y坐标，`width，height`是真实的像素宽高，`conf`是目标得分设置为`1`(已经按检测的得分阈值筛选出的检测结果)，`x,y,z`是3D中用到的，在2D中默认为`-1`即可。
+**注意**:
+- `frame_id`是图片帧的序号
+- `identity`是目标id采用默认值为`-1`
+- `bb_left`是目标框的左边界的x坐标
+- `bb_top`是目标框的上边界的y坐标
+- `width,height`是真实的像素宽高
+- `conf`是目标得分设置为`1`(已经按检测的得分阈值筛选出的检测结果)
+- `x,y,z`是3D中用到的，在2D中默认为`-1`
 ## 快速开始
@@ -43,10 +50,10 @@ det_results_dir
 ```bash
 # 使用PaddleDetection发布的权重
-CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/jde/jde_darknet53_30e_1088x608_track.yml -o metric=MOT weights=https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams
+CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/jde/jde_darknet53_30e_1088x608_track.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams
 # 使用训练保存的checkpoint
-CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/jde/jde_darknet53_30e_1088x608_track.yml -o metric=MOT weights=output/jde_darknet53_30e_1088x608/model_final
+CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/jde/jde_darknet53_30e_1088x608_track.yml -o weights=output/jde_darknet53_30e_1088x608/model_final
 ```
 ### 2. 跟踪预测

--- a/configs/mot/jde/README.md
+++ b/configs/mot/jde/README.md
@@ -21,9 +21,9 @@ English | [简体中文](README_cn.md)
 | backbone           | input shape | MOTA | IDF1  |  IDS  |   FP  |  FN  |  FPS  | download | config |
 | :----------------- | :------- | :----: | :----: | :---: | :----: | :---: | :---: | :---: | :---: |
-| DarkNet53          | 1088x608 |  73.2  |  69.4  | 1320  |  6613  | 21629 |   -   |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_1088x608.yml) |
+| DarkNet53          | 1088x608 |  73.2  |  69.3  | 1351  |  6591  | 21625 |   -   |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_1088x608.yml) |
-| DarkNet53          | 864x480 |  70.1  |  65.4  | 1341  |  6454  | 25208 |   -   |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_864x480.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_864x480.yml) |
+| DarkNet53          | 864x480 |  70.1  |  65.2  | 1328  |  6441  | 25187 |   -   |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_864x480.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_864x480.yml) |
-| DarkNet53          | 576x320 |  63.1  |  64.6  | 1357  |  7083  | 32312 |   -   |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_576x320.yml) |
+| DarkNet53          | 576x320 |  63.2  |  64.5  | 1308  |  7011  | 32252 |   -   |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_576x320.yml) |
 **Notes:**

--- a/configs/mot/jde/README_cn.md
+++ b/configs/mot/jde/README_cn.md
@@ -21,9 +21,9 @@
 | 骨干网络            |  输入尺寸  |  MOTA  |  IDF1 |  IDS  |  FP  |  FN  |  FPS  |  检测模型  | 配置文件 |
 | :----------------- | :------- | :----: | :----: | :---: | :----: | :---: | :---: | :---: | :---: |
-| DarkNet53          | 1088x608 |  73.2  |  69.4  | 1320  |  6613  | 21629 |   -   |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_1088x608.yml) |
+| DarkNet53          | 1088x608 |  73.2  |  69.3  | 1351  |  6591  | 21625 |   -   |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_1088x608.yml) |
-| DarkNet53          | 864x480 |  70.1  |  65.4  | 1341  |  6454  | 25208 |   -   |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_864x480.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_864x480.yml) |
+| DarkNet53          | 864x480 |  70.1  |  65.2  | 1328  |  6441  | 25187 |   -   |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_864x480.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_864x480.yml) |
-| DarkNet53          | 576x320 |  63.1  |  64.6  | 1357  |  7083  | 32312 |   -   |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_576x320.yml) |
+| DarkNet53          | 576x320 |  63.2  |  64.5  | 1308  |  7011  | 32252 |   -   |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_576x320.yml) |
 **注意:**
 JDE使用8个GPU进行训练，每个GPU上batch size为4，训练了30个epoch。

--- a/docs/tutorials/PrepareMOTDataSet.md
+++ b/docs/tutorials/PrepareMOTDataSet.md
@@ -10,7 +10,7 @@ English | [简体中文](PrepareMOTDataSet_cn.md)
 - [Citations](#Citations)
 ### MOT Dataset
-PaddleDetection uses the same training data as [JDE](https://github.com/Zhongdao/Towards-Realtime-MOT) and [FairMOT](https://github.com/ifzhang/FairMOT). Please download and prepare all the training data including Caltech Pedestrian, CityPersons, CUHK-SYSU, PRW, ETHZ, MOT17 and MOT16. MOT15 and MOT20 can also be downloaded from the official webpage of MOT challenge. If you want to use these datasets, please **follow their licenses**.
+PaddleDetection uses the same training data as [JDE](https://github.com/Zhongdao/Towards-Realtime-MOT) and [FairMOT](https://github.com/ifzhang/FairMOT). Please download and prepare all the training data including **Caltech Pedestrian, CityPersons, CUHK-SYSU, PRW, ETHZ, MOT17 and MOT16**. **MOT15 and MOT20** can also be downloaded from the official webpage of MOT challenge. If you want to use these datasets, please **follow their licenses**.
 ### Data Format
 These several relevant datasets have the following structure:
@@ -37,11 +37,11 @@ In the annotation text, each line is describing a bounding box and has the follo
 ```
 [class] [identity] [x_center] [y_center] [width] [height]
 ```
-The field `[class]` should be `0`. Only single-class multi-object tracking is supported in this version.
+**Notes:**
+- `class` should be `0`. Only single-class multi-object tracking is supported now.
-The field `[identity]` is an integer from `0` to `num_identities - 1`, or `-1` if this box has no identity annotation.
+- `identity` is an integer from `0` to `num_identities - 1`(`num_identities` is the total number of instances of objects in the dataset), or `-1` if this box has no identity annotation.
+- `[x_center] [y_center] [width] [height]` are the center coordinates, width and height, note that they are normalized by the width/height of the image, so they are floating point numbers ranging from 0 to 1.
-***Note** that the values of `[x_center] [y_center] [width] [height]` are normalized by the width/height of the image, so they are floating point numbers ranging from 0 to 1.
 ### Dataset Directory
@@ -81,7 +81,7 @@ dataset/mot
 ### Custom Dataset Preparation
-In order to standardize training and evaluation, custom data needs to be converted into the same directory and format as MOT-17 dataset:
+In order to standardize training and evaluation, custom data needs to be converted into the same directory and format as MOT-16 dataset:
 ```
 custom_data
   |——————images
@@ -124,14 +124,15 @@ imExt=.jpg
 Each line in `gt.txt`  describes a bounding box, with the format as follows:
 ```
-[frame_id][identity][bb_left][bb_top][width][height][x][y][z]
+[frame_id],[identity],[bb_left],[bb_top],[width],[height],[x],[y],[z]
 ```
 **Notes:**:
 - `frame_id` is the current frame id.
 - `identity` is an integer from `0` to `num_identities - 1`(`num_identities` is the total number of instances of objects in the dataset), or `-1` if this box has no identity annotation.
 - `bb_left` is the x coordinate of the left boundary of the target box
 - `bb_top` is the Y coordinate of the upper boundary of the target box
- `width, height` are the pixel width and height, and `x,y,z` are only used in 3D.
+- `width, height` are the pixel width and height
+- `x,y,z` are only used in 3D, default to `-1` in 2D.
 #### labels_with_ids
@@ -144,7 +145,7 @@ In the annotation text, each line is describing a bounding box and has the follo
 **Notes:**
 - `class` should be `0`. Only single-class multi-object tracking is supported now.
 - `identity` is an integer from `0` to `num_identities - 1`(`num_identities` is the total number of instances of objects in the dataset), or `-1` if this box has no identity annotation.
- `[x_center] [y_center] [width] [height]` are normalized by the width/height of the image, so they are floating point numbers ranging from 0 to 1.
+- `[x_center] [y_center] [width] [height]` are the center coordinates, width and height, note that they are normalized by the width/height of the image, so they are floating point numbers ranging from 0 to 1.
 Generate the corresponding `labels_with_ids` with following command:
 ```

--- a/docs/tutorials/PrepareMOTDataSet_cn.md
+++ b/docs/tutorials/PrepareMOTDataSet_cn.md
@@ -10,7 +10,7 @@
 - [引用](#引用)
 ### MOT数据集
-PaddleDetection使用和[JDE](https://github.com/Zhongdao/Towards-Realtime-MOT) 还有[FairMOT](https://github.com/ifzhang/FairMOT)相同的数据集，请先下载并准备好所有的数据集包括Caltech Pedestrian, CityPersons, CUHK-SYSU, PRW, ETHZ, MOT17和MOT16。此外还可以下载MOT15和MOT20数据集，如果您想使用这些数据集，请**遵循他们的License**。
+PaddleDetection使用和[JDE](https://github.com/Zhongdao/Towards-Realtime-MOT) 还有[FairMOT](https://github.com/ifzhang/FairMOT)相同的数据集，请先下载并准备好所有的数据集包括**Caltech Pedestrian, CityPersons, CUHK-SYSU, PRW, ETHZ, MOT17和MOT16**。此外还可以下载**MOT15和MOT20**数据集，如果您想使用这些数据集，请**遵循他们的License**。
 ### 数据格式
 这几个相关数据集都遵循以下结构：
@@ -33,12 +33,12 @@ MOT17
 ```
 所有数据集的标注是以统一数据格式提供的。各个数据集中每张图片都有相应的标注文本。给定一个图像路径，可以通过将字符串`images`替换为`labels_with_ids`并将`.jpg`替换为`.txt`来生成标注文本路径。在标注文本中，每行都描述一个边界框，格式如下：
 ```
-[class][identity][x_center][y_center][width][height]
+[class] [identity] [x_center] [y_center] [width] [height]
 ```
 **注意**:
 - `class`为`0`，目前仅支持单类别多目标跟踪。
 - `identity`是从`0`到`num_identifies-1`的整数(`num_identifies`是数据集中不同物体实例的总数)，如果此框没有`identity`标注，则为`-1`。
- `[x_center][y_center][width][height]`的值是由图片的宽度/高度标准化的，因此它们是从0到1的浮点数。
+- `[x_center] [y_center] [width] [height]`是中心点坐标和宽高，注意它们的值是由图片的宽度/高度标准化的，因此它们是从0到1的浮点数。
 ### 数据集目录
@@ -163,7 +163,7 @@ Google Drive:
 ### 用户数据准备
-为了规范地进行训练和评测，用户数据需要转成和MOT-17数据集相同的目录和格式：
+为了规范地进行训练和评测，用户数据需要转成和MOT-16数据集相同的目录和格式：
 ```
 custom_data
   |——————images
@@ -206,7 +206,7 @@ imExt=.jpg
 `gt.txt`里是当前视频中所有图片的原始标注文件，每行都描述一个边界框，格式如下：
 ```
-[frame_id][identity][bb_left][bb_top][width][height][x][y][z]
+[frame_id],[identity],[bb_left],[bb_top],[width],[height],[x],[y],[z]
 ```
 **注意**:
 - `frame_id`为当前图片帧序号
@@ -220,12 +220,12 @@ imExt=.jpg
 #### labels_with_ids文件夹
 所有数据集的标注是以统一数据格式提供的。各个数据集中每张图片都有相应的标注文本。给定一个图像路径，可以通过将字符串`images`替换为`labels_with_ids`并将`.jpg`替换为`.txt`来生成标注文本路径。在标注文本中，每行都描述一个边界框，格式如下：
 ```
-[class][identity][x_center][y_center][width][height]
+[class] [identity] [x_center] [y_center] [width] [height]
 ```
 **注意**:
 - `class`为`0`，目前仅支持单类别多目标跟踪。
 - `identity`是从`0`到`num_identifies-1`的整数(`num_identifies`是数据集中不同物体实例的总数)，如果此框没有`identity`标注，则为`-1`。
- `[x_center][y_center][width][height]`的值是由图片的宽度/高度标准化的，因此它们是从0到1的浮点数。
+- `[x_center] [y_center] [width] [height]`是中心点坐标和宽高，注意是由图片的宽度/高度标准化的，因此它们是从0到1的浮点数。
 可采用如下脚本生成相应的`labels_with_ids`:
 ```

--- a/ppdet/modeling/mot/utils.py
+++ b/ppdet/modeling/mot/utils.py
@@ -124,8 +124,8 @@ def scale_coords(coords, input_shape, im_shape, scale_factor):
    ratio = scale_factor.numpy()[0][0]
    img0_shape = [int(im_shape[0] / ratio), int(im_shape[1] / ratio)]
-    pad_w = (input_shape[1] - img0_shape[1] * ratio) / 2
+    pad_w = (input_shape[1] - round(img0_shape[1] * ratio)) / 2
-    pad_h = (input_shape[0] - img0_shape[0] * ratio) / 2
+    pad_h = (input_shape[0] - round(img0_shape[0] * ratio)) / 2
    coords[:, 0::2] -= pad_w
    coords[:, 1::2] -= pad_h
    coords[:, 0:4] /= paddle.to_tensor(ratio)