fix mot identity gttxt doc, test=document_fix

1256d982 · nemonameless · cb568212 · 1256d982 · 1256d982 · 1256d982
4 changed file
--- a/configs/mot/README.md
+++ b/configs/mot/README.md
@@ -163,7 +163,7 @@ In the annotation text, each line is describing a bounding box and has the follo
 ```
 **Notes:**
 - `class` should be `0`. Only single-class multi-object tracking is supported now.
- `identity` is an integer from `0` to `num_identities - 1`(`num_identities` is the total number of instances of objects in the dataset), or `-1` if this box has no identity annotation.
+- `identity` is an integer from `1` to `num_identities`(`num_identities` is the total number of instances of objects in the dataset), or `-1` if this box has no identity annotation.
 - `[x_center] [y_center] [width] [height]` are the center coordinates, width and height, note that they are normalized by the width/height of the image, so they are floating point numbers ranging from 0 to 1.
 ### Dataset Directory
@@ -257,7 +257,7 @@ CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/fairmot/fairm
 ```bash
 python deploy/python/mot_infer.py --model_dir=output_inference/fairmot_dla34_30e_1088x608 --video_file={your video name}.mp4 --device=GPU --save_mot_txts
 ```
-**Notes:** 
+**Notes:**
 The tracking model is used to predict the video, and does not support the prediction of a single image. The visualization video of the tracking results is saved by default. You can add `--save_mot_txts` to save the txt result file, or `--save_images` to save the visualization images.
 ### 6. Using exported MOT and keypoint model for unite python inference
@@ -265,7 +265,7 @@ The tracking model is used to predict the video, and does not support the predic
 ```bash
 python deploy/python/mot_keypoint_unite_infer.py --mot_model_dir=output_inference/fairmot_dla34_30e_1088x608/ --keypoint_model_dir=output_inference/higherhrnet_hrnet_w32_512/ --video_file={your video name}.mp4 --device=GPU
 ```
-**Notes:** 
+**Notes:**
 Keypoint model export tutorial: `configs/keypoint/README.md`.
 ## Citations

--- a/configs/mot/README_cn.md
+++ b/configs/mot/README_cn.md
@@ -161,7 +161,7 @@ MOT17
 ```
 **注意**:
 - `class`为`0`，目前仅支持单类别多目标跟踪。
- `identity`是从`0`到`num_identifies-1`的整数(`num_identifies`是数据集中不同物体实例的总数)，如果此框没有`identity`标注，则为`-1`。
+- `identity`是从`1`到`num_identifies`的整数(`num_identifies`是数据集中不同物体实例的总数)，如果此框没有`identity`标注，则为`-1`。
 - `[x_center] [y_center] [width] [height]`是中心点坐标和宽高，注意他们的值是由图片的宽度/高度标准化的，因此它们是从0到1的浮点数。
 ### 数据集目录
@@ -255,7 +255,7 @@ CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/fairmot/fairm
 ```bash
 python deploy/python/mot_infer.py --model_dir=output_inference/fairmot_dla34_30e_1088x608 --video_file={your video name}.mp4 --device=GPU --save_mot_txts
 ```
-**注意:** 
+**注意:**
 跟踪模型是对视频进行预测，不支持单张图的预测，默认保存跟踪结果可视化后的视频，可添加`--save_mot_txts`表示保存跟踪结果的txt文件，或`--save_images`表示保存跟踪结果可视化图片。
 ### 6. 用导出的跟踪和关键点模型Python联合预测
@@ -263,7 +263,7 @@ python deploy/python/mot_infer.py --model_dir=output_inference/fairmot_dla34_30e
 ```bash
 python deploy/python/mot_keypoint_unite_infer.py --mot_model_dir=output_inference/fairmot_dla34_30e_1088x608/ --keypoint_model_dir=output_inference/higherhrnet_hrnet_w32_512/ --video_file={your video name}.mp4 --device=GPU
 ```
-**注意:** 
+**注意:**
 关键点模型导出教程请参考`configs/keypoint/README.md`。
 ## 引用

--- a/docs/tutorials/PrepareMOTDataSet.md
+++ b/docs/tutorials/PrepareMOTDataSet.md
@@ -39,7 +39,7 @@ In the annotation text, each line is describing a bounding box and has the follo
 ```
 **Notes:**
 - `class` should be `0`. Only single-class multi-object tracking is supported now.
- `identity` is an integer from `0` to `num_identities - 1`(`num_identities` is the total number of instances of objects in the dataset), or `-1` if this box has no identity annotation.
+- `identity` is an integer from `1` to `num_identities`(`num_identities` is the total number of instances of objects in the dataset), or `-1` if this box has no identity annotation.
 - `[x_center] [y_center] [width] [height]` are the center coordinates, width and height, note that they are normalized by the width/height of the image, so they are floating point numbers ranging from 0 to 1.
@@ -124,16 +124,18 @@ imExt=.jpg
 Each line in `gt.txt`  describes a bounding box, with the format as follows:
 ```
-[frame_id],[identity],[bb_left],[bb_top],[width],[height],[x],[y],[z]
+[frame_id],[identity],[bb_left],[bb_top],[width],[height],[score],[label],[vis_ratio]
 ```
 **Notes:**:
 - `frame_id` is the current frame id.
- `identity` is an integer from `0` to `num_identities - 1`(`num_identities` is the total number of instances of objects in the dataset), or `-1` if this box has no identity annotation.
+- `identity` is an integer from `1` to `num_identities`(`num_identities` is the total number of instances of objects in the dataset), or `-1` if this box has no identity annotation.
 - `bb_left` is the x coordinate of the left boundary of the target box
 - `bb_top` is the Y coordinate of the upper boundary of the target box
 - `width, height` are the pixel width and height
- `x,y,z` are only used in 3D, default to `-1` in 2D.
+- `score` acts as a flag whether the entry is to be considered. A value of 0 means that this particular instance is ignored in the evaluation, while a value of 1 is used to mark it as active. `1` by default.
+- `label` is the type of object annotated, use `1` as default because only single-class multi-object tracking is supported now. There are other classes of object in MOT-16, but they are treated as ignore.
+- `vis_ratio` is the visibility ratio of each bounding box. This can be due to occlusion by another
+static or moving object, or due to image border cropping. `1` by default.
 #### labels_with_ids
 Annotations of these datasets are provided in a unified format. Every image has a corresponding annotation text. Given an image path, the annotation text path can be generated by replacing the string `images` with `labels_with_ids` and replacing `.jpg` with `.txt`.
@@ -144,7 +146,7 @@ In the annotation text, each line is describing a bounding box and has the follo
 ```
 **Notes:**
 - `class` should be `0`. Only single-class multi-object tracking is supported now.
- `identity` is an integer from `0` to `num_identities - 1`(`num_identities` is the total number of instances of objects in the dataset), or `-1` if this box has no identity annotation.
+- `identity` is an integer from `1` to `num_identities`(`num_identities` is the total number of instances of objects in the dataset), or `-1` if this box has no identity annotation.
 - `[x_center] [y_center] [width] [height]` are the center coordinates, width and height, note that they are normalized by the width/height of the image, so they are floating point numbers ranging from 0 to 1.
 Generate the corresponding `labels_with_ids` with following command:

--- a/docs/tutorials/PrepareMOTDataSet_cn.md
+++ b/docs/tutorials/PrepareMOTDataSet_cn.md
@@ -37,7 +37,7 @@ MOT17
 ```
 **注意**:
 - `class`为`0`，目前仅支持单类别多目标跟踪。
- `identity`是从`0`到`num_identifies-1`的整数(`num_identifies`是数据集中不同物体实例的总数)，如果此框没有`identity`标注，则为`-1`。
+- `identity`是从`1`到`num_identifies`的整数(`num_identifies`是数据集中不同物体实例的总数)，如果此框没有`identity`标注，则为`-1`。
 - `[x_center] [y_center] [width] [height]`是中心点坐标和宽高，注意它们的值是由图片的宽度/高度标准化的，因此它们是从0到1的浮点数。
 ### 数据集目录
@@ -206,15 +206,17 @@ imExt=.jpg
 `gt.txt`里是当前视频中所有图片的原始标注文件，每行都描述一个边界框，格式如下：
 ```
-[frame_id],[identity],[bb_left],[bb_top],[width],[height],[x],[y],[z]
+[frame_id],[identity],[bb_left],[bb_top],[width],[height],[score],[label],[vis_ratio]
 ```
 **注意**:
 - `frame_id`为当前图片帧序号
- `identity`是从`0`到`num_identifies-1`的整数(`num_identifies`是数据集中不同物体实例的总数)，如果此框没有`identity`标注，则为`-1`
+- `identity`是从`1`到`num_identifies`的整数(`num_identifies`是当前视频中不同物体实例的总数)，如果此框没有`identity`标注，则为`-1`
 - `bb_left`是目标框的左边界的x坐标
 - `bb_top`是目标框的上边界的y坐标
 - `width，height`是真实的像素宽高
- `x,y,z`是3D中用到的，在2D中默认为`-1`
+- `score`是当前目标是否进入考虑范围内的标志(值为0表示此目标在计算中被忽略，而值为1则用于将其标记为活动实例)，默认为`1`
+- `label`是当前目标的种类标签，由于目前仅支持单类别跟踪，默认为`1`，MOT-16数据集中会有其他类别标签，但都是当作ignore类别计算
+- `vis_ratio`是当前目标被其他目标包含或覆挡后的可见率，是从0到1的浮点数，默认为`1`
 #### labels_with_ids文件夹
@@ -224,7 +226,7 @@ imExt=.jpg
 ```
 **注意**:
 - `class`为`0`，目前仅支持单类别多目标跟踪。
- `identity`是从`0`到`num_identifies-1`的整数(`num_identifies`是数据集中不同物体实例的总数)，如果此框没有`identity`标注，则为`-1`。
+- `identity`是从`1`到`num_identifies`的整数(`num_identifies`是数据集中不同物体实例的总数)，如果此框没有`identity`标注，则为`-1`。
 - `[x_center] [y_center] [width] [height]`是中心点坐标和宽高，注意是由图片的宽度/高度标准化的，因此它们是从0到1的浮点数。
 可采用如下脚本生成相应的`labels_with_ids`: