UpdateAnnotationPart

2590da33 · LaraStuStu · c77b7e60 · 2590da33 · 2590da33 · 2590da33
10 changed file
--- a/.DS_Store
+++ b/.DS_Store
--- a/DataAnnotation/.DS_Store
+++ b/DataAnnotation/.DS_Store
--- a/DataAnnotation/AnnotationNote/1_[图像分类]任务数据标注.md
+++ b/DataAnnotation/AnnotationNote/1_[图像分类]任务数据标注.md
 ## 1. 准备「图像分类」任务数据

-### 1.1 图像分类的数据结构
+### 图像分类的数据结构

-图像分类的数据集存放结构如下：
 ```
 data/mydataset/
-|-- train_img
-    |-- train_0001.jpg
-    |-- train_0002.jpg
+|-- class 1
+    |-- 0001.jpg
+    |-- 0002.jpg
    |-- ...
-|-- val_img
-    |-- val_0001.jpg
-    |-- val_0002.jpg
+|-- class 2
+    |-- 0001.jpg
+    |-- 0002.jpg
    |-- ...
-|-- test_img
-    |-- test_0001.jpg
-    |-- test_0002.jpg
-    |-- ...
-|-- train_list.txt
-|-- val_list.txt
-|-- test_list.txt
-```
-其中.txt文件用于存放与相应图像对应的标注信息，train_img、val_img、test_img用于存放训练集、验证集、测试集的图像，而val_img是非必须的；
-同时.txt文件名字固定，其余文件名字可自己根据实际情况设定。
-
-### 1.2 构造标注文件
-
-标注信息都存放在.txt中，其中每行代表一张图像的信息。若是train_list.txt和val_list.txt，则其每一行由图像文件的相对路径（相对于图像存放文件夹的父目录，例如***1.1***中的data/mydataset/）和该张图像的类别id组成，中间由空格分隔开；若是test_list.txt，则其每一行是图像文件的相对路径。如下所示，是train_list.txt中标注信息的例子：
-```
-train_img/train_0001.jpg 0
-train_img/train_0002.jpg 1
-...
 ```
-【注意】图像相对路径的命名不可以出现空格和中文字符，应由英文字符和下划线等字符组成。类别id则是由数字组成。
+class 1 及 class 2 文件夹需要命名为需要分类的类名，输入限定为英文字符，不可包含空格、中文或特殊字符。
--- a/DataAnnotation/AnnotationNote/2_[实例分割]数据标注.md
+++ b/DataAnnotation/AnnotationNote/2_[实例分割]数据标注.md
@@ -2,9 +2,8 @@

 ### 2.1 准备工作     

-* **2.1.1** 对收集的图像划分为训练、验证（非必须）、测试三个部分的数据集，分别存放于对应的文件夹中。     
-* **2.1.2** 创建与图像文件夹相对应的文件夹，用于存储标注的json文件。
-* **2.1.3** 点击”Open Dir“按钮，选择需要标注的图像所在的文件夹打开，则”File List“对话框中会显示所有图像所对应的绝对路径。      
+* **2.1.1** 创建与图像文件夹相对应的文件夹，用于存储标注的json文件。
+* **2.1.2**点击”Open Dir“按钮，选择需要标注的图像所在的文件夹打开，则”File List“对话框中会显示所有图像所对应的绝对路径。      

 ### 2.2 目标检测标注    

@@ -35,99 +34,29 @@
 <div align=center><img width="800" height="450" src="./pics/detection4.png"/></div>   


-## 2.4 对LabelMe标注数据的转换
-目标检测中常用的数据集是COCO数据集，由于LabelMe标注的结果是.json文件与图像文件一一对应，而COCO数据集的标注文件则是一个数据集对应一个，所以需要将LabelMe数据集转换为COCO数据集格式。      
-
-COCO的数据目录结构如下：
+## 2.4 目标检测任务数据目录结构
 ```
 data/mydataset/
+|-- JPEGImages
+    |-- 1.jpg 
+    |-- 2.jpg
 |-- annotations
-    |-- instance_train.json 
-    |-- instance_test.json 
-    |-- instance_val.json 
-|-- train
-|-- test
-|-- val
+    |-- 1.xml
+    |-- 2.xml
 ```
-其中，`annotations`用于存放.json文件，`train\test\val`用于存放图像文件。           
-
-针对训练、验证（非必须）、测试三个部分的数据集，分别在命令行中执行下述命令进行转换：
-```cmd
-# 进入Anaconda环境后，安装下述python包
-# 安装numpy
-pip install numpy
-# 安装PIL
-pip install Pillow
-# 转换
-cd ./DataAnnotation
-python ./labelme2coco.py \
-       --image_input_dir ~/Users/image/ \
-       --json_input_dir ~/Users/json/ \
-# --image_input_dir：3.1.1的步骤一中创建的图像文件夹路径
-# --json_input_dir：image_input_dir所对应的存放LabelMe标注文件的文件夹路径
-# 最终转换的json文件存放在image_input_dir父目录下的anatations文件夹下
+其中，annotations文件夹中存放标注文件，JPEGImages文件夹中存放图像文件。        

-```
+  
+
+## 2.5 实例分割任务数据目录结构

-转换后的标注文件为每个数据集（训练或验证（非必须）或测试）分别对应一个.json文件。转换后的.json文件的合适如下所示：
-```python
-info{
-    "year": int,
-    "version": str,
-    "description": str,
-    "contributor": str,
-    "url": str,
-    "date_created": datetime,
-}
-license[
-    {
-        "id": int,
-        "name": str,
-        "url": str,
-    },
-    ...
-]
-// 以上信息对目标检测作用不大，所以并未转换
-image[
-    {
-        "id": int,
-        "width": int,
-        "height": int,
-        "file_name": str
-    },
-    ...
-]
-categories[
-    {
-        "supercategory": str,
-        "id": int,
-        "naeme": str
-    },
-    ...
-]
-annotation[
-    {
-        "id": int,    
-        "image_id": int,
-        "category_id": int,
-        "segmentation": list,
-        "area": float,
-        "bbox": [x,y,width,height],
-        "iscrowd": 0 or 1,
-    },
-    ...
-]    
 ```
-相关标签含义:
-
-|标签|备注|
-|-----|-----|
-|image/width| 图像宽度|
-|image/height| 图像高度|
-|image/file_name| 图像在文件夹中的名字|
-|categories/supercategory| 父类名字|
-|categories/name| 类别名字|
-|annotation/segmentation| 横纵坐标点构成的list|
-|annotation/area| 目标框面积的大小|
-|annotation/bbox| 目标矩形框的左上角坐标及矩形宽高|
-|annotation/iscrowd| 表示目标是否是一组对象|
+data/mydataset/
+|-- JPEGImages
+    |-- 1.jpg 
+    |-- 2.jpg
+|-- annotations.json  
+```
+
+其中，`annotations.json`为标注文件，JPEGImages文件夹中存放图像文件。            
+
--- a/DataAnnotation/AnnotationNote/3_[语义分割]任务数据标注.md
+++ b/DataAnnotation/AnnotationNote/3_[语义分割]任务数据标注.md
 ## 3 使用LabelMe标注「语义分割」任务数据
 语义分割中常用的数据集是CityScape和COCO，此小节主要讲述CityScape数据集在LabelMe上标注的使用，有关COCO部分请参考 2.3 小节中有关Mask RCNN部分。           

-### 3.1 准备工作
+### 3.1 准备工作     

-* **3.1.1** 对收集的图像划分为训练、验证（非必须）、测试三个部分的数据集，分别存放于对应的文件夹中      
+* **3.1.1** 创建与图像文件夹相对应的文件夹，用于存储标注的json文件

-* **3.1.2** 创建与图像文件夹相对应的文件夹，用于存储标注的json文件
-
-* **3.1.3** 点击”Open Dir“按钮，选择需要标注的图像所在的文件夹打开，则”File List“对话框中会显示所有图像所对应的绝对路径      
+* **3.1.2** 点击”Open Dir“按钮，选择需要标注的图像所在的文件夹打开，则”File List“对话框中会显示所有图像所对应的绝对路径      

 ### 3.2 标注

@@ -21,68 +19,17 @@



-## 3.3 对LabelMe标注数据的转换
-语义分割中常用的数据集是CityScape和COCO，此小节主要讲述CityScape数据集的转换，有关COCO部分请参考 2.4 小节。由于LabelMe标注的数据与CityScape数据集所需的格式不一致，所以需要将LabelMe数据集转换为CityScape数据集格式。      
-
-CityScape的数据目录结构如下：
+## 3.3 语义分割任务数据目录结构：
 ```
 data/mydataset/
-|-- gtFine
-    |-- test
-    |-- train
-    |-- val
-|-- leftImg8bit
-    |-- test
-    |-- train
-    |-- val
-```
-其中，`gtFine`用于存放json文件，`leftImg8bit`用于存放图像文件。        
-
-针对训练、验证（非必须）、测试三个部分的数据集，分别在命令行中执行下述命令进行转换：
-```cmd
-# 进入Anaconda环境后，安装下述python包
-# 安装numpy
-pip install numpy
-# 转换
-cd ./DataAnnotation
-python ./labelme2cityscape.py \
-       --json_input_dir ~/Users/json/ \
-       --output_dir ~/Users/cityscape/ \
-# --json_input_dir：LabelMe所标注文件的文件夹路径
-# --output_dir：最终转换的文件的存放路径
-
+|-- JPEGImages
+    |-- 1.jpg
+    |-- 2.jpg
+    |-- 3.jpg
+|-- Annotations
+    |-- 1.png
+    |-- 2.png
+    |-- 3.png
+(可选)|-- label.txt   
 ```
-
-转换后的标注文件为每张图像对应一个.json文件，将训练、验证（非必须）、测试的转换结果分别放入gtFine的train/val/test中，同时将各个文件夹下json文件所对应的图像文件分别放入leftImg8bit的train/val/test中。转换后的.json文件的格式如下所示：
-```
-{
-    "imgHeight": int,
-    "imgWidth": int,
-    "objects": [
-        {
-            "label": str,
-            "polygon": [
-                [
-                    int,
-                    int
-                 ],
-                 [
-                    int,
-                    int
-                  ],
-                  ...
-            ]
-            ...
-        }
-    ]
-}       
-```
-相关标签含义:
-
-|标签|备注|
-|-----|-----|
-|objects| 目标信息的列表|
-|label| 目标类别|
-|polygon| 目标坐标的列表，有多边形点的所有坐标组成|
-|imageHeight| 图像高度|
-|imageWidth| 图像宽度|
+其中JPEGImages为图片文件夹，Annotations为标签文件夹。您可以提供一份命名为“label.txt”的包含所有标注名的清单，用于直接呈现类别名称，将标注序号“1”、“2”、“3” 等显示为对应的“空调”、“桌子”、“花瓶”。
\ No newline at end of file
--- a/DataAnnotation/README.md
+++ b/DataAnnotation/README.md
-### 数据标注
-您可以使用LabeMe标注工具对您的数据进行标注，同时提供了数据处理脚本，帮助用户快速准备训练目标检测和语义分割任务所需的数据。
-
 ### LabelMe
-LabelMe是目前广泛使用的数据标注工具，并且在GitHub上开源给用户使用。  
-GitHub地址：https://github.com/wkentaro/labelme
+LabelMe是目前广泛使用的数据标注工具，您也可以在保证标注文件格式与PaddleX所支持格式进行匹配的基础上选用其他标注工具。  
+LabelMe GitHub地址：https://github.com/wkentaro/labelme

 #### LabelMe的安装

-***注：为了保证环境的统一，本文介绍了在Anaconda环境下安装及使用LableMe的方法，您也可以根据您的实际情况及需求，使用LableMe或其他标注工具***
+***注：为了保证环境的统一，本文介绍了在Anaconda环境下安装及使用LabelMe的方法，您也可以根据您的实际情况及需求，使用LabelMe或其他标注工具***

 Windows: 参考文档[[标注工具安装和使用/1_Windows/1_3_LabelMe安装.md]](../DataAnnotation/标注工具安装和使用/1_Windows/1_3_LabelMe安装.md)  
 Ubuntu: 参考文档[[标注工具安装和使用/2_Ubuntu/2_3_LabelMe安装.md]](../DataAnnotation/标注工具安装和使用/2_Ubuntu/2_3_LabelMe安装.md)  
 MacOS: 参考文档[[标注工具安装和使用/3_MacOS/3_3_LabelMe安装.md]](../DataAnnotation/标注工具安装和使用/3_MacOS/3_3_LabelMe安装.md)

-#### LabelMe的使用
+#### 使用LabelMe标注你的数据集
+
 参考文档[[AnnotationNote]](../DataAnnotation/AnnotationNote)

--- a/DataAnnotation/labelme2cityscape.py
+++ b/DataAnnotation/labelme2cityscape.py
-#!/usr/bin/env python
-# coding: utf-8
-# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-import argparse
-import glob
-import json
-import os
-import os.path as osp
-import numpy as np
-
-
-class MyEncoder(json.JSONEncoder):
-    def default(self, obj):
-        if isinstance(obj, np.integer):
-            return int(obj)
-        elif isinstance(obj, np.floating):
-            return float(obj)
-        elif isinstance(obj, np.ndarray):
-            return obj.tolist()
-        else:
-            return super(MyEncoder, self).default(obj)
-
-
-def deal_json(json_file):
-    data_cs = {}
-    objects = []
-    num = -1
-    num = num + 1
-    if not json_file.endswith('.json'):
-        print('Cannot generating dataset from:', json_file)
-        return None
-    with open(json_file) as f:
-        print('Generating dataset from:', json_file)
-        data = json.load(f)
-        data_cs['imgHeight'] = data['imageHeight']
-        data_cs['imgWidth'] = data['imageWidth']
-        for shapes in data['shapes']:
-            obj = {}
-            label = shapes['label']
-            obj['label'] = label
-            points = shapes['points']
-            p_type = shapes['shape_type']
-            if p_type == 'polygon':
-                obj['polygon'] = points
-            objects.append(obj)
-        data_cs['objects'] = objects
-    return data_cs
-
-
-def main():
-    parser = argparse.ArgumentParser(
-        formatter_class=argparse.ArgumentDefaultsHelpFormatter, )
-    parser.add_argument('--json_input_dir', help='input annotated directory')
-    parser.add_argument(
-        '--output_dir',
-        help='output dataset directory', )
-
-    args = parser.parse_args()
-    try:
-        assert os.path.exists(args.json_input_dir)
-    except AssertionError as e:
-        print('The json folder does not exist!')
-        os._exit(0)
-
-    # Deal with the json files.
-    total_num = len(glob.glob(osp.join(args.json_input_dir, '*.json')))
-    for json_name in os.listdir(args.json_input_dir):
-        data_cs = deal_json(osp.join(args.json_input_dir, json_name))
-        if data_cs is None:
-            continue
-        json.dump(
-            data_cs,
-            open(osp.join(args.output_dir, json_name), 'w'),
-            indent=4,
-            cls=MyEncoder, )
-
-
-if __name__ == '__main__':
-    main()
--- a/DataAnnotation/labelme2coco.py
+++ b/DataAnnotation/labelme2coco.py
-#!/usr/bin/env python
-# coding: utf-8
-import argparse
-import glob
-import json
-import os
-import os.path as osp
-import sys
-
-import numpy as np
-import PIL.ImageDraw
-
-
-class MyEncoder(json.JSONEncoder):
-    def default(self, obj):
-        if isinstance(obj, np.integer):
-            return int(obj)
-        elif isinstance(obj, np.floating):
-            return float(obj)
-        elif isinstance(obj, np.ndarray):
-            return obj.tolist()
-        else:
-            return super(MyEncoder, self).default(obj)
-
-
-
-def images(data, num):
-    image = {}
-    image['height'] = data['imageHeight']
-    image['width'] = data['imageWidth']
-    image['id'] = num + 1
-    image['file_name'] = data['imagePath'].split('/')[-1]
-    return image
-
-
-def categories(label, labels_list):
-    category = {}
-    category['supercategory'] = 'component'
-    category['id'] = len(labels_list) + 1
-    category['name'] = label
-    return category
-
-
-def annotations_rectangle(iscrowd, points, label, num, label_to_num, count):
-    annotation = {}
-    seg_points = np.asarray(points).copy()
-    seg_points[1, :] = np.asarray(points)[2, :]
-    seg_points[2, :] = np.asarray(points)[1, :]
-    annotation['segmentation'] = [list(seg_points.flatten())]
-    annotation['iscrowd'] = iscrowd
-    annotation['image_id'] = num + 1
-    annotation['bbox'] = list(
-        map(
-            float,
-            [
-                points[0][0],
-                points[0][1],
-                points[1][0] - points[0][0],
-                points[1][1] - points[0][1],
-            ], ), )
-    annotation['area'] = annotation['bbox'][2] * annotation['bbox'][3]
-    annotation['category_id'] = label_to_num[label]
-    annotation['id'] = count
-    return annotation
-
-
-def annotations_polygon(annotation, iscrowd, height, width, points, label, num,
-                        label_to_num, count):
-    
-    if len(annotation) == 0:
-        annotation['segmentation'] = [list(np.asarray(points).flatten())]
-        annotation['iscrowd'] = iscrowd
-        annotation['image_id'] = num + 1
-        annotation['bbox'] = list(map(float, get_bbox(height, width, points)))
-        annotation['area'] = annotation['bbox'][2] * annotation['bbox'][3]
-        annotation['category_id'] = label_to_num[label]
-        annotation['id'] = count
-    else:
-        annotation['segmentation'].append(list(np.asarray(points).flatten()))
-        box1 = annotation['bbox']
-        box2 = list(map(float, get_bbox(height, width, points)))
-        x11, y11, x12, y12 = box1[0], box1[1], box1[0] + box1[2], box1[1] + box1[3]
-        x21, y21, x22, y22 = box2[0], box2[1], box2[0] + box2[2], box2[1] + box2[3]
-        x1 = x21 if x11 > x21 else x11
-        y1 = y21 if y11 > y21 else y11
-        x2 = x22 if x12 < x22 else x12
-        y2 = y22 if y12 < y22 else y12
-        annotation['bbox'] = [x1, y1, x2 - x1, y2 - y1]
-
-
-
-def get_bbox(height, width, points):
-    polygons = points
-    mask = np.zeros([height, width], dtype=np.uint8)
-    mask = PIL.Image.fromarray(mask)
-    xy = list(map(tuple, polygons))
-    PIL.ImageDraw.Draw(mask).polygon(xy=xy, outline=1, fill=1)
-    mask = np.array(mask, dtype=bool)
-    index = np.argwhere(mask == 1)
-    rows = index[:, 0]
-    clos = index[:, 1]
-    left_top_r = np.min(rows)
-    left_top_c = np.min(clos)
-    right_bottom_r = np.max(rows)
-    right_bottom_c = np.max(clos)
-    return [
-        left_top_c,
-        left_top_r,
-        right_bottom_c - left_top_c,
-        right_bottom_r - left_top_r,
-    ]
-
-
-def deal_json(img_path, json_path):
-    data_coco = {}
-    label_to_num = {}
-    images_list = []
-    categories_list = []
-    annotations_list = []
-    labels_list = []
-    num = -1
-    for img_file in os.listdir(img_path):
-        img_label = img_file.split('.')[0]
-        if img_label == '':
-            continue
-        label_file = osp.join(json_path, img_label + '.json')
-        assert os.path.exists(label_file), \
-            'The .json file of {} is not exists!'.format(img_file)
-        print('Generating dataset from:', label_file)
-        num = num + 1
-        with open(label_file) as f:
-            data = json.load(f)
-            images_list.append(images(data, num))
-            count = 0
-            lmid_count = {}
-            for shapes in data['shapes']:
-                count += 1
-                label = shapes['label']
-                part = label.split('_')
-                iscrowd = int(part[-1][0])
-                label = label.split('_' + part[-1])[0]
-                if label not in labels_list:
-                    categories_list.append(categories(label, labels_list))
-                    labels_list.append(label)
-                    label_to_num[label] = len(labels_list)
-                points = shapes['points']
-                p_type = shapes['shape_type']
-                if p_type == 'polygon':
-                    if len(part[-1]) > 1:
-                        lmid = part[-1][1:]
-                        if lmid in lmid_count:
-                            real_count = lmid_count[lmid]
-                            real_anno = None
-                            for anno in annotations_list:
-                                if anno['id'] == real_count:
-                                    real_anno = anno
-                                    break
-                            annotations_polygon(anno, iscrowd, data['imageHeight'], data[
-                                'imageWidth'], points, label, num, label_to_num,
-                                                real_count)
-                            count -= 1
-                        else:
-                            lmid_count[lmid] = count
-                            anno = {}
-                            annotations_polygon(anno, iscrowd, data['imageHeight'], data[
-                                'imageWidth'], points, label, num, label_to_num,
-                                                count)
-                            annotations_list.append(anno)
-                    else:
-                        anno = {}
-                        annotations_polygon(anno, iscrowd, data['imageHeight'], data[
-                            'imageWidth'], points, label, num, label_to_num,
-                                            count)
-                        annotations_list.append(anno)
-                if p_type == 'rectangle':
-                    points.append([points[0][0], points[1][1]])
-                    points.append([points[1][0], points[0][1]])
-                    annotations_list.append(
-                        annotations_rectangle(iscrowd, points, label, num,
-                                              label_to_num, count))
-    data_coco['images'] = images_list
-    data_coco['categories'] = categories_list
-    data_coco['annotations'] = annotations_list
-    return data_coco
-
-
-def main():
-    parser = argparse.ArgumentParser(
-        formatter_class=argparse.ArgumentDefaultsHelpFormatter, )
-    parser.add_argument('--json_input_dir', help='input annotated directory')
-    parser.add_argument('--image_input_dir', help='image directory')
-    args = parser.parse_args()
-    try:
-        assert os.path.exists(args.json_input_dir)
-    except AssertionError as e:
-        print('The json folder does not exist!')
-        os._exit(0)
-    try:
-        assert os.path.exists(args.image_input_dir)
-    except AssertionError as e:
-        print('The image folder does not exist!')
-        os._exit(0)
-
-    # Allocate the dataset.
-    total_num = len(glob.glob(osp.join(args.json_input_dir, '*.json')))
-
-    # Deal with the json files.
-    res_dir = os.path.abspath(os.path.join(args.image_input_dir, '..'))
-    if not os.path.exists(res_dir + '/annotations'):
-        os.makedirs(res_dir + '/annotations')
-    train_data_coco = deal_json(args.image_input_dir, args.json_input_dir)
-    train_json_path = osp.join(
-        res_dir + '/annotations',
-        'instance_{}.json'.format(
-            os.path.basename(os.path.abspath(args.image_input_dir))))
-    json.dump(
-        train_data_coco,
-        open(
-            train_json_path,
-            'w'),
-        indent=4,
-        cls=MyEncoder)
-
-
-if __name__ == '__main__':
-    main()
--- a/DataAnnotation/标注工具安装和使用/.DS_Store
+++ b/DataAnnotation/标注工具安装和使用/.DS_Store
--- a/images/.DS_Store
+++ b/images/.DS_Store