提交 f29b3bca 编写于 作者: G gaotingquan 提交者: Tingquan Gao

docs: rename

上级 7609beeb
## Features of PaddleClas
PaddleClas is an image recognition toolset for industry and academia, helping users train better computer vision models and apply them in real scenarios. Specifically, it contains the following core features.
- Practical image recognition system: Integrate detection, feature learning, and retrieval modules to be applicable to all types of image recognition tasks. Four sample solutions are provided, including product recognition, vehicle recognition, logo recognition, and animation character recognition.
- Rich library of pre-trained models: Provide a total of 175 ImageNet pre-trained models of 36 series, among which 7 selected series of models support fast structural modification.
- Comprehensive and easy-to-use feature learning components: 12 metric learning methods are integrated and can be combined and switched at will through configuration files.
- SSLD knowledge distillation: The 14 classification pre-training models generally improved their accuracy by more than 3%; among them, the ResNet50_vd model achieved a Top-1 accuracy of 84.0% on the Image-Net-1k dataset and the Res2Net200_vd pre-training model achieved a Top-1 accuracy of 85.1%.
- Data augmentation: Provide 8 data augmentation algorithms such as AutoAugment, Cutout, Cutmix, etc. with the detailed introduction, code replication, and evaluation of effectiveness in a unified experimental environment.
[![img](https://github.com/PaddlePaddle/PaddleClas/raw/release/2.3/docs/images/recognition.gif)](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/images/recognition.gif)
For more information about the quick start of image recognition, algorithm details, model training and evaluation, and prediction and deployment methods, please refer to the [README Tutorial](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/README_ch.md) on home page.
...@@ -4,19 +4,19 @@ This document elaborates on the dataset format adopted by PaddleClas for image c ...@@ -4,19 +4,19 @@ This document elaborates on the dataset format adopted by PaddleClas for image c
------ ------
## Catalogue ## Contents
- [1.Dataset Format](#1) - [Dataset Format](https://github.com/paddlepaddle/paddleclas/blob/release%2F2.3/docs/zh_CN/data_preparation/classification_dataset.md#数据集格式说明)
- [2.Common Datasets for Image Classification](#2) - [Common Datasets for Image Classification](https://github.com/paddlepaddle/paddleclas/blob/release%2F2.3/docs/zh_CN/data_preparation/classification_dataset.md#图像分类任务常见数据集介绍)
- [2.1 ImageNet1k](#2.1) - [2.1 ImageNet1k](https://github.com/paddlepaddle/paddleclas/blob/release%2F2.3/docs/zh_CN/data_preparation/classification_dataset.md#ImageNet1k)
- [2.2 Flowers102](#2.2) - [2.2 Flowers102](https://github.com/paddlepaddle/paddleclas/blob/release%2F2.3/docs/zh_CN/data_preparation/classification_dataset.md#Flowers102)
- [2.3 CIFAR10 / CIFAR100](#2.3) - [2.3 CIFAR10 / CIFAR100](https://github.com/paddlepaddle/paddleclas/blob/release%2F2.3/docs/zh_CN/data_preparation/classification_dataset.md#CIFAR10/CIFAR100)
- [2.4 MNIST](#2.4) - [2.4 MNIST](https://github.com/paddlepaddle/paddleclas/blob/release%2F2.3/docs/zh_CN/data_preparation/classification_dataset.md#MNIST)
- [2.5 NUS-WIDE](#2.5) - [2.5 NUS-WIDE](https://github.com/paddlepaddle/paddleclas/blob/release%2F2.3/docs/zh_CN/data_preparation/classification_dataset.md#NUS-WIDE)
<a name="1"></a>
## 1.Dataset Format ## 1 Dataset Format
PaddleClas adopts `txt` files to assign the training and test sets. Taking the `ImageNet1k` dataset as an example, where `train_list.txt` and `val_list.txt` have the following formats: PaddleClas adopts `txt` files to assign the training and test sets. Taking the `ImageNet1k` dataset as an example, where `train_list.txt` and `val_list.txt` have the following formats:
...@@ -33,12 +33,11 @@ val/ILSVRC2012_val_00000001.JPEG 65 ...@@ -33,12 +33,11 @@ val/ILSVRC2012_val_00000001.JPEG 65
``` ```
<a name="2"></a>
## 2.Common Datasets for Image Classification ## 2 Common Datasets for Image Classification
Here we present a compilation of commonly used image classification datasets, which is continuously updated and expects your supplement. Here we present a compilation of commonly used image classification datasets, which is continuously updated and expects your supplement.
<a name="2.1"></a>
### 2.1 ImageNet1k ### 2.1 ImageNet1k
[ImageNet](https://image-net.org/) is a large visual database for visual target recognition research with over 14 million manually labeled images. ImageNet-1k is a subset of the ImageNet dataset, which contains 1000 categories with 1281167 images for the training set and 50000 for the validation set. Since 2010, ImageNet began to hold an annual image classification competition, namely, the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) with ImageNet-1k as its specified dataset. To date, ImageNet-1k has become one of the most significant contributors to the development of computer vision, based on which numerous initial models of downstream computer vision tasks are trained. [ImageNet](https://image-net.org/) is a large visual database for visual target recognition research with over 14 million manually labeled images. ImageNet-1k is a subset of the ImageNet dataset, which contains 1000 categories with 1281167 images for the training set and 50000 for the validation set. Since 2010, ImageNet began to hold an annual image classification competition, namely, the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) with ImageNet-1k as its specified dataset. To date, ImageNet-1k has become one of the most significant contributors to the development of computer vision, based on which numerous initial models of downstream computer vision tasks are trained.
...@@ -69,7 +68,7 @@ PaddleClas/dataset/ILSVRC2012/ ...@@ -69,7 +68,7 @@ PaddleClas/dataset/ILSVRC2012/
``` ```
<a name="2.2"></a>
### 2.2 Flowers102 ### 2.2 Flowers102
| Dataset | Size of Training Set | Size of Test Set | Number of Category | Note | | Dataset | Size of Training Set | Size of Test Set | Number of Category | Note |
...@@ -106,7 +105,7 @@ PaddleClas/dataset/flowers102/ ...@@ -106,7 +105,7 @@ PaddleClas/dataset/flowers102/
``` ```
<a name="2.3"></a>
### 2.3 CIFAR10 / CIFAR100 ### 2.3 CIFAR10 / CIFAR100
The CIFAR-10 dataset comprises 60,000 color images of 10 classes with 32x32 image resolution, each with 6,000 images including 5,000 images in the training set and 1,000 images in the validation set. The 10 different classes represent airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. The CIFAR-100 dataset is an extension of CIFAR-10 and consists of 60,000 color images of 100 classes with 32x32 image resolution, each with 600 images including 500 images in the training set and 100 images in the validation set. The CIFAR-10 dataset comprises 60,000 color images of 10 classes with 32x32 image resolution, each with 6,000 images including 5,000 images in the training set and 1,000 images in the validation set. The 10 different classes represent airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. The CIFAR-100 dataset is an extension of CIFAR-10 and consists of 60,000 color images of 100 classes with 32x32 image resolution, each with 600 images including 500 images in the training set and 100 images in the validation set.
...@@ -114,7 +113,7 @@ The CIFAR-10 dataset comprises 60,000 color images of 10 classes with 32x32 imag ...@@ -114,7 +113,7 @@ The CIFAR-10 dataset comprises 60,000 color images of 10 classes with 32x32 imag
Website:http://www.cs.toronto.edu/~kriz/cifar.html Website:http://www.cs.toronto.edu/~kriz/cifar.html
<a name="2.4"></a>
### 2.4 MNIST ### 2.4 MNIST
MMNIST is a renowned dataset for handwritten digit recognition and is used as an introductory sample for deep learning in many sources. It contains 60,000 images, 50,000 for the training set and 10,000 for the validation set, with a size of 28 * 28. MMNIST is a renowned dataset for handwritten digit recognition and is used as an introductory sample for deep learning in many sources. It contains 60,000 images, 50,000 for the training set and 10,000 for the validation set, with a size of 28 * 28.
...@@ -122,7 +121,7 @@ MMNIST is a renowned dataset for handwritten digit recognition and is used as an ...@@ -122,7 +121,7 @@ MMNIST is a renowned dataset for handwritten digit recognition and is used as an
Website:http://yann.lecun.com/exdb/mnist/ Website:http://yann.lecun.com/exdb/mnist/
<a name="2.5"></a>
### 2.5 NUS-WIDE ### 2.5 NUS-WIDE
NUS-WIDE is a multi-category dataset. It contains 269,648 images and 81 categories with each image being labeled as one or more of the 81 categories. NUS-WIDE is a multi-category dataset. It contains 269,648 images and 81 categories with each image being labeled as one or more of the 81 categories.
......
...@@ -4,19 +4,19 @@ This document elaborates on the dataset format adopted by PaddleClas for image c ...@@ -4,19 +4,19 @@ This document elaborates on the dataset format adopted by PaddleClas for image c
------ ------
## Contents ## Catalogue
- [Dataset Format](https://github.com/paddlepaddle/paddleclas/blob/release%2F2.3/docs/zh_CN/data_preparation/classification_dataset.md#数据集格式说明) - [1.Dataset Format](#1)
- [Common Datasets for Image Classification](https://github.com/paddlepaddle/paddleclas/blob/release%2F2.3/docs/zh_CN/data_preparation/classification_dataset.md#图像分类任务常见数据集介绍) - [2.Common Datasets for Image Classification](#2)
- [2.1 ImageNet1k](https://github.com/paddlepaddle/paddleclas/blob/release%2F2.3/docs/zh_CN/data_preparation/classification_dataset.md#ImageNet1k) - [2.1 ImageNet1k](#2.1)
- [2.2 Flowers102](https://github.com/paddlepaddle/paddleclas/blob/release%2F2.3/docs/zh_CN/data_preparation/classification_dataset.md#Flowers102) - [2.2 Flowers102](#2.2)
- [2.3 CIFAR10 / CIFAR100](https://github.com/paddlepaddle/paddleclas/blob/release%2F2.3/docs/zh_CN/data_preparation/classification_dataset.md#CIFAR10/CIFAR100) - [2.3 CIFAR10 / CIFAR100](#2.3)
- [2.4 MNIST](https://github.com/paddlepaddle/paddleclas/blob/release%2F2.3/docs/zh_CN/data_preparation/classification_dataset.md#MNIST) - [2.4 MNIST](#2.4)
- [2.5 NUS-WIDE](https://github.com/paddlepaddle/paddleclas/blob/release%2F2.3/docs/zh_CN/data_preparation/classification_dataset.md#NUS-WIDE) - [2.5 NUS-WIDE](#2.5)
<a name="1"></a>
## 1 Dataset Format ## 1.Dataset Format
PaddleClas adopts `txt` files to assign the training and test sets. Taking the `ImageNet1k` dataset as an example, where `train_list.txt` and `val_list.txt` have the following formats: PaddleClas adopts `txt` files to assign the training and test sets. Taking the `ImageNet1k` dataset as an example, where `train_list.txt` and `val_list.txt` have the following formats:
...@@ -33,11 +33,12 @@ val/ILSVRC2012_val_00000001.JPEG 65 ...@@ -33,11 +33,12 @@ val/ILSVRC2012_val_00000001.JPEG 65
``` ```
<a name="2"></a>
## 2 Common Datasets for Image Classification ## 2.Common Datasets for Image Classification
Here we present a compilation of commonly used image classification datasets, which is continuously updated and expects your supplement. Here we present a compilation of commonly used image classification datasets, which is continuously updated and expects your supplement.
<a name="2.1"></a>
### 2.1 ImageNet1k ### 2.1 ImageNet1k
[ImageNet](https://image-net.org/) is a large visual database for visual target recognition research with over 14 million manually labeled images. ImageNet-1k is a subset of the ImageNet dataset, which contains 1000 categories with 1281167 images for the training set and 50000 for the validation set. Since 2010, ImageNet began to hold an annual image classification competition, namely, the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) with ImageNet-1k as its specified dataset. To date, ImageNet-1k has become one of the most significant contributors to the development of computer vision, based on which numerous initial models of downstream computer vision tasks are trained. [ImageNet](https://image-net.org/) is a large visual database for visual target recognition research with over 14 million manually labeled images. ImageNet-1k is a subset of the ImageNet dataset, which contains 1000 categories with 1281167 images for the training set and 50000 for the validation set. Since 2010, ImageNet began to hold an annual image classification competition, namely, the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) with ImageNet-1k as its specified dataset. To date, ImageNet-1k has become one of the most significant contributors to the development of computer vision, based on which numerous initial models of downstream computer vision tasks are trained.
...@@ -68,7 +69,7 @@ PaddleClas/dataset/ILSVRC2012/ ...@@ -68,7 +69,7 @@ PaddleClas/dataset/ILSVRC2012/
``` ```
<a name="2.2"></a>
### 2.2 Flowers102 ### 2.2 Flowers102
| Dataset | Size of Training Set | Size of Test Set | Number of Category | Note | | Dataset | Size of Training Set | Size of Test Set | Number of Category | Note |
...@@ -105,7 +106,7 @@ PaddleClas/dataset/flowers102/ ...@@ -105,7 +106,7 @@ PaddleClas/dataset/flowers102/
``` ```
<a name="2.3"></a>
### 2.3 CIFAR10 / CIFAR100 ### 2.3 CIFAR10 / CIFAR100
The CIFAR-10 dataset comprises 60,000 color images of 10 classes with 32x32 image resolution, each with 6,000 images including 5,000 images in the training set and 1,000 images in the validation set. The 10 different classes represent airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. The CIFAR-100 dataset is an extension of CIFAR-10 and consists of 60,000 color images of 100 classes with 32x32 image resolution, each with 600 images including 500 images in the training set and 100 images in the validation set. The CIFAR-10 dataset comprises 60,000 color images of 10 classes with 32x32 image resolution, each with 6,000 images including 5,000 images in the training set and 1,000 images in the validation set. The 10 different classes represent airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. The CIFAR-100 dataset is an extension of CIFAR-10 and consists of 60,000 color images of 100 classes with 32x32 image resolution, each with 600 images including 500 images in the training set and 100 images in the validation set.
...@@ -113,7 +114,7 @@ The CIFAR-10 dataset comprises 60,000 color images of 10 classes with 32x32 imag ...@@ -113,7 +114,7 @@ The CIFAR-10 dataset comprises 60,000 color images of 10 classes with 32x32 imag
Website:http://www.cs.toronto.edu/~kriz/cifar.html Website:http://www.cs.toronto.edu/~kriz/cifar.html
<a name="2.4"></a>
### 2.4 MNIST ### 2.4 MNIST
MMNIST is a renowned dataset for handwritten digit recognition and is used as an introductory sample for deep learning in many sources. It contains 60,000 images, 50,000 for the training set and 10,000 for the validation set, with a size of 28 * 28. MMNIST is a renowned dataset for handwritten digit recognition and is used as an introductory sample for deep learning in many sources. It contains 60,000 images, 50,000 for the training set and 10,000 for the validation set, with a size of 28 * 28.
...@@ -121,7 +122,7 @@ MMNIST is a renowned dataset for handwritten digit recognition and is used as an ...@@ -121,7 +122,7 @@ MMNIST is a renowned dataset for handwritten digit recognition and is used as an
Website:http://yann.lecun.com/exdb/mnist/ Website:http://yann.lecun.com/exdb/mnist/
<a name="2.5"></a>
### 2.5 NUS-WIDE ### 2.5 NUS-WIDE
NUS-WIDE is a multi-category dataset. It contains 269,648 images and 81 categories with each image being labeled as one or more of the 81 categories. NUS-WIDE is a multi-category dataset. It contains 269,648 images and 81 categories with each image being labeled as one or more of the 81 categories.
......
...@@ -4,20 +4,20 @@ This document elaborates on the dataset format adopted by PaddleClas for image r ...@@ -4,20 +4,20 @@ This document elaborates on the dataset format adopted by PaddleClas for image r
------ ------
## Catalogue ## Contents
- [1.Dataset Format](#1) - [Dataset Format](https://github.com/paddlepaddle/paddleclas/blob/release%2F2.3/docs/zh_CN/data_preparation/recognition_dataset.md#数据集格式说明)
- [2.Common Datasets for Image Recognition](#2) - [Common Datasets for Image Recognition](https://github.com/paddlepaddle/paddleclas/blob/release%2F2.3/docs/zh_CN/data_preparation/recognition_dataset.md#图像识别任务常见数据集介绍)
- [2.1 General Datasets](#2.1) - [2.1 General Datasets](https://github.com/paddlepaddle/paddleclas/blob/release%2F2.3/docs/zh_CN/data_preparation/recognition_dataset.md#通用图像识别数据集)
- [2.2 Vertical Class Datasets](#2.2) - [2.2 Vertical Datasets](https://github.com/paddlepaddle/paddleclas/blob/release%2F2.3/docs/zh_CN/data_preparation/recognition_dataset.md#垂类图像识别数据集)
- [2.2.1 Animation Character Recognition](#2.2.1) - [2.2.1 Animation Character Recognition](https://github.com/paddlepaddle/paddleclas/blob/release%2F2.3/docs/zh_CN/data_preparation/recognition_dataset.md#动漫人物识别)
- [2.2.2 Product Recognition](#2.2.2) - [2.2.2 Product Recognition](https://github.com/paddlepaddle/paddleclas/blob/release%2F2.3/docs/zh_CN/data_preparation/recognition_dataset.md#商品识别)
- [2.2.3 Logo Recognition](#2.2.3) - [2.2.3 Logo Recognition](https://github.com/paddlepaddle/paddleclas/blob/release%2F2.3/docs/zh_CN/data_preparation/recognition_dataset.md#Logo识别)
- [2.2.4 Vehicle Recognition](#2.2.4) - [2.2.4 Vehicle Recognition](https://github.com/paddlepaddle/paddleclas/blob/release%2F2.3/docs/zh_CN/data_preparation/recognition_dataset.md#车辆识别)
<a name="1"></a>
## 1.Dataset Format ## 1 Dataset Format
The dataset for the vector search, unlike those for classification tasks, is divided into the following three parts: The dataset for the vector search, unlike those for classification tasks, is divided into the following three parts:
...@@ -56,12 +56,11 @@ Each row of data is separated by "space", and the three columns of data stand fo ...@@ -56,12 +56,11 @@ Each row of data is separated by "space", and the three columns of data stand fo
2. When the gallery dataset and query dataset are different, there is no need to add a unique id. Both `query_list.txt` and `gallery_list.txt` contain two columns, which are the path and label information of the training data. The dataset of yaml configuration file is ` ImageNetDataset`. 2. When the gallery dataset and query dataset are different, there is no need to add a unique id. Both `query_list.txt` and `gallery_list.txt` contain two columns, which are the path and label information of the training data. The dataset of yaml configuration file is ` ImageNetDataset`.
<a name="2"></a>
## 2.Common Datasets for Image Recognition ## 2. Common Datasets for Image Recognition
Here we present a compilation of commonly used image recognition datasets, which is continuously updated and expects your supplement. Here we present a compilation of commonly used image recognition datasets, which is continuously updated and expects your supplement.
<a name="2.1"></a>
### 2.1 General Datasets ### 2.1 General Datasets
- SOP: The SOP dataset is a common product dataset in general recognition research and MetricLearning technology research, which contains 120,053 images of 22,634 products downloaded from eBay.com. There are 59,551 images of 11,318 in the training set and 60,502 images of 11,316 categories in the validation set. - SOP: The SOP dataset is a common product dataset in general recognition research and MetricLearning technology research, which contains 120,053 images of 22,634 products downloaded from eBay.com. There are 59,551 images of 11,318 in the training set and 60,502 images of 11,316 categories in the validation set.
...@@ -79,11 +78,11 @@ Here we present a compilation of commonly used image recognition datasets, which ...@@ -79,11 +78,11 @@ Here we present a compilation of commonly used image recognition datasets, which
Website: http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html Website: http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html
<a name="2.2"></a>
### 2.2 Vertical Class Datasets ### 2.2 Vertical Datasets
<a name="2.2.1"></a>
#### 2.2.1 Animation Character Recognition #### 2.2.1 Animation Character Recognition
- iCartoonFace: iCartoonFace, developed by iQiyi (an online video platform), is the world's largest manual labeled detection and recognition dataset for cartoon characters, which contains more than 5013 cartoon characters and 389,678 high-quality live images. Compared with other datasets, it boasts features of large scale, high quality, rich diversity, and challenging difficulty, making it one of the most commonly used datasets to study cartoon character recognition. - iCartoonFace: iCartoonFace, developed by iQiyi (an online video platform), is the world's largest manual labeled detection and recognition dataset for cartoon characters, which contains more than 5013 cartoon characters and 389,678 high-quality live images. Compared with other datasets, it boasts features of large scale, high quality, rich diversity, and challenging difficulty, making it one of the most commonly used datasets to study cartoon character recognition.
...@@ -99,7 +98,7 @@ Here we present a compilation of commonly used image recognition datasets, which ...@@ -99,7 +98,7 @@ Here we present a compilation of commonly used image recognition datasets, which
Website: http://cvit.iiit.ac.in/research/projects/cvit-projects/cartoonfaces Website: http://cvit.iiit.ac.in/research/projects/cvit-projects/cartoonfaces
<a name="2.2.2"></a>
#### 2.2.2 Product Recognition #### 2.2.2 Product Recognition
- AliProduct: The AliProduct dataset is the largest open source product dataset. As an SKU-level image classification dataset, it contains 50,000 categories and 3 million images, ranking the first in both aspects in the industry. This dataset covers a large number of household goods, food, etc. Due to its lack of manual annotation, the data is messy and unevenly distributed with many similar product images. - AliProduct: The AliProduct dataset is the largest open source product dataset. As an SKU-level image classification dataset, it contains 50,000 categories and 3 million images, ranking the first in both aspects in the industry. This dataset covers a large number of household goods, food, etc. Due to its lack of manual annotation, the data is messy and unevenly distributed with many similar product images.
...@@ -113,7 +112,7 @@ Here we present a compilation of commonly used image recognition datasets, which ...@@ -113,7 +112,7 @@ Here we present a compilation of commonly used image recognition datasets, which
- DeepFashion-Inshop: The same as the common datasets In-shop Clothes. - DeepFashion-Inshop: The same as the common datasets In-shop Clothes.
<a name="2.2.3"></a>
### 2.2.3 Logo Recognition ### 2.2.3 Logo Recognition
- Logo-2K+: Logo-2K+ is a dataset exclusively for logo image recognition, which contains 10 major categories, 2341 minor categories, and 167,140 images. - Logo-2K+: Logo-2K+ is a dataset exclusively for logo image recognition, which contains 10 major categories, 2341 minor categories, and 167,140 images.
...@@ -124,8 +123,6 @@ Here we present a compilation of commonly used image recognition datasets, which ...@@ -124,8 +123,6 @@ Here we present a compilation of commonly used image recognition datasets, which
Website: https://cg.cs.tsinghua.edu.cn/traffic-sign/ Website: https://cg.cs.tsinghua.edu.cn/traffic-sign/
<a name="2.2.4"></a>
### 2.2.4 Vehicle Recognition ### 2.2.4 Vehicle Recognition
- CompCars: The images, 136,726 images of the whole car and 27,618 partial ones, are mainly from network and surveillance data. The network data contains 163 vehicle manufacturers and 1,716 vehicle models and includes the bounding box, viewing angle, and 5 attributes (maximum speed, displacement, number of doors, number of seats, and vehicle type). And the surveillance data comprises 50,000 front view images. - CompCars: The images, 136,726 images of the whole car and 27,618 partial ones, are mainly from network and surveillance data. The network data contains 163 vehicle manufacturers and 1,716 vehicle models and includes the bounding box, viewing angle, and 5 attributes (maximum speed, displacement, number of doors, number of seats, and vehicle type). And the surveillance data comprises 50,000 front view images.
......
...@@ -4,20 +4,20 @@ This document elaborates on the dataset format adopted by PaddleClas for image r ...@@ -4,20 +4,20 @@ This document elaborates on the dataset format adopted by PaddleClas for image r
------ ------
## Contents ## Catalogue
- [Dataset Format](https://github.com/paddlepaddle/paddleclas/blob/release%2F2.3/docs/zh_CN/data_preparation/recognition_dataset.md#数据集格式说明) - [1.Dataset Format](#1)
- [Common Datasets for Image Recognition](https://github.com/paddlepaddle/paddleclas/blob/release%2F2.3/docs/zh_CN/data_preparation/recognition_dataset.md#图像识别任务常见数据集介绍) - [2.Common Datasets for Image Recognition](#2)
- [2.1 General Datasets](https://github.com/paddlepaddle/paddleclas/blob/release%2F2.3/docs/zh_CN/data_preparation/recognition_dataset.md#通用图像识别数据集) - [2.1 General Datasets](#2.1)
- [2.2 Vertical Datasets](https://github.com/paddlepaddle/paddleclas/blob/release%2F2.3/docs/zh_CN/data_preparation/recognition_dataset.md#垂类图像识别数据集) - [2.2 Vertical Class Datasets](#2.2)
- [2.2.1 Animation Character Recognition](https://github.com/paddlepaddle/paddleclas/blob/release%2F2.3/docs/zh_CN/data_preparation/recognition_dataset.md#动漫人物识别) - [2.2.1 Animation Character Recognition](#2.2.1)
- [2.2.2 Product Recognition](https://github.com/paddlepaddle/paddleclas/blob/release%2F2.3/docs/zh_CN/data_preparation/recognition_dataset.md#商品识别) - [2.2.2 Product Recognition](#2.2.2)
- [2.2.3 Logo Recognition](https://github.com/paddlepaddle/paddleclas/blob/release%2F2.3/docs/zh_CN/data_preparation/recognition_dataset.md#Logo识别) - [2.2.3 Logo Recognition](#2.2.3)
- [2.2.4 Vehicle Recognition](https://github.com/paddlepaddle/paddleclas/blob/release%2F2.3/docs/zh_CN/data_preparation/recognition_dataset.md#车辆识别) - [2.2.4 Vehicle Recognition](#2.2.4)
<a name="1"></a>
## 1 Dataset Format ## 1.Dataset Format
The dataset for the vector search, unlike those for classification tasks, is divided into the following three parts: The dataset for the vector search, unlike those for classification tasks, is divided into the following three parts:
...@@ -56,11 +56,12 @@ Each row of data is separated by "space", and the three columns of data stand fo ...@@ -56,11 +56,12 @@ Each row of data is separated by "space", and the three columns of data stand fo
2. When the gallery dataset and query dataset are different, there is no need to add a unique id. Both `query_list.txt` and `gallery_list.txt` contain two columns, which are the path and label information of the training data. The dataset of yaml configuration file is ` ImageNetDataset`. 2. When the gallery dataset and query dataset are different, there is no need to add a unique id. Both `query_list.txt` and `gallery_list.txt` contain two columns, which are the path and label information of the training data. The dataset of yaml configuration file is ` ImageNetDataset`.
<a name="2"></a>
## 2. Common Datasets for Image Recognition ## 2.Common Datasets for Image Recognition
Here we present a compilation of commonly used image recognition datasets, which is continuously updated and expects your supplement. Here we present a compilation of commonly used image recognition datasets, which is continuously updated and expects your supplement.
<a name="2.1"></a>
### 2.1 General Datasets ### 2.1 General Datasets
- SOP: The SOP dataset is a common product dataset in general recognition research and MetricLearning technology research, which contains 120,053 images of 22,634 products downloaded from eBay.com. There are 59,551 images of 11,318 in the training set and 60,502 images of 11,316 categories in the validation set. - SOP: The SOP dataset is a common product dataset in general recognition research and MetricLearning technology research, which contains 120,053 images of 22,634 products downloaded from eBay.com. There are 59,551 images of 11,318 in the training set and 60,502 images of 11,316 categories in the validation set.
...@@ -78,11 +79,11 @@ Here we present a compilation of commonly used image recognition datasets, which ...@@ -78,11 +79,11 @@ Here we present a compilation of commonly used image recognition datasets, which
Website: http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html Website: http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html
<a name="2.2"></a>
### 2.2 Vertical Datasets ### 2.2 Vertical Class Datasets
<a name="2.2.1"></a>
#### 2.2.1 Animation Character Recognition #### 2.2.1 Animation Character Recognition
- iCartoonFace: iCartoonFace, developed by iQiyi (an online video platform), is the world's largest manual labeled detection and recognition dataset for cartoon characters, which contains more than 5013 cartoon characters and 389,678 high-quality live images. Compared with other datasets, it boasts features of large scale, high quality, rich diversity, and challenging difficulty, making it one of the most commonly used datasets to study cartoon character recognition. - iCartoonFace: iCartoonFace, developed by iQiyi (an online video platform), is the world's largest manual labeled detection and recognition dataset for cartoon characters, which contains more than 5013 cartoon characters and 389,678 high-quality live images. Compared with other datasets, it boasts features of large scale, high quality, rich diversity, and challenging difficulty, making it one of the most commonly used datasets to study cartoon character recognition.
...@@ -98,7 +99,7 @@ Here we present a compilation of commonly used image recognition datasets, which ...@@ -98,7 +99,7 @@ Here we present a compilation of commonly used image recognition datasets, which
Website: http://cvit.iiit.ac.in/research/projects/cvit-projects/cartoonfaces Website: http://cvit.iiit.ac.in/research/projects/cvit-projects/cartoonfaces
<a name="2.2.2"></a>
#### 2.2.2 Product Recognition #### 2.2.2 Product Recognition
- AliProduct: The AliProduct dataset is the largest open source product dataset. As an SKU-level image classification dataset, it contains 50,000 categories and 3 million images, ranking the first in both aspects in the industry. This dataset covers a large number of household goods, food, etc. Due to its lack of manual annotation, the data is messy and unevenly distributed with many similar product images. - AliProduct: The AliProduct dataset is the largest open source product dataset. As an SKU-level image classification dataset, it contains 50,000 categories and 3 million images, ranking the first in both aspects in the industry. This dataset covers a large number of household goods, food, etc. Due to its lack of manual annotation, the data is messy and unevenly distributed with many similar product images.
...@@ -112,7 +113,7 @@ Here we present a compilation of commonly used image recognition datasets, which ...@@ -112,7 +113,7 @@ Here we present a compilation of commonly used image recognition datasets, which
- DeepFashion-Inshop: The same as the common datasets In-shop Clothes. - DeepFashion-Inshop: The same as the common datasets In-shop Clothes.
<a name="2.2.3"></a>
### 2.2.3 Logo Recognition ### 2.2.3 Logo Recognition
- Logo-2K+: Logo-2K+ is a dataset exclusively for logo image recognition, which contains 10 major categories, 2341 minor categories, and 167,140 images. - Logo-2K+: Logo-2K+ is a dataset exclusively for logo image recognition, which contains 10 major categories, 2341 minor categories, and 167,140 images.
...@@ -123,6 +124,8 @@ Here we present a compilation of commonly used image recognition datasets, which ...@@ -123,6 +124,8 @@ Here we present a compilation of commonly used image recognition datasets, which
Website: https://cg.cs.tsinghua.edu.cn/traffic-sign/ Website: https://cg.cs.tsinghua.edu.cn/traffic-sign/
<a name="2.2.4"></a>
### 2.2.4 Vehicle Recognition ### 2.2.4 Vehicle Recognition
- CompCars: The images, 136,726 images of the whole car and 27,618 partial ones, are mainly from network and surveillance data. The network data contains 163 vehicle manufacturers and 1,716 vehicle models and includes the bounding box, viewing angle, and 5 attributes (maximum speed, displacement, number of doors, number of seats, and vehicle type). And the surveillance data comprises 50,000 front view images. - CompCars: The images, 136,726 images of the whole car and 27,618 partial ones, are mainly from network and surveillance data. The network data contains 163 vehicle manufacturers and 1,716 vehicle models and includes the bounding box, viewing angle, and 5 attributes (maximum speed, displacement, number of doors, number of seats, and vehicle type). And the surveillance data comprises 50,000 front view images.
......
## Features of PaddleClas ## Features of PaddleClas
PaddleClas is an image recognition toolset for industry and academia, PaddleClas is an image recognition toolset for industry and academia, helping users train better computer vision models and apply them in real scenarios. Specifically, it contains the following core features.
helping users train better computer vision models and apply them in real scenarios.
Specifically, it contains the following core features.
- Practical image recognition system: Integrate detection, feature learning, - Practical image recognition system: Integrate detection, feature learning, and retrieval modules to be applicable to all types of image recognition tasks. Four sample solutions are provided, including product recognition, vehicle recognition, logo recognition, and animation character recognition.
and retrieval modules to be applicable to all types of image recognition tasks. Four sample solutions are provided, - Rich library of pre-trained models: Provide a total of 175 ImageNet pre-trained models of 36 series, among which 7 selected series of models support fast structural modification.
including product recognition, vehicle recognition, logo recognition, and animation character recognition. - Comprehensive and easy-to-use feature learning components: 12 metric learning methods are integrated and can be combined and switched at will through configuration files.
- Rich library of pre-trained models: Provide a total of 175 ImageNet pre-trained models of 36 series, - SSLD knowledge distillation: The 14 classification pre-training models generally improved their accuracy by more than 3%; among them, the ResNet50_vd model achieved a Top-1 accuracy of 84.0% on the Image-Net-1k dataset and the Res2Net200_vd pre-training model achieved a Top-1 accuracy of 85.1%.
among which 7 selected series of models support fast structural modification. - Data augmentation: Provide 8 data augmentation algorithms such as AutoAugment, Cutout, Cutmix, etc. with the detailed introduction, code replication, and evaluation of effectiveness in a unified experimental environment.
- Comprehensive and easy-to-use feature learning components: 12 metric learning methods are integrated and can be
combined and switched at will through configuration files.
- SSLD knowledge distillation: The 14 classification pre-training models generally improved their accuracy by
more than 3%; among them, the ResNet50_vd model achieved a Top-1 accuracy of 84.0% on the Image-Net-1k dataset
and the Res2Net200_vd pre-training model achieved a Top-1 accuracy of 85.1%.
- Data augmentation: Provide 8 data augmentation algorithms such as AutoAugment, Cutout, Cutmix, etc.
with the detailed introduction, code replication, and evaluation of effectiveness in a unified experimental environment.
![img](../../images/recognition.gif) [![img](https://github.com/PaddlePaddle/PaddleClas/raw/release/2.3/docs/images/recognition.gif)](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/images/recognition.gif)
For more information about the quick start of image recognition, algorithm details, model training and evaluation, For more information about the quick start of image recognition, algorithm details, model training and evaluation, and prediction and deployment methods, please refer to the [README Tutorial](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/README_ch.md) on home page.
and prediction and deployment methods, please refer to the [README Tutorial](../../../README_en.md) on home page.
## Features of PaddleClas
PaddleClas is an image recognition toolset for industry and academia,
helping users train better computer vision models and apply them in real scenarios.
Specifically, it contains the following core features.
- Practical image recognition system: Integrate detection, feature learning,
and retrieval modules to be applicable to all types of image recognition tasks. Four sample solutions are provided,
including product recognition, vehicle recognition, logo recognition, and animation character recognition.
- Rich library of pre-trained models: Provide a total of 175 ImageNet pre-trained models of 36 series,
among which 7 selected series of models support fast structural modification.
- Comprehensive and easy-to-use feature learning components: 12 metric learning methods are integrated and can be
combined and switched at will through configuration files.
- SSLD knowledge distillation: The 14 classification pre-training models generally improved their accuracy by
more than 3%; among them, the ResNet50_vd model achieved a Top-1 accuracy of 84.0% on the Image-Net-1k dataset
and the Res2Net200_vd pre-training model achieved a Top-1 accuracy of 85.1%.
- Data augmentation: Provide 8 data augmentation algorithms such as AutoAugment, Cutout, Cutmix, etc.
with the detailed introduction, code replication, and evaluation of effectiveness in a unified experimental environment.
![img](../../images/recognition.gif)
For more information about the quick start of image recognition, algorithm details, model training and evaluation,
and prediction and deployment methods, please refer to the [README Tutorial](../../../README_en.md) on home page.
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册