diff --git a/docs/en/algorithm_introduction/.gitkeep b/docs/en/algorithm_introduction/.gitkeep deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/docs/en/data_preparation/classification_dataset_en.md b/docs/en/data_preparation/classification_dataset_en.md index 45aaa1cf944ceddf4745bed710341310cc674208..3903a002b34613570b5f681d4b68f1670139e4de 100644 --- a/docs/en/data_preparation/classification_dataset_en.md +++ b/docs/en/data_preparation/classification_dataset_en.md @@ -6,21 +6,21 @@ This document elaborates on the dataset format adopted by PaddleClas for image c ## Catalogue -- [1. Dataset Format](#1) -- [2. Common Datasets for Image Classification](#2) +- [1.Dataset Format](#1) +- [2.Common Datasets for Image Classification](#2) - [2.1 ImageNet1k](#2.1) - [2.2 Flowers102](#2.2) - [2.3 CIFAR10 / CIFAR100](#2.3) - [2.4 MNIST](#2.4) - [2.5 NUS-WIDE](#2.5) - -## 1. Dataset Format + +## 1.Dataset Format PaddleClas adopts `txt` files to assign the training and test sets. Taking the `ImageNet1k` dataset as an example, where `train_list.txt` and `val_list.txt` have the following formats: -```shell +``` # Separate the image path and annotation with "space" for each line # train_list.txt has the following format @@ -32,14 +32,13 @@ val/ILSVRC2012_val_00000001.JPEG 65 ... ``` - -## 2. Common Datasets for Image Classification + +## 2.Common Datasets for Image Classification Here we present a compilation of commonly used image classification datasets, which is continuously updated and expects your supplement. - ### 2.1 ImageNet1k [ImageNet](https://image-net.org/) is a large visual database for visual target recognition research with over 14 million manually labeled images. ImageNet-1k is a subset of the ImageNet dataset, which contains 1000 categories with 1281167 images for the training set and 50000 for the validation set. Since 2010, ImageNet began to hold an annual image classification competition, namely, the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) with ImageNet-1k as its specified dataset. To date, ImageNet-1k has become one of the most significant contributors to the development of computer vision, based on which numerous initial models of downstream computer vision tasks are trained. @@ -69,8 +68,8 @@ PaddleClas/dataset/ILSVRC2012/ |_ val_list.txt ``` - + ### 2.2 Flowers102 | Dataset | Size of Training Set | Size of Test Set | Number of Category | Note | @@ -106,24 +105,24 @@ PaddleClas/dataset/flowers102/ |_ val_list.txt ``` - + ### 2.3 CIFAR10 / CIFAR100 The CIFAR-10 dataset comprises 60,000 color images of 10 classes with 32x32 image resolution, each with 6,000 images including 5,000 images in the training set and 1,000 images in the validation set. The 10 different classes represent airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. The CIFAR-100 dataset is an extension of CIFAR-10 and consists of 60,000 color images of 100 classes with 32x32 image resolution, each with 600 images including 500 images in the training set and 100 images in the validation set. Website:http://www.cs.toronto.edu/~kriz/cifar.html - + ### 2.4 MNIST MMNIST is a renowned dataset for handwritten digit recognition and is used as an introductory sample for deep learning in many sources. It contains 60,000 images, 50,000 for the training set and 10,000 for the validation set, with a size of 28 * 28. Website:http://yann.lecun.com/exdb/mnist/ - + ### 2.5 NUS-WIDE NUS-WIDE is a multi-category dataset. It contains 269,648 images and 81 categories with each image being labeled as one or more of the 81 categories. diff --git a/docs/en/data_preparation/classification_dataset_en.md~6d5e2b2e3619279438ccbf6dcb63165dcc3b63ea b/docs/en/data_preparation/classification_dataset_en.md~6d5e2b2e3619279438ccbf6dcb63165dcc3b63ea deleted file mode 100644 index 3903a002b34613570b5f681d4b68f1670139e4de..0000000000000000000000000000000000000000 --- a/docs/en/data_preparation/classification_dataset_en.md~6d5e2b2e3619279438ccbf6dcb63165dcc3b63ea +++ /dev/null @@ -1,130 +0,0 @@ -# Image Classification Datasets - -This document elaborates on the dataset format adopted by PaddleClas for image classification tasks, as well as other common datasets in this field. - ------- - -## Catalogue - -- [1.Dataset Format](#1) -- [2.Common Datasets for Image Classification](#2) - - [2.1 ImageNet1k](#2.1) - - [2.2 Flowers102](#2.2) - - [2.3 CIFAR10 / CIFAR100](#2.3) - - [2.4 MNIST](#2.4) - - [2.5 NUS-WIDE](#2.5) - - - -## 1.Dataset Format - -PaddleClas adopts `txt` files to assign the training and test sets. Taking the `ImageNet1k` dataset as an example, where `train_list.txt` and `val_list.txt` have the following formats: - -``` -# Separate the image path and annotation with "space" for each line - -# train_list.txt has the following format -train/n01440764/n01440764_10026.JPEG 0 -... - -# val_list.txt has the following format -val/ILSVRC2012_val_00000001.JPEG 65 -... -``` - - - -## 2.Common Datasets for Image Classification - -Here we present a compilation of commonly used image classification datasets, which is continuously updated and expects your supplement. - - -### 2.1 ImageNet1k - -[ImageNet](https://image-net.org/) is a large visual database for visual target recognition research with over 14 million manually labeled images. ImageNet-1k is a subset of the ImageNet dataset, which contains 1000 categories with 1281167 images for the training set and 50000 for the validation set. Since 2010, ImageNet began to hold an annual image classification competition, namely, the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) with ImageNet-1k as its specified dataset. To date, ImageNet-1k has become one of the most significant contributors to the development of computer vision, based on which numerous initial models of downstream computer vision tasks are trained. - -| Dataset | Size of Training Set | Size of Test Set | Number of Category | Note | -| ------------------------------------------------------------ | -------------------- | ---------------- | ------------------ | ---- | -| [ImageNet1k](http://www.image-net.org/challenges/LSVRC/2012/) | 1.2M | 50k | 1000 | | - -After downloading the data from official sources, organize it in the following format to train with the ImageNet1k dataset in PaddleClas. - -``` -PaddleClas/dataset/ILSVRC2012/ -|_ train/ -| |_ n01440764 -| | |_ n01440764_10026.JPEG -| | |_ ... -| |_ ... -| | -| |_ n15075141 -| |_ ... -| |_ n15075141_9993.JPEG -|_ val/ -| |_ ILSVRC2012_val_00000001.JPEG -| |_ ... -| |_ ILSVRC2012_val_00050000.JPEG -|_ train_list.txt -|_ val_list.txt -``` - - - -### 2.2 Flowers102 - -| Dataset | Size of Training Set | Size of Test Set | Number of Category | Note | -| ------------------------------------------------------------ | -------------------- | ---------------- | ------------------ | ---- | -| [flowers102](https://www.robots.ox.ac.uk/~vgg/data/flowers/102/) | 1k | 6k | 102 | | - -Unzip the downloaded data to see the following directory. - -``` -jpg/ -setid.mat -imagelabels.mat -``` - -Place the files above under `PaddleClas/dataset/flowers102/` . - -Run `generate_flowers102_list.py` to generate `train_list.txt` and `val_list.txt`: - -``` -python generate_flowers102_list.py jpg train > train_list.txt -python generate_flowers102_list.py jpg valid > val_list.txt -``` - -Structure the data as follows: - -``` -PaddleClas/dataset/flowers102/ -|_ jpg/ -| |_ image_03601.jpg -| |_ ... -| |_ image_02355.jpg -|_ train_list.txt -|_ val_list.txt -``` - - - -### 2.3 CIFAR10 / CIFAR100 - -The CIFAR-10 dataset comprises 60,000 color images of 10 classes with 32x32 image resolution, each with 6,000 images including 5,000 images in the training set and 1,000 images in the validation set. The 10 different classes represent airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. The CIFAR-100 dataset is an extension of CIFAR-10 and consists of 60,000 color images of 100 classes with 32x32 image resolution, each with 600 images including 500 images in the training set and 100 images in the validation set. - -Website:http://www.cs.toronto.edu/~kriz/cifar.html - - - -### 2.4 MNIST - -MMNIST is a renowned dataset for handwritten digit recognition and is used as an introductory sample for deep learning in many sources. It contains 60,000 images, 50,000 for the training set and 10,000 for the validation set, with a size of 28 * 28. - -Website:http://yann.lecun.com/exdb/mnist/ - - - -### 2.5 NUS-WIDE - -NUS-WIDE is a multi-category dataset. It contains 269,648 images and 81 categories with each image being labeled as one or more of the 81 categories. - -Website:https://lms.comp.nus.edu.sg/wp-content/uploads/2019/research/nuswide/NUS-WIDE.html diff --git a/docs/en/data_preparation/recognition_dataset_en.md b/docs/en/data_preparation/recognition_dataset_en.md index 3a2f6a524e4060643506ff3f59f6529d0cde2c39..32a2798463d32af8c1d83232667dcdb7a4393d7a 100644 --- a/docs/en/data_preparation/recognition_dataset_en.md +++ b/docs/en/data_preparation/recognition_dataset_en.md @@ -6,18 +6,18 @@ This document elaborates on the dataset format adopted by PaddleClas for image r ## Catalogue -- [1. Dataset Format](#1) -- [2. Common Datasets for Image Recognition](#2) +- [1.Dataset Format](#1) +- [2.Common Datasets for Image Recognition](#2) - [2.1 General Datasets](#2.1) - - [2.2 Vertical Datasets](#2.2) + - [2.2 Vertical Class Datasets](#2.2) - [2.2.1 Animation Character Recognition](#2.2.1) - [2.2.2 Product Recognition](#2.2.2) - [2.2.3 Logo Recognition](#2.2.3) - [2.2.4 Vehicle Recognition](#2.2.4) - -## 1. Dataset Format + +## 1.Dataset Format The dataset for the vector search, unlike those for classification tasks, is divided into the following three parts: @@ -27,7 +27,7 @@ The dataset for the vector search, unlike those for classification tasks, is div The above three datasets all adopt `txt` files for assignment. Taking the `CUB_200_2011` dataset as an example, the `train_list.txt` of the train dataset has the following format: -```shell +``` # Use "space" as the separator ... train/99/Ovenbird_0136_92859.jpg 99 2 @@ -38,7 +38,7 @@ train/99/Ovenbird_0128_93366.jpg 99 6 The `test_list.txt` of the query dataset (both gallery dataset and query dataset in`CUB_200_2011`) has the following format: -```shell +``` # Use "space" as the separator ... test/200/Common_Yellowthroat_0126_190407.jpg 200 1 @@ -55,14 +55,13 @@ Each row of data is separated by "space", and the three columns of data stand fo 2. When the gallery dataset and query dataset are different, there is no need to add a unique id. Both `query_list.txt` and `gallery_list.txt` contain two columns, which are the path and label information of the training data. The dataset of yaml configuration file is ` ImageNetDataset`. - -## 2. Common Datasets for Image Recognition + +## 2.Common Datasets for Image Recognition Here we present a compilation of commonly used image recognition datasets, which is continuously updated and expects your supplement. - ### 2.1 General Datasets - SOP: The SOP dataset is a common product dataset in general recognition research and MetricLearning technology research, which contains 120,053 images of 22,634 products downloaded from eBay.com. There are 59,551 images of 11,318 in the training set and 60,502 images of 11,316 categories in the validation set. @@ -79,12 +78,12 @@ Here we present a compilation of commonly used image recognition datasets, which Website: http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html + +### 2.2 Vertical Class Datasets -### 2.2 Vertical Datasets - #### 2.2.1 Animation Character Recognition - iCartoonFace: iCartoonFace, developed by iQiyi (an online video platform), is the world's largest manual labeled detection and recognition dataset for cartoon characters, which contains more than 5013 cartoon characters and 389,678 high-quality live images. Compared with other datasets, it boasts features of large scale, high quality, rich diversity, and challenging difficulty, making it one of the most commonly used datasets to study cartoon character recognition. @@ -99,8 +98,8 @@ Here we present a compilation of commonly used image recognition datasets, which Website: http://cvit.iiit.ac.in/research/projects/cvit-projects/cartoonfaces - + #### 2.2.2 Product Recognition - AliProduct: The AliProduct dataset is the largest open source product dataset. As an SKU-level image classification dataset, it contains 50,000 categories and 3 million images, ranking the first in both aspects in the industry. This dataset covers a large number of household goods, food, etc. Due to its lack of manual annotation, the data is messy and unevenly distributed with many similar product images. @@ -113,8 +112,8 @@ Here we present a compilation of commonly used image recognition datasets, which - DeepFashion-Inshop: The same as the common datasets In-shop Clothes. - + ### 2.2.3 Logo Recognition - Logo-2K+: Logo-2K+ is a dataset exclusively for logo image recognition, which contains 10 major categories, 2341 minor categories, and 167,140 images. @@ -125,8 +124,8 @@ Here we present a compilation of commonly used image recognition datasets, which Website: https://cg.cs.tsinghua.edu.cn/traffic-sign/ - + ### 2.2.4 Vehicle Recognition - CompCars: The images, 136,726 images of the whole car and 27,618 partial ones, are mainly from network and surveillance data. The network data contains 163 vehicle manufacturers and 1,716 vehicle models and includes the bounding box, viewing angle, and 5 attributes (maximum speed, displacement, number of doors, number of seats, and vehicle type). And the surveillance data comprises 50,000 front view images. diff --git a/docs/en/data_preparation/recognition_dataset_en.md~6d5e2b2e3619279438ccbf6dcb63165dcc3b63ea b/docs/en/data_preparation/recognition_dataset_en.md~6d5e2b2e3619279438ccbf6dcb63165dcc3b63ea deleted file mode 100644 index 32a2798463d32af8c1d83232667dcdb7a4393d7a..0000000000000000000000000000000000000000 --- a/docs/en/data_preparation/recognition_dataset_en.md~6d5e2b2e3619279438ccbf6dcb63165dcc3b63ea +++ /dev/null @@ -1,141 +0,0 @@ -# Image Recognition Datasets - -This document elaborates on the dataset format adopted by PaddleClas for image recognition tasks, as well as other common datasets in this field. - ------- - -## Catalogue - -- [1.Dataset Format](#1) -- [2.Common Datasets for Image Recognition](#2) - - [2.1 General Datasets](#2.1) - - [2.2 Vertical Class Datasets](#2.2) - - [2.2.1 Animation Character Recognition](#2.2.1) - - [2.2.2 Product Recognition](#2.2.2) - - [2.2.3 Logo Recognition](#2.2.3) - - [2.2.4 Vehicle Recognition](#2.2.4) - - - -## 1.Dataset Format - -The dataset for the vector search, unlike those for classification tasks, is divided into the following three parts: - -- Train dataset: Used to train the model to learn the image features involved. -- Gallery dataset: Used to provide the gallery data in the vector search task. It can either be the same as the train or query datasets or different, and when it is the same as the train dataset, the category system of the query dataset and train dataset should be the same. -- Query dataset: Used to test the performance of the model. It usually extracts features from each query image of the dataset, followed by distance matching with those in the gallery dataset to get the recognition results, based on which the metrics of the whole query dataset are calculated. - -The above three datasets all adopt `txt` files for assignment. Taking the `CUB_200_2011` dataset as an example, the `train_list.txt` of the train dataset has the following format: - -``` -# Use "space" as the separator -... -train/99/Ovenbird_0136_92859.jpg 99 2 -... -train/99/Ovenbird_0128_93366.jpg 99 6 -... -``` - -The `test_list.txt` of the query dataset (both gallery dataset and query dataset in`CUB_200_2011`) has the following format: - -``` -# Use "space" as the separator -... -test/200/Common_Yellowthroat_0126_190407.jpg 200 1 -... -test/200/Common_Yellowthroat_0114_190501.jpg 200 6 -... -``` - -Each row of data is separated by "space", and the three columns of data stand for the path, label information, and unique id of training data. - -**Note**: - -1. When the gallery dataset and query dataset are the same, to remove the first retrieved data (the images themselves require no evaluation), each data should have its unique id (ensuring that each image has a different id, which can be represented by the row number) for subsequent evaluation of mAP, recall@1, and other metrics. The dataset of yaml configuration file is `VeriWild`. - -2. When the gallery dataset and query dataset are different, there is no need to add a unique id. Both `query_list.txt` and `gallery_list.txt` contain two columns, which are the path and label information of the training data. The dataset of yaml configuration file is ` ImageNetDataset`. - - - -## 2.Common Datasets for Image Recognition - -Here we present a compilation of commonly used image recognition datasets, which is continuously updated and expects your supplement. - - -### 2.1 General Datasets - -- SOP: The SOP dataset is a common product dataset in general recognition research and MetricLearning technology research, which contains 120,053 images of 22,634 products downloaded from eBay.com. There are 59,551 images of 11,318 in the training set and 60,502 images of 11,316 categories in the validation set. - - Website: https://cvgl.stanford.edu/projects/lifted_struct/ - -- Cars196: The Cars dataset contains 16,185 images of 196 categories of cars. The data is classified into 8144 training images and 8041 query images, with each category split roughly in a 50-50 ratio. The classification is normally based on the manufacturing, model and year of the car, e.g. 2012 Tesla Model S or 2012 BMW M3 coupe. - - Website: https://ai.stanford.edu/~jkrause/cars/car_dataset.html - -- CUB_200_2011: The CUB_200_2011 dataset is a fine-grained dataset proposed by the California Institute of Technology (Caltech) in 2010 and is currently the benchmark image dataset for fine-grained classification recognition research. There are 11788 bird images in this dataset with 200 subclasses, including 5994 images in the train dataset and 5794 images in the query dataset. Each image provides label information, the bounding box of the bird, the key part information of the bird, and the attribute of the bird. The dataset is shown in the figure below. - -- In-shop Clothes: In-shop Clothes is one of the 4 subsets of the DeepFashion dataset. It is a seller show image dataset with multi-angle images of each product id being collected in the same folder. The dataset contains 7982 items with 52712 images, each with 463 attributes, Bbox, landmarks, and store descriptions. - - Website: http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html - - - -### 2.2 Vertical Class Datasets - - - -#### 2.2.1 Animation Character Recognition - -- iCartoonFace: iCartoonFace, developed by iQiyi (an online video platform), is the world's largest manual labeled detection and recognition dataset for cartoon characters, which contains more than 5013 cartoon characters and 389,678 high-quality live images. Compared with other datasets, it boasts features of large scale, high quality, rich diversity, and challenging difficulty, making it one of the most commonly used datasets to study cartoon character recognition. - -- Website: http://challenge.ai.iqiyi.com/detail?raceId=5def69ace9fcf68aef76a75d - -- Manga109: Manga109 is a dataset released in May 2020 for the study of cartoon character detection and recognition, which contains 21142 images and is officially banned from commercial use. Manga109-s, a subset of this dataset, is available for industrial use, mainly for tasks such as text detection, sketch line-based search, and character image generation. - - Website:http://www.manga109.org/en/ - -- IIT-CFW: The IIF-CFW dataset contains a total of 8928 labeled cartoon portraits of celebrity characters, covering 100 characters with varying numbers of portraits for each. In addition, it also provides 1000 real face photos (10 real portraits for 100 public figures). This dataset can be employed to study both animation character recognition and cross-modal search tasks. - - Website: http://cvit.iiit.ac.in/research/projects/cvit-projects/cartoonfaces - - - -#### 2.2.2 Product Recognition - -- AliProduct: The AliProduct dataset is the largest open source product dataset. As an SKU-level image classification dataset, it contains 50,000 categories and 3 million images, ranking the first in both aspects in the industry. This dataset covers a large number of household goods, food, etc. Due to its lack of manual annotation, the data is messy and unevenly distributed with many similar product images. - - Website: https://retailvisionworkshop.github.io/recognition_challenge_2020/ - -- Product-10k: Products-10k dataset has all its images from Jingdong Mall, covering 10,000 frequently purchased SKUs that are organized into a hierarchy. In total, there are nearly 190,000 images. In the real application scenario, the distribution of image volume is uneven. All images are manually checked/labeled by a team of production experts. - - Website:https://www.kaggle.com/c/products-10k/data?select=train.csv - -- DeepFashion-Inshop: The same as the common datasets In-shop Clothes. - - - -### 2.2.3 Logo Recognition - -- Logo-2K+: Logo-2K+ is a dataset exclusively for logo image recognition, which contains 10 major categories, 2341 minor categories, and 167,140 images. - - Website: https://github.com/msn199959/Logo-2k-plus-Dataset - -- Tsinghua-Tencent 100K: This dataset is a large traffic sign benchmark dataset based on 100,000 Tencent Street View panoramas. 30,000 traffic sign instances included, it provides 100,000 images covering a wide range of illumination, and weather conditions. Each traffic sign in the benchmark test is labeled with the category, bounding box and pixel mask. A total of 222 categories (0 background + 221 traffic signs) are incorporated. - - Website: https://cg.cs.tsinghua.edu.cn/traffic-sign/ - - - -### 2.2.4 Vehicle Recognition - -- CompCars: The images, 136,726 images of the whole car and 27,618 partial ones, are mainly from network and surveillance data. The network data contains 163 vehicle manufacturers and 1,716 vehicle models and includes the bounding box, viewing angle, and 5 attributes (maximum speed, displacement, number of doors, number of seats, and vehicle type). And the surveillance data comprises 50,000 front view images. - - Website: http://mmlab.ie.cuhk.edu.hk/datasets/comp_cars/ - -- BoxCars: The dataset contains a total of 21,250 vehicles, 63,750 images, 27 vehicle manufacturers, and 148 subcategories. All of them are derived from surveillance data. - - Website: https://github.com/JakubSochor/BoxCars - -- PKU-VD Dataset: The dataset contains two large vehicle datasets (VD1 and VD2) that capture images from real-world unrestricted scenes in two cities. VD1 is obtained from high-resolution traffic cameras, while images in VD2 are acquired from surveillance videos. The authors have performed vehicle detection on the raw data to ensure that each image contains only one vehicle. Due to privacy constraints, all the license numbers have been obscured with black overlays. All images are captured from the front view, and diverse attribute annotations are provided for each image in the dataset, including identification numbers, accurate vehicle models, and colors. VD1 originally contained 1097649 images, 1232 vehicle models, and 11 vehicle colors, and remains 846358 images and 141756 vehicles after removing images with multiple vehicles inside and those taken from the rear of the vehicle. VD2 contains 807260 images, 79763 vehicles, 1112 vehicle models, and 11 vehicle colors. - - Website: https://pkuml.org/resources/pku-vds.html diff --git a/docs/en/faq_en.md b/docs/en/faq_en.md deleted file mode 100644 index 1aa60bdf1b259a91811be21b8c729e576b67fc99..0000000000000000000000000000000000000000 --- a/docs/en/faq_en.md +++ /dev/null @@ -1,48 +0,0 @@ -# FAQ - ->> -* Why are the metrics different for different cards? -* A: Fleet is the default option for the use of PaddleClas. Each GPU card is taken as a single trainer and deals with different images, which cause the final small difference. Single card evalution is suggested to get the accurate results if you use `tools/eval.py`. You can also use `tools/eval_multi_platform.py` to evalute the models on multiple GPU cards, which is also supported on Windows and CPU. - - ->> -* Q: Why `Mixup` or `Cutmix` is not used even if I have already add the data operation in the configuration file? -* A: When using `Mixup` or `Cutmix`, you also need to add `use_mix: True` in the configuration file to make it work properly. - - ->> -* Q: During evaluation and inference, pretrained model address is assgined, but the weights can not be imported. Why? -* A: Prefix of the pretrained model is needed. For example, if the pretained weights are located in `output/ResNet50_vd/19`, with the filename `output/ResNet50_vd/19/ppcls.pdparams`, then `pretrained_model` in the configuration file needs to be `output/ResNet50_vd/19/ppcls`. - ->> -* Q: Why are the metrics 0.3% lower than that shown in the model zoo for `EfficientNet` series of models? -* A: Resize method is set as `Cubic` for `EfficientNet`(interpolation is set as 2 in OpenCV), while other models are set as `Bilinear`(interpolation is set as None in OpenCV). Therefore, you need to modify the interpolation explicitly in `ResizeImage`. Specifically, the following configuration is a demo for EfficientNet. - -``` -VALID: - batch_size: 16 - num_workers: 4 - file_list: "./dataset/ILSVRC2012/val_list.txt" - data_dir: "./dataset/ILSVRC2012/" - shuffle_seed: 0 - transforms: - - DecodeImage: - to_rgb: True - to_np: False - channel_first: False - - ResizeImage: - resize_short: 256 - interpolation: 2 - - CropImage: - size: 224 - - NormalizeImage: - scale: 1.0/255.0 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' - - ToCHWImage: -``` - ->> -* Q: The error occured when using visualdl under python2, shows that: `TypeError: __init__() missing 1 required positional argument: 'sync_cycle'`. -* A: `Visualdl` is only supported on python3 as now, whose version needs also be higher than `2.0`. If your visualdl version is lower than 2.0, you can also install visualdl 2.0 by `pip3 install visualdl==2.0.0b8 -i https://mirror.baidu.com/pypi/simple`. diff --git a/docs/en/faq_series/faq_2020_s1_en.md b/docs/en/faq_series/faq_2020_s1_en.md index 8499668d55ab0694b903bf404174e161f3c08572..de5a50c87233296f2a2543600ba1c7d8fb16671c 100644 --- a/docs/en/faq_series/faq_2020_s1_en.md +++ b/docs/en/faq_series/faq_2020_s1_en.md @@ -79,9 +79,9 @@ Not really, increasing all the convolutional kernels in the network may not lead **A**:The process is as follows: -- First, create a new model structure file under the folder ppcls/arch/backbone/model_zoo/, i.e. your own backbone. You can refer to resnet.py for model construction; -- Then add your own backbone class in ppcls/arch/backbone/\__init__.py; -- Next, configure the yaml file for training, here you can refer to ppcls/configs/ImageNet/ResNet/ResNet50.yaml; +- First, create a new model structure file under the folder `ppcls/arch/backbone/model_zoo/`, i.e. your own backbone. You can refer to resnet.py for model construction; +- Then add your own backbone class in `ppcls/arch/backbone/__init__.py`; +- Next, configure the yaml file for training, here you can refer to `ppcls/configs/ImageNet/ResNet/ResNet50.yaml`; - Now you can start the training. ### Q2.2: How to transfer the existing models and weights to your own classification tasks? @@ -96,7 +96,7 @@ Not really, increasing all the convolutional kernels in the network may not lead **A**: -The default parameter of the configuration file under ppcls/configs/ImageNet/ in PaddleClas is the training parameter of ImageNet-1k, which is not suitable for all datasets, and the specific datasets need to be further debugged on this basis. +The default parameter of the configuration file under `ppcls/configs/ImageNet/` in PaddleClas is the training parameter of ImageNet-1k, which is not suitable for all datasets, and the specific datasets need to be further debugged on this basis. ### Q2.4 The resolution varies for different models in PaddleClas, so what is the standard? diff --git a/docs/en/faq_series/faq_2021_s1_en.md b/docs/en/faq_series/faq_2021_s1_en.md index d9ffc2074004a56e222a89c3a43c9ae762d05350..c730f25f6e04c65d3d77729b64fda541ad0cf86d 100644 --- a/docs/en/faq_series/faq_2021_s1_en.md +++ b/docs/en/faq_series/faq_2021_s1_en.md @@ -41,7 +41,7 @@ This may be caused by the small shared memory in docker. When creating docker, t **A**: -Based on ResNet50_vd, Baidu open-sourced its own large-scale classification pre-training model with 100,000 categories and 43 million images. The former is available for download at [download address](https://paddle-imagenet-models-name.bj.bcebos.com/ ResNet50_vd_10w_pretrained.tar), where it should be noted that the pre-training model does not provide the final FC layer parameters and thus cannot be used directly for inference; however, it can be used as a pre-training model to fine-tune it on your own dataset. It is verified that this pre-training model has a more significant accuracy gain of up to 30% on different datasets than the ResNet50_vd pre-training model based on the ImageNet1k dataset. +Based on ResNet50_vd, Baidu open-sourced its own large-scale classification pre-training model with 100,000 categories and 43 million images. The former is available for download at [download address](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_10w_pretrained.tar), where it should be noted that the pre-training model does not provide the final FC layer parameters and thus cannot be used directly for inference; however, it can be used as a pre-training model to fine-tune it on your own dataset. It is verified that this pre-training model has a more significant accuracy gain of up to 30% on different datasets than the ResNet50_vd pre-training model based on the ImageNet1k dataset. ### Q1.5 How to accelerate when using C++ for inference deployment? diff --git a/docs/en/faq_series/faq_2021_s2_en.md b/docs/en/faq_series/faq_2021_s2_en.md index f7ed5548edcc85477c3ecb2b03152ea62bba2816..c18da5b094c13cbe4fb07c28753ab375d79d0bb4 100644 --- a/docs/en/faq_series/faq_2021_s2_en.md +++ b/docs/en/faq_series/faq_2021_s2_en.md @@ -53,7 +53,7 @@ w_t+1 = w_t - v_t+1 Here `m` is the `momentum`, which is the weighted value of the cumulative momentum, generally taken as `0.9`. And when the value is less than `1`, the earlier the gradient is, the smaller the impact on the current. For example, when the momentum parameter `m` takes `0.9`, the weighted value of the gradient of `t-5` is `0.9 ^ 5 = 0.59049` at time `t`, while the value at time `t-2` is `0.9 ^ 2 = 0.81`. Therefore, it is intuitive that gradient information that is too "far away" is of little significance for the current reference, while "recent" historical gradient information matters more. -[![img](https://github.com/PaddlePaddle/PaddleClas/raw/release/2.3/docs/images/faq/momentum.jpeg)](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/images/faq/momentum.jpeg) +[](../../images/faq/momentum.jpeg) By introducing the concept of momentum, the effect of historical updates is taken into account in parameter updates, thus speeding up the convergence and improving the loss (cost, loss) oscillation caused by the `SGD` optimizer. @@ -93,7 +93,7 @@ Among them, RandAngment provides a variety of random combinations of data augmen **A**: -The training data is a randomly selected subset of publicly available datasets such as COCO, Object365, RPC, and LogoDet. We are currently introducing an ultra-lightweight mainbody detection model in version 2.3, which can be found in [Mainbody Detection](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/image_recognition_ pipeline/mainbody_detection.md#2-Model Selection). +The training data is a randomly selected subset of publicly available datasets such as COCO, Object365, RPC, and LogoDet. We are currently introducing an ultra-lightweight mainbody detection model in version 2.3, which can be found in [Mainbody Detection](../../en/image_recognition_pipeline/mainbody_detection_en.md#2-model-selection). #### Q1.4.3: Is there any false detections in some scenarios with the current mainbody detection model? @@ -109,7 +109,7 @@ The training data is a randomly selected subset of publicly available datasets s `circle loss` is a unified form of sample pair learning and classification learning, and `triplet loss` can be added if it is a classification learning. -#### Q1.5.2 如果不是识别开源的四个方向的图片,该使用哪个识别模型?Which recognition model is better if not to recognize open source images in all four directions? +#### Q1.5.2 Which recognition model is better if not to recognize open source images in all four directions? **A**: @@ -196,8 +196,8 @@ PaddleClas saves/updates the following three types of models during training. **A**: -- For `Mixup`, please refer to [Mixup](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/ppcls/configs/ImageNet/DataAugment/ResNet50_ Mixup.yaml#L63-L65); and`Cuxmix`, please refer to [Cuxmix](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/ppcls/configs/ImageNet/ DataAugment/ResNet50_Cutmix.yaml#L63-L65). -- The training accuracy (Acc) metric cannot be calculated when using `Mixup` or `Cutmix` for training, so you need to remove the `Metric.Train.TopkAcc` field in the configuration file, please refer to [Metric.Train.TopkAcc](https://github.com/ PaddlePaddle/PaddleClas/blob/release/2.3/ppcls/configs/ImageNet/DataAugment/ResNet50_Cutmix.yaml#L125-L128). +- For `Mixup`, please refer to [Mixup](../../../ppcls/configs/ImageNet/DataAugment/ResNet50_ Mixup.yaml#L63-L65); and`Cuxmix`, please refer to [Cuxmix](../../../ppcls/configs/ImageNet/DataAugment/ResNet50_Cutmix.yaml#L63-L65). +- The training accuracy (Acc) metric cannot be calculated when using `Mixup` or `Cutmix` for training, so you need to remove the `Metric.Train.TopkAcc` field in the configuration file, please refer to [Metric.Train.TopkAcc](../../../ppcls/configs/ImageNet/DataAugment/ResNet50_Cutmix.yaml#L125-L128). #### Q2.1.9: What are the fields `Global.pretrain_model` and `Global.checkpoints` used for in the training configuration file yaml? @@ -244,11 +244,11 @@ PaddleClas saves/updates the following three types of models during training. #### Q2.4.1: Why is `Illegal instruction` reported during the recognition inference? -**A**:If you are using the release/2.2 branch, it is recommended to update it to the release/2.3 branch, where we replaced the Möbius search model with the faiss search module, as described in [Vector Search Tutorial](https://github.com/PaddlePaddle/ PaddleClas/blob/release/2.3/deploy/vector_search/README.md). If you still have problems, you can contact us in the WeChat group or raise an issue on GitHub. +**A**:If you are using the release/2.2 branch, it is recommended to update it to the release/2.3 branch, where we replaced the Möbius search model with the faiss search module, as described in [Vector Search Tutorial](../image_recognition_pipeline/vector_search_en.md). If you still have problems, you can contact us in the WeChat group or raise an issue on GitHub. #### Q2.4.2: How can recognition models be fine-tuned to train on the basis of pre-trained models? -**A**:The fine-tuning training of the recognition model is similar to that of the classification model. The recognition model can be loaded with a pre-trained model of the product, and the training process can be found in [recognition model training](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/ models_training/recognition.md), and we will continue to refine the documentation. +**A**:The fine-tuning training of the recognition model is similar to that of the classification model. The recognition model can be loaded with a pre-trained model of the product, and the training process can be found in [recognition model training](../../models_training/recognition_en.md), and we will continue to refine the documentation. #### Q2.4.3: Why does it fail to run all mini-batches in each epoch when training metric learning? @@ -268,13 +268,13 @@ PaddleClas saves/updates the following three types of models during training. #### Q2.5.2: Do I need to rebuild the index to add new base data? -**A**:Starting from release/2.3 branch, we have replaced the Möbius search model with the faiss search module, which already supports the addition of base data without building the base library, as described in [Vector Search Tutorial](https://github.com/PaddlePaddle/PaddleClas/blob/ release/2.3/deploy/vector_search/README.md). +**A**:Starting from release/2.3 branch, we have replaced the Möbius search model with the faiss search module, which already supports the addition of base data without building the base library, as described in [Vector Search Tutorial](../image_recognition_pipeline/vector_search_en.md). #### Q2.5.3: How to deal with the reported error clang: error: unsupported option '-fopenmp' when recompiling index.so in Mac? **A**: -If you are using the release/2.2 branch, it is recommended to update it to the release/2.3 branch, where we replaced the Möbius search model with the faiss search module, as described in [Vector Search Tutorial](https://github.com/PaddlePaddle/ PaddleClas/blob/release/2.3/deploy/vector_search/README.md). If you still have problems, you can contact us in the user WeChat group or raise an issue on GitHub. +If you are using the release/2.2 branch, it is recommended to update it to the release/2.3 branch, where we replaced the Möbius search model with the faiss search module, as described in [Vector Search Tutorial](../image_recognition_pipeline/vector_search_en.md). If you still have problems, you can contact us in the user WeChat group or raise an issue on GitHub. #### Q2.5.4: How to set the parameter `pq_size` when build searches the base library? @@ -288,7 +288,7 @@ If you are using the release/2.2 branch, it is recommended to update it to the r #### Q2.6.1: How to add the parameter of a module that is enabled by hub serving? -**A**:See [hub serving parameters](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/deploy/hubserving/clas/params.py) for more details. +**A**:See [hub serving parameters](../../../deploy/hubserving/clas/params.py) for more details. #### Q2.6.2: Why is the result not accurate enough when exporting the inference model for inference deployment? @@ -327,13 +327,13 @@ pip install paddle2onnx - `params_filename`: this parameter is used to specify the path of the `.pdiparams` file under the parameter `model_dir`. - `save_file`: this parameter is used to specify the path to the directory where the converted model is saved. - For the conversion of a non-`combined` format inference model exported from a static diagram (usually containing the file `__model__` and multiple parameter files), and more parameter descriptions, please refer to the official documentation of [paddle2onnx](https://github.com/ PaddlePaddle/Paddle2ONNX/blob/develop/README_zh.md#Parameter options). + For the conversion of a non-`combined` format inference model exported from a static diagram (usually containing the file `__model__` and multiple parameter files), and more parameter descriptions, please refer to the official documentation of [paddle2onnx](https://github.com/PaddlePaddle/Paddle2ONNX/blob/develop/README.md#parameters). - Exporting ONNX format models directly from the model networking code. Take the model networking code of dynamic graphs as an example, the model class is a subclass that inherits from `paddle.nn.Layer` and the code is shown below: - ``` + ```python import paddle from paddle.static import InputSpec diff --git a/docs/en/faq_series/faq_selected_30_en.md b/docs/en/faq_series/faq_selected_30_en.md index 0121ddeb4c3873aac040d6863409586c56a9165e..54f6ca18182b6bf40dc37932f0451c657e02b258 100644 --- a/docs/en/faq_series/faq_selected_30_en.md +++ b/docs/en/faq_series/faq_selected_30_en.md @@ -237,7 +237,7 @@ > > - Q: Why `TypeError: __init__() missing 1 required positional argument: 'sync_cycle'` is reported when using visualdl under python2? -- A: Currently visualdl only supports running under python3 with a required version of 2.0 or higher. If visualdl is not the right version, you can install it as follows: `pip3 install visualdl -i https://mirror.baidu.com/pypi/ simple` +- A: Currently visualdl only supports running under python3 with a required version of 2.0 or higher. If visualdl is not the right version, you can install it as follows: `pip3 install visualdl -i https://mirror.baidu.com/pypi/simple` > > @@ -252,7 +252,7 @@ > > - Q: How to train the model on windows or cpu? -- A: You can refer to [Getting Started Tutorial](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/models_training/classification.md) for detailed tutorials on model training, evaluation and inference in Linux , Windows, CPU, and other environments. +- A: You can refer to [Getting Started Tutorial](../models_training/classification_en.md) for detailed tutorials on model training, evaluation and inference in Linux , Windows, CPU, and other environments. > > @@ -275,12 +275,12 @@ Loss: > > - Q: Why is `Error: Pass tensorrt_subgraph_pass has not been registered` reported When using `deploy/python/predict_cls.py` for model prediction? -- A: If you want to use TensorRT for model prediction and inference, you need to install or compile PaddlePaddle with TensorRT by yourself. For Linux, Windows, macOS users, you can refer to [download inference library](https://paddleinference. paddlepaddle.org.cn/user_guides/download_lib.html). If there is no required version, you need to compile and install it locally, which is detailed in [source code compilation](https://paddleinference.paddlepaddle.org .cn/user_guides/source_compile.html). +- A: If you want to use TensorRT for model prediction and inference, you need to install or compile PaddlePaddle with TensorRT by yourself. For Linux, Windows, macOS users, you can refer to [download inference library](https://paddleinference.paddlepaddle.org.cn/user_guides/download_lib.html). If there is no required version, you need to compile and install it locally, which is detailed in [source code compilation](https://paddleinference.paddlepaddle.org.cn/user_guides/source_compile.html). > > - Q: How to train with Automatic Mixed Precision (AMP) during training? -- A: You can refer to [ResNet50_fp16.yaml](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/ppcls/configs/ImageNet/ResNet/ResNet50_fp16. yaml). Specifically, if you want your configuration file to support automatic mixed precision during model training, you can add the following information to the file. +- A: You can refer to [ResNet50_fp16.yaml](../../../ppcls/configs/ImageNet/ResNet/ResNet50_fp16.yaml). Specifically, if you want your configuration file to support automatic mixed precision during model training, you can add the following information to the file. ``` # mixed precision training diff --git a/docs/en/image_recognition_pipeline/.gitkeep b/docs/en/image_recognition_pipeline/.gitkeep deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/docs/en/introduction/function_intro_en.md~6d5e2b2e3619279438ccbf6dcb63165dcc3b63ea b/docs/en/introduction/function_intro_en.md~6d5e2b2e3619279438ccbf6dcb63165dcc3b63ea deleted file mode 100644 index 013441b8846d2e28387c6e7095d91dbf0ab49f2c..0000000000000000000000000000000000000000 --- a/docs/en/introduction/function_intro_en.md~6d5e2b2e3619279438ccbf6dcb63165dcc3b63ea +++ /dev/null @@ -1,23 +0,0 @@ -## Features of PaddleClas - -PaddleClas is an image recognition toolset for industry and academia, -helping users train better computer vision models and apply them in real scenarios. -Specifically, it contains the following core features. - -- Practical image recognition system: Integrate detection, feature learning, -and retrieval modules to be applicable to all types of image recognition tasks. Four sample solutions are provided, -including product recognition, vehicle recognition, logo recognition, and animation character recognition. -- Rich library of pre-trained models: Provide a total of 175 ImageNet pre-trained models of 36 series, -among which 7 selected series of models support fast structural modification. -- Comprehensive and easy-to-use feature learning components: 12 metric learning methods are integrated and can be -combined and switched at will through configuration files. -- SSLD knowledge distillation: The 14 classification pre-training models generally improved their accuracy by -more than 3%; among them, the ResNet50_vd model achieved a Top-1 accuracy of 84.0% on the Image-Net-1k dataset -and the Res2Net200_vd pre-training model achieved a Top-1 accuracy of 85.1%. -- Data augmentation: Provide 8 data augmentation algorithms such as AutoAugment, Cutout, Cutmix, etc. -with the detailed introduction, code replication, and evaluation of effectiveness in a unified experimental environment. - -![img](../../images/recognition.gif) - -For more information about the quick start of image recognition, algorithm details, model training and evaluation, -and prediction and deployment methods, please refer to the [README Tutorial](../../../README_en.md) on home page. diff --git a/docs/en/models_training/.gitkeep b/docs/en/models_training/.gitkeep deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/docs/en/others/train_on_xpu_en.md b/docs/en/others/train_on_xpu_en.md index a6f9f72b25fc946217d53d14f30637bd076cd727..5fb44b1b278ae5d29e55f85f75ed1c653671b1cb 100644 --- a/docs/en/others/train_on_xpu_en.md +++ b/docs/en/others/train_on_xpu_en.md @@ -75,12 +75,12 @@ python3.7 ppcls/static/train.py \ ``` python3.7 ppcls/static/train.py \ - -c ppcls/configs/quick_start/VGG16_finetune_kunlun.yaml \ + -c ppcls/configs/quick_start/kunlun/VGG16_finetune_kunlun.yaml \ -o use_gpu=False \ -o use_xpu=True \ -o is_distributed=False python3.7 ppcls/static/train.py \ - -c ppcls/configs/quick_start/VGG19_finetune_kunlun.yaml \ + -c ppcls/configs/quick_start/kunlun/VGG19_finetune_kunlun.yaml \ -o use_gpu=False \ -o use_xpu=True \ -o is_distributed=False diff --git a/docs/en/quick_start/.gitkeep b/docs/en/quick_start/.gitkeep deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/docs/zh_CN/others/train_on_xpu.md b/docs/zh_CN/others/train_on_xpu.md index e62ec1a5d3ccc72058c13039420b289a6de2ce9e..429119b588a53ecbaa29a7d71485a7d47308c871 100644 --- a/docs/zh_CN/others/train_on_xpu.md +++ b/docs/zh_CN/others/train_on_xpu.md @@ -67,14 +67,14 @@ python3.7 ppcls/static/train.py \ ```shell python3.7 ppcls/static/train.py \ - -c ppcls/configs/quick_start/VGG16_finetune_kunlun.yaml \ + -c ppcls/configs/quick_start/kunlun/VGG16_finetune_kunlun.yaml \ -o use_gpu=False \ -o use_xpu=True \ -o is_distributed=False ``` ```shell python3.7 ppcls/static/train.py \ - -c ppcls/configs/quick_start/VGG19_finetune_kunlun.yaml \ + -c ppcls/configs/quick_start/kunlun/VGG19_finetune_kunlun.yaml \ -o use_gpu=False \ -o use_xpu=True \ -o is_distributed=False