欢迎使用PaddleHub!这是一个基于飞桨框架的预训练模型应用工具,旨在降低AI模型的使用门槛并促动AI社区的发展。无论您是AI领域的资深开发者,还是对该领域不甚了解却非常感兴趣的用户,PaddleHub都可以对您产生帮助。ou are a senior developer in the AI industry nor an interested person who knows nothing about the AI field, you can benefit from PaddleHub.
In the training of a new task, it is a time-consuming process starting from zero, without producing the desired results probably. You can use the pre-training model provided by PaddleHub for fine-tune of a specific task. You just need to perform pre-processing of the customized data accordingly, and then input the pre-training model to get the corresponding results. Refer to the following for setting up the structure of the dataset.
## I. Image Classification Dataset
When migrating a classification task using custom data with PaddleHub, you need to slice the dataset into training set, validation set, and test set.
### Data Preparation
Three text files are needed to record the corresponding image paths and labels, plus a label file to record the name of the label.
```
├─data:
├─train_list.txt:
├─test_list.txt:
├─validate_list.txt:
├─label_list.txt:
└─...
```
The format of the data list file for the training/validation/test set is as follows:
```
Path-1 label-1
Path-2 label-2
...
```
The format of label\_list.txt is:
```
Classification 1
Classification 2
...
```
Example: Take [Flower Dataset](../reference/dataset.md) as an example, train\_list.txt/test\_list.txt/validate\_list.txt:
```
roses/8050213579_48e1e7109f.jpg 0
sunflowers/45045003_30bbd0a142_m.jpg 3
daisy/3415180846_d7b5cced14_m.jpg 2
```
label\_list.txt reads as follows.
```
roses
tulips
daisy
sunflowers
dandelion
```
### Dataset Loading
For the preparation code of dataset, see [flowers.py](../../paddlehub/datasets/flowers.py). `hub.datasets.Flowers()` It automatically downloads the dataset from the network and unzip it into the `$HOME/.paddlehub/dataset` directory. Specific usage:
```python
frompaddlehub.datasetsimportFlowers
flowers=Flowers(transforms)
flowers_validate=Flowers(transforms,mode='val')
```
*`transforms`: Data pre-processing mode.
*`mode`: Select data mode. Options are `train`, `test`, `val`. Default is `train`.
## II. Image Coloring Dataset
When migrating a coloring task using custom data with PaddleHub, you need to slice the dataset into training set and test set.
### Data Preparation
You need to divide the color images for coloring training and testing into training set data and test set data.
```
├─data:
├─train:
|-folder1
|-folder2
|-……
|-pic1
|-pic2
|-……
├─test:
|-folder1
|-folder2
|-……
|-pic1
|-pic2
|-……
└─……
```
Example: PaddleHub provides users with a dataset for coloring `Canvas dataset. It consists of 1193 images in Monet style and 400 images in Van Gogh style. Take [Canvas Dataset](../reference/datasets.md) as an example, the contents of the train folder are as follows:
```
├─train:
|-monet
|-pic1
|-pic2
|-……
|-vango
|-pic1
|-pic2
|-……
```
### Dataset Loading
For dataset preparation codes, refer to [canvas.py](../../paddlehub/datasets/canvas.py). `hub.datasets.Canvas()` It automatically downloads the dataset from the network and unzip it into the `$HOME/.paddlehub/dataset` directory. Specific usage:
```python
from paddlehub.datasets import Canvas
color_set = Canvas(transforms, mode='train')
```
* `transforms`: Data pre-processing mode.
* `mode`: Select data mode. Options are `train`, `test`. The default is `train`.
## III. Style Migration Dataset
When using custom data for style migration tasks with PaddleHub, you need to slice the dataset into training set and test set.
### Data Preparation
You need to split the color images for style migration into training set and test set data.
```
├─data:
├─train:
|-folder1
|-folder2
|-...
|-pic1
|-pic2
|-...
├─test:
|-folder1
|-folder2
|-...
|-pic1
|-pic1
|-...
|- 21styles
|-pic1
|-pic1
└─...
```
Example: PaddleHub provides users with data sets for style migration `MiniCOCO dataset`. Training set data and test set data is from COCO2014, in which there are 2001 images in the training set and 200 images in the test set. There are 21 images with different styles in `21styles` folder. Users can change the images of different styles as needed. Take [MiniCOCO Dataset](../reference/datasets.md) as an example. The content of train folder is as follows:
```
├─train:
|-train
|-pic1
|-pic2
|-……
|-test
|-pic1
|-pic2
|-……
|-21styles
|-pic1
|-pic2
|-……
```
### Dataset Loading
For the preparation codes of the dataset, refer to [minicoco.py](../../paddlehub/datasets/minicoco.py). `hub.datasets.MiniCOCO()` It automatically downloads the dataset from the network and unzip it into the `$HOME/.paddlehub/dataset` directory. Specific usage:
```python
from paddlehub.datasets import MiniCOCO
ccolor_set = MiniCOCO(transforms, mode='train')
```
* `transforms`: Data pre-processing mode.
* `mode`: Select data mode. Options are `train`, `test`. The default is `train`.