data_en.md 1.5 KB
Newer Older
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77
# Data

---

## Introducation
This document introduces the preparation of ImageNet1k and flowers102

## Dataset

Dataset | train dataset size | valid dataset size | category |
:------:|:---------------:|:---------------------:|:--------:|
[flowers102](https://www.robots.ox.ac.uk/~vgg/data/flowers/102/)|1k | 6k | 102 |
[ImageNet1k](http://www.image-net.org/challenges/LSVRC/2012/)|1.2M| 50k | 1000 |

* Data format

Please follow the steps mentioned below to organize data, include train_list.txt and val_list.txt

```shell
# delimiter: "space"

ILSVRC2012_val_00000001.JPEG 65
...

```
### ImageNet1k
After downloading data, please organize the data dir as below

```bash
PaddleClas/dataset/imagenet/
|_ train/
|  |_ n01440764
|  |  |_ n01440764_10026.JPEG
|  |  |_ ...
|  |_ ...
|  |
|  |_ n15075141
|     |_ ...
|     |_ n15075141_9993.JPEG
|_ val/
|  |_ ILSVRC2012_val_00000001.JPEG
|  |_ ...
|  |_ ILSVRC2012_val_00050000.JPEG
|_ train_list.txt
|_ val_list.txt
```
### Flowers102 Dataset

Download [Data](https://www.robots.ox.ac.uk/~vgg/data/flowers/102/) then decompress:

```shell
jpg/
setid.mat
imagelabels.mat
```

Please put all the files under ```PaddleClas/dataset/flowers102```

generate generate_flowers102_list.py and train_list.txt和val_list.txt

```bash
python generate_flowers102_list.py jpg train > train_list.txt
python generate_flowers102_list.py jpg valid > val_list.txt

```

Please organize data dir as below

```bash
PaddleClas/dataset/flowers102/
|_ jpg/
|  |_ image_03601.jpg
|  |_ ...
|  |_ image_02355.jpg
|_ train_list.txt
|_ val_list.txt
```