fix: fix the invalid link (#5347)

1251ab86 · Tingquan Gao · GitHub · 888b3caf · 1251ab86 · 1251ab86
3 changed file
--- a/PaddleCV/image_classification/README.md
+++ b/PaddleCV/image_classification/README.md
@@ -50,19 +50,38 @@ pip install numpy
 ### 数据准备
-下面给出了ImageNet分类任务的样例，
+**注意**: 由于ImageNet数据集官方不再提供公开下载链接，请用户自行下载ImageNet-2012图像数据，训练集与验证集数据需要分别放在 `train` 和 `val` 目录中。另外，ImageNet数据大小超过140GB，下载非常耗时，已经下载ImageNet的用户可以直接将数据组织放置到 `data/ILSVRC2012` 下面。
-在Linux系统下通过如下的方式进行数据的准备：
+我们提供了训练集与验证集对应的标签文件，Linux 用户可通过以下命令下载：
 ```
-cd data/ILSVRC2012/
+wget https://paddle-imagenet-models-name.bj.bcebos.com/data/ImageNet1k/ImageNet1k_train_list.txt -O data/ILSVRC2012/train_list.txt
-sh download_imagenet2012.sh
+wget https://paddle-imagenet-models-name.bj.bcebos.com/data/ImageNet1k/ImageNet1k_val_list.txt -O data/ILSVRC2012/val_list.txt
 ```
-在```download_imagenet2012.sh```脚本中，通过下面三步来准备数据：
-**步骤一：** 首先在```image-net.org```网站上完成注册，用于获得一对```Username```和```AccessKey```。
+Windows 用户可自行下载：[train_list.txt](https://paddle-imagenet-models-name.bj.bcebos.com/data/ImageNet1k/ImageNet1k_train_list.txt)、[val_list.txt](https://paddle-imagenet-models-name.bj.bcebos.com/data/ImageNet1k/ImageNet1k_val_list.txt)，并分别重命名为 `train_list.txt` 和 `val_list.txt`，然后放置在 `data/ILSVRC2012` 目录下。
+整理好的数据集目录应为如下格式：
-**步骤二：** 从ImageNet官网下载ImageNet-2012的图像数据。训练以及验证数据集会分别被下载到"train" 和 "val" 目录中。注意，ImageNet数据的大小超过140GB，下载非常耗时；已经自行下载ImageNet的用户可以直接将数据组织放置到```data/ILSVRC2012```。
+```
+ data/ILSVRC2012
+ ├── train             # 训练集图像数据
+ │   ├── n01440764
+ │   ...
+ │   ├── n13054560
+ │   ├── n13133613
+ │   └── n15075141
+ ├── val               # 验证集图像数据
+ │   ├── ILSVRC2012_val_00000001.JPEG
+ │   ...
+ │   ├── ILSVRC2012_val_00049998.JPEG
+ │   ├── ILSVRC2012_val_00049999.JPEG
+ │   └── ILSVRC2012_val_00050000.JPEG
+ ├── train_list.txt    # 训练集数据标注文件
+ └── val_list.txt      # 验证集数据标注文件
+```
-**步骤三：** 下载训练与验证集合对应的标签文件。下面两个文件分别包含了训练集合与验证集合中图像的标签：
+其中，`train_list.txt` 与 `val_list.txt` 两个文件分别包含了训练集与验证集中图像的标签信息：
 * train_list.txt: ImageNet-2012训练集合的标签文件，每一行采用"空格"分隔图像路径与标注，例如：
 ```
@@ -74,8 +93,6 @@ val/ILSVRC2012_val_00000001.jpeg 65
 ```
 注意：可能需要根据本地环境调整reader.py中相关路径来正确读取数据。
-**Windows系统下请用户自行下载ImageNet数据，[label下载链接](http://paddle-imagenet-models.bj.bcebos.com/ImageNet_label.tgz)**
 ### 模型训练
 数据准备完毕后，可以通过如下的方式启动训练：

--- a/PaddleCV/image_classification/README_en.md
+++ b/PaddleCV/image_classification/README_en.md
@@ -42,20 +42,39 @@ Running samples in this directory requires Python 2.7 and later, CUDA 8.0 and la
 ### Data preparation
 An example for ImageNet classification is as follows.
-For Linux system, preparation of imagenet data can be done as:
-```bash
+**Note**: The ImageNet dataset is no longer publicly accessibile from ImageNet official. You need to download the image data externally and place the training data and validation data into `train` and `val` respectively. In addition, the size of total data is more than 140GB, it will take much time to download. If you have downloaded the ImageNet dataset, only need to organize and place it into `data/ILSVRC2012`. 
-cd data/ILSVRC2012/
-sh download_imagenet2012.sh
+We provide the label list files corresponding to the training data and the verification data, that can be downloaded for Linux users by command as follows: 
+```
+wget https://paddle-imagenet-models-name.bj.bcebos.com/data/ImageNet1k/ImageNet1k_train_list.txt -O data/ILSVRC2012/train_list.txt
+wget https://paddle-imagenet-models-name.bj.bcebos.com/data/ImageNet1k/ImageNet1k_val_list.txt -O data/ILSVRC2012/val_list.txt
 ```
-In the shell script ```download_imagenet2012.sh```,  there are three steps to prepare data:
+And for Windows, users can download, rename and place these files into `data/ILSVRC2012` manually: [train_list.txt](https://paddle-imagenet-models-name.bj.bcebos.com/data/ImageNet1k/ImageNet1k_train_list.txt) and [val_list.txt](https://paddle-imagenet-models-name.bj.bcebos.com/data/ImageNet1k/ImageNet1k_val_list.txt).
-**step-1:** Register at ```image-net.org``` first in order to get a pair of ```Username``` and ```AccessKey```, which are used to download ImageNet data.
+The tree of dataset directory is as follows:
-**step-2:** Download ImageNet-2012 dataset from website. The training and validation data will be downloaded into folder "train" and "val" respectively. Please note that the size of data is more than 40 GB, it will take much time to download. Users who have downloaded the ImageNet data can organize it into ```data/ILSVRC2012``` directly.
+```
+ data/ILSVRC2012
+ ├── train             # training images
+ │   ├── n01440764
+ │   ...
+ │   ├── n13054560
+ │   ├── n13133613
+ │   └── n15075141
+ ├── val               # validation images
+ │   ├── ILSVRC2012_val_00000001.JPEG
+ │   ...
+ │   ├── ILSVRC2012_val_00049998.JPEG
+ │   ├── ILSVRC2012_val_00049999.JPEG
+ │   └── ILSVRC2012_val_00050000.JPEG
+ ├── train_list.txt    # training data label list file
+ └── val_list.txt      # validation data label list file
+```
-**step-3:** Download training and validation label files. There are two label files which contain train and validation image labels respectively:
+There are two label files which contain train and validation image labels respectively:
 * train_list.txt: label file of imagenet-2012 training set, with each line seperated by ```SPACE```, like:
 ```
@@ -67,7 +86,6 @@ val/ILSVRC2012_val_00000001.jpeg 65
 ```
 Note: You may need to modify the data path in reader.py to load data correctly.
-**For windows system, Users should download ImageNet data by themselves. and the label list can be downloaded in [Here](http://paddle-imagenet-models.bj.bcebos.com/ImageNet_label.tgz)**
 ### Training

--- a/PaddleCV/image_classification/data/ILSVRC2012/download_imagenet2012.sh
+++ b/PaddleCV/image_classification/data/ILSVRC2012/download_imagenet2012.sh
-set -e
-if [ "x${IMAGENET_USERNAME}" == x -o "x${IMAGENET_ACCESS_KEY}" == x ];then
-  echo "Please create an account on image-net.org."
-  echo "It will provide you a pair of username and accesskey to download imagenet data."
-  read -p "Username: " IMAGENET_USERNAME
-  read -p "Accesskey: " IMAGENET_ACCESS_KEY
-fi
-root_url=http://www.image-net.org/challenges/LSVRC/2012/nnoupb
-valid_tar=ILSVRC2012_img_val.tar
-train_tar=ILSVRC2012_img_train.tar
-train_folder=train/
-valid_folder=val/
-echo "Download imagenet training data..."
-mkdir -p ${train_folder}
-wget -nd -c ${root_url}/${train_tar}
-tar xf ${train_tar} -C ${train_folder}
-cd ${train_folder}
-for x in `ls *.tar`
-do
-  filename=`basename $x .tar`
-  mkdir -p $filename
-  tar -xf $x -C $filename
-  rm -rf $x
-done
-cd -
-echo "Download imagenet validation data..."
-mkdir -p ${valid_folder}
-wget -nd -c ${root_url}/${valid_tar}
-tar xf ${valid_tar} -C ${valid_folder}
-echo "Download imagenet label file: val_list.txt & train_list.txt"
-label_file=ImageNet_label.tgz
-label_url=http://paddle-imagenet-models.bj.bcebos.com/${label_file}
-wget -nd -c ${label_url}
-tar zxf ${label_file}