未验证 提交 2f1be920 编写于 作者: K Kaipeng Deng 提交者: GitHub

[cherry-pick] refine PaddleDetection/yolov3/rcnn/ssd dataset download (#3647)

* refine download sh to py

* update QUICK_STARTED

* refine yolov3/rcnn/ssd dataset download

* refine docs
上级 d75d2464
DIR="$( cd "$(dirname "$0")" ; pwd -P )"
cd "$DIR"
# Download the data.
echo "Downloading..."
wget http://images.cocodataset.org/zips/train2014.zip
wget http://images.cocodataset.org/zips/val2014.zip
wget http://images.cocodataset.org/zips/train2017.zip
wget http://images.cocodataset.org/zips/val2017.zip
wget http://images.cocodataset.org/annotations/annotations_trainval2014.zip
wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip
# Extract the data.
echo "Extracting..."
unzip train2014.zip
unzip val2014.zip
unzip train2017.zip
unzip val2017.zip
unzip annotations_trainval2014.zip
unzip annotations_trainval2017.zip
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import sys
import os.path as osp
import logging
from ppdet.utils.download import download_dataset
logging.basicConfig(level=logging.INFO)
download_path = osp.split(osp.realpath(sys.argv[0]))[0]
download_dataset(download_path, 'coco')
DIR="$( cd "$(dirname "$0")" ; pwd -P )"
cd "$DIR"
# Download the data.
echo "Downloading..."
wget https://dataset.bj.bcebos.com/PaddleDetection_demo/fruit-detection.tar
# Extract the data.
echo "Extracting..."
tar xvf fruit-detection.tar
rm -rf fruit-detection.tar
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import sys
import os.path as osp
import logging
from ppdet.utils.download import download_dataset
logging.basicConfig(level=logging.INFO)
download_path = osp.split(osp.realpath(sys.argv[0]))[0]
download_dataset(download_path, 'fruit')
DIR="$( cd "$(dirname "$0")" ; pwd -P )"
cd "$DIR"
# Download the data.
echo "Downloading..."
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar
# Extract the data.
echo "Extracting..."
tar -xf VOCtrainval_11-May-2012.tar
tar -xf VOCtrainval_06-Nov-2007.tar
tar -xf VOCtest_06-Nov-2007.tar
echo "Creating data lists..."
python -c 'from ppdet.utils.voc_utils import merge_and_create_list; merge_and_create_list("VOCdevkit", ["2007", "2012"], "VOCdevkit/VOC_all")'
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import sys
import os.path as osp
import logging
from ppdet.utils.download import download_dataset
logging.basicConfig(level=logging.INFO)
download_path = osp.split(osp.realpath(sys.argv[0]))[0]
download_dataset(download_path, 'voc')
...@@ -110,15 +110,15 @@ On the other hand, to download the datasets, run the following commands: ...@@ -110,15 +110,15 @@ On the other hand, to download the datasets, run the following commands:
- COCO - COCO
``` ```
cd dataset/coco export PYTHONPATH=$PYTHONPATH:.
./download.sh python dataset/coco/download_coco.py
``` ```
- Pascal VOC - Pascal VOC
``` ```
cd dataset/voc export PYTHONPATH=$PYTHONPATH:.
./download.sh python dataset/voc/download_voc.py
``` ```
**Download datasets automatically:** **Download datasets automatically:**
......
...@@ -109,15 +109,15 @@ ln -sf <path/to/voc> <path/to/paddle_detection>/dataset/voc ...@@ -109,15 +109,15 @@ ln -sf <path/to/voc> <path/to/paddle_detection>/dataset/voc
- COCO - COCO
``` ```
cd dataset/coco export PYTHONPATH=$PYTHONPATH:.
./download.sh python dataset/coco/download_coco.py
``` ```
- Pascal VOC - Pascal VOC
``` ```
cd dataset/voc export PYTHONPATH=$PYTHONPATH:.
./download.sh python dataset/voc/download_voc.py
``` ```
**自动下载数据集:** **自动下载数据集:**
......
...@@ -6,11 +6,11 @@ This tutorial fine-tunes a tiny dataset by pretrained detection model for users ...@@ -6,11 +6,11 @@ This tutorial fine-tunes a tiny dataset by pretrained detection model for users
## Data Preparation ## Data Preparation
Dataset refers to [Kaggle](https://www.kaggle.com/mbkinaci/fruit-images-for-object-detection), which contains 240 images in train dataset and 60 images in test dataset. Data categories are apple, orange and banana. Download [here](https://dataset.bj.bcebos.com/PaddleDetection_demo/fruit-detection.tar) and uncompress the dataset after download, script for data preparation is located at [download.sh](../dataset/fruit/download.sh). Command is as follows: Dataset refers to [Kaggle](https://www.kaggle.com/mbkinaci/fruit-images-for-object-detection), which contains 240 images in train dataset and 60 images in test dataset. Data categories are apple, orange and banana. Download [here](https://dataset.bj.bcebos.com/PaddleDetection_demo/fruit-detection.tar) and uncompress the dataset after download, script for data preparation is located at [download_fruit.py](../dataset/fruit/download_fruit.py). Command is as follows:
```bash ```bash
cd dataset/fruit export PYTHONPATH=$PYTHONPATH:.
sh download.sh python dataset/fruit/download_fruit.py
``` ```
- **Note: before started, run the following command and specifiy the GPU** - **Note: before started, run the following command and specifiy the GPU**
......
...@@ -6,11 +6,11 @@ ...@@ -6,11 +6,11 @@
## 数据准备 ## 数据准备
数据集参考[Kaggle数据集](https://www.kaggle.com/mbkinaci/fruit-images-for-object-detection),其中训练数据集240张图片,测试数据集60张图片,数据类别为3类:苹果,橘子,香蕉。[下载链接](https://dataset.bj.bcebos.com/PaddleDetection_demo/fruit-detection.tar)。数据下载后分别解压即可, 数据准备脚本位于[download.sh](../dataset/fruit/download.sh)。下载数据方式如下: 数据集参考[Kaggle数据集](https://www.kaggle.com/mbkinaci/fruit-images-for-object-detection),其中训练数据集240张图片,测试数据集60张图片,数据类别为3类:苹果,橘子,香蕉。[下载链接](https://dataset.bj.bcebos.com/PaddleDetection_demo/fruit-detection.tar)。数据下载后分别解压即可, 数据准备脚本位于[download_fruit.py](../dataset/fruit/download_fruit.py)。下载数据方式如下:
```bash ```bash
cd dataset/fruit export PYTHONPATH=$PYTHONPATH:.
sh download.sh python dataset/fruit/download_fruit.py
``` ```
- **注:在开始前,运行如下命令并指定GPU** - **注:在开始前,运行如下命令并指定GPU**
......
...@@ -35,7 +35,7 @@ __all__ = ['get_weights_path', 'get_dataset_path'] ...@@ -35,7 +35,7 @@ __all__ = ['get_weights_path', 'get_dataset_path']
WEIGHTS_HOME = osp.expanduser("~/.cache/paddle/weights") WEIGHTS_HOME = osp.expanduser("~/.cache/paddle/weights")
DATASET_HOME = osp.expanduser("~/.cache/paddle/dataset") DATASET_HOME = osp.expanduser("~/.cache/paddle/dataset")
# dict of {dataset_name: (downalod_info, sub_dirs)} # dict of {dataset_name: (download_info, sub_dirs)}
# download info: (url, md5sum) # download info: (url, md5sum)
DATASETS = { DATASETS = {
'coco': ([ 'coco': ([
...@@ -60,6 +60,11 @@ DATASETS = { ...@@ -60,6 +60,11 @@ DATASETS = {
'http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar', 'http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar',
'b6e924de25625d8de591ea690078ad9f', ), 'b6e924de25625d8de591ea690078ad9f', ),
], ["VOCdevkit/VOC_all"]), ], ["VOCdevkit/VOC_all"]),
'fruit': ([
(
'https://dataset.bj.bcebos.com/PaddleDetection_demo/fruit-detection.tar',
'374554a7633b1b68d6a5fbb7c061b8ba', ),
], ["fruit-detection"]),
} }
DOWNLOAD_RETRY_LIMIT = 3 DOWNLOAD_RETRY_LIMIT = 3
...@@ -103,25 +108,7 @@ def get_dataset_path(path, annotation, image_dir): ...@@ -103,25 +108,7 @@ def get_dataset_path(path, annotation, image_dir):
# voc should merge dir and create list after download # voc should merge dir and create list after download
if name == 'voc': if name == 'voc':
logger.info("Download voc dataset successed, merge " _merge_voc_dir(data_dir, dataset[1][0])
"VOC2007 and VOC2012 to VOC_all...")
output_dir = osp.join(data_dir, dataset[1][0])
devkit_dir = "/".join(output_dir.split('/')[:-1])
years = ['2007', '2012']
# merge dir in output_tmp_dir at first, move to
# output_dir after merge sucessed.
output_tmp_dir = osp.join(data_dir, 'tmp')
if osp.isdir(output_tmp_dir):
shutil.rmtree(output_tmp_dir)
# NOTE(dengkaipeng): since using auto download VOC
# dataset, VOC default label list should be used,
# do not generate label_list.txt here. For default
# label, see ../data/source/voc_loader.py
merge_and_create_list(devkit_dir, years, output_tmp_dir)
shutil.move(output_tmp_dir, output_dir)
# remove source directory VOC2007 and VOC2012
shutil.rmtree(osp.join(devkit_dir, "VOC2007"))
shutil.rmtree(osp.join(devkit_dir, "VOC2012"))
return data_dir return data_dir
# not match any dataset in DATASETS # not match any dataset in DATASETS
...@@ -130,6 +117,28 @@ def get_dataset_path(path, annotation, image_dir): ...@@ -130,6 +117,28 @@ def get_dataset_path(path, annotation, image_dir):
"'voc' and 'coco' currently".format(path, osp.split(path)[-1])) "'voc' and 'coco' currently".format(path, osp.split(path)[-1]))
def _merge_voc_dir(data_dir, output_subdir):
logger.info("Download voc dataset successed, merge "
"VOC2007 and VOC2012 to VOC_all...")
output_dir = osp.join(data_dir, output_subdir)
devkit_dir = "/".join(output_dir.split('/')[:-1])
years = ['2007', '2012']
# merge dir in output_tmp_dir at first, move to
# output_dir after merge sucessed.
output_tmp_dir = osp.join(data_dir, 'tmp')
if osp.isdir(output_tmp_dir):
shutil.rmtree(output_tmp_dir)
# NOTE: since using auto download VOC
# dataset, VOC default label list should be used,
# do not generate label_list.txt here. For default
# label, see ../data/source/voc_loader.py
merge_and_create_list(devkit_dir, years, output_tmp_dir)
shutil.move(output_tmp_dir, output_dir)
# remove source directory VOC2007 and VOC2012
shutil.rmtree(osp.join(devkit_dir, "VOC2007"))
shutil.rmtree(osp.join(devkit_dir, "VOC2012"))
def map_path(url, root_dir): def map_path(url, root_dir):
# parse path after download to decompress under root_dir # parse path after download to decompress under root_dir
fname = url.split('/')[-1] fname = url.split('/')[-1]
...@@ -173,6 +182,19 @@ def get_path(url, root_dir, md5sum=None): ...@@ -173,6 +182,19 @@ def get_path(url, root_dir, md5sum=None):
return fullpath return fullpath
def download_dataset(path, dataset=None):
if dataset not in DATASETS.keys():
logger.error("Unknown dataset {}, it should be "
"{}".format(dataset, DATASETS.keys()))
return
dataset_info = DATASETS[dataset][0]
for info in dataset_info:
get_path(info[0], path, info[1])
if dataset == 'voc':
_merge_voc_dir(path, DATASETS[dataset][1][0])
logger.info("Download dataset {} finished.".format(dataset))
def _dataset_exists(path, annotation, image_dir): def _dataset_exists(path, annotation, image_dir):
""" """
Check if user define dataset exists Check if user define dataset exists
......
...@@ -38,8 +38,9 @@ Mask RCNN is a two stage model as well. At the first stage, it generates proposa ...@@ -38,8 +38,9 @@ Mask RCNN is a two stage model as well. At the first stage, it generates proposa
Train the model on [MS-COCO dataset](http://cocodataset.org/#download), download dataset as below: Train the model on [MS-COCO dataset](http://cocodataset.org/#download), download dataset as below:
cd dataset/coco ```bash
./download.sh python dataset/coco/download.py
```
The data catalog structure is as follows: The data catalog structure is as follows:
...@@ -67,6 +68,8 @@ The data catalog structure is as follows: ...@@ -67,6 +68,8 @@ The data catalog structure is as follows:
sh ./pretrained/download.sh sh ./pretrained/download.sh
**NOTE:** Windows users can download weights from links in `./pretrained/download.sh`.
Set `pretrained_model` to load pre-trained model. In addition, this parameter is used to load trained model when finetuning as well. Set `pretrained_model` to load pre-trained model. In addition, this parameter is used to load trained model when finetuning as well.
Please make sure that pretrained_model is downloaded and loaded correctly, otherwise, the loss may be NAN during training. Please make sure that pretrained_model is downloaded and loaded correctly, otherwise, the loss may be NAN during training.
......
...@@ -38,8 +38,9 @@ Mask RCNN同样为两阶段框架,第一阶段扫描图像生成候选框; ...@@ -38,8 +38,9 @@ Mask RCNN同样为两阶段框架,第一阶段扫描图像生成候选框;
[MS-COCO数据集](http://cocodataset.org/#download)上进行训练,通过如下方式下载数据集。 [MS-COCO数据集](http://cocodataset.org/#download)上进行训练,通过如下方式下载数据集。
cd dataset/coco ```bash
./download.sh python dataset/coco/download.py
```
数据目录结构如下: 数据目录结构如下:
...@@ -68,6 +69,8 @@ data/coco/ ...@@ -68,6 +69,8 @@ data/coco/
sh ./pretrained/download.sh sh ./pretrained/download.sh
**注意:** Windows用户可通过`./pretrained/download.sh`中的链接直接下载和解压。
通过初始化`pretrained_model` 加载预训练模型。同时在参数微调时也采用该设置加载已训练模型。 通过初始化`pretrained_model` 加载预训练模型。同时在参数微调时也采用该设置加载已训练模型。
请在训练前确认预训练模型下载与加载正确,否则训练过程中损失可能会出现NAN。 请在训练前确认预训练模型下载与加载正确,否则训练过程中损失可能会出现NAN。
......
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import os.path as osp
import sys
import zipfile
import logging
from paddle.dataset.common import download
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
DATASETS = {
'coco': [
# coco2017
('http://images.cocodataset.org/zips/train2017.zip',
'cced6f7f71b7629ddf16f17bbcfab6b2', ),
('http://images.cocodataset.org/zips/val2017.zip',
'442b8da7639aecaf257c1dceb8ba8c80', ),
('http://images.cocodataset.org/annotations/annotations_trainval2017.zip',
'f4bbac642086de4f52a3fdda2de5fa2c', ),
# coco2014
('http://images.cocodataset.org/zips/train2014.zip',
'0da8c0bd3d6becc4dcb32757491aca88', ),
('http://images.cocodataset.org/zips/val2014.zip',
'a3d79f5ed8d289b7a7554ce06a5782b3', ),
('http://images.cocodataset.org/annotations/annotations_trainval2014.zip',
'0a379cfc70b0e71301e0f377548639bd', ),
],
}
def download_decompress_file(data_dir, url, md5):
logger.info("Downloading from {}".format(url))
zip_file = download(url, data_dir, md5)
logger.info("Decompressing {}".format(zip_file))
with zipfile.ZipFile(zip_file) as zf:
zf.extractall(path=data_dir)
os.remove(zip_file)
if __name__ == "__main__":
data_dir = osp.split(osp.realpath(sys.argv[0]))[0]
for name, infos in DATASETS.items():
for info in infos:
download_decompress_file(data_dir, info[0], info[1])
logger.info("Download dataset {} finished.".format(name))
DIR="$( cd "$(dirname "$0")" ; pwd -P )"
cd "$DIR"
# Download the data.
echo "Downloading..."
wget http://images.cocodataset.org/zips/train2014.zip
wget http://images.cocodataset.org/zips/val2014.zip
wget http://images.cocodataset.org/zips/train2017.zip
wget http://images.cocodataset.org/zips/val2017.zip
wget http://images.cocodataset.org/annotations/annotations_trainval2014.zip
wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip
# Extract the data.
echo "Extracting..."
unzip train2014.zip
unzip val2014.zip
unzip train2017.zip
unzip val2017.zip
unzip annotations_trainval2014.zip
unzip annotations_trainval2017.zip
...@@ -26,10 +26,10 @@ Please download [PASCAL VOC dataset](http://host.robots.ox.ac.uk/pascal/VOC/) at ...@@ -26,10 +26,10 @@ Please download [PASCAL VOC dataset](http://host.robots.ox.ac.uk/pascal/VOC/) at
``` ```
cd data/pascalvoc cd data/pascalvoc
./download.sh python download.py
``` ```
The command `download.sh` also will create training and testing file lists. The script `download.py` will also create training and testing file lists.
### Train ### Train
...@@ -37,9 +37,11 @@ The command `download.sh` also will create training and testing file lists. ...@@ -37,9 +37,11 @@ The command `download.sh` also will create training and testing file lists.
We provide two pre-trained models. The one is MobileNet-v1 SSD trained on COCO dataset, but removed the convolutional predictors for COCO dataset. This model can be used to initialize the models when training other datasets, like PASCAL VOC. The other pre-trained model is MobileNet-v1 trained on ImageNet 2012 dataset but removed the last weights and bias in the Fully-Connected layer. Download MobileNet-v1 SSD: We provide two pre-trained models. The one is MobileNet-v1 SSD trained on COCO dataset, but removed the convolutional predictors for COCO dataset. This model can be used to initialize the models when training other datasets, like PASCAL VOC. The other pre-trained model is MobileNet-v1 trained on ImageNet 2012 dataset but removed the last weights and bias in the Fully-Connected layer. Download MobileNet-v1 SSD:
``` ```bash
./pretrained/download_coco.sh sh ./pretrained/download_coco.sh
``` ```
**NOTE:** Windows users can download weights from link in `./pretrained/download_coco.sh`.
Declaration: the MobileNet-v1 SSD model is converted by [TensorFlow model](https://github.com/tensorflow/models/blob/f87a58cd96d45de73c9a8330a06b2ab56749a7fa/research/object_detection/g3doc/detection_model_zoo.md). Declaration: the MobileNet-v1 SSD model is converted by [TensorFlow model](https://github.com/tensorflow/models/blob/f87a58cd96d45de73c9a8330a06b2ab56749a7fa/research/object_detection/g3doc/detection_model_zoo.md).
......
...@@ -27,10 +27,10 @@ SSD 可以方便地插入到任何一种标准卷积网络中,比如 VGG、Res ...@@ -27,10 +27,10 @@ SSD 可以方便地插入到任何一种标准卷积网络中,比如 VGG、Res
``` ```
cd data/pascalvoc cd data/pascalvoc
./download.sh python download.py
``` ```
`download.sh` 命令会自动创建训练和测试用的列表文件。 `download.py` 脚本会自动创建训练和测试用的列表文件。
### 模型训练 ### 模型训练
...@@ -39,9 +39,11 @@ cd data/pascalvoc ...@@ -39,9 +39,11 @@ cd data/pascalvoc
我们提供了两个预训练模型。第一个模型是在 COCO 数据集上预训练的 MobileNet-v1 SSD,我们将它的预测头移除了以便在 COCO 以外的数据集上进行训练。第二个模型是在 ImageNet 2012 数据集上预训练的 MobileNet-v1,我们也将最后的全连接层移除以便进行目标检测训练。下载 MobileNet-v1 SSD: 我们提供了两个预训练模型。第一个模型是在 COCO 数据集上预训练的 MobileNet-v1 SSD,我们将它的预测头移除了以便在 COCO 以外的数据集上进行训练。第二个模型是在 ImageNet 2012 数据集上预训练的 MobileNet-v1,我们也将最后的全连接层移除以便进行目标检测训练。下载 MobileNet-v1 SSD:
``` ```bash
./pretrained/download_coco.sh sh ./pretrained/download_coco.sh
``` ```
**注意:** Windows用户可通过`./pretrained/download_coco.sh`中的链接直接下载和解压。
声明:MobileNet-v1 SSD 模型转换自[TensorFlow model](https://github.com/tensorflow/models/blob/f87a58cd96d45de73c9a8330a06b2ab56749a7fa/research/object_detection/g3doc/detection_model_zoo.md)。MobileNet-v1 模型转换自[Caffe](https://github.com/shicai/MobileNet-Caffe) 声明:MobileNet-v1 SSD 模型转换自[TensorFlow model](https://github.com/tensorflow/models/blob/f87a58cd96d45de73c9a8330a06b2ab56749a7fa/research/object_detection/g3doc/detection_model_zoo.md)。MobileNet-v1 模型转换自[Caffe](https://github.com/shicai/MobileNet-Caffe)
......
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import os.path as osp
import sys
import zipfile
import logging
from paddle.dataset.common import download
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
DATASETS = {
'coco': [
# coco2017
('http://images.cocodataset.org/zips/train2017.zip',
'cced6f7f71b7629ddf16f17bbcfab6b2', ),
('http://images.cocodataset.org/zips/val2017.zip',
'442b8da7639aecaf257c1dceb8ba8c80', ),
('http://images.cocodataset.org/annotations/annotations_trainval2017.zip',
'f4bbac642086de4f52a3fdda2de5fa2c', ),
# coco2014
('http://images.cocodataset.org/zips/train2014.zip',
'0da8c0bd3d6becc4dcb32757491aca88', ),
('http://images.cocodataset.org/zips/val2014.zip',
'a3d79f5ed8d289b7a7554ce06a5782b3', ),
('http://images.cocodataset.org/annotations/annotations_trainval2014.zip',
'0a379cfc70b0e71301e0f377548639bd', ),
],
}
def download_decompress_file(data_dir, url, md5):
logger.info("Downloading from {}".format(url))
zip_file = download(url, data_dir, md5)
logger.info("Decompressing {}".format(zip_file))
with zipfile.ZipFile(zip_file) as zf:
zf.extractall(path=data_dir)
os.remove(zip_file)
if __name__ == "__main__":
data_dir = osp.split(osp.realpath(sys.argv[0]))[0]
for name, infos in DATASETS.items():
for info in infos:
download_decompress_file(data_dir, info[0], info[1])
logger.info("Download dataset {} finished.".format(name))
DIR="$( cd "$(dirname "$0")" ; pwd -P )"
cd "$DIR"
# Download the data.
echo "Downloading..."
wget http://images.cocodataset.org/zips/train2014.zip
wget http://images.cocodataset.org/zips/val2014.zip
wget http://images.cocodataset.org/zips/train2017.zip
wget http://images.cocodataset.org/zips/val2017.zip
wget http://images.cocodataset.org/annotations/annotations_trainval2014.zip
wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip
# Extract the data.
echo "Extracting..."
unzip train2014.zip
unzip val2014.zip
unzip train2017.zip
unzip val2017.zip
unzip annotations_trainval2014.zip
unzip annotations_trainval2017.zip
# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved. # Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
...@@ -11,10 +11,31 @@ ...@@ -11,10 +11,31 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
import os import os
import os.path as osp import os.path as osp
import sys
import re import re
import random import random
import tarfile
import logging
from paddle.dataset.common import download
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
DATASETS = {
'pascalvoc': [
('http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar',
'6cd6e144f989b92b3379bac3b3de84fd', ),
('http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar',
'c52e279531787c972589f7e41ab4ae64', ),
('http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar',
'b6e924de25625d8de591ea690078ad9f', ),
],
}
devkit_dir = './VOCdevkit' devkit_dir = './VOCdevkit'
years = ['2007', '2012'] years = ['2007', '2012']
...@@ -73,5 +94,22 @@ def prepare_filelist(devkit_dir, years, output_dir): ...@@ -73,5 +94,22 @@ def prepare_filelist(devkit_dir, years, output_dir):
ftest.write(item[0] + ' ' + item[1] + '\n') ftest.write(item[0] + ' ' + item[1] + '\n')
if __name__ == '__main__':
prepare_filelist(devkit_dir, years, '.') def download_decompress_file(data_dir, url, md5):
logger.info("Downloading from {}".format(url))
tar_file = download(url, data_dir, md5)
logger.info("Decompressing {}".format(tar_file))
with tarfile.open(tar_file) as tf:
tf.extractall(path=data_dir)
os.remove(tar_file)
if __name__ == "__main__":
data_dir = osp.split(osp.realpath(sys.argv[0]))[0]
for name, infos in DATASETS.items():
for info in infos:
download_decompress_file(data_dir, info[0], info[1])
if name == 'pascalvoc':
logger.info("create list for pascalvoc dataset.")
prepare_filelist(devkit_dir, years, data_dir)
logger.info("Download dataset {} finished.".format(name))
DIR="$( cd "$(dirname "$0")" ; pwd -P )"
cd "$DIR"
# Download the data.
echo "Downloading..."
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar
# Extract the data.
echo "Extracting..."
tar -xf VOCtrainval_11-May-2012.tar
tar -xf VOCtrainval_06-Nov-2007.tar
tar -xf VOCtest_06-Nov-2007.tar
echo "Creating data lists..."
python create_list.py
...@@ -7,5 +7,6 @@ checkpoints/ ...@@ -7,5 +7,6 @@ checkpoints/
weights/ weights/
!weights/*.sh !weights/*.sh
dataset/coco/ dataset/coco/
!dataset/coco/*.py
log* log*
output* output*
...@@ -50,8 +50,9 @@ ...@@ -50,8 +50,9 @@
[MS-COCO数据集](http://cocodataset.org/#download)上进行训练,通过如下方式下载数据集。 [MS-COCO数据集](http://cocodataset.org/#download)上进行训练,通过如下方式下载数据集。
cd dataset/coco ```bash
./download.sh python dataset/coco/download.py
```
数据目录结构如下: 数据目录结构如下:
...@@ -84,6 +85,8 @@ dataset/coco/ ...@@ -84,6 +85,8 @@ dataset/coco/
sh ./weights/download.sh sh ./weights/download.sh
**注意:** Windows用户可通过`./weights/download.sh`中的链接直接下载和解压。
通过设置`--pretrain` 加载预训练模型。同时在fine-tune时也采用该设置加载已训练模型。 通过设置`--pretrain` 加载预训练模型。同时在fine-tune时也采用该设置加载已训练模型。
请在训练前确认预训练模型下载与加载正确,否则训练过程中损失可能会出现NAN。 请在训练前确认预训练模型下载与加载正确,否则训练过程中损失可能会出现NAN。
......
...@@ -50,8 +50,9 @@ To train the model, COCO-API is needed. Installation is as follows: ...@@ -50,8 +50,9 @@ To train the model, COCO-API is needed. Installation is as follows:
Train the model on [MS-COCO dataset](http://cocodataset.org/#download), we also provide download script as follows: Train the model on [MS-COCO dataset](http://cocodataset.org/#download), we also provide download script as follows:
cd dataset/coco ```bash
./download.sh python dataset/coco/download.py
```
The data catalog structure is as follows: The data catalog structure is as follows:
...@@ -84,6 +85,8 @@ You can defined datasets by yourself, we recommend using annotations in COCO for ...@@ -84,6 +85,8 @@ You can defined datasets by yourself, we recommend using annotations in COCO for
sh ./weights/download.sh sh ./weights/download.sh
**NOTE:** Windows users can download weights from links in `./weights/download.sh`.
Set `--pretrain` to load pre-trained model. In addition, this parameter is used to load trained model when finetuning as well. Set `--pretrain` to load pre-trained model. In addition, this parameter is used to load trained model when finetuning as well.
Please make sure that pre-trained model is downloaded and loaded correctly, otherwise, the loss may be NAN during training. Please make sure that pre-trained model is downloaded and loaded correctly, otherwise, the loss may be NAN during training.
......
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import os.path as osp
import sys
import zipfile
import logging
from paddle.dataset.common import download
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
DATASETS = {
'coco': [
# coco2017
('http://images.cocodataset.org/zips/train2017.zip',
'cced6f7f71b7629ddf16f17bbcfab6b2', ),
('http://images.cocodataset.org/zips/val2017.zip',
'442b8da7639aecaf257c1dceb8ba8c80', ),
('http://images.cocodataset.org/annotations/annotations_trainval2017.zip',
'f4bbac642086de4f52a3fdda2de5fa2c', ),
# coco2014
('http://images.cocodataset.org/zips/train2014.zip',
'0da8c0bd3d6becc4dcb32757491aca88', ),
('http://images.cocodataset.org/zips/val2014.zip',
'a3d79f5ed8d289b7a7554ce06a5782b3', ),
('http://images.cocodataset.org/annotations/annotations_trainval2014.zip',
'0a379cfc70b0e71301e0f377548639bd', ),
],
}
def download_decompress_file(data_dir, url, md5):
logger.info("Downloading from {}".format(url))
zip_file = download(url, data_dir, md5)
logger.info("Decompressing {}".format(zip_file))
with zipfile.ZipFile(zip_file) as zf:
zf.extractall(path=data_dir)
os.remove(zip_file)
if __name__ == "__main__":
data_dir = osp.split(osp.realpath(sys.argv[0]))[0]
for name, infos in DATASETS.items():
for info in infos:
download_decompress_file(data_dir, info[0], info[1])
logger.info("Download dataset {} finished.".format(name))
DIR="$( cd "$(dirname "$0")" ; pwd -P )"
cd "$DIR"
# Download the data.
echo "Downloading..."
wget http://images.cocodataset.org/zips/train2014.zip
wget http://images.cocodataset.org/zips/val2014.zip
wget http://images.cocodataset.org/zips/train2017.zip
wget http://images.cocodataset.org/zips/val2017.zip
wget http://images.cocodataset.org/annotations/annotations_trainval2014.zip
wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip
# Extract the data.
echo "Extracting..."
unzip train2014.zip
unzip val2014.zip
unzip train2017.zip
unzip val2017.zip
unzip annotations_trainval2014.zip
unzip annotations_trainval2017.zip
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册