Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
s920243400
PaddleDetection
提交
27545a84
P
PaddleDetection
项目概览
s920243400
/
PaddleDetection
与 Fork 源项目一致
Fork自
PaddlePaddle / PaddleDetection
通知
2
Star
0
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
PaddleDetection
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
27545a84
编写于
10月 28, 2019
作者:
K
Kaipeng Deng
提交者:
GitHub
10月 28, 2019
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
Change voc loader (#3781)
* change voc_loader
上级
c4f3dfaf
变更
18
隐藏空白更改
内联
并排
Showing
18 changed file
with
325 addition
and
178 deletion
+325
-178
configs/ssd/ssd_mobilenet_v1_voc.yml
configs/ssd/ssd_mobilenet_v1_voc.yml
+2
-4
configs/ssd/ssd_vgg16_300_voc.yml
configs/ssd/ssd_vgg16_300_voc.yml
+2
-4
configs/ssd/ssd_vgg16_512_voc.yml
configs/ssd/ssd_vgg16_512_voc.yml
+2
-4
configs/yolov3_darknet_voc.yml
configs/yolov3_darknet_voc.yml
+2
-4
configs/yolov3_mobilenet_v1_fruit.yml
configs/yolov3_mobilenet_v1_fruit.yml
+2
-5
configs/yolov3_mobilenet_v1_voc.yml
configs/yolov3_mobilenet_v1_voc.yml
+2
-4
configs/yolov3_r34_voc.yml
configs/yolov3_r34_voc.yml
+2
-4
dataset/voc/create_list.py
dataset/voc/create_list.py
+25
-0
dataset/voc/label_list.txt
dataset/voc/label_list.txt
+20
-0
docs/DATA.md
docs/DATA.md
+33
-24
docs/DATA_cn.md
docs/DATA_cn.md
+35
-28
docs/INSTALL.md
docs/INSTALL.md
+63
-0
docs/INSTALL_cn.md
docs/INSTALL_cn.md
+62
-0
ppdet/data/data_feed.py
ppdet/data/data_feed.py
+8
-9
ppdet/data/source/roidb_source.py
ppdet/data/source/roidb_source.py
+1
-1
ppdet/data/source/voc_loader.py
ppdet/data/source/voc_loader.py
+13
-25
ppdet/utils/download.py
ppdet/utils/download.py
+34
-32
ppdet/utils/voc_utils.py
ppdet/utils/voc_utils.py
+17
-30
未找到文件。
configs/ssd/ssd_mobilenet_v1_voc.yml
浏览文件 @
27545a84
...
...
@@ -61,8 +61,7 @@ SSDTrainFeed:
use_process
:
true
dataset
:
dataset_dir
:
dataset/voc
annotation
:
VOCdevkit/VOC_all/ImageSets/Main/train.txt
image_dir
:
VOCdevkit/VOC_all/JPEGImages
annotation
:
trainval.txt
use_default_label
:
true
SSDEvalFeed
:
...
...
@@ -70,8 +69,7 @@ SSDEvalFeed:
use_process
:
true
dataset
:
dataset_dir
:
dataset/voc
annotation
:
VOCdevkit/VOC_all/ImageSets/Main/val.txt
image_dir
:
VOCdevkit/VOC_all/JPEGImages
annotation
:
test.txt
use_default_label
:
true
drop_last
:
false
...
...
configs/ssd/ssd_vgg16_300_voc.yml
浏览文件 @
27545a84
...
...
@@ -64,8 +64,7 @@ SSDTrainFeed:
batch_size
:
8
dataset
:
dataset_dir
:
dataset/voc
annotation
:
VOCdevkit/VOC_all/ImageSets/Main/train.txt
image_dir
:
VOCdevkit/VOC_all/JPEGImages
annotation
:
trainval.txt
use_default_label
:
true
image_shape
:
[
3
,
300
,
300
]
sample_transforms
:
...
...
@@ -109,8 +108,7 @@ SSDEvalFeed:
batch_size
:
32
dataset
:
dataset_dir
:
dataset/voc
annotation
:
VOCdevkit/VOC_all/ImageSets/Main/val.txt
image_dir
:
VOCdevkit/VOC_all/JPEGImages
annotation
:
test.txt
use_default_label
:
true
drop_last
:
false
image_shape
:
[
3
,
300
,
300
]
...
...
configs/ssd/ssd_vgg16_512_voc.yml
浏览文件 @
27545a84
...
...
@@ -68,8 +68,7 @@ SSDTrainFeed:
batch_size
:
8
dataset
:
dataset_dir
:
dataset/voc
annotation
:
VOCdevkit/VOC_all/ImageSets/Main/train.txt
image_dir
:
VOCdevkit/VOC_all/JPEGImages
annotation
:
trainval.txt
use_default_label
:
true
image_shape
:
[
3
,
512
,
512
]
sample_transforms
:
...
...
@@ -113,8 +112,7 @@ SSDEvalFeed:
batch_size
:
32
dataset
:
dataset_dir
:
dataset/voc
annotation
:
VOCdevkit/VOC_all/ImageSets/Main/val.txt
image_dir
:
VOCdevkit/VOC_all/JPEGImages
annotation
:
test.txt
use_default_label
:
true
drop_last
:
false
image_shape
:
[
3
,
512
,
512
]
...
...
configs/yolov3_darknet_voc.yml
浏览文件 @
27545a84
...
...
@@ -62,8 +62,7 @@ YoloTrainFeed:
batch_size
:
8
dataset
:
dataset_dir
:
dataset/voc
annotation
:
VOCdevkit/VOC_all/ImageSets/Main/train.txt
image_dir
:
VOCdevkit/VOC_all/JPEGImages
annotation
:
trainval.txt
use_default_label
:
true
num_workers
:
8
bufsize
:
128
...
...
@@ -75,8 +74,7 @@ YoloEvalFeed:
image_shape
:
[
3
,
608
,
608
]
dataset
:
dataset_dir
:
dataset/voc
annotation
:
VOCdevkit/VOC_all/ImageSets/Main/val.txt
image_dir
:
VOCdevkit/VOC_all/JPEGImages
annotation
:
test.txt
use_default_label
:
true
YoloTestFeed
:
...
...
configs/yolov3_mobilenet_v1_fruit.yml
浏览文件 @
27545a84
...
...
@@ -64,8 +64,7 @@ YoloTrainFeed:
batch_size
:
1
dataset
:
dataset_dir
:
dataset/fruit/fruit-detection
annotation
:
./ImageSets/Main/train.txt
image_dir
:
./JPEGImages
annotation
:
train.txt
use_default_label
:
false
num_workers
:
16
bufsize
:
128
...
...
@@ -111,8 +110,7 @@ YoloEvalFeed:
image_shape
:
[
3
,
608
,
608
]
dataset
:
dataset_dir
:
dataset/fruit/fruit-detection
annotation
:
./ImageSets/Main/val.txt
image_dir
:
./JPEGImages
annotation
:
val.txt
use_default_label
:
false
...
...
@@ -121,5 +119,4 @@ YoloTestFeed:
image_shape
:
[
3
,
608
,
608
]
dataset
:
dataset_dir
:
dataset/fruit/fruit-detection
annotation
:
./ImageSets/Main/label_list.txt
use_default_label
:
false
configs/yolov3_mobilenet_v1_voc.yml
浏览文件 @
27545a84
...
...
@@ -63,8 +63,7 @@ YoloTrainFeed:
batch_size
:
8
dataset
:
dataset_dir
:
dataset/voc
annotation
:
VOCdevkit/VOC_all/ImageSets/Main/train.txt
image_dir
:
VOCdevkit/VOC_all/JPEGImages
annotation
:
trainval.txt
use_default_label
:
true
num_workers
:
8
bufsize
:
128
...
...
@@ -76,8 +75,7 @@ YoloEvalFeed:
image_shape
:
[
3
,
608
,
608
]
dataset
:
dataset_dir
:
dataset/voc
annotation
:
VOCdevkit/VOC_all/ImageSets/Main/val.txt
image_dir
:
VOCdevkit/VOC_all/JPEGImages
annotation
:
test.txt
use_default_label
:
true
YoloTestFeed
:
...
...
configs/yolov3_r34_voc.yml
浏览文件 @
27545a84
...
...
@@ -65,8 +65,7 @@ YoloTrainFeed:
batch_size
:
8
dataset
:
dataset_dir
:
dataset/voc
annotation
:
VOCdevkit/VOC_all/ImageSets/Main/train.txt
image_dir
:
VOCdevkit/VOC_all/JPEGImages
annotation
:
trainval.txt
use_default_label
:
true
num_workers
:
8
bufsize
:
128
...
...
@@ -78,8 +77,7 @@ YoloEvalFeed:
image_shape
:
[
3
,
608
,
608
]
dataset
:
dataset_dir
:
dataset/voc
annotation
:
VOCdevkit/VOC_all/ImageSets/Main/val.txt
image_dir
:
VOCdevkit/VOC_all/JPEGImages
annotation
:
test.txt
use_default_label
:
true
YoloTestFeed
:
...
...
dataset/voc/create_list.py
0 → 100644
浏览文件 @
27545a84
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import
sys
import
os.path
as
osp
import
logging
from
ppdet.utils.download
import
create_voc_list
logging
.
basicConfig
(
level
=
logging
.
INFO
)
voc_path
=
osp
.
split
(
osp
.
realpath
(
sys
.
argv
[
0
]))[
0
]
create_voc_list
(
voc_path
)
dataset/voc/label_list.txt
0 → 100644
浏览文件 @
27545a84
aeroplane
bicycle
bird
boat
bottle
bus
car
cat
chair
cow
diningtable
dog
horse
motorbike
person
pottedplant
sheep
sofa
train
tvmonitor
docs/DATA.md
浏览文件 @
27545a84
...
...
@@ -27,6 +27,7 @@ Parses various data sources and creates `data.Dataset` instances. Currently,
following data sources are supported:
-
COCO data source
Loads
`COCO`
type datasets with directory structures like this:
```
...
...
@@ -36,46 +37,54 @@ Loads `COCO` type datasets with directory structures like this:
│ ├── instances_train2017.json
│ ├── instances_val2014.json
│ ├── instances_val2017.json
| ...
│
| ...
├── train2017
│ ├── 000000000009.jpg
│ ├── 000000580008.jpg
| ...
│
| ...
├── val2017
│ ├── 000000000139.jpg
│ ├── 000000000285.jpg
│ | ...
| ...
```
-
Pascal VOC data source
Loads
`Pascal VOC`
like datasets with directory structure like this:
```
data/pascalvoc/
├──Annotations
│ ├── i000050.jpg
│ ├── 003876.xml
| ...
├── ImageSets
│ ├──Main
└── train.txt
└── val.txt
└── test.txt
└── dog_train.txt
└── dog_trainval.txt
└── dog_val.txt
└── dog_test.txt
└── ...
│ ├──Layout
└──...
│ ├── Segmentation
└──...
├── JPEGImages
│ ├── 000050.jpg
│ ├── 003876.jpg
dataset/voc/
├── train.txt
├── val.txt
├── test.txt
├── label_list.txt (optional)
├── VOCdevkit/VOC2007
│ ├── Annotations
│ ├── 001789.xml
│ | ...
│ ├── JPEGImages
│ ├── 001789.xml
│ | ...
│ ├── ImageSets
│ | ...
├── VOCdevkit/VOC2012
│ ├── Annotations
│ ├── 003876.xml
│ | ...
│ ├── JPEGImages
│ ├── 003876.xml
│ | ...
│ ├── ImageSets
│ | ...
| ...
```
**NOTE:**
If you set
`use_default_label=False`
in yaml configs, the
`label_list.txt`
of Pascal VOC dataset will be read, otherwise,
`label_list.txt`
is unnecessary and
the default Pascal VOC label list which defined in
[
voc\_loader.py
](
../ppdet/data/source/voc_loader.py
)
will be used.
-
Roidb data source
A generalized data source serialized as pickle files, which have the following
structure:
...
...
docs/DATA_cn.md
浏览文件 @
27545a84
...
...
@@ -11,9 +11,12 @@
子功能介绍:
1.
数据解析
数据解析得到的是
`data.Dataset`
,实现逻辑位于
`data.source`
中。通过它可以实现解析不同格式的数据集,已支持的数据源包括:
数据解析得到的是
`data.Dataset`
,实现逻辑位于
`data.source`
中。通过它可以实现解析不同格式的数据集,已支持的数据源包括:
-
COCO数据源
该数据集目前分为COCO2012和COCO2017,主要由json文件和image文件组成,其组织结构如下所示:
该数据集目前分为COCO2014和COCO2017,主要由json文件和image文件组成,其组织结构如下所示:
```
dataset/coco/
...
...
@@ -22,49 +25,53 @@
│ ├── instances_train2017.json
│ ├── instances_val2014.json
│ ├── instances_val2017.json
| ...
│
| ...
├── train2017
│ ├── 000000000009.jpg
│ ├── 000000580008.jpg
| ...
│
| ...
├── val2017
│ ├── 000000000139.jpg
│ ├── 000000000285.jpg
│ | ...
| ...
```
-
Pascal VOC数据源
该数据集目前分为VOC2007和VOC2012,主要由xml文件和image文件组成,其组织结构如下所示:
该数据集目前分为VOC2007和VOC2012,主要由xml文件和image文件组成,其组织结构如下所示:
```
data/pascalvoc/
├──Annotations
│ ├── i000050.jpg
│ ├── 003876.xml
| ...
├── ImageSets
│ ├──Main
└── train.txt
└── val.txt
└── test.txt
└── dog_train.txt
└── dog_trainval.txt
└── dog_val.txt
└── dog_test.txt
└── ...
│ ├──Layout
└──...
│ ├── Segmentation
└──...
├── JPEGImages
│ ├── 000050.jpg
│ ├── 003876.jpg
dataset/voc/
├── train.txt
├── val.txt
├── test.txt
├── label_list.txt (optional)
├── VOCdevkit/VOC2007
│ ├── Annotations
│ ├── 001789.xml
│ | ...
│ ├── JPEGImages
│ ├── 001789.xml
│ | ...
│ ├── ImageSets
│ | ...
├── VOCdevkit/VOC2012
│ ├── Annotations
│ ├── 003876.xml
│ | ...
│ ├── JPEGImages
│ ├── 003876.xml
│ | ...
│ ├── ImageSets
│ | ...
| ...
```
**说明:**
如果你在yaml配置文件中设置
`use_default_label=False`
, 将从
`label_list.txt`
中读取类别列表,反之则可以没有
`label_list.txt`
文件,检测库会使用Pascal VOC数据集的默
认类别列表,默认类别列表定义在
[
voc\_loader.py
](
../ppdet/data/source/voc_loader.py
)
-
Roidb数据源
该数据集主要由COCO数据集和Pascal VOC数据集转换而成的pickle文件,包含一个dict,而dict中只包含一个命名为‘records’的list(可能还有一个命名为‘cname2cid’的字典),其内容如下所示:
...
...
docs/INSTALL.md
浏览文件 @
27545a84
...
...
@@ -111,6 +111,13 @@ ln -sf <path/to/coco> <path/to/paddle_detection>/dataset/coco
ln -sf <path/to/voc> <path/to/paddle_detection>/dataset/voc
```
For Pascal VOC dataset, you should create file list by:
```
export PYTHONPATH=$PYTHONPATH:.
python dataset/voc/create_list.py
```
**Download datasets manually:**
On the other hand, to download the datasets, run the following commands:
...
...
@@ -122,13 +129,69 @@ export PYTHONPATH=$PYTHONPATH:.
python dataset/coco/download_coco.py
```
`COCO`
dataset with directory structures like this:
```
dataset/coco/
├── annotations
│ ├── instances_train2014.json
│ ├── instances_train2017.json
│ ├── instances_val2014.json
│ ├── instances_val2017.json
│ | ...
├── train2017
│ ├── 000000000009.jpg
│ ├── 000000580008.jpg
│ | ...
├── val2017
│ ├── 000000000139.jpg
│ ├── 000000000285.jpg
│ | ...
| ...
```
-
Pascal VOC
```
export PYTHONPATH=$PYTHONPATH:.
python dataset/voc/download_voc.py
python dataset/voc/create_list.py
```
`Pascal VOC`
dataset with directory structure like this:
```
dataset/voc/
├── train.txt
├── val.txt
├── test.txt
├── label_list.txt (optional)
├── VOCdevkit/VOC2007
│ ├── Annotations
│ ├── 001789.xml
│ | ...
│ ├── JPEGImages
│ ├── 001789.xml
│ | ...
│ ├── ImageSets
│ | ...
├── VOCdevkit/VOC2012
│ ├── Annotations
│ ├── 003876.xml
│ | ...
│ ├── JPEGImages
│ ├── 003876.xml
│ | ...
│ ├── ImageSets
│ | ...
| ...
```
**NOTE:**
If you set
`use_default_label=False`
in yaml configs, the
`label_list.txt`
of Pascal VOC dataset will be read, otherwise,
`label_list.txt`
is unnecessary and
the default Pascal VOC label list which defined in
[
voc\_loader.py
](
../ppdet/data/source/voc_loader.py
)
will be used.
**Download datasets automatically:**
If a training session is started but the dataset is not setup properly (e.g,
...
...
docs/INSTALL_cn.md
浏览文件 @
27545a84
...
...
@@ -108,6 +108,13 @@ ln -sf <path/to/coco> <path/to/paddle_detection>/dataset/coco
ln -sf <path/to/voc> <path/to/paddle_detection>/dataset/voc
```
对于Pascal VOC数据集,需通过如下命令创建文件列表:
```
export PYTHONPATH=$PYTHONPATH:.
python dataset/voc/create_list.py
```
**手动下载数据集:**
若您本地没有数据集,可通过如下命令下载:
...
...
@@ -119,13 +126,68 @@ export PYTHONPATH=$PYTHONPATH:.
python dataset/coco/download_coco.py
```
`COCO`
数据集目录结构如下:
```
dataset/coco/
├── annotations
│ ├── instances_train2014.json
│ ├── instances_train2017.json
│ ├── instances_val2014.json
│ ├── instances_val2017.json
│ | ...
├── train2017
│ ├── 000000000009.jpg
│ ├── 000000580008.jpg
│ | ...
├── val2017
│ ├── 000000000139.jpg
│ ├── 000000000285.jpg
│ | ...
| ...
```
-
Pascal VOC
```
export PYTHONPATH=$PYTHONPATH:.
python dataset/voc/download_voc.py
python dataset/voc/create_list.py
```
`Pascal VOC`
数据集目录结构如下:
```
dataset/voc/
├── train.txt
├── val.txt
├── test.txt
├── label_list.txt (optional)
├── VOCdevkit/VOC2007
│ ├── Annotations
│ ├── 001789.xml
│ | ...
│ ├── JPEGImages
│ ├── 001789.xml
│ | ...
│ ├── ImageSets
│ | ...
├── VOCdevkit/VOC2012
│ ├── Annotations
│ ├── 003876.xml
│ | ...
│ ├── JPEGImages
│ ├── 003876.xml
│ | ...
│ ├── ImageSets
│ | ...
| ...
```
**说明:**
如果你在yaml配置文件中设置
`use_default_label=False`
, 将从
`label_list.txt`
中读取类别列表,反之则可以没有
`label_list.txt`
文件,检测库会使用Pascal VOC数据集的默
认类别列表,默认类别列表定义在
[
voc\_loader.py
](
../ppdet/data/source/voc_loader.py
)
**自动下载数据集:**
若您在数据集未成功设置(例如,在
`dataset/coco`
或
`dataset/voc`
中找不到)的情况下开始运行,
...
...
ppdet/data/data_feed.py
浏览文件 @
27545a84
...
...
@@ -219,7 +219,7 @@ class DataSet(object):
def
__init__
(
self
,
annotation
,
image_dir
,
image_dir
=
None
,
dataset_dir
=
None
,
use_default_label
=
None
):
super
(
DataSet
,
self
).
__init__
()
...
...
@@ -229,7 +229,7 @@ class DataSet(object):
self
.
use_default_label
=
use_default_label
COCO_DATASET_DIR
=
'coco'
COCO_DATASET_DIR
=
'
dataset/
coco'
COCO_TRAIN_ANNOTATION
=
'annotations/instances_train2017.json'
COCO_TRAIN_IMAGE_DIR
=
'train2017'
COCO_VAL_ANNOTATION
=
'annotations/instances_val2017.json'
...
...
@@ -246,12 +246,11 @@ class CocoDataSet(DataSet):
dataset_dir
=
dataset_dir
,
annotation
=
annotation
,
image_dir
=
image_dir
)
VOC_DATASET_DIR
=
'pascalvoc'
VOC_TRAIN_ANNOTATION
=
'VOCdevkit/VOC_all/ImageSets/Main/train.txt'
VOC_VAL_ANNOTATION
=
'VOCdevkit/VOC_all/ImageSets/Main/val.txt'
VOC_TEST_ANNOTATION
=
'VOCdevkit/VOC_all/ImageSets/Main/test.txt'
VOC_IMAGE_DIR
=
'VOCdevkit/VOC_all/JPEGImages'
VOC_USE_DEFAULT_LABEL
=
None
VOC_DATASET_DIR
=
'dataset/voc'
VOC_TRAIN_ANNOTATION
=
'train.txt'
VOC_VAL_ANNOTATION
=
'val.txt'
VOC_IMAGE_DIR
=
None
VOC_USE_DEFAULT_LABEL
=
True
@
serializable
...
...
@@ -843,7 +842,7 @@ class SSDTestFeed(DataFeed):
__doc__
=
DataFeed
.
__doc__
def
__init__
(
self
,
dataset
=
SimpleDataSet
(
VOC_
TEST
_ANNOTATION
).
__dict__
,
dataset
=
SimpleDataSet
(
VOC_
VAL
_ANNOTATION
).
__dict__
,
fields
=
[
'image'
,
'im_id'
,
'im_shape'
],
image_shape
=
[
3
,
300
,
300
],
sample_transforms
=
[
...
...
ppdet/data/source/roidb_source.py
浏览文件 @
27545a84
...
...
@@ -62,7 +62,7 @@ class RoiDbSource(Dataset):
assert
os
.
path
.
isfile
(
anno_file
)
or
os
.
path
.
isdir
(
anno_file
),
\
'anno_file {} is not a file or a directory'
.
format
(
anno_file
)
self
.
_fname
=
anno_file
self
.
_image_dir
=
image_dir
self
.
_image_dir
=
image_dir
if
image_dir
is
not
None
else
''
if
image_dir
is
not
None
:
assert
os
.
path
.
isdir
(
image_dir
),
\
'image_dir {} is not a directory'
.
format
(
image_dir
)
...
...
ppdet/data/source/voc_loader.py
浏览文件 @
27545a84
...
...
@@ -26,8 +26,7 @@ def get_roidb(anno_path,
Load VOC records with annotations in xml directory 'anno_path'
Notes:
${anno_path}/ImageSets/Main/train.txt must contains xml file names for annotations
${anno_path}/Annotations/xxx.xml must contain annotation info for one record
${anno_path} must contains xml file and image file path for annotations
Args:
anno_path (str): root directory for voc annotation data
...
...
@@ -53,11 +52,7 @@ def get_roidb(anno_path,
'cname2id' is a dict to map category name to class id
"""
txt_file
=
anno_path
part
=
txt_file
.
split
(
'ImageSets'
)
xml_path
=
os
.
path
.
join
(
part
[
0
],
'Annotations'
)
assert
os
.
path
.
isfile
(
txt_file
)
and
\
os
.
path
.
isdir
(
xml_path
),
'invalid xml path'
data_dir
=
os
.
path
.
dirname
(
anno_path
)
records
=
[]
ct
=
0
...
...
@@ -67,17 +62,16 @@ def get_roidb(anno_path,
# mapping category name to class id
# background:0, first_class:1, second_class:2, ...
with
open
(
txt_file
,
'r'
)
as
fr
:
with
open
(
anno_path
,
'r'
)
as
fr
:
while
True
:
line
=
fr
.
readline
()
if
not
line
:
break
fname
=
line
.
strip
()
+
'.xml'
xml_file
=
os
.
path
.
join
(
xml_path
,
fname
)
img_file
,
xml_file
=
[
os
.
path
.
join
(
data_dir
,
x
)
\
for
x
in
line
.
strip
().
split
()[:
2
]]
if
not
os
.
path
.
isfile
(
xml_file
):
continue
tree
=
ET
.
parse
(
xml_file
)
im_fname
=
tree
.
find
(
'filename'
).
text
if
tree
.
find
(
'id'
)
is
None
:
im_id
=
np
.
array
([
ct
])
else
:
...
...
@@ -114,7 +108,7 @@ def get_roidb(anno_path,
is_crowd
[
i
][
0
]
=
0
difficult
[
i
][
0
]
=
_difficult
voc_rec
=
{
'im_file'
:
im
_fnam
e
,
'im_file'
:
im
g_fil
e
,
'im_id'
:
im_id
,
'h'
:
im_h
,
'w'
:
im_w
,
...
...
@@ -144,8 +138,7 @@ def load(anno_path,
xml directory 'anno_path'
Notes:
${anno_path}/ImageSets/Main/train.txt must contains xml file names for annotations
${anno_path}/Annotations/xxx.xml must contain annotation info for one record
${anno_path} must contains xml file and image file path for annotations
Args:
@anno_path (str): root directory for voc annotation data
...
...
@@ -171,11 +164,7 @@ def load(anno_path,
'cname2id' is a dict to map category name to class id
"""
txt_file
=
anno_path
part
=
txt_file
.
split
(
'ImageSets'
)
xml_path
=
os
.
path
.
join
(
part
[
0
],
'Annotations'
)
assert
os
.
path
.
isfile
(
txt_file
)
and
\
os
.
path
.
isdir
(
xml_path
),
'invalid xml path'
data_dir
=
os
.
path
.
dirname
(
anno_path
)
# mapping category name to class id
# if with_background is True:
...
...
@@ -186,7 +175,7 @@ def load(anno_path,
ct
=
0
cname2cid
=
{}
if
not
use_default_label
:
label_path
=
os
.
path
.
join
(
part
[
0
],
'ImageSets/Main/
label_list.txt'
)
label_path
=
os
.
path
.
join
(
data_dir
,
'
label_list.txt'
)
with
open
(
label_path
,
'r'
)
as
fr
:
label_id
=
int
(
with_background
)
for
line
in
fr
.
readlines
():
...
...
@@ -195,17 +184,16 @@ def load(anno_path,
else
:
cname2cid
=
pascalvoc_label
(
with_background
)
with
open
(
txt_file
,
'r'
)
as
fr
:
with
open
(
anno_path
,
'r'
)
as
fr
:
while
True
:
line
=
fr
.
readline
()
if
not
line
:
break
fname
=
line
.
strip
()
+
'.xml'
xml_file
=
os
.
path
.
join
(
xml_path
,
fname
)
img_file
,
xml_file
=
[
os
.
path
.
join
(
data_dir
,
x
)
\
for
x
in
line
.
strip
().
split
()[:
2
]]
if
not
os
.
path
.
isfile
(
xml_file
):
continue
tree
=
ET
.
parse
(
xml_file
)
im_fname
=
tree
.
find
(
'filename'
).
text
if
tree
.
find
(
'id'
)
is
None
:
im_id
=
np
.
array
([
ct
])
else
:
...
...
@@ -235,7 +223,7 @@ def load(anno_path,
is_crowd
[
i
][
0
]
=
0
difficult
[
i
][
0
]
=
_difficult
voc_rec
=
{
'im_file'
:
im
_fnam
e
,
'im_file'
:
im
g_fil
e
,
'im_id'
:
im_id
,
'h'
:
im_h
,
'w'
:
im_w
,
...
...
ppdet/utils/download.py
浏览文件 @
27545a84
...
...
@@ -25,7 +25,7 @@ import hashlib
import
tarfile
import
zipfile
from
.voc_utils
import
merge_and_
create_list
from
.voc_utils
import
create_list
import
logging
logger
=
logging
.
getLogger
(
__name__
)
...
...
@@ -59,7 +59,7 @@ DATASETS = {
(
'http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar'
,
'b6e924de25625d8de591ea690078ad9f'
,
),
],
[
"VOCdevkit/VOC
_all
"
]),
],
[
"VOCdevkit/VOC
2012"
,
"VOCdevkit/VOC2007
"
]),
'wider_face'
:
([
(
'https://dataset.bj.bcebos.com/wider_face/WIDER_train.zip'
,
...
...
@@ -85,7 +85,8 @@ def get_weights_path(url):
"""Get weights path from WEIGHT_HOME, if not exists,
download it from url.
"""
return
get_path
(
url
,
WEIGHTS_HOME
)
path
,
_
=
get_path
(
url
,
WEIGHTS_HOME
)
return
path
def
get_dataset_path
(
path
,
annotation
,
image_dir
):
...
...
@@ -107,19 +108,26 @@ def get_dataset_path(path, annotation, image_dir):
"{}"
.
format
(
path
,
name
))
data_dir
=
osp
.
join
(
DATASET_HOME
,
name
)
# For voc, only check
merged dir VOC_all
# For voc, only check
dir VOCdevkit/VOC2012, VOCdevkit/VOC2007
if
name
==
'voc'
:
check_dir
=
osp
.
join
(
data_dir
,
dataset
[
1
][
0
])
if
osp
.
exists
(
check_dir
):
logger
.
info
(
"Found {}"
.
format
(
check_dir
))
exists
=
True
for
sub_dir
in
dataset
[
1
]:
check_dir
=
osp
.
join
(
data_dir
,
sub_dir
)
if
osp
.
exists
(
check_dir
):
logger
.
info
(
"Found {}"
.
format
(
check_dir
))
else
:
exists
=
False
if
exists
:
return
data_dir
# voc exist is checked above, voc is not exist here
check_exist
=
name
!=
'voc'
for
url
,
md5sum
in
dataset
[
0
]:
get_path
(
url
,
data_dir
,
md5sum
)
get_path
(
url
,
data_dir
,
md5sum
,
check_exist
)
# voc should
merge dir and
create list after download
# voc should create list after download
if
name
==
'voc'
:
_merge_voc_dir
(
data_dir
,
dataset
[
1
][
0
]
)
create_voc_list
(
data_dir
)
return
data_dir
# not match any dataset in DATASETS
...
...
@@ -129,26 +137,17 @@ def get_dataset_path(path, annotation, image_dir):
osp
.
split
(
path
)[
-
1
]))
def
_merge_voc_dir
(
data_dir
,
output_subdir
):
logger
.
info
(
"Download voc dataset successed, merge "
"VOC2007 and VOC2012 to VOC_all..."
)
output_dir
=
osp
.
join
(
data_dir
,
output_subdir
)
devkit_dir
=
"/"
.
join
(
output_dir
.
split
(
'/'
)[:
-
1
])
def
create_voc_list
(
data_dir
,
devkit_subdir
=
'VOCdevkit'
):
logger
.
info
(
"Create voc file list..."
)
devkit_dir
=
osp
.
join
(
data_dir
,
devkit_subdir
)
years
=
[
'2007'
,
'2012'
]
# merge dir in output_tmp_dir at first, move to
# output_dir after merge sucessed.
output_tmp_dir
=
osp
.
join
(
data_dir
,
'tmp'
)
if
osp
.
isdir
(
output_tmp_dir
):
shutil
.
rmtree
(
output_tmp_dir
)
# NOTE: since using auto download VOC
# dataset, VOC default label list should be used,
# do not generate label_list.txt here. For default
# label, see ../data/source/voc_loader.py
merge_and_create_list
(
devkit_dir
,
years
,
output_tmp_dir
)
shutil
.
move
(
output_tmp_dir
,
output_dir
)
# remove source directory VOC2007 and VOC2012
shutil
.
rmtree
(
osp
.
join
(
devkit_dir
,
"VOC2007"
))
shutil
.
rmtree
(
osp
.
join
(
devkit_dir
,
"VOC2012"
))
create_list
(
devkit_dir
,
years
,
data_dir
)
logger
.
info
(
"Create voc file list finished"
)
def
map_path
(
url
,
root_dir
):
...
...
@@ -161,7 +160,7 @@ def map_path(url, root_dir):
return
osp
.
join
(
root_dir
,
fpath
)
def
get_path
(
url
,
root_dir
,
md5sum
=
None
):
def
get_path
(
url
,
root_dir
,
md5sum
=
None
,
check_exist
=
True
):
""" Download from given url to root_dir.
if file or directory specified by url is exists under
root_dir, return the path directly, otherwise download
...
...
@@ -178,20 +177,25 @@ def get_path(url, root_dir, md5sum=None):
# For same zip file, decompressed directory name different
# from zip file name, rename by following map
decompress_name_map
=
{
"VOC"
:
"VOCdevkit/VOC_all"
,
"VOCtrainval_11-May-2012"
:
"VOCdevkit/VOC2012"
,
"VOCtrainval_06-Nov-2007"
:
"VOCdevkit/VOC2007"
,
"VOCtest_06-Nov-2007"
:
"VOCdevkit/VOC2007"
,
"annotations_trainval"
:
"annotations"
}
for
k
,
v
in
decompress_name_map
.
items
():
if
fullpath
.
find
(
k
)
>=
0
:
fullpath
=
'/'
.
join
(
fullpath
.
split
(
'/'
)[:
-
1
]
+
[
v
])
if
osp
.
exists
(
fullpath
):
exist_flag
=
False
if
osp
.
exists
(
fullpath
)
and
check_exist
:
exist_flag
=
True
logger
.
info
(
"Found {}"
.
format
(
fullpath
))
else
:
exist_flag
=
False
fullname
=
_download
(
url
,
root_dir
,
md5sum
)
_decompress
(
fullname
)
return
fullpath
return
fullpath
,
exist_flag
def
download_dataset
(
path
,
dataset
=
None
):
...
...
@@ -201,9 +205,7 @@ def download_dataset(path, dataset=None):
return
dataset_info
=
DATASETS
[
dataset
][
0
]
for
info
in
dataset_info
:
get_path
(
info
[
0
],
path
,
info
[
1
])
if
dataset
==
'voc'
:
_merge_voc_dir
(
path
,
DATASETS
[
dataset
][
1
][
0
])
get_path
(
info
[
0
],
path
,
info
[
1
],
False
)
logger
.
info
(
"Download dataset {} finished."
.
format
(
dataset
))
...
...
ppdet/utils/voc_utils.py
浏览文件 @
27545a84
...
...
@@ -22,20 +22,15 @@ import re
import
random
import
shutil
__all__
=
[
'
merge_and_
create_list'
]
__all__
=
[
'create_list'
]
def
merge_and_
create_list
(
devkit_dir
,
years
,
output_dir
):
def
create_list
(
devkit_dir
,
years
,
output_dir
):
"""
Merge VOC2007 and VOC2012 to output_dir and create following list:
1. train.txt
2. val.txt
3. test.txt
create following list:
1. trainval.txt
2. test.txt
"""
os
.
makedirs
(
osp
.
join
(
output_dir
,
'Annotations/'
))
os
.
makedirs
(
osp
.
join
(
output_dir
,
'ImageSets/Main/'
))
os
.
makedirs
(
osp
.
join
(
output_dir
,
'JPEGImages/'
))
trainval_list
=
[]
test_list
=
[]
for
year
in
years
:
...
...
@@ -43,20 +38,16 @@ def merge_and_create_list(devkit_dir, years, output_dir):
trainval_list
.
extend
(
trainval
)
test_list
.
extend
(
test
)
main_dir
=
osp
.
join
(
output_dir
,
'ImageSets/Main/'
)
random
.
shuffle
(
trainval_list
)
with
open
(
osp
.
join
(
main_dir
,
'train
.txt'
),
'w'
)
as
ftrainval
:
with
open
(
osp
.
join
(
output_dir
,
'trainval
.txt'
),
'w'
)
as
ftrainval
:
for
item
in
trainval_list
:
ftrainval
.
write
(
item
+
'
\n
'
)
ftrainval
.
write
(
item
[
0
]
+
' '
+
item
[
1
]
+
'
\n
'
)
with
open
(
osp
.
join
(
main_dir
,
'val.txt'
),
'w'
)
as
fval
:
with
open
(
osp
.
join
(
main_dir
,
'test.txt'
),
'w'
)
as
ftest
:
ct
=
0
for
item
in
test_list
:
ct
+=
1
fval
.
write
(
item
+
'
\n
'
)
if
ct
<=
1000
:
ftest
.
write
(
item
+
'
\n
'
)
with
open
(
osp
.
join
(
output_dir
,
'test.txt'
),
'w'
)
as
fval
:
ct
=
0
for
item
in
test_list
:
ct
+=
1
fval
.
write
(
item
[
0
]
+
' '
+
item
[
1
]
+
'
\n
'
)
def
_get_voc_dir
(
devkit_dir
,
year
,
type
):
...
...
@@ -86,14 +77,10 @@ def _walk_voc_dir(devkit_dir, year, output_dir):
if
name_prefix
in
added
:
continue
added
.
add
(
name_prefix
)
ann_path
=
osp
.
join
(
annotation_dir
,
name_prefix
+
'.xml'
)
img_path
=
osp
.
join
(
img_dir
,
name_prefix
+
'.jpg'
)
new_ann_path
=
osp
.
join
(
output_dir
,
'Annotations/'
,
name_prefix
+
'.xml'
)
new_img_path
=
osp
.
join
(
output_dir
,
'JPEGImages/'
,
name_prefix
+
'.jpg'
)
shutil
.
copy
(
ann_path
,
new_ann_path
)
shutil
.
copy
(
img_path
,
new_img_path
)
img_ann_list
.
append
(
name_prefix
)
ann_path
=
osp
.
join
(
osp
.
relpath
(
annotation_dir
,
output_dir
),
name_prefix
+
'.xml'
)
img_path
=
osp
.
join
(
osp
.
relpath
(
img_dir
,
output_dir
),
name_prefix
+
'.jpg'
)
img_ann_list
.
append
((
img_path
,
ann_path
))
return
trainval_list
,
test_list
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录