Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
magicwindyyd
mindspore
提交
340d98a4
M
mindspore
项目概览
magicwindyyd
/
mindspore
与 Fork 源项目一致
Fork自
MindSpore / mindspore
通知
1
Star
1
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
M
mindspore
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
340d98a4
编写于
7月 06, 2020
作者:
T
tinazhang
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
added test case to cifar_op
update cifar10 dataset fixing missing error handling code in validator
上级
089623ad
变更
12
隐藏空白更改
内联
并排
Showing
12 changed file
with
393 addition
and
196 deletion
+393
-196
mindspore/dataset/core/validator_helpers.py
mindspore/dataset/core/validator_helpers.py
+2
-0
tests/ut/data/dataset/testCifar100Data/datasetSchema.json
tests/ut/data/dataset/testCifar100Data/datasetSchema.json
+0
-21
tests/ut/data/dataset/testCifar100Data/datasetSchemaTestRepeat.json
...ata/dataset/testCifar100Data/datasetSchemaTestRepeat.json
+0
-21
tests/ut/data/dataset/testCifar10Data/data_batch_1.bin
tests/ut/data/dataset/testCifar10Data/data_batch_1.bin
+0
-0
tests/ut/data/dataset/testCifar10Data/datasetDistributionAll.json
.../data/dataset/testCifar10Data/datasetDistributionAll.json
+0
-9
tests/ut/data/dataset/testCifar10Data/datasetDistributionRandom.json
...ta/dataset/testCifar10Data/datasetDistributionRandom.json
+0
-9
tests/ut/data/dataset/testCifar10Data/datasetDistributionUnique.json
...ta/dataset/testCifar10Data/datasetDistributionUnique.json
+0
-9
tests/ut/data/dataset/testCifar10Data/datasetSchema.json
tests/ut/data/dataset/testCifar10Data/datasetSchema.json
+0
-16
tests/ut/data/dataset/testCifar10Data/datasetSchemaTestRepeat.json
...data/dataset/testCifar10Data/datasetSchemaTestRepeat.json
+0
-16
tests/ut/python/dataset/test_cifarop.py
tests/ut/python/dataset/test_cifarop.py
+0
-91
tests/ut/python/dataset/test_config.py
tests/ut/python/dataset/test_config.py
+4
-4
tests/ut/python/dataset/test_datasets_cifarop.py
tests/ut/python/dataset/test_datasets_cifarop.py
+387
-0
未找到文件。
mindspore/dataset/core/validator_helpers.py
浏览文件 @
340d98a4
...
@@ -271,6 +271,8 @@ def check_sampler_shuffle_shard_options(param_dict):
...
@@ -271,6 +271,8 @@ def check_sampler_shuffle_shard_options(param_dict):
if
sampler
is
not
None
:
if
sampler
is
not
None
:
if
shuffle
is
not
None
:
if
shuffle
is
not
None
:
raise
RuntimeError
(
"sampler and shuffle cannot be specified at the same time."
)
raise
RuntimeError
(
"sampler and shuffle cannot be specified at the same time."
)
if
num_shards
is
not
None
:
raise
RuntimeError
(
"sampler and sharding cannot be specified at the same time."
)
if
num_shards
is
not
None
:
if
num_shards
is
not
None
:
check_pos_int32
(
num_shards
)
check_pos_int32
(
num_shards
)
...
...
tests/ut/data/dataset/testCifar100Data/datasetSchema.json
已删除
100644 → 0
浏览文件 @
089623ad
{
"datasetType"
:
"CIFAR100"
,
"numRows"
:
100
,
"columns"
:
{
"image"
:
{
"type"
:
"uint8"
,
"rank"
:
1
,
"t_impl"
:
"cvmat"
},
"coarse_label"
:
{
"type"
:
"uint32"
,
"rank"
:
1
,
"t_impl"
:
"flex"
},
"fine_label"
:
{
"type"
:
"uint32"
,
"rank"
:
1
,
"t_impl"
:
"flex"
}
}
}
tests/ut/data/dataset/testCifar100Data/datasetSchemaTestRepeat.json
已删除
100644 → 0
浏览文件 @
089623ad
{
"datasetType"
:
"CIFAR100"
,
"numRows"
:
33
,
"columns"
:
{
"image"
:
{
"type"
:
"uint8"
,
"rank"
:
1
,
"t_impl"
:
"cvmat"
},
"coarse_label"
:
{
"type"
:
"uint32"
,
"rank"
:
1
,
"t_impl"
:
"flex"
},
"fine_label"
:
{
"type"
:
"uint32"
,
"rank"
:
1
,
"t_impl"
:
"flex"
}
}
}
tests/ut/data/dataset/testCifar10Data/data_batch_1.bin
浏览文件 @
340d98a4
无法预览此类型文件
tests/ut/data/dataset/testCifar10Data/datasetDistributionAll.json
已删除
100644 → 0
浏览文件 @
089623ad
{
"deviceNum"
:
3
,
"deviceId"
:
1
,
"shardConfig"
:
"ALL"
,
"shuffle"
:
"ON"
,
"seed"
:
0
,
"epoch"
:
2
}
tests/ut/data/dataset/testCifar10Data/datasetDistributionRandom.json
已删除
100644 → 0
浏览文件 @
089623ad
{
"deviceNum"
:
3
,
"deviceId"
:
1
,
"shardConfig"
:
"RANDOM"
,
"shuffle"
:
"ON"
,
"seed"
:
0
,
"epoch"
:
1
}
tests/ut/data/dataset/testCifar10Data/datasetDistributionUnique.json
已删除
100644 → 0
浏览文件 @
089623ad
{
"deviceNum"
:
3
,
"deviceId"
:
1
,
"shardConfig"
:
"UNIQUE"
,
"shuffle"
:
"ON"
,
"seed"
:
0
,
"epoch"
:
3
}
tests/ut/data/dataset/testCifar10Data/datasetSchema.json
已删除
100644 → 0
浏览文件 @
089623ad
{
"datasetType"
:
"CIFAR10"
,
"numRows"
:
60000
,
"columns"
:
{
"image"
:
{
"type"
:
"uint8"
,
"rank"
:
1
,
"t_impl"
:
"cvmat"
},
"label"
:
{
"type"
:
"uint32"
,
"rank"
:
1
,
"t_impl"
:
"flex"
}
}
}
tests/ut/data/dataset/testCifar10Data/datasetSchemaTestRepeat.json
已删除
100644 → 0
浏览文件 @
089623ad
{
"datasetType"
:
"CIFAR10"
,
"numRows"
:
33
,
"columns"
:
{
"image"
:
{
"type"
:
"uint8"
,
"rank"
:
1
,
"t_impl"
:
"cvmat"
},
"label"
:
{
"type"
:
"uint32"
,
"rank"
:
1
,
"t_impl"
:
"flex"
}
}
}
tests/ut/python/dataset/test_cifarop.py
已删除
100644 → 0
浏览文件 @
089623ad
# Copyright 2019 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
import
os
import
numpy
as
np
import
mindspore.dataset
as
ds
from
mindspore
import
log
as
logger
# Data for CIFAR and MNIST are not part of build tree
# They need to be downloaded directly
# prep_data.py can be executed or code below
# import sys
# sys.path.insert(0,"../../data")
# import prep_data
# prep_data.download_all_for_test("../../data")
DATA_DIR_10
=
"../data/dataset/testCifar10Data"
DATA_DIR_100
=
"../data/dataset/testCifar100Data"
def
load_cifar
(
path
):
raw
=
np
.
empty
(
0
,
dtype
=
np
.
uint8
)
for
file_name
in
os
.
listdir
(
path
):
if
file_name
.
endswith
(
".bin"
):
with
open
(
os
.
path
.
join
(
path
,
file_name
),
mode
=
'rb'
)
as
file
:
raw
=
np
.
append
(
raw
,
np
.
fromfile
(
file
,
dtype
=
np
.
uint8
),
axis
=
0
)
raw
=
raw
.
reshape
(
-
1
,
3073
)
labels
=
raw
[:,
0
]
images
=
raw
[:,
1
:]
images
=
images
.
reshape
(
-
1
,
3
,
32
,
32
)
images
=
images
.
transpose
(
0
,
2
,
3
,
1
)
return
images
,
labels
def
test_case_dataset_cifar10
():
"""
dataset parameter
"""
logger
.
info
(
"Test dataset parameter"
)
# apply dataset operations
data1
=
ds
.
Cifar10Dataset
(
DATA_DIR_10
,
100
)
num_iter
=
0
for
_
in
data1
.
create_dict_iterator
():
# in this example, each dictionary has keys "image" and "label"
num_iter
+=
1
assert
num_iter
==
100
def
test_case_dataset_cifar100
():
"""
dataset parameter
"""
logger
.
info
(
"Test dataset parameter"
)
# apply dataset operations
data1
=
ds
.
Cifar100Dataset
(
DATA_DIR_100
,
100
)
num_iter
=
0
for
_
in
data1
.
create_dict_iterator
():
# in this example, each dictionary has keys "image" and "label"
num_iter
+=
1
assert
num_iter
==
100
def
test_reading_cifar10
():
"""
Validate CIFAR10 image readings
"""
data1
=
ds
.
Cifar10Dataset
(
DATA_DIR_10
,
100
,
shuffle
=
False
)
images
,
labels
=
load_cifar
(
DATA_DIR_10
)
for
i
,
d
in
enumerate
(
data1
.
create_dict_iterator
()):
np
.
testing
.
assert_array_equal
(
d
[
"image"
],
images
[
i
])
np
.
testing
.
assert_array_equal
(
d
[
"label"
],
labels
[
i
])
if
__name__
==
'__main__'
:
test_case_dataset_cifar10
()
test_case_dataset_cifar100
()
test_reading_cifar10
()
tests/ut/python/dataset/test_config.py
浏览文件 @
340d98a4
...
@@ -245,17 +245,17 @@ def test_deterministic_run_distribution():
...
@@ -245,17 +245,17 @@ def test_deterministic_run_distribution():
# First dataset
# First dataset
data1
=
ds
.
TFRecordDataset
(
DATA_DIR
,
SCHEMA_DIR
,
columns_list
=
[
"image"
],
shuffle
=
False
)
data1
=
ds
.
TFRecordDataset
(
DATA_DIR
,
SCHEMA_DIR
,
columns_list
=
[
"image"
],
shuffle
=
False
)
random_
cro
p_op
=
c_vision
.
RandomHorizontalFlip
(
0.1
)
random_
horizontal_fli
p_op
=
c_vision
.
RandomHorizontalFlip
(
0.1
)
decode_op
=
c_vision
.
Decode
()
decode_op
=
c_vision
.
Decode
()
data1
=
data1
.
map
(
input_columns
=
[
"image"
],
operations
=
decode_op
)
data1
=
data1
.
map
(
input_columns
=
[
"image"
],
operations
=
decode_op
)
data1
=
data1
.
map
(
input_columns
=
[
"image"
],
operations
=
random_
cro
p_op
)
data1
=
data1
.
map
(
input_columns
=
[
"image"
],
operations
=
random_
horizontal_fli
p_op
)
# Second dataset
# Second dataset
data2
=
ds
.
TFRecordDataset
(
DATA_DIR
,
SCHEMA_DIR
,
columns_list
=
[
"image"
],
shuffle
=
False
)
data2
=
ds
.
TFRecordDataset
(
DATA_DIR
,
SCHEMA_DIR
,
columns_list
=
[
"image"
],
shuffle
=
False
)
data2
=
data2
.
map
(
input_columns
=
[
"image"
],
operations
=
decode_op
)
data2
=
data2
.
map
(
input_columns
=
[
"image"
],
operations
=
decode_op
)
# If seed is set up on constructor, so the two ops output deterministic sequence
# If seed is set up on constructor, so the two ops output deterministic sequence
random_
cro
p_op2
=
c_vision
.
RandomHorizontalFlip
(
0.1
)
random_
horizontal_fli
p_op2
=
c_vision
.
RandomHorizontalFlip
(
0.1
)
data2
=
data2
.
map
(
input_columns
=
[
"image"
],
operations
=
random_
cro
p_op2
)
data2
=
data2
.
map
(
input_columns
=
[
"image"
],
operations
=
random_
horizontal_fli
p_op2
)
for
item1
,
item2
in
zip
(
data1
.
create_dict_iterator
(),
data2
.
create_dict_iterator
()):
for
item1
,
item2
in
zip
(
data1
.
create_dict_iterator
(),
data2
.
create_dict_iterator
()):
np
.
testing
.
assert_equal
(
item1
[
"image"
],
item2
[
"image"
])
np
.
testing
.
assert_equal
(
item1
[
"image"
],
item2
[
"image"
])
...
...
tests/ut/python/dataset/test_datasets_cifarop.py
0 → 100644
浏览文件 @
340d98a4
# Copyright 2019 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""
Test Cifar10 and Cifar100 dataset operators
"""
import
os
import
pytest
import
numpy
as
np
import
matplotlib.pyplot
as
plt
import
mindspore.dataset
as
ds
from
mindspore
import
log
as
logger
DATA_DIR_10
=
"../data/dataset/testCifar10Data"
DATA_DIR_100
=
"../data/dataset/testCifar100Data"
def
load_cifar
(
path
,
kind
=
"cifar10"
):
"""
load Cifar10/100 data
"""
raw
=
np
.
empty
(
0
,
dtype
=
np
.
uint8
)
for
file_name
in
os
.
listdir
(
path
):
if
file_name
.
endswith
(
".bin"
):
with
open
(
os
.
path
.
join
(
path
,
file_name
),
mode
=
'rb'
)
as
file
:
raw
=
np
.
append
(
raw
,
np
.
fromfile
(
file
,
dtype
=
np
.
uint8
),
axis
=
0
)
if
kind
==
"cifar10"
:
raw
=
raw
.
reshape
(
-
1
,
3073
)
labels
=
raw
[:,
0
]
images
=
raw
[:,
1
:]
elif
kind
==
"cifar100"
:
raw
=
raw
.
reshape
(
-
1
,
3074
)
labels
=
raw
[:,
:
2
]
images
=
raw
[:,
2
:]
else
:
raise
ValueError
(
"Invalid parameter value"
)
images
=
images
.
reshape
(
-
1
,
3
,
32
,
32
)
images
=
images
.
transpose
(
0
,
2
,
3
,
1
)
return
images
,
labels
def
visualize_dataset
(
images
,
labels
):
"""
Helper function to visualize the dataset samples
"""
num_samples
=
len
(
images
)
for
i
in
range
(
num_samples
):
plt
.
subplot
(
1
,
num_samples
,
i
+
1
)
plt
.
imshow
(
images
[
i
])
plt
.
title
(
labels
[
i
])
plt
.
show
()
### Testcases for Cifar10Dataset Op ###
def
test_cifar10_content_check
():
"""
Validate Cifar10Dataset image readings
"""
logger
.
info
(
"Test Cifar10Dataset Op with content check"
)
data1
=
ds
.
Cifar10Dataset
(
DATA_DIR_10
,
num_samples
=
100
,
shuffle
=
False
)
images
,
labels
=
load_cifar
(
DATA_DIR_10
)
num_iter
=
0
# in this example, each dictionary has keys "image" and "label"
for
i
,
d
in
enumerate
(
data1
.
create_dict_iterator
()):
np
.
testing
.
assert_array_equal
(
d
[
"image"
],
images
[
i
])
np
.
testing
.
assert_array_equal
(
d
[
"label"
],
labels
[
i
])
num_iter
+=
1
assert
num_iter
==
100
def
test_cifar10_basic
():
"""
Validate CIFAR10
"""
logger
.
info
(
"Test Cifar10Dataset Op"
)
# case 1: test num_samples
data1
=
ds
.
Cifar10Dataset
(
DATA_DIR_10
,
num_samples
=
100
)
num_iter1
=
0
for
_
in
data1
.
create_dict_iterator
():
num_iter1
+=
1
assert
num_iter1
==
100
# case 2: test num_parallel_workers
data2
=
ds
.
Cifar10Dataset
(
DATA_DIR_10
,
num_samples
=
50
,
num_parallel_workers
=
1
)
num_iter2
=
0
for
_
in
data2
.
create_dict_iterator
():
num_iter2
+=
1
assert
num_iter2
==
50
# case 3: test repeat
data3
=
ds
.
Cifar10Dataset
(
DATA_DIR_10
,
num_samples
=
100
)
data3
=
data3
.
repeat
(
3
)
num_iter3
=
0
for
_
in
data3
.
create_dict_iterator
():
num_iter3
+=
1
assert
num_iter3
==
300
# case 4: test batch with drop_remainder=False
data4
=
ds
.
Cifar10Dataset
(
DATA_DIR_10
,
num_samples
=
100
)
assert
data4
.
get_dataset_size
()
==
100
assert
data4
.
get_batch_size
()
==
1
data4
=
data4
.
batch
(
batch_size
=
7
)
# drop_remainder is default to be False
assert
data4
.
get_dataset_size
()
==
15
assert
data4
.
get_batch_size
()
==
7
num_iter4
=
0
for
_
in
data4
.
create_dict_iterator
():
num_iter4
+=
1
assert
num_iter4
==
15
# case 5: test batch with drop_remainder=True
data5
=
ds
.
Cifar10Dataset
(
DATA_DIR_10
,
num_samples
=
100
)
assert
data5
.
get_dataset_size
()
==
100
assert
data5
.
get_batch_size
()
==
1
data5
=
data5
.
batch
(
batch_size
=
7
,
drop_remainder
=
True
)
# the rest of incomplete batch will be dropped
assert
data5
.
get_dataset_size
()
==
14
assert
data5
.
get_batch_size
()
==
7
num_iter5
=
0
for
_
in
data5
.
create_dict_iterator
():
num_iter5
+=
1
assert
num_iter5
==
14
def
test_cifar10_pk_sampler
():
"""
Test Cifar10Dataset with PKSampler
"""
logger
.
info
(
"Test Cifar10Dataset Op with PKSampler"
)
golden
=
[
0
,
0
,
0
,
1
,
1
,
1
,
2
,
2
,
2
,
3
,
3
,
3
,
4
,
4
,
4
,
5
,
5
,
5
,
6
,
6
,
6
,
7
,
7
,
7
,
8
,
8
,
8
,
9
,
9
,
9
]
sampler
=
ds
.
PKSampler
(
3
)
data
=
ds
.
Cifar10Dataset
(
DATA_DIR_10
,
sampler
=
sampler
)
num_iter
=
0
label_list
=
[]
for
item
in
data
.
create_dict_iterator
():
label_list
.
append
(
item
[
"label"
])
num_iter
+=
1
np
.
testing
.
assert_array_equal
(
golden
,
label_list
)
assert
num_iter
==
30
def
test_cifar10_sequential_sampler
():
"""
Test Cifar10Dataset with SequentialSampler
"""
logger
.
info
(
"Test Cifar10Dataset Op with SequentialSampler"
)
num_samples
=
30
sampler
=
ds
.
SequentialSampler
(
num_samples
=
num_samples
)
data1
=
ds
.
Cifar10Dataset
(
DATA_DIR_10
,
sampler
=
sampler
)
data2
=
ds
.
Cifar10Dataset
(
DATA_DIR_10
,
shuffle
=
False
,
num_samples
=
num_samples
)
num_iter
=
0
for
item1
,
item2
in
zip
(
data1
.
create_dict_iterator
(),
data2
.
create_dict_iterator
()):
np
.
testing
.
assert_equal
(
item1
[
"label"
],
item2
[
"label"
])
num_iter
+=
1
assert
num_iter
==
num_samples
def
test_cifar10_exception
():
"""
Test error cases for Cifar10Dataset
"""
logger
.
info
(
"Test error cases for Cifar10Dataset"
)
error_msg_1
=
"sampler and shuffle cannot be specified at the same time"
with
pytest
.
raises
(
RuntimeError
,
match
=
error_msg_1
):
ds
.
Cifar10Dataset
(
DATA_DIR_10
,
shuffle
=
False
,
sampler
=
ds
.
PKSampler
(
3
))
error_msg_2
=
"sampler and sharding cannot be specified at the same time"
with
pytest
.
raises
(
RuntimeError
,
match
=
error_msg_2
):
ds
.
Cifar10Dataset
(
DATA_DIR_10
,
sampler
=
ds
.
PKSampler
(
3
),
num_shards
=
2
,
shard_id
=
0
)
error_msg_3
=
"num_shards is specified and currently requires shard_id as well"
with
pytest
.
raises
(
RuntimeError
,
match
=
error_msg_3
):
ds
.
Cifar10Dataset
(
DATA_DIR_10
,
num_shards
=
10
)
error_msg_4
=
"shard_id is specified but num_shards is not"
with
pytest
.
raises
(
RuntimeError
,
match
=
error_msg_4
):
ds
.
Cifar10Dataset
(
DATA_DIR_10
,
shard_id
=
0
)
error_msg_5
=
"Input shard_id is not within the required interval"
with
pytest
.
raises
(
ValueError
,
match
=
error_msg_5
):
ds
.
Cifar10Dataset
(
DATA_DIR_10
,
num_shards
=
2
,
shard_id
=-
1
)
with
pytest
.
raises
(
ValueError
,
match
=
error_msg_5
):
ds
.
Cifar10Dataset
(
DATA_DIR_10
,
num_shards
=
2
,
shard_id
=
5
)
error_msg_6
=
"num_parallel_workers exceeds"
with
pytest
.
raises
(
ValueError
,
match
=
error_msg_6
):
ds
.
Cifar10Dataset
(
DATA_DIR_10
,
shuffle
=
False
,
num_parallel_workers
=
0
)
with
pytest
.
raises
(
ValueError
,
match
=
error_msg_6
):
ds
.
Cifar10Dataset
(
DATA_DIR_10
,
shuffle
=
False
,
num_parallel_workers
=
88
)
def
test_cifar10_visualize
(
plot
=
False
):
"""
Visualize Cifar10Dataset results
"""
logger
.
info
(
"Test Cifar10Dataset visualization"
)
data1
=
ds
.
Cifar10Dataset
(
DATA_DIR_10
,
num_samples
=
10
,
shuffle
=
False
)
num_iter
=
0
image_list
,
label_list
=
[],
[]
for
item
in
data1
.
create_dict_iterator
():
image
=
item
[
"image"
]
label
=
item
[
"label"
]
image_list
.
append
(
image
)
label_list
.
append
(
"label {}"
.
format
(
label
))
assert
isinstance
(
image
,
np
.
ndarray
)
assert
image
.
shape
==
(
32
,
32
,
3
)
assert
image
.
dtype
==
np
.
uint8
assert
label
.
dtype
==
np
.
uint32
num_iter
+=
1
assert
num_iter
==
10
if
plot
:
visualize_dataset
(
image_list
,
label_list
)
### Testcases for Cifar100Dataset Op ###
def
test_cifar100_content_check
():
"""
Validate Cifar100Dataset image readings
"""
logger
.
info
(
"Test Cifar100Dataset with content check"
)
data1
=
ds
.
Cifar100Dataset
(
DATA_DIR_100
,
num_samples
=
100
,
shuffle
=
False
)
images
,
labels
=
load_cifar
(
DATA_DIR_100
,
kind
=
"cifar100"
)
num_iter
=
0
# in this example, each dictionary has keys "image", "coarse_label" and "fine_image"
for
i
,
d
in
enumerate
(
data1
.
create_dict_iterator
()):
np
.
testing
.
assert_array_equal
(
d
[
"image"
],
images
[
i
])
np
.
testing
.
assert_array_equal
(
d
[
"coarse_label"
],
labels
[
i
][
0
])
np
.
testing
.
assert_array_equal
(
d
[
"fine_label"
],
labels
[
i
][
1
])
num_iter
+=
1
assert
num_iter
==
100
def
test_cifar100_basic
():
"""
Test Cifar100Dataset
"""
logger
.
info
(
"Test Cifar100Dataset"
)
# case 1: test num_samples
data1
=
ds
.
Cifar100Dataset
(
DATA_DIR_100
,
num_samples
=
100
)
num_iter1
=
0
for
_
in
data1
.
create_dict_iterator
():
num_iter1
+=
1
assert
num_iter1
==
100
# case 2: test repeat
data1
=
data1
.
repeat
(
2
)
num_iter2
=
0
for
_
in
data1
.
create_dict_iterator
():
num_iter2
+=
1
assert
num_iter2
==
200
# case 3: test num_parallel_workers
data2
=
ds
.
Cifar100Dataset
(
DATA_DIR_100
,
num_samples
=
100
,
num_parallel_workers
=
1
)
num_iter3
=
0
for
_
in
data2
.
create_dict_iterator
():
num_iter3
+=
1
assert
num_iter3
==
100
# case 4: test batch with drop_remainder=False
data3
=
ds
.
Cifar100Dataset
(
DATA_DIR_100
,
num_samples
=
100
)
assert
data3
.
get_dataset_size
()
==
100
assert
data3
.
get_batch_size
()
==
1
data3
=
data3
.
batch
(
batch_size
=
3
)
assert
data3
.
get_dataset_size
()
==
34
assert
data3
.
get_batch_size
()
==
3
num_iter4
=
0
for
_
in
data3
.
create_dict_iterator
():
num_iter4
+=
1
assert
num_iter4
==
34
# case 4: test batch with drop_remainder=True
data4
=
ds
.
Cifar100Dataset
(
DATA_DIR_100
,
num_samples
=
100
)
data4
=
data4
.
batch
(
batch_size
=
3
,
drop_remainder
=
True
)
assert
data4
.
get_dataset_size
()
==
33
assert
data4
.
get_batch_size
()
==
3
num_iter5
=
0
for
_
in
data4
.
create_dict_iterator
():
num_iter5
+=
1
assert
num_iter5
==
33
def
test_cifar100_pk_sampler
():
"""
Test Cifar100Dataset with PKSampler
"""
logger
.
info
(
"Test Cifar100Dataset with PKSampler"
)
golden
=
[
i
for
i
in
range
(
20
)]
sampler
=
ds
.
PKSampler
(
1
)
data
=
ds
.
Cifar100Dataset
(
DATA_DIR_100
,
sampler
=
sampler
)
num_iter
=
0
label_list
=
[]
for
item
in
data
.
create_dict_iterator
():
label_list
.
append
(
item
[
"coarse_label"
])
num_iter
+=
1
np
.
testing
.
assert_array_equal
(
golden
,
label_list
)
assert
num_iter
==
20
def
test_cifar100_exception
():
"""
Test error cases for Cifar100Dataset
"""
logger
.
info
(
"Test error cases for Cifar100Dataset"
)
error_msg_1
=
"sampler and shuffle cannot be specified at the same time"
with
pytest
.
raises
(
RuntimeError
,
match
=
error_msg_1
):
ds
.
Cifar100Dataset
(
DATA_DIR_100
,
shuffle
=
False
,
sampler
=
ds
.
PKSampler
(
3
))
error_msg_2
=
"sampler and sharding cannot be specified at the same time"
with
pytest
.
raises
(
RuntimeError
,
match
=
error_msg_2
):
ds
.
Cifar100Dataset
(
DATA_DIR_100
,
sampler
=
ds
.
PKSampler
(
3
),
num_shards
=
2
,
shard_id
=
0
)
error_msg_3
=
"num_shards is specified and currently requires shard_id as well"
with
pytest
.
raises
(
RuntimeError
,
match
=
error_msg_3
):
ds
.
Cifar100Dataset
(
DATA_DIR_100
,
num_shards
=
10
)
error_msg_4
=
"shard_id is specified but num_shards is not"
with
pytest
.
raises
(
RuntimeError
,
match
=
error_msg_4
):
ds
.
Cifar100Dataset
(
DATA_DIR_100
,
shard_id
=
0
)
error_msg_5
=
"Input shard_id is not within the required interval"
with
pytest
.
raises
(
ValueError
,
match
=
error_msg_5
):
ds
.
Cifar100Dataset
(
DATA_DIR_100
,
num_shards
=
2
,
shard_id
=-
1
)
with
pytest
.
raises
(
ValueError
,
match
=
error_msg_5
):
ds
.
Cifar10Dataset
(
DATA_DIR_100
,
num_shards
=
2
,
shard_id
=
5
)
error_msg_6
=
"num_parallel_workers exceeds"
with
pytest
.
raises
(
ValueError
,
match
=
error_msg_6
):
ds
.
Cifar100Dataset
(
DATA_DIR_100
,
shuffle
=
False
,
num_parallel_workers
=
0
)
with
pytest
.
raises
(
ValueError
,
match
=
error_msg_6
):
ds
.
Cifar100Dataset
(
DATA_DIR_100
,
shuffle
=
False
,
num_parallel_workers
=
88
)
def
test_cifar100_visualize
(
plot
=
False
):
"""
Visualize Cifar100Dataset results
"""
logger
.
info
(
"Test Cifar100Dataset visualization"
)
data1
=
ds
.
Cifar100Dataset
(
DATA_DIR_100
,
num_samples
=
10
,
shuffle
=
False
)
num_iter
=
0
image_list
,
label_list
=
[],
[]
for
item
in
data1
.
create_dict_iterator
():
image
=
item
[
"image"
]
coarse_label
=
item
[
"coarse_label"
]
fine_label
=
item
[
"fine_label"
]
image_list
.
append
(
image
)
label_list
.
append
(
"coarse_label {}
\n
fine_label {}"
.
format
(
coarse_label
,
fine_label
))
assert
isinstance
(
image
,
np
.
ndarray
)
assert
image
.
shape
==
(
32
,
32
,
3
)
assert
image
.
dtype
==
np
.
uint8
assert
coarse_label
.
dtype
==
np
.
uint32
assert
fine_label
.
dtype
==
np
.
uint32
num_iter
+=
1
assert
num_iter
==
10
if
plot
:
visualize_dataset
(
image_list
,
label_list
)
if
__name__
==
'__main__'
:
test_cifar10_content_check
()
test_cifar10_basic
()
test_cifar10_pk_sampler
()
test_cifar10_sequential_sampler
()
test_cifar10_exception
()
test_cifar10_visualize
(
plot
=
False
)
test_cifar100_content_check
()
test_cifar100_basic
()
test_cifar100_pk_sampler
()
test_cifar100_exception
()
test_cifar100_visualize
(
plot
=
False
)
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录