Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
magicwindyyd
mindspore
提交
898b2fde
M
mindspore
项目概览
magicwindyyd
/
mindspore
与 Fork 源项目一致
Fork自
MindSpore / mindspore
通知
1
Star
1
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
M
mindspore
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
898b2fde
编写于
6月 17, 2020
作者:
M
mindspore-ci-bot
提交者:
Gitee
6月 17, 2020
浏览文件
操作
浏览文件
下载
差异文件
!2187 Fix comment display issues in BuildVocabDataset
Merge pull request !2187 from ZiruiWu/vocab_rework
上级
3ccbafe7
27948836
变更
3
隐藏空白更改
内联
并排
Showing
3 changed file
with
11 addition
and
9 deletion
+11
-9
mindspore/ccsrc/dataset/text/kernels/ngram_op.cc
mindspore/ccsrc/dataset/text/kernels/ngram_op.cc
+2
-2
mindspore/dataset/engine/datasets.py
mindspore/dataset/engine/datasets.py
+6
-6
tests/ut/python/dataset/test_ngram_op.py
tests/ut/python/dataset/test_ngram_op.py
+3
-1
未找到文件。
mindspore/ccsrc/dataset/text/kernels/ngram_op.cc
浏览文件 @
898b2fde
...
...
@@ -56,10 +56,10 @@ Status NgramOp::Compute(const std::shared_ptr<Tensor> &input, std::shared_ptr<Te
CHECK_FAIL_RETURN_UNEXPECTED
(
n
>
0
,
"n gram needs to be a positive number.
\n
"
);
int32_t
start_ind
=
l_len_
-
std
::
min
(
l_len_
,
n
-
1
);
int32_t
end_ind
=
offsets
.
size
()
-
r_len_
+
std
::
min
(
r_len_
,
n
-
1
);
if
(
end_ind
-
start_ind
<
n
)
{
if
(
end_ind
-
start_ind
<
=
n
)
{
res
.
emplace_back
(
std
::
string
());
// push back empty string
}
else
{
if
(
end_ind
-
n
<
0
)
RETURN_STATUS_UNEXPECTED
(
"loop condition error!
"
);
CHECK_FAIL_RETURN_UNEXPECTED
(
end_ind
-
n
>=
0
,
"Incorrect loop condition
"
);
for
(
int
i
=
start_ind
;
i
<
end_ind
-
n
;
i
++
)
{
res
.
emplace_back
(
str_buffer
.
substr
(
offsets
[
i
],
offsets
[
i
+
n
]
-
offsets
[
i
]
-
separator_
.
size
()));
...
...
mindspore/dataset/engine/datasets.py
浏览文件 @
898b2fde
...
...
@@ -4893,15 +4893,15 @@ class BuildVocabDataset(DatasetOp):
text.Vocab.from_dataset()
Args:
vocab(Vocab): vocab object
columns(str or list, optional): column names to get words from. It can be a list of column names
.
(Default is None where all columns will be used. If any column isn't string type, will return error)
vocab(Vocab): vocab object
.
columns(str or list, optional): column names to get words from. It can be a list of column names
(Default is
None, all columns are used, return error if any column isn't string).
freq_range(tuple, optional): A tuple of integers (min_frequency, max_frequency). Words within the frequency
range would be kept. 0 <= min_frequency <= max_frequency <= total_words. min_frequency/max_frequency
can be None, which corresponds to 0/total_words separately (default is None, all words are included)
can be None, which corresponds to 0/total_words separately (default is None, all words are included)
.
top_k(int, optional): top_k > 0. Number of words to be built into vocab. top_k most frequent words are
taken.
top_k is taken after freq_range. If not enough top_k, all words will be taken.
(default is None
all words are included)
taken.
The top_k is taken after freq_range. If not enough top_k, all words will be taken
(default is None
all words are included)
.
Returns:
BuildVocabDataset
...
...
tests/ut/python/dataset/test_ngram_op.py
浏览文件 @
898b2fde
...
...
@@ -51,7 +51,7 @@ def test_simple_ngram():
""" test simple gram with only one n value"""
plates_mottos
=
[
"Friendly Manitoba"
,
"Yours to Discover"
,
"Land of Living Skies"
,
"Birthplace of the Confederation"
]
n_gram_mottos
=
[[]]
n_gram_mottos
=
[[
""
]]
n_gram_mottos
.
append
([
"Yours to Discover"
])
n_gram_mottos
.
append
([
'Land of Living'
,
'of Living Skies'
])
n_gram_mottos
.
append
([
'Birthplace of the'
,
'of the Confederation'
])
...
...
@@ -81,6 +81,8 @@ def test_corner_cases():
for
data
in
dataset
.
create_dict_iterator
():
assert
[
d
.
decode
(
"utf8"
)
for
d
in
data
[
"text"
]]
==
output_line
,
output_line
# test tensor length smaller than n
test_config
(
"Lone Star"
,
[
"Lone Star"
,
""
,
""
,
""
],
[
2
,
3
,
4
,
5
])
# test empty separator
test_config
(
"Beautiful British Columbia"
,
[
'BeautifulBritish'
,
'BritishColumbia'
],
2
,
sep
=
""
)
# test separator with longer length
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录