Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
机器未来
Paddle
提交
01fda934
P
Paddle
项目概览
机器未来
/
Paddle
与 Fork 源项目一致
Fork自
PaddlePaddle / Paddle
通知
1
Star
1
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
1
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
Paddle
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
1
Issue
1
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
未验证
提交
01fda934
编写于
9月 26, 2018
作者:
Q
Qiyang Min
提交者:
GitHub
9月 26, 2018
浏览文件
操作
浏览文件
下载
差异文件
Merge pull request #13523 from velconia/fix_rnn_search
Fix reader of rnn_search in python3
上级
1d91a49d
06289aa2
变更
2
隐藏空白更改
内联
并排
Showing
2 changed file
with
6 addition
and
3 deletion
+6
-3
python/paddle/dataset/wmt14.py
python/paddle/dataset/wmt14.py
+2
-1
python/paddle/dataset/wmt16.py
python/paddle/dataset/wmt16.py
+4
-2
未找到文件。
python/paddle/dataset/wmt14.py
浏览文件 @
01fda934
...
@@ -89,7 +89,8 @@ def reader_creator(tar_file, file_name, dict_size):
...
@@ -89,7 +89,8 @@ def reader_creator(tar_file, file_name, dict_size):
]
]
for
name
in
names
:
for
name
in
names
:
for
line
in
f
.
extractfile
(
name
):
for
line
in
f
.
extractfile
(
name
):
line_split
=
line
.
strip
().
split
(
six
.
b
(
'
\t
'
))
line
=
cpt
.
to_text
(
line
)
line_split
=
line
.
strip
().
split
(
'
\t
'
)
if
len
(
line_split
)
!=
2
:
if
len
(
line_split
)
!=
2
:
continue
continue
src_seq
=
line_split
[
0
]
# one source sequence
src_seq
=
line_split
[
0
]
# one source sequence
...
...
python/paddle/dataset/wmt16.py
浏览文件 @
01fda934
...
@@ -64,7 +64,8 @@ def __build_dict(tar_file, dict_size, save_path, lang):
...
@@ -64,7 +64,8 @@ def __build_dict(tar_file, dict_size, save_path, lang):
word_dict
=
defaultdict
(
int
)
word_dict
=
defaultdict
(
int
)
with
tarfile
.
open
(
tar_file
,
mode
=
"r"
)
as
f
:
with
tarfile
.
open
(
tar_file
,
mode
=
"r"
)
as
f
:
for
line
in
f
.
extractfile
(
"wmt16/train"
):
for
line
in
f
.
extractfile
(
"wmt16/train"
):
line_split
=
line
.
strip
().
split
(
six
.
b
(
"
\t
"
))
line
=
cpt
.
to_text
(
line
)
line_split
=
line
.
strip
().
split
(
"
\t
"
)
if
len
(
line_split
)
!=
2
:
continue
if
len
(
line_split
)
!=
2
:
continue
sen
=
line_split
[
0
]
if
lang
==
"en"
else
line_split
[
1
]
sen
=
line_split
[
0
]
if
lang
==
"en"
else
line_split
[
1
]
for
w
in
sen
.
split
():
for
w
in
sen
.
split
():
...
@@ -123,7 +124,8 @@ def reader_creator(tar_file, file_name, src_dict_size, trg_dict_size, src_lang):
...
@@ -123,7 +124,8 @@ def reader_creator(tar_file, file_name, src_dict_size, trg_dict_size, src_lang):
with
tarfile
.
open
(
tar_file
,
mode
=
"r"
)
as
f
:
with
tarfile
.
open
(
tar_file
,
mode
=
"r"
)
as
f
:
for
line
in
f
.
extractfile
(
file_name
):
for
line
in
f
.
extractfile
(
file_name
):
line_split
=
line
.
strip
().
split
(
six
.
b
(
"
\t
"
))
line
=
cpt
.
to_text
(
line
)
line_split
=
line
.
strip
().
split
(
"
\t
"
)
if
len
(
line_split
)
!=
2
:
if
len
(
line_split
)
!=
2
:
continue
continue
src_words
=
line_split
[
src_col
].
split
()
src_words
=
line_split
[
src_col
].
split
()
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录