Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
PALM
提交
7a7e7551
P
PALM
项目概览
PaddlePaddle
/
PALM
通知
5
Star
3
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
10
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
PALM
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
10
Issue
10
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
7a7e7551
编写于
11月 26, 2019
作者:
X
xixiaoyao
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
add remove_noanswer
上级
fc69141e
变更
1
隐藏空白更改
内联
并排
Showing
1 changed file
with
7 addition
and
3 deletion
+7
-3
paddlepalm/reader/utils/reader4ernie.py
paddlepalm/reader/utils/reader4ernie.py
+7
-3
未找到文件。
paddlepalm/reader/utils/reader4ernie.py
浏览文件 @
7a7e7551
...
@@ -639,7 +639,8 @@ class MRCReader(BaseReader):
...
@@ -639,7 +639,8 @@ class MRCReader(BaseReader):
for_cn
=
True
,
for_cn
=
True
,
task_id
=
0
,
task_id
=
0
,
doc_stride
=
128
,
doc_stride
=
128
,
max_query_length
=
64
):
max_query_length
=
64
,
remove_noanswer
=
True
):
self
.
max_seq_len
=
max_seq_len
self
.
max_seq_len
=
max_seq_len
self
.
tokenizer
=
tokenization
.
FullTokenizer
(
self
.
tokenizer
=
tokenization
.
FullTokenizer
(
vocab_file
=
vocab_path
,
do_lower_case
=
do_lower_case
)
vocab_file
=
vocab_path
,
do_lower_case
=
do_lower_case
)
...
@@ -654,6 +655,7 @@ class MRCReader(BaseReader):
...
@@ -654,6 +655,7 @@ class MRCReader(BaseReader):
self
.
max_query_length
=
max_query_length
self
.
max_query_length
=
max_query_length
self
.
examples
=
{}
self
.
examples
=
{}
self
.
features
=
{}
self
.
features
=
{}
self
.
remove_noanswer
=
remove_noanswer
if
random_seed
is
not
None
:
if
random_seed
is
not
None
:
np
.
random
.
seed
(
random_seed
)
np
.
random
.
seed
(
random_seed
)
...
@@ -758,7 +760,7 @@ class MRCReader(BaseReader):
...
@@ -758,7 +760,7 @@ class MRCReader(BaseReader):
return
cur_span_index
==
best_span_index
return
cur_span_index
==
best_span_index
def
_convert_example_to_feature
(
self
,
examples
,
max_seq_length
,
tokenizer
,
def
_convert_example_to_feature
(
self
,
examples
,
max_seq_length
,
tokenizer
,
is_training
):
is_training
,
remove_noanswer
=
True
):
features
=
[]
features
=
[]
unique_id
=
1000000000
unique_id
=
1000000000
...
@@ -845,6 +847,8 @@ class MRCReader(BaseReader):
...
@@ -845,6 +847,8 @@ class MRCReader(BaseReader):
if
out_of_span
:
if
out_of_span
:
start_position
=
0
start_position
=
0
end_position
=
0
end_position
=
0
if
remove_noanswer
:
continue
else
:
else
:
doc_offset
=
len
(
query_tokens
)
+
2
doc_offset
=
len
(
query_tokens
)
+
2
start_position
=
tok_start_position
-
doc_start
+
doc_offset
start_position
=
tok_start_position
-
doc_start
+
doc_offset
...
@@ -958,7 +962,7 @@ class MRCReader(BaseReader):
...
@@ -958,7 +962,7 @@ class MRCReader(BaseReader):
if
not
examples
:
if
not
examples
:
examples
=
self
.
_read_json
(
input_file
,
phase
==
"train"
)
examples
=
self
.
_read_json
(
input_file
,
phase
==
"train"
)
features
=
self
.
_convert_example_to_feature
(
features
=
self
.
_convert_example_to_feature
(
examples
,
self
.
max_seq_len
,
self
.
tokenizer
,
phase
==
"train"
)
examples
,
self
.
max_seq_len
,
self
.
tokenizer
,
phase
==
"train"
,
remove_noanswer
=
self
.
remove_noanswer
)
self
.
examples
[
phase
]
=
examples
self
.
examples
[
phase
]
=
examples
self
.
features
[
phase
]
=
features
self
.
features
[
phase
]
=
features
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录