Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
PaddleHub
提交
ff3bc5b8
P
PaddleHub
项目概览
PaddlePaddle
/
PaddleHub
大约 1 年 前同步成功
通知
282
Star
12117
Fork
2091
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
200
列表
看板
标记
里程碑
合并请求
4
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
PaddleHub
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
200
Issue
200
列表
看板
标记
里程碑
合并请求
4
合并请求
4
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
ff3bc5b8
编写于
4月 11, 2019
作者:
Z
Zeyu Chen
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
fix typo in ernie sequence labeling task
上级
6985001a
变更
4
显示空白变更内容
内联
并排
Showing
4 changed file
with
23 addition
and
8 deletion
+23
-8
demo/ernie-seq-labeling/sequence_labeling.py
demo/ernie-seq-labeling/sequence_labeling.py
+2
-1
paddlehub/dataset/chnsenticorp.py
paddlehub/dataset/chnsenticorp.py
+0
-1
paddlehub/dataset/msra_ner.py
paddlehub/dataset/msra_ner.py
+15
-6
paddlehub/dataset/nlpcc_dbqa.py
paddlehub/dataset/nlpcc_dbqa.py
+6
-0
未找到文件。
demo/ernie-seq-labeling/sequence_labeling.py
浏览文件 @
ff3bc5b8
...
...
@@ -46,7 +46,7 @@ if __name__ == '__main__':
trainable
=
True
,
max_seq_len
=
args
.
max_seq_len
)
# Step2: Download dataset and use SequenceLabelReader to read dataset
dataset
=
hub
.
dataset
.
MSRA_NER
()
,
dataset
=
hub
.
dataset
.
MSRA_NER
()
reader
=
hub
.
reader
.
SequenceLabelReader
(
dataset
=
dataset
,
vocab_path
=
module
.
get_vocab_path
(),
...
...
@@ -91,6 +91,7 @@ if __name__ == '__main__':
use_cuda
=
True
,
num_epoch
=
args
.
num_epoch
,
batch_size
=
args
.
batch_size
,
checkpoint_dir
=
args
.
checkpoint_dir
,
strategy
=
strategy
)
# Finetune and evaluate model by PaddleHub's API
# will finish training, evaluation, testing, save model automatically
...
...
paddlehub/dataset/chnsenticorp.py
浏览文件 @
ff3bc5b8
...
...
@@ -68,7 +68,6 @@ class ChnSentiCorp(HubDataset):
return
self
.
test_examples
def
get_labels
(
self
):
"""See base class."""
return
[
"0"
,
"1"
]
def
_read_tsv
(
self
,
input_file
,
quotechar
=
None
):
...
...
paddlehub/dataset/msra_ner.py
浏览文件 @
ff3bc5b8
...
...
@@ -21,6 +21,7 @@ import csv
import
json
from
collections
import
namedtuple
from
paddlehub.dataset
import
InputExample
,
HubDataset
from
paddlehub.common.downloader
import
default_downloader
from
paddlehub.common.dir
import
DATA_HOME
from
paddlehub.common.logger
import
logger
...
...
@@ -28,7 +29,14 @@ from paddlehub.common.logger import logger
DATA_URL
=
"https://paddlehub-dataset.bj.bcebos.com/msra_ner.tar.gz"
class
MSRA_NER
(
object
):
class
MSRA_NER
(
HubDataset
):
"""
A set of manually annotated Chinese word-segmentation data and
specifications for training and testing a Chinese word-segmentation system
for research purposes. For more information please refer to
https://www.microsoft.com/en-us/download/details.aspx?id=52531
"""
def
__init__
(
self
):
self
.
dataset_dir
=
os
.
path
.
join
(
DATA_HOME
,
"msra_ner"
)
if
not
os
.
path
.
exists
(
self
.
dataset_dir
):
...
...
@@ -78,12 +86,13 @@ class MSRA_NER(object):
"""Reads a tab separated value file."""
with
open
(
input_file
,
"r"
)
as
f
:
reader
=
csv
.
reader
(
f
,
delimiter
=
"
\t
"
,
quotechar
=
quotechar
)
headers
=
next
(
reader
)
Example
=
namedtuple
(
'Example'
,
headers
)
examples
=
[]
seq_id
=
0
header
=
next
(
reader
)
# skip header
for
line
in
reader
:
example
=
Example
(
*
line
)
example
=
InputExample
(
guid
=
seq_id
,
label
=
line
[
1
],
text_a
=
line
[
0
])
seq_id
+=
1
examples
.
append
(
example
)
return
examples
...
...
@@ -92,4 +101,4 @@ class MSRA_NER(object):
if
__name__
==
"__main__"
:
ds
=
MSRA_NER
()
for
e
in
ds
.
get_train_examples
():
print
(
e
)
print
(
"{}
\t
{}
\t
{}
\t
{}"
.
format
(
e
.
guid
,
e
.
text_a
,
e
.
text_b
,
e
.
label
)
)
paddlehub/dataset/nlpcc_dbqa.py
浏览文件 @
ff3bc5b8
...
...
@@ -29,6 +29,12 @@ DATA_URL = "https://paddlehub-dataset.bj.bcebos.com/nlpcc-dbqa.tar.gz"
class
NLPCC_DBQA
(
HubDataset
):
"""
Please refer to
http://tcci.ccf.org.cn/conference/2017/dldoc/taskgline05.pdf
for more information
"""
def
__init__
(
self
):
self
.
dataset_dir
=
os
.
path
.
join
(
DATA_HOME
,
"nlpcc-dbqa"
)
if
not
os
.
path
.
exists
(
self
.
dataset_dir
):
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录