未验证 提交 048056f5 编写于 作者: S smallv0221 提交者: GitHub

fix dataset bug (#5038)

* update lrscheduler

* minor fix

* add pre-commit

* minor fix

* Add __len__ to squad dataset

* minor fix

* Add dureader robust prototype

* dataset implement

* minor fix

* fix var name

* add dureader-yesno train script and dataset

* add readme and fix md5sum

* integrete dureader datasets

* change var names: segment to mode, root to data_file

* minor fix

* update var name

* Fix api bugs

* add dataset readme

* add express ner

* update readme format

* fix format bug

* change readme path

* fix format bug

* fix dataset bug
上级 11ee20fb
...@@ -91,7 +91,7 @@ class GlueCoLA(_GlueDataset): ...@@ -91,7 +91,7 @@ class GlueCoLA(_GlueDataset):
Each example is a sequence of words annotated with whether it is a Each example is a sequence of words annotated with whether it is a
grammatical English sentence. From https://gluebenchmark.com/tasks grammatical English sentence. From https://gluebenchmark.com/tasks
Args: Args:
segment ('train'|'dev'|'test'): Dataset segment. Default: 'train'. mode ('train'|'dev'|'test'): Dataset segment. Default: 'train'.
root (str): Path to temp folder for storing data. root (str): Path to temp folder for storing data.
return_all_fields (bool): Return all fields available in the dataset. return_all_fields (bool): Return all fields available in the dataset.
Default: False. Default: False.
...@@ -135,7 +135,7 @@ class GlueSST2(_GlueDataset): ...@@ -135,7 +135,7 @@ class GlueSST2(_GlueDataset):
from movie reviews and human annotations of their sentiment. from movie reviews and human annotations of their sentiment.
From https://gluebenchmark.com/tasks From https://gluebenchmark.com/tasks
Args: Args:
segment ('train'|'dev'|'test'): Dataset segment. Default: 'train'. mode ('train'|'dev'|'test'): Dataset segment. Default: 'train'.
root (str): Path to temp folder for storing data. root (str): Path to temp folder for storing data.
return_all_fields (bool): Return all fields available in the dataset. return_all_fields (bool): Return all fields available in the dataset.
Default: False. Default: False.
...@@ -181,7 +181,7 @@ class GlueMRPC(_GlueDataset): ...@@ -181,7 +181,7 @@ class GlueMRPC(_GlueDataset):
From https://gluebenchmark.com/tasks From https://gluebenchmark.com/tasks
Args: Args:
root (str): Path to temp folder for storing data. root (str): Path to temp folder for storing data.
segment ('train'|'dev'|'test'): Dataset segment. Default: 'train'. mode ('train'|'dev'|'test'): Dataset segment. Default: 'train'.
Example: Example:
.. code-block:: python .. code-block:: python
from paddle.incubate.hapi.text.glue import GlueMRPC from paddle.incubate.hapi.text.glue import GlueMRPC
...@@ -234,7 +234,7 @@ class GlueMRPC(_GlueDataset): ...@@ -234,7 +234,7 @@ class GlueMRPC(_GlueDataset):
warnings.warn( warnings.warn(
'md5 check failed for {}, download {} data to {}'.format( 'md5 check failed for {}, download {} data to {}'.format(
filename, self.__class__.__name__, default_root)) filename, self.__class__.__name__, default_root))
if segment in ('train', 'dev'): if mode in ('train', 'dev'):
dev_id_path = get_path_from_url( dev_id_path = get_path_from_url(
self.DEV_ID_URL, self.DEV_ID_URL,
os.path.join(default_root, 'MRPC'), self.DEV_ID_MD5) os.path.join(default_root, 'MRPC'), self.DEV_ID_MD5)
...@@ -297,7 +297,7 @@ class GlueSTSB(_GlueDataset): ...@@ -297,7 +297,7 @@ class GlueSTSB(_GlueDataset):
with a similarity score from 1 to 5. with a similarity score from 1 to 5.
From https://gluebenchmark.com/tasks From https://gluebenchmark.com/tasks
Args: Args:
segment ('train'|'dev'|'test'): Dataset segment. Default: 'train'. mode ('train'|'dev'|'test'): Dataset mode. Default: 'train'.
root (str): Path to temp folder for storing data. root (str): Path to temp folder for storing data.
return_all_fields (bool): Return all fields available in the dataset. Default: False. return_all_fields (bool): Return all fields available in the dataset. Default: False.
Example: Example:
...@@ -340,7 +340,7 @@ class GlueQQP(_GlueDataset): ...@@ -340,7 +340,7 @@ class GlueQQP(_GlueDataset):
community question-answering website Quora. community question-answering website Quora.
From https://gluebenchmark.com/tasks From https://gluebenchmark.com/tasks
Args: Args:
segment ({'train', 'dev', 'test'}): Dataset segment. Default: 'train'. mode ({'train', 'dev', 'test'}): Dataset mode. Default: 'train'.
root (str): Path to temp folder for storing data. root (str): Path to temp folder for storing data.
return_all_fields (bool): Return all fields available in the dataset. return_all_fields (bool): Return all fields available in the dataset.
Default: False. Default: False.
...@@ -380,7 +380,7 @@ class GlueQQP(_GlueDataset): ...@@ -380,7 +380,7 @@ class GlueQQP(_GlueDataset):
def __init__(self, mode='train', root=None, return_all_fields=False): def __init__(self, mode='train', root=None, return_all_fields=False):
# QQP may include broken samples # QQP may include broken samples
super(GlueQQP, self).__init__( super(GlueQQP, self).__init__(
segment, root, return_all_fields, allow_missing=True) mode, root, return_all_fields, allow_missing=True)
def get_labels(self): def get_labels(self):
""" """
...@@ -396,7 +396,7 @@ class GlueMNLI(_GlueDataset): ...@@ -396,7 +396,7 @@ class GlueMNLI(_GlueDataset):
annotations. annotations.
From https://gluebenchmark.com/tasks From https://gluebenchmark.com/tasks
Args: Args:
segment ('train'|'dev_matched'|'dev_mismatched'|'test_matched'| mode ('train'|'dev_matched'|'dev_mismatched'|'test_matched'|
'test_mismatched'): Dataset segment. Default: ‘train’. 'test_mismatched'): Dataset segment. Default: ‘train’.
root (str, default '$MXNET_HOME/datasets/glue_mnli'): Path to temp root (str, default '$MXNET_HOME/datasets/glue_mnli'): Path to temp
folder for storing data. folder for storing data.
...@@ -452,7 +452,7 @@ class GlueQNLI(_GlueDataset): ...@@ -452,7 +452,7 @@ class GlueQNLI(_GlueDataset):
Answering Dataset (Rajpurkar et al. 2016). Answering Dataset (Rajpurkar et al. 2016).
From https://gluebenchmark.com/tasks From https://gluebenchmark.com/tasks
Args: Args:
segment ('train'|'dev'|'test'): Dataset segment. Dataset segment. mode ('train'|'dev'|'test'): Dataset segment.
Default: 'train'. Default: 'train'.
root (str): Path to temp folder for storing data. root (str): Path to temp folder for storing data.
return_all_fields (bool): Return all fields available in the dataset. return_all_fields (bool): Return all fields available in the dataset.
...@@ -506,7 +506,7 @@ class GlueRTE(_GlueDataset): ...@@ -506,7 +506,7 @@ class GlueRTE(_GlueDataset):
annual textual entailment challenges (RTE1, RTE2, RTE3, and RTE5). annual textual entailment challenges (RTE1, RTE2, RTE3, and RTE5).
From https://gluebenchmark.com/tasks From https://gluebenchmark.com/tasks
Args: Args:
segment ('train'|'dev'|'test'): Dataset segment. Default: 'train'. mode ('train'|'dev'|'test'): Dataset segment. Default: 'train'.
root (str): Path to temp folder for storing data. root (str): Path to temp folder for storing data.
return_all_fields (bool): Return all fields available in the dataset. return_all_fields (bool): Return all fields available in the dataset.
Default: False. Default: False.
...@@ -556,7 +556,7 @@ class GlueWNLI(_GlueDataset): ...@@ -556,7 +556,7 @@ class GlueWNLI(_GlueDataset):
Challenge (Levesque et al., 2011). Challenge (Levesque et al., 2011).
From https://gluebenchmark.com/tasks From https://gluebenchmark.com/tasks
Args: Args:
segment ('train'|'dev'|'test'): Dataset segment. Default: 'train'. mode ('train'|'dev'|'test'): Dataset segment. Default: 'train'.
root (str): Path to temp folder for storing data. root (str): Path to temp folder for storing data.
return_all_fields (bool): Return all fields available in the dataset. return_all_fields (bool): Return all fields available in the dataset.
Default: False. Default: False.
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册