未验证 提交 dce498ce 编写于 作者: L liu zhengxi 提交者: GitHub

alter the default value for vocab (#5062)

上级 448ef165
...@@ -36,14 +36,14 @@ class Vocab(object): ...@@ -36,14 +36,14 @@ class Vocab(object):
between tokens and indices to be used. If provided, adjust the tokens between tokens and indices to be used. If provided, adjust the tokens
and indices mapping according to it. If None, counter must be provided. and indices mapping according to it. If None, counter must be provided.
Default: None. Default: None.
unk_token (str): special token for unknow token. If no need, it also unk_token (str): special token for unknow token '<unk>'. If no need, it also
could be None. Default: '<unk>'. could be None. Default: None.
pad_token (str): special token for padding token. If no need, it also pad_token (str): special token for padding token '<pad>'. If no need, it also
could be None. Default: '<pad>'. could be None. Default: None.
bos_token (str): special token for bos token. If no need, it also bos_token (str): special token for bos token '<bos>'. If no need, it also
could be None. Default: <bos>'. could be None. Default: None.
eos_token (str): special token for eos token. If no need, it also eos_token (str): special token for eos token '<eos>'. If no need, it also
could be None. Default: '<eos>'. could be None. Default: None.
**kwargs (dict): Keyword arguments ending with `_token`. It can be used **kwargs (dict): Keyword arguments ending with `_token`. It can be used
to specify further special tokens that will be exposed as attribute to specify further special tokens that will be exposed as attribute
of the vocabulary and associated with an index. of the vocabulary and associated with an index.
...@@ -54,10 +54,10 @@ class Vocab(object): ...@@ -54,10 +54,10 @@ class Vocab(object):
max_size=None, max_size=None,
min_freq=1, min_freq=1,
token_to_idx=None, token_to_idx=None,
unk_token='<unk>', unk_token=None,
pad_token='<pad>', pad_token=None,
bos_token='<bos>', bos_token=None,
eos_token='<eos>', eos_token=None,
**kwargs): **kwargs):
# Handle special tokens # Handle special tokens
combs = (('unk_token', unk_token), ('pad_token', pad_token), combs = (('unk_token', unk_token), ('pad_token', pad_token),
...@@ -317,10 +317,10 @@ class Vocab(object): ...@@ -317,10 +317,10 @@ class Vocab(object):
max_size=None, max_size=None,
min_freq=1, min_freq=1,
token_to_idx=None, token_to_idx=None,
unk_token='<unk>', unk_token=None,
pad_token='<pad>', pad_token=None,
bos_token='<bos>', bos_token=None,
eos_token='<eos>', eos_token=None,
**kwargs): **kwargs):
""" """
Building vocab accoring to given iterator and other information. Iterate Building vocab accoring to given iterator and other information. Iterate
...@@ -333,14 +333,14 @@ class Vocab(object): ...@@ -333,14 +333,14 @@ class Vocab(object):
between tokens and indices to be used. If provided, adjust the tokens between tokens and indices to be used. If provided, adjust the tokens
and indices mapping according to it. If None, counter must be provided. and indices mapping according to it. If None, counter must be provided.
Default: None. Default: None.
unk_token (str): special token for unknow token. If no need, it also unk_token (str): special token for unknow token '<unk>'. If no need, it also
could be None. Default: '<unk>'. could be None. Default: None.
pad_token (str): special token for padding token. If no need, it also pad_token (str): special token for padding token '<pad>'. If no need, it also
could be None. Default: '<pad>'. could be None. Default: None.
bos_token (str): special token for bos token. If no need, it also bos_token (str): special token for bos token '<bos>'. If no need, it also
could be None. Default: <bos>'. could be None. Default: None.
eos_token (str): special token for eos token. If no need, it also eos_token (str): special token for eos token '<eos>'. If no need, it also
could be None. Default: '<eos>'. could be None. Default: None.
**kwargs (dict): Keyword arguments ending with `_token`. It can be used **kwargs (dict): Keyword arguments ending with `_token`. It can be used
to specify further special tokens that will be exposed as attribute to specify further special tokens that will be exposed as attribute
of the vocabulary and associated with an index. of the vocabulary and associated with an index.
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册