未验证 提交 e6e41bc1 编写于 作者: S Steffy-zxf 提交者: GitHub

add paddle.models api reference docs (#5030)

* add paddle.models api reference docs

* update docs

* update docs
上级 f6ebbace
# paddlenlp.models # paddlenlp.models
高阶组网API说明,Ernie, SimNet, Senta 该模块提供了百度自研的模型的高阶API,如文本分类模型Senta,文本匹配模型SimNet,通用预训练模型ERNIE等。
```python
class paddlenlp.models.Ernie(model_name, num_classes, task=None, **kwargs):
"""
预训练模型ERNIE。
更多信息参考:ERNIE: Enhanced Representation through Knowledge Integration(https://arxiv.org/abs/1904.09223)
参数:
`model_name (obj:`str`)`: 模型名称,如`ernie-1.0`,`ernie-tiny`,`ernie-2.0-en`, `ernie-2.0-large-en`。
`num_classes (obj:`int`)`: 分类类别数。
`task (obj:`str`): 预训练模型ERNIE用于下游任务名称,可以为`seq-cls`,`token-cls`,`qa`. 默认为None
- task='seq-cls': ERNIE用于文本分类任务。其将从ERNIE模型中提取句子特征,用于最后一层全连接网络进行文本分类。
详细信息参考:`paddlenlp.transformers.ErnieForSequenceClassification`。
- task='token-cls': ERNIE用于序列标注任务。其将从ERNIE模型中提取每一个token特征,用于最后一层全连接网络进行token分类。
详细信息参考:`paddlenlp.transformers.ErnieForQuestionAnswering`。
- task='qa': ERNIE用于阅读理解任务。其将从ERNIE模型中提取每一个token特征,用于最后一层全连接网络进行答案位置在原文中位置的预测。
详细信息参考:`paddlenlp.transformers.ErnieForTokenClassification`。
- task='None':预训练模型ERNIE。可将其作为backbone,用于提取句子特征pooled_output、token特征sequence_output。
详细信息参考:`paddlenlp.transformers.ErnieModel`
"""
def forward(input_ids, token_type_ids=None, position_ids=None, attention_mask=None):
"""
参数:
`input_ids (obj:`paddle.Tensor`)`:文本token id,shape为(batch_size, sequence_length)。
`token_type_ids (obj:`paddle.Tensor`)`: 各token所在文本的标识(token属于文本1或者文本2),shape为(batch_size, sequence_length)。
默认为None,表示所有token都属于文本1。
`position_ids(obj:`paddle.Tensor`)`:各Token在输入序列中的位置,shape为(batch_size, sequence_length)。默认为None。
`attention_mask`(obj:`paddle.Tensor`)`:为了避免在padding token上做attention操作,`attention_mask`表示token是否为padding token的标志矩阵,
shape为(batch_size, sequence_length)。mask的值或为0或为1, 为1表示该token是padding token,为0表示该token为真实输入token id。默认为None。
返回:
- 当`task=None`时,返回相应下游任务的分类概率值`probs(obj:`paddle.Tensor`)`,shape为(batch_size,num_classes)。
- 当`task=None`时,返回预训练模型ERNIE的句子特征pooled_output、token特征sequence_output。
* pooled_output(obj:`paddle.Tensor`):shape (batch_size,hidden_size)
* sequence_output(obj:`paddle.Tensor`):shape (batch_size,sequence_length, hidden_size)
"""
```
```python
class paddlenlp.models.Senta(network, vocab_size, num_classes, emb_dim=128, pad_token_id=0):
"""
文本分类模型Senta
参数:
`network(obj:`str`)`: 网络名称,可选bow,bilstm,bilstm_attn,bigru,birnn,cnn,lstm,gru,rnn以及textcnn。
- network='bow',对输入word embedding相加作为文本特征表示。
详细信息参考:`paddlenlp.seq2vec.BoWEncoder`。
- network=`bilstm`, 对输入word embedding进行双向lstm操作,取最后一个step的表示作为文本特征表示。
详细信息参考:`paddlenlp.seq2vec.LSTMEncoder`。
- network=`bilstm_attn`, 对输入word embedding进行双向lstm和Attention操作,取最后一个step的表示作为文本特征表示。
详细信息参考:`paddlenlp.seq2vec.LSTMEncoder`。
- network=`bigru`, 对输入word embedding进行双向gru操作,取最后一个step的表示作为文本特征表示。
详细信息参考:`paddlenlp.seq2vec.GRUEncoder`。
- network=`birnn`, 对输入word embedding进行双向rnn操作,取最后一个step的表示作为文本特征表示。
详细信息参考:`paddlenlp.seq2vec.RNNEncoder`。
- network='cnn',对输入word embedding进行一次积操作后进行max-pooling,作为文本特征表示。
详细信息参考:`paddlenlp.seq2vec.CNNEncoder`。
- network='lstm', 对输入word embedding进行lstm操作后进行max-pooling,作为文本特征表示。
详细信息参考:`paddlenlp.seq2vec.LSTMEncoder`。
- network='gru', 对输入word embedding进行lstm操作后进行max-pooling,作为文本特征表示。
详细信息参考:`paddlenlp.seq2vec.GRUEncoder`。
- network='rnn', 对输入word embedding进行lstm操作后进行max-pooling,作为文本特征表示。
详细信息参考:`paddlenlp.seq2vec.RNNEncoder`。
- network='textcnn',对输入word embedding进行多次卷积和max-pooling,作为文本特征表示。
详细信息参考:`paddlenlp.seq2vec.CNNEncoder`。
`vocab_size(obj:`int`)`:词汇表大小。
`num_classes(obj:`int`)`:分类类别数。
`emb_dim(obj:`int`)`:word embedding维度,默认128.
`pad_token_id(obj:`int`)`:padding token 在词汇表中index,默认0。
"""
def forward(text, seq_len):
"""
参数:
`text(obj:`paddle.Tensor`)`: 文本token id,shape为(batch_size, sequence_length)。
`seq_len(obj:`paddle.Tensor`): 文本序列长度, shape为(batch_size)。
返回:
`probs(obj:`paddle.Tensor`)`: 分类概率值,shape为(batch_size,num_classes)。
"""
```
```python
class paddlenlp.models.SimNet(nn.Layer):
"""
文本匹配模型SimNet
参数:
`network(obj:`str`)`: 网络名称,可选bow,cnn,lstm,以及gru,rnn以及textcnn。
- network='bow',对输入word embedding相加作为文本特征表示。
详细信息参考:`paddlenlp.seq2vec.BoWEncoder`。
- network='cnn',对输入word embedding进行一次积操作后进行max-pooling,作为文本特征表示。
详细信息参考:`paddlenlp.seq2vec.CNNEncoder`。
- network='lstm', 对输入word embedding进行lstm操作,取最后一个step的表示作为文本特征表示。
详细信息参考:`paddlenlp.seq2vec.LSTMEncoder`。
- network='gru', 对输入word embedding进行lstm操作后进行max-pooling,取最后一个step的表示作为文本特征表示。
详细信息参考:`paddlenlp.seq2vec.GRUEncoder`。
`vocab_size(obj:`int`)`:词汇表大小。
`num_classes(obj:`int`)`:分类类别数。
`emb_dim(obj:`int`)`:word embedding维度,默认128。
`pad_token_id(obj:`int`)`:padding token 在词汇表中index,默认0。
"""
def forward(query, title, query_seq_len=None, title_seq_len=None):
"""
参数:
`query(obj:`paddle.Tensor`)`: query文本token id,shape为(batch_size, query_sequence_length)。
`title(obj:`paddle.Tensor`)`: title文本token id,shape为(batch_size, title_sequence_length)。
`query_seq_len(obj:`paddle.Tensor`): query文本序列长度,shape为(batch_size)。。
返回:
`probs(obj:`paddle.Tensor`)`: 分类概率值,shape为(batch_size,num_classes)。
"""
```
...@@ -11,8 +11,8 @@ ...@@ -11,8 +11,8 @@
本项目针对中文文本分类问题,开源了一系列模型,供用户可配置地使用: 本项目针对中文文本分类问题,开源了一系列模型,供用户可配置地使用:
+ BERT([Bidirectional Encoder Representations from Transformers](https://arxiv.org/abs/1810.04805))中文模型,简写`bert-base-chinese`, 其由12层Transformer网络组成。 + BERT([Bidirectional Encoder Representations from Transformers](https://arxiv.org/abs/1810.04805))中文模型,简写`bert-base-chinese`, 其由12层Transformer网络组成。
+ ERNIE([Enhanced Representation through Knowledge Integration](https://arxiv.org/pdf/1904.09223)),支持ERNIE 1.0中文模型(简写`ernie-1.0`)和ERNIE Tiny中文模型(简写`ernie_tiny`)。 + ERNIE([Enhanced Representation through Knowledge Integration](https://arxiv.org/pdf/1904.09223)),支持ERNIE 1.0中文模型(简写`ernie-1.0`)和ERNIE Tiny中文模型(简写`ernie-tiny`)。
其中`ernie`由12层Transformer网络组成,`ernie_tiny`由3层Transformer网络组成。 其中`ernie`由12层Transformer网络组成,`ernie-tiny`由3层Transformer网络组成。
+ RoBERTa([A Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692)),支持24层Transformer网络的`roberta-wwm-ext-large`和12层Transformer网络的`roberta-wwm-ext` + RoBERTa([A Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692)),支持24层Transformer网络的`roberta-wwm-ext-large`和12层Transformer网络的`roberta-wwm-ext`
+ Electra([ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators](https://arxiv.org/abs/2003.10555)), 支持hidden_size=256的`chinese-electra-discriminator-small` + Electra([ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators](https://arxiv.org/abs/2003.10555)), 支持hidden_size=256的`chinese-electra-discriminator-small`
hidden_size=768的`chinese-electra-discriminator-base` hidden_size=768的`chinese-electra-discriminator-base`
...@@ -28,8 +28,8 @@ ...@@ -28,8 +28,8 @@
| roberta-wwm-ext-large | 0.95250 | 0.95333 | | roberta-wwm-ext-large | 0.95250 | 0.95333 |
| rbt3 | 0.92583 | 0.93250 | | rbt3 | 0.92583 | 0.93250 |
| rbtl3 | 0.9341 | 0.93583 | | rbtl3 | 0.9341 | 0.93583 |
| chinese-electra-discriminator-base | 0.94500 | 0.94500 | | chinese-electra-base | 0.94500 | 0.94500 |
| chinese-electra-discriminator-small | 0.92417 | 0.93417 | | chinese-electra-small | 0.92417 | 0.93417 |
## 快速开始 ## 快速开始
...@@ -66,13 +66,13 @@ pretrained_models/ ...@@ -66,13 +66,13 @@ pretrained_models/
```shell ```shell
# 设置使用的GPU卡号 # 设置使用的GPU卡号
CUDA_VISIBLE_DEVICES=0 CUDA_VISIBLE_DEVICES=0
python train.py --model_type ernie --model_name ernie_tiny --n_gpu 1 --save_dir ./checkpoints python train.py --model_type ernie --model_name ernie-tiny --n_gpu 1 --save_dir ./checkpoints
``` ```
可支持配置的参数: 可支持配置的参数:
* `model_type`:必选,模型类型,可以选择bert,ernie,roberta。 * `model_type`:必选,模型类型,可以选择bert,ernie,roberta。
* `model_name`: 必选,具体的模型简称。如`model_type=ernie`,则model_name可以选择`ernie``ernie_tiny``model_type=bert`,则model_name可以选择`bert-base-chinese` * `model_name`: 必选,具体的模型简称。如`model_type=ernie`,则model_name可以选择`ernie-1.0``ernie-tiny``model_type=bert`,则model_name可以选择`bert-base-chinese`
`model_type=roberta`,则model_name可以选择`roberta-wwm-ext-large``roberta-wwm-ext` `model_type=roberta`,则model_name可以选择`roberta-wwm-ext-large``roberta-wwm-ext`
* `save_dir`:必选,保存训练模型的目录。 * `save_dir`:必选,保存训练模型的目录。
* `max_seq_length`:可选,ERNIE/BERT模型使用的最大序列长度,最大不能超过512, 若出现显存不足,请适当调低这一参数;默认为128。 * `max_seq_length`:可选,ERNIE/BERT模型使用的最大序列长度,最大不能超过512, 若出现显存不足,请适当调低这一参数;默认为128。
...@@ -99,14 +99,14 @@ checkpoints/ ...@@ -99,14 +99,14 @@ checkpoints/
**NOTE:** **NOTE:**
* 如需恢复模型训练,则可以设置`init_from_ckpt`, 如`init_from_ckpt=checkpoints/model_100/model_state.pdparams` * 如需恢复模型训练,则可以设置`init_from_ckpt`, 如`init_from_ckpt=checkpoints/model_100/model_state.pdparams`
* 如需使用ernie_tiny模型,则需要提前先安装sentencepiece依赖,如`pip install sentencepiece` * 如需使用ernie-tiny模型,则需要提前先安装sentencepiece依赖,如`pip install sentencepiece`
### 模型预测 ### 模型预测
启动预测: 启动预测:
```shell ```shell
export CUDA_VISIBLE_DEVICES=0 export CUDA_VISIBLE_DEVICES=0
python predict.py --model_type ernie --model_name ernie_tiny --params_path checkpoints/model_400/model_state.pdparams python predict.py --model_type ernie --model_name ernie-tiny --params_path checkpoints/model_400/model_state.pdparams
``` ```
将待预测数据如以下示例: 将待预测数据如以下示例:
......
...@@ -42,7 +42,7 @@ def parse_args(): ...@@ -42,7 +42,7 @@ def parse_args():
parser = argparse.ArgumentParser() parser = argparse.ArgumentParser()
# Required parameters # Required parameters
parser.add_argument("--model_type", default='ernie', required=True, type=str, help="Model type selected in the list: " +", ".join(MODEL_CLASSES.keys())) parser.add_argument("--model_type", default='ernie', required=True, type=str, help="Model type selected in the list: " +", ".join(MODEL_CLASSES.keys()))
parser.add_argument("--model_name_or_path", default='ernie_tiny', required=True, type=str, help="Path to pre-trained model or shortcut name selected in the list: " + parser.add_argument("--model_name_or_path", default='ernie-tiny', required=True, type=str, help="Path to pre-trained model or shortcut name selected in the list: " +
", ".join(sum([list(classes[-1].pretrained_init_configuration.keys()) for classes in MODEL_CLASSES.values()], []))) ", ".join(sum([list(classes[-1].pretrained_init_configuration.keys()) for classes in MODEL_CLASSES.values()], [])))
parser.add_argument("--params_path", type=str, required=True, help="The path to model parameters to be loaded.") parser.add_argument("--params_path", type=str, required=True, help="The path to model parameters to be loaded.")
......
...@@ -49,7 +49,7 @@ def parse_args(): ...@@ -49,7 +49,7 @@ def parse_args():
", ".join(MODEL_CLASSES.keys())) ", ".join(MODEL_CLASSES.keys()))
parser.add_argument( parser.add_argument(
"--model_name", "--model_name",
default='ernie_tiny', default='ernie-tiny',
required=True, required=True,
type=str, type=str,
help="Path to pre-trained model or shortcut name selected in the list: " help="Path to pre-trained model or shortcut name selected in the list: "
......
...@@ -39,8 +39,8 @@ PaddleNLP提供了丰富的预训练模型,并且可以便捷地获取PaddlePa ...@@ -39,8 +39,8 @@ PaddleNLP提供了丰富的预训练模型,并且可以便捷地获取PaddlePa
本项目针对中文文本匹配问题,开源了一系列模型,供用户可配置地使用: 本项目针对中文文本匹配问题,开源了一系列模型,供用户可配置地使用:
+ BERT([Bidirectional Encoder Representations from Transformers](https://arxiv.org/abs/1810.04805))中文模型,简写`bert-base-chinese`, 其由12层Transformer网络组成。 + BERT([Bidirectional Encoder Representations from Transformers](https://arxiv.org/abs/1810.04805))中文模型,简写`bert-base-chinese`, 其由12层Transformer网络组成。
+ ERNIE([Enhanced Representation through Knowledge Integration](https://arxiv.org/pdf/1904.09223)),支持ERNIE 1.0中文模型(简写`ernie-1.0`)和ERNIE Tiny中文模型(简写`ernie_tiny`)。 + ERNIE([Enhanced Representation through Knowledge Integration](https://arxiv.org/pdf/1904.09223)),支持ERNIE 1.0中文模型(简写`ernie-1.0`)和ERNIE Tiny中文模型(简写`ernie-tiny`)。
其中`ernie`由12层Transformer网络组成,`ernie_tiny`由3层Transformer网络组成。 其中`ernie`由12层Transformer网络组成,`ernie-tiny`由3层Transformer网络组成。
+ RoBERTa([A Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692)),支持24层Transformer网络的`roberta-wwm-ext-large`和12层Transformer网络的`roberta-wwm-ext` + RoBERTa([A Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692)),支持24层Transformer网络的`roberta-wwm-ext-large`和12层Transformer网络的`roberta-wwm-ext`
+ Electra([ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators](https://arxiv.org/abs/2003.10555)), 支持hidden_size=256的`chinese-electra-discriminator-small` + Electra([ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators](https://arxiv.org/abs/2003.10555)), 支持hidden_size=256的`chinese-electra-discriminator-small`
hidden_size=768的`chinese-electra-discriminator-base` hidden_size=768的`chinese-electra-discriminator-base`
...@@ -48,11 +48,11 @@ PaddleNLP提供了丰富的预训练模型,并且可以便捷地获取PaddlePa ...@@ -48,11 +48,11 @@ PaddleNLP提供了丰富的预训练模型,并且可以便捷地获取PaddlePa
## TODO 增加模型效果 ## TODO 增加模型效果
| 模型 | dev acc | test acc | | 模型 | dev acc | test acc |
| ---- | ------- | -------- | | ---- | ------- | -------- |
| bert-base-chinese | | | | bert-base-chinese | 0.86537 | 0.84440 |
| bert-wwm-chinese | | | | bert-wwm-chinese | 0.86333 | 0.84128 |
| bert-wwm-ext-chinese | | | | bert-wwm-ext-chinese | | |
| ernie | | | | ernie | 0.87480 | 0.84760 |
| ernie-tiny | | | | ernie-tiny | 0.86071 | 0.83352 |
| roberta-wwm-ext | | | | roberta-wwm-ext | | |
| roberta-wwm-ext-large | | | | roberta-wwm-ext-large | | |
| rbt3 | | | | rbt3 | | |
...@@ -108,7 +108,7 @@ python train.py --model_type ernie --model_name ernie-1.0 --n_gpu 1 --save_dir . ...@@ -108,7 +108,7 @@ python train.py --model_type ernie --model_name ernie-1.0 --n_gpu 1 --save_dir .
可支持配置的参数: 可支持配置的参数:
* `model_type`:必选,模型类型,可以选择bert,ernie,roberta。 * `model_type`:必选,模型类型,可以选择bert,ernie,roberta。
* `model_name`: 必选,具体的模型简称。如`model_type=ernie`,则model_name可以选择`ernie``ernie_tiny``model_type=bert`,则model_name可以选择`bert-base-chinese` * `model_name`: 必选,具体的模型简称。如`model_type=ernie`,则model_name可以选择`ernie-1.0``ernie-tiny``model_type=bert`,则model_name可以选择`bert-base-chinese`
`model_type=roberta`,则model_name可以选择`roberta-wwm-ext-large``roberta-wwm-ext` `model_type=roberta`,则model_name可以选择`roberta-wwm-ext-large``roberta-wwm-ext`
* `save_dir`:必选,保存训练模型的目录。 * `save_dir`:必选,保存训练模型的目录。
* `max_seq_length`:可选,ERNIE/BERT模型使用的最大序列长度,最大不能超过512, 若出现显存不足,请适当调低这一参数;默认为128。 * `max_seq_length`:可选,ERNIE/BERT模型使用的最大序列长度,最大不能超过512, 若出现显存不足,请适当调低这一参数;默认为128。
...@@ -135,14 +135,14 @@ checkpoints/ ...@@ -135,14 +135,14 @@ checkpoints/
**NOTE:** **NOTE:**
* 如需恢复模型训练,则可以设置`init_from_ckpt`, 如`init_from_ckpt=checkpoints/model_100/model_state.pdparams` * 如需恢复模型训练,则可以设置`init_from_ckpt`, 如`init_from_ckpt=checkpoints/model_100/model_state.pdparams`
* 如需使用ernie_tiny模型,则需要提前先安装sentencepiece依赖,如`pip install sentencepiece` * 如需使用ernie-tiny模型,则需要提前先安装sentencepiece依赖,如`pip install sentencepiece`
### 模型预测 ### 模型预测
启动预测: 启动预测:
```shell ```shell
CUDA_VISIBLE_DEVICES=0 CUDA_VISIBLE_DEVICES=0
python predict.py --model_type ernie --model_name ernie_tiny --params_path checkpoints/model_400/model_state.pdparams python predict.py --model_type ernie --model_name ernie-tiny --params_path checkpoints/model_400/model_state.pdparams
``` ```
将待预测数据如以下示例: 将待预测数据如以下示例:
......
...@@ -31,8 +31,7 @@ MODEL_CLASSES = { ...@@ -31,8 +31,7 @@ MODEL_CLASSES = {
'ernie': (ppnlp.transformers.ErnieModel, ppnlp.transformers.ErnieTokenizer), 'ernie': (ppnlp.transformers.ErnieModel, ppnlp.transformers.ErnieTokenizer),
'roberta': (ppnlp.transformers.RobertaModel, 'roberta': (ppnlp.transformers.RobertaModel,
ppnlp.transformers.RobertaTokenizer), ppnlp.transformers.RobertaTokenizer),
# 'electra': (ppnlp.transformers.Electra, 'electra': (ppnlp.transformers.Electra, ppnlp.transformers.ElectraTokenizer)
# ppnlp.transformers.ElectraTokenizer)
} }
...@@ -176,10 +175,6 @@ def predict(model, data, tokenizer, label_map, batch_size=1): ...@@ -176,10 +175,6 @@ def predict(model, data, tokenizer, label_map, batch_size=1):
title_input_ids = paddle.to_tensor(title_input_ids) title_input_ids = paddle.to_tensor(title_input_ids)
title_segment_ids = paddle.to_tensor(title_segment_ids) title_segment_ids = paddle.to_tensor(title_segment_ids)
print(query_input_ids)
print(query_segment_ids)
print(title_segment_ids)
probs = model( probs = model(
query_input_ids, query_input_ids,
title_input_ids, title_input_ids,
......
...@@ -32,8 +32,7 @@ MODEL_CLASSES = { ...@@ -32,8 +32,7 @@ MODEL_CLASSES = {
'ernie': (ppnlp.transformers.ErnieModel, ppnlp.transformers.ErnieTokenizer), 'ernie': (ppnlp.transformers.ErnieModel, ppnlp.transformers.ErnieTokenizer),
'roberta': (ppnlp.transformers.RobertaModel, 'roberta': (ppnlp.transformers.RobertaModel,
ppnlp.transformers.RobertaTokenizer), ppnlp.transformers.RobertaTokenizer),
# 'electra': (ppnlp.transformers.Electra, 'electra': (ppnlp.transformers.Electra, ppnlp.transformers.ElectraTokenizer)
# ppnlp.transformers.ElectraTokenizer)
} }
......
...@@ -20,7 +20,7 @@ from paddlenlp.transformers import * ...@@ -20,7 +20,7 @@ from paddlenlp.transformers import *
class Ernie(nn.Layer): class Ernie(nn.Layer):
def __init__(self, model_name, num_classes, task=None): def __init__(self, model_name, num_classes, task=None, **kwargs):
super().__init__() super().__init__()
model_name = model_name.lower() model_name = model_name.lower()
self.task = task.lower() self.task = task.lower()
...@@ -30,20 +30,21 @@ class Ernie(nn.Layer): ...@@ -30,20 +30,21 @@ class Ernie(nn.Layer):
assert model_name in required_names, "model_name must be in %s, unknown %s ." ( assert model_name in required_names, "model_name must be in %s, unknown %s ." (
required_names, model_name) required_names, model_name)
self.model = ErnieForSequenceClassification.from_pretrained( self.model = ErnieForSequenceClassification.from_pretrained(
model_name, num_classes=num_classes) model_name, num_classes=num_classes, **kwargs)
elif self.task == 'token-cls': elif self.task == 'token-cls':
required_names = list(ErnieForTokenClassification. required_names = list(ErnieForTokenClassification.
pretrained_init_configuration.keys()) pretrained_init_configuration.keys())
assert model_name in required_names, "model_name must be in %s, unknown %s ." ( assert model_name in required_names, "model_name must be in %s, unknown %s ." (
required_names, model_name) required_names, model_name)
self.model = ErnieForTokenClassification.from_pretrained( self.model = ErnieForTokenClassification.from_pretrained(
model_name, num_classes=num_classes) model_name, num_classes=num_classes, **kwargs)
elif self.task == 'qa': elif self.task == 'qa':
required_names = list( required_names = list(
ErnieForQuestionAnswering.pretrained_init_configuration.keys()) ErnieForQuestionAnswering.pretrained_init_configuration.keys())
assert model_name in required_names, "model_name must be in %s, unknown %s ." ( assert model_name in required_names, "model_name must be in %s, unknown %s ." (
required_names, model_name) required_names, model_name)
self.model = ErnieForQuestionAnswering.from_pretrained(model_name) self.model = ErnieForQuestionAnswering.from_pretrained(model_name,
**kwargs)
elif self.task is None: elif self.task is None:
required_names = list(ErnieModel.pretrained_init_configuration.keys( required_names = list(ErnieModel.pretrained_init_configuration.keys(
)) ))
......
...@@ -23,32 +23,32 @@ INF = 1. * 1e12 ...@@ -23,32 +23,32 @@ INF = 1. * 1e12
class Senta(nn.Layer): class Senta(nn.Layer):
def __init__(self, def __init__(self,
network_name, network,
vocab_size, vocab_size,
num_classes, num_classes,
emb_dim=128, emb_dim=128,
pad_token_id=0): pad_token_id=0):
super().__init__() super().__init__()
network_name = network_name.lower() network = network.lower()
if network_name == 'bow': if network == 'bow':
self.model = BoWModel( self.model = BoWModel(
vocab_size, num_classes, emb_dim, padding_idx=pad_token_id) vocab_size, num_classes, emb_dim, padding_idx=pad_token_id)
elif network_name == 'bigru': elif network == 'bigru':
self.model = GRUModel( self.model = GRUModel(
vocab_size, vocab_size,
num_classes, num_classes,
emb_dim, emb_dim,
direction='bidirectional', direction='bidirectional',
padding_idx=pad_token_id) padding_idx=pad_token_id)
elif network_name == 'bilstm': elif network == 'bilstm':
self.model = LSTMModel( self.model = LSTMModel(
vocab_size, vocab_size,
num_classes, num_classes,
emb_dim, emb_dim,
direction='bidirectional', direction='bidirectional',
padding_idx=pad_token_id) padding_idx=pad_token_id)
elif network_name == 'bilstm_attn': elif network == 'bilstm_attn':
lstm_hidden_size = 196 lstm_hidden_size = 196
attention = SelfInteractiveAttention(hidden_size=2 * attention = SelfInteractiveAttention(hidden_size=2 *
lstm_hidden_size) lstm_hidden_size)
...@@ -58,17 +58,17 @@ class Senta(nn.Layer): ...@@ -58,17 +58,17 @@ class Senta(nn.Layer):
lstm_hidden_size=lstm_hidden_size, lstm_hidden_size=lstm_hidden_size,
num_classes=num_classes, num_classes=num_classes,
padding_idx=pad_token_id) padding_idx=pad_token_id)
elif network_name == 'birnn': elif network == 'birnn':
self.model = RNNModel( self.model = RNNModel(
vocab_size, vocab_size,
num_classes, num_classes,
emb_dim, emb_dim,
direction='bidrectional', direction='bidrectional',
padding_idx=pad_token_id) padding_idx=pad_token_id)
elif network_name == 'cnn': elif network == 'cnn':
self.model = CNNModel( self.model = CNNModel(
vocab_size, num_classes, emb_dim, padding_idx=pad_token_id) vocab_size, num_classes, emb_dim, padding_idx=pad_token_id)
elif network_name == 'gru': elif network == 'gru':
self.model = GRUModel( self.model = GRUModel(
vocab_size, vocab_size,
num_classes, num_classes,
...@@ -76,7 +76,7 @@ class Senta(nn.Layer): ...@@ -76,7 +76,7 @@ class Senta(nn.Layer):
direction='forward', direction='forward',
padding_idx=pad_token_id, padding_idx=pad_token_id,
pooling_type='max') pooling_type='max')
elif network_name == 'lstm': elif network == 'lstm':
self.model = LSTMModel( self.model = LSTMModel(
vocab_size, vocab_size,
num_classes, num_classes,
...@@ -84,7 +84,7 @@ class Senta(nn.Layer): ...@@ -84,7 +84,7 @@ class Senta(nn.Layer):
direction='forward', direction='forward',
padding_idx=pad_token_id, padding_idx=pad_token_id,
pooling_type='max') pooling_type='max')
elif network_name == 'rnn': elif network == 'rnn':
self.model = RNNModel( self.model = RNNModel(
vocab_size, vocab_size,
num_classes, num_classes,
...@@ -92,15 +92,15 @@ class Senta(nn.Layer): ...@@ -92,15 +92,15 @@ class Senta(nn.Layer):
direction='forward', direction='forward',
padding_idx=pad_token_id, padding_idx=pad_token_id,
pooling_type='max') pooling_type='max')
elif network_name == 'textcnn': elif network == 'textcnn':
self.model = TextCNNModel( self.model = TextCNNModel(
vocab_size, num_classes, emb_dim, padding_idx=pad_token_id) vocab_size, num_classes, emb_dim, padding_idx=pad_token_id)
else: else:
raise ValueError( raise ValueError(
"Unknown network: %s, it must be one of bow, lstm, bilstm, cnn, gru, bigru, rnn, birnn, bilstm_attn and textcnn." "Unknown network: %s, it must be one of bow, lstm, bilstm, cnn, gru, bigru, rnn, birnn, bilstm_attn and textcnn."
% network_name) % network)
def forward(self, text, seq_len): def forward(self, text, seq_len=None):
logits = self.model(text, seq_len) logits = self.model(text, seq_len)
probs = F.softmax(logits, axis=-1) probs = F.softmax(logits, axis=-1)
return probs return probs
...@@ -137,7 +137,7 @@ class BoWModel(nn.Layer): ...@@ -137,7 +137,7 @@ class BoWModel(nn.Layer):
self.fc2 = nn.Linear(hidden_size, fc_hidden_size) self.fc2 = nn.Linear(hidden_size, fc_hidden_size)
self.output_layer = nn.Linear(fc_hidden_size, num_classes) self.output_layer = nn.Linear(fc_hidden_size, num_classes)
def forward(self, text, seq_len): def forward(self, text, seq_len=None):
# Shape: (batch_size, num_tokens, embedding_dim) # Shape: (batch_size, num_tokens, embedding_dim)
embedded_text = self.embedder(text) embedded_text = self.embedder(text)
...@@ -462,7 +462,7 @@ class CNNModel(nn.Layer): ...@@ -462,7 +462,7 @@ class CNNModel(nn.Layer):
self.fc = nn.Linear(self.encoder.get_output_dim(), fc_hidden_size) self.fc = nn.Linear(self.encoder.get_output_dim(), fc_hidden_size)
self.output_layer = nn.Linear(fc_hidden_size, num_classes) self.output_layer = nn.Linear(fc_hidden_size, num_classes)
def forward(self, text, seq_len): def forward(self, text, seq_len=None):
# Shape: (batch_size, num_tokens, embedding_dim) # Shape: (batch_size, num_tokens, embedding_dim)
embedded_text = self.embedder(text) embedded_text = self.embedder(text)
# Shape: (batch_size, len(ngram_filter_sizes)*num_filter) # Shape: (batch_size, len(ngram_filter_sizes)*num_filter)
...@@ -511,7 +511,7 @@ class TextCNNModel(nn.Layer): ...@@ -511,7 +511,7 @@ class TextCNNModel(nn.Layer):
self.fc = nn.Linear(self.encoder.get_output_dim(), fc_hidden_size) self.fc = nn.Linear(self.encoder.get_output_dim(), fc_hidden_size)
self.output_layer = nn.Linear(fc_hidden_size, num_classes) self.output_layer = nn.Linear(fc_hidden_size, num_classes)
def forward(self, text, seq_len): def forward(self, text, seq_len=None):
# Shape: (batch_size, num_tokens, embedding_dim) # Shape: (batch_size, num_tokens, embedding_dim)
embedded_text = self.embedder(text) embedded_text = self.embedder(text)
# Shape: (batch_size, len(ngram_filter_sizes)*num_filter) # Shape: (batch_size, len(ngram_filter_sizes)*num_filter)
......
...@@ -106,7 +106,7 @@ class ErniePretrainedModel(PretrainedModel): ...@@ -106,7 +106,7 @@ class ErniePretrainedModel(PretrainedModel):
"vocab_size": 18000, "vocab_size": 18000,
"pad_token_id": 0, "pad_token_id": 0,
}, },
"ernie_tiny": { "ernie-tiny": {
"attention_probs_dropout_prob": 0.1, "attention_probs_dropout_prob": 0.1,
"hidden_act": "relu", "hidden_act": "relu",
"hidden_dropout_prob": 0.1, "hidden_dropout_prob": 0.1,
...@@ -153,7 +153,7 @@ class ErniePretrainedModel(PretrainedModel): ...@@ -153,7 +153,7 @@ class ErniePretrainedModel(PretrainedModel):
"model_state": { "model_state": {
"ernie-1.0": "ernie-1.0":
"https://paddlenlp.bj.bcebos.com/models/transformers/ernie/ernie_v1_chn_base.pdparams", "https://paddlenlp.bj.bcebos.com/models/transformers/ernie/ernie_v1_chn_base.pdparams",
"ernie_tiny": "ernie-tiny":
"https://paddlenlp.bj.bcebos.com/models/transformers/ernie_tiny/ernie_tiny.pdparams", "https://paddlenlp.bj.bcebos.com/models/transformers/ernie_tiny/ernie_tiny.pdparams",
"ernie-2.0-en": "ernie-2.0-en":
"https://paddlenlp.bj.bcebos.com/models/transformers/ernie_v2_base/ernie-2.0-en.pdparams", "https://paddlenlp.bj.bcebos.com/models/transformers/ernie_v2_base/ernie-2.0-en.pdparams",
......
...@@ -403,7 +403,7 @@ class ErnieTinyTokenizer(PretrainedTokenizer): ...@@ -403,7 +403,7 @@ class ErnieTinyTokenizer(PretrainedTokenizer):
Examples: Examples:
.. code-block:: python .. code-block:: python
from paddlenlp.transformers import ErnieTinyTokenizer from paddlenlp.transformers import ErnieTinyTokenizer
tokenizer = ErnieTinyTokenizer.from_pretrained('ernie_tiny) tokenizer = ErnieTinyTokenizer.from_pretrained('ernie-tiny)
# the following line get: ['he', 'was', 'a', 'puppet', '##eer'] # the following line get: ['he', 'was', 'a', 'puppet', '##eer']
tokens = tokenizer('He was a puppeteer') tokens = tokenizer('He was a puppeteer')
# the following line get: 'he was a puppeteer' # the following line get: 'he was a puppeteer'
...@@ -416,19 +416,19 @@ class ErnieTinyTokenizer(PretrainedTokenizer): ...@@ -416,19 +416,19 @@ class ErnieTinyTokenizer(PretrainedTokenizer):
} # for save_pretrained } # for save_pretrained
pretrained_resource_files_map = { pretrained_resource_files_map = {
"vocab_file": { "vocab_file": {
"ernie_tiny": "ernie-tiny":
"https://paddlenlp.bj.bcebos.com/models/transformers/ernie_tiny/vocab.txt" "https://paddlenlp.bj.bcebos.com/models/transformers/ernie_tiny/vocab.txt"
}, },
"sentencepiece_model_file": { "sentencepiece_model_file": {
"ernie_tiny": "ernie-tiny":
"https://paddlenlp.bj.bcebos.com/models/transformers/ernie_tiny/spm_cased_simp_sampled.model" "https://paddlenlp.bj.bcebos.com/models/transformers/ernie_tiny/spm_cased_simp_sampled.model"
}, },
"word_dict": { "word_dict": {
"ernie_tiny": "ernie-tiny":
"https://paddlenlp.bj.bcebos.com/models/transformers/ernie_tiny/dict.wordseg.pickle" "https://paddlenlp.bj.bcebos.com/models/transformers/ernie_tiny/dict.wordseg.pickle"
}, },
} }
pretrained_init_configuration = {"ernie_tiny": {"do_lower_case": True}} pretrained_init_configuration = {"ernie-tiny": {"do_lower_case": True}}
def __init__(self, def __init__(self,
vocab_file, vocab_file,
...@@ -553,8 +553,8 @@ class ErnieTinyTokenizer(PretrainedTokenizer): ...@@ -553,8 +553,8 @@ class ErnieTinyTokenizer(PretrainedTokenizer):
save_directory (str): Directory to save files into. save_directory (str): Directory to save files into.
""" """
for name, file_name in self.resource_files_names.items(): for name, file_name in self.resource_files_names.items():
### TODO: make the name 'ernie_tiny' as a variable ### TODO: make the name 'ernie-tiny' as a variable
source_path = os.path.join(MODEL_HOME, 'ernie_tiny', file_name) source_path = os.path.join(MODEL_HOME, 'ernie-tiny', file_name)
save_path = os.path.join(save_directory, save_path = os.path.join(save_directory,
self.resource_files_names[name]) self.resource_files_names[name])
shutil.copyfile(source_path, save_path) shutil.copyfile(source_path, save_path)
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册