未验证 提交 ca8541bd 编写于 作者: S Steffy-zxf 提交者: GitHub

add sequence classification with ernie/bert/roberta finetuned in dygraph

上级 060c13ec
# PaddleHub Transformer模型fine-tune文本分类(动态图)
本示例将展示如何使用PaddleHub Transformer模型(如 ERNIE、BERT、RoBERTa等模型)module 以动态图方式fine-tune并完成预测任务。
## 如何开始Fine-tune
我们以中文情感分类公开数据集ChnSentiCorp为示例数据集,可以运行下面的命令,在训练集(train.tsv)上进行模型训练,并在开发集(dev.tsv)验证。通过如下命令,即可启动训练。
```shell
# 设置使用的GPU卡号
export CUDA_VISIBLE_DEVICES=0
python train.py
```
## 代码步骤
使用PaddleHub Fine-tune API进行Fine-tune可以分为4个步骤。
### Step1: 选择模型
```python
import paddlehub as hub
model = hub.Module(name='ernie_tiny', version='2.0.0', task='sequence_classification')
```
其中,参数:
* `name`:模型名称,可以选择`ernie``ernie-tiny``bert_chinese_L-12_H-768_A-12``chinese-roberta-wwm-ext``chinese-roberta-wwm-ext-large`等。
* `version`:module版本号
* `task`:fine-tune任务。此处为`sequence_classification`,表示文本分类任务。
### Step2: 下载并加载数据集
```python
train_dataset = hub.datasets.ChnSentiCorp(
tokenizer=model.get_tokenizer(tokenize_chinese_chars=True), max_seq_len=128, mode='train')
dev_dataset = hub.datasets.ChnSentiCorp(
tokenizer=model.get_tokenizer(tokenize_chinese_chars=True), max_seq_len=128, mode='dev')
```
* `tokenizer`:表示该module所需用到的tokenizer,其将对输入文本完成切词,并转化成module运行所需模型输入格式。
* `mode`:选择数据模式,可选项有 `train`, `test`, `val`, 默认为`train`
* `max_seq_len`:ERNIE/BERT模型使用的最大序列长度,若出现显存不足,请适当调低这一参数。
### Step3: 选择优化策略和运行配置
```python
optimizer = paddle.optimizer.Adam(learning_rate=5e-5, parameters=model.parameters())
trainer = hub.Trainer(model, optimizer, checkpoint_dir='test_ernie_text_cls')
trainer.train(train_dataset, epochs=3, batch_size=32, eval_dataset=dev_dataset)
# 在测试集上评估当前训练模型
trainer.evaluate(test_dataset, batch_size=32)
```
#### 优化策略
Paddle2.0-rc提供了多种优化器选择,如`SGD`, `Adam`, `Adamax`等,详细参见[策略](https://www.paddlepaddle.org.cn/documentation/docs/zh/2.0-rc/api/paddle/optimizer/optimizer/Optimizer_cn.html)
其中`Adam`:
* `learning_rate`: 全局学习率。默认为1e-3;
* `parameters`: 待优化模型参数。
#### 运行配置
`Trainer` 主要控制Fine-tune的训练,包含以下可控制的参数:
* `model`: 被优化模型;
* `optimizer`: 优化器选择;
* `use_vdl`: 是否使用vdl可视化训练过程;
* `checkpoint_dir`: 保存模型参数的地址;
* `compare_metrics`: 保存最优模型的衡量指标;
`trainer.train` 主要控制具体的训练过程,包含以下可控制的参数:
* `train_dataset`: 训练时所用的数据集;
* `epochs`: 训练轮数;
* `batch_size`: 训练的批大小,如果使用GPU,请根据实际情况调整batch_size;
* `num_workers`: works的数量,默认为0;
* `eval_dataset`: 验证集;
* `log_interval`: 打印日志的间隔, 单位为执行批训练的次数。
* `save_interval`: 保存模型的间隔频次,单位为执行训练的轮数。
## 模型预测
当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。
我们以以下数据为待预测数据,使用该模型来进行预测
```text
这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般
怀着十分激动的心情放映,可是看着看着发现,在放映完毕后,出现一集米老鼠的动画片
作为老的四星酒店,房间依然很整洁,相当不错。机场接机服务很好,可以在车上办理入住手续,节省时间。
```
```python
import paddlehub as hub
data = [
['这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般'],
['怀着十分激动的心情放映,可是看着看着发现,在放映完毕后,出现一集米老鼠的动画片'],
['作为老的四星酒店,房间依然很整洁,相当不错。机场接机服务很好,可以在车上办理入住手续,节省时间。'],
]
label_map = {0: 'negative', 1: 'positive'}
model = hub.Module(
directory='/mnt/zhangxuefei/program-paddle/PaddleHub/modules/text/language_model/ernie_tiny',
version='2.0.0',
task='sequence_classification',
load_checkpoint='./test_ernie_text_cls/best_model/model.pdparams',
label_map=label_map)
results = model.predict(data, max_seq_len=50, batch_size=1, use_gpu=False)
for idx, text in enumerate(data):
print('Data: {} \t Lable: {}'.format(text[0], results[idx]))
```
参数配置正确后,请执行脚本`python predict.py`, 加载模型具体可参见[加载](https://www.paddlepaddle.org.cn/documentation/docs/zh/2.0-rc/api/paddle/framework/io/load_cn.html#load)
### 依赖
paddlepaddle >= 2.0.0rc
paddlehub >= 2.0.0
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved. # Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
...@@ -11,3 +11,23 @@ ...@@ -11,3 +11,23 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
import paddlehub as hub
if __name__ == '__main__':
data = [
['这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般'],
['怀着十分激动的心情放映,可是看着看着发现,在放映完毕后,出现一集米老鼠的动画片'],
['作为老的四星酒店,房间依然很整洁,相当不错。机场接机服务很好,可以在车上办理入住手续,节省时间。'],
]
label_map = {0: 'negative', 1: 'positive'}
model = hub.Module(
name='ernie_tiny',
version='2.0.0',
task='sequence_classification',
load_checkpoint='./test_ernie_text_cls/best_model/model.pdparams',
label_map=label_map)
results = model.predict(data, max_seq_len=50, batch_size=1, use_gpu=False)
for idx, text in enumerate(data):
print('Data: {} \t Lable: {}'.format(text[0], results[idx]))
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved. # Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
...@@ -11,3 +11,21 @@ ...@@ -11,3 +11,21 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
import paddle
import paddlehub as hub
if __name__ == '__main__':
model = hub.Module(name='ernie_tiny', version='2.0.0', task='sequence_classification')
train_dataset = hub.datasets.ChnSentiCorp(
tokenizer=model.get_tokenizer(tokenize_chinese_chars=True), max_seq_len=128, mode='train')
dev_dataset = hub.datasets.ChnSentiCorp(
tokenizer=model.get_tokenizer(tokenize_chinese_chars=True), max_seq_len=128, mode='dev')
test_dataset = hub.datasets.ChnSentiCorp(
tokenizer=model.get_tokenizer(tokenize_chinese_chars=True), max_seq_len=128, mode='test')
optimizer = paddle.optimizer.AdamW(learning_rate=5e-5, parameters=model.parameters())
trainer = hub.Trainer(model, optimizer, checkpoint_dir='test_ernie_text_cls', use_gpu=True)
trainer.train(train_dataset, epochs=3, batch_size=32, eval_dataset=dev_dataset, save_interval=1)
trainer.evaluate(test_dataset, batch_size=32)
```shell
$ hub install bert-base-cased==2.0.0
```
<p align="center">
<img src="https://bj.bcebos.com/paddlehub/paddlehub-img/bert_network.png" hspace='10'/> <br />
</p>
更多详情请参考[BERT论文](https://arxiv.org/abs/1810.04805)
## API
```python
def __init__(
task=None,
load_checkpoint=None,
label_map=None)
```
创建Module对象(动态图组网版本)。
**参数**
* `task`: 任务名称,可为`sequence_classification`
* `load_checkpoint`:使用PaddleHub Fine-tune api训练保存的模型参数文件路径。
* `label_map`:预测时的类别映射表。
```python
def predict(
data,
max_seq_len=128,
batch_size=1,
use_gpu=False)
```
**参数**
* `data`: 待预测数据,格式为\[\[sample\_a\_text\_a, sample\_a\_text\_b\], \[sample\_b\_text\_a, sample\_b\_text\_b\],…,\],其中每个元素都是一个样例,
每个样例可以包含text\_a与text\_b。每个样例文本数量(1个或者2个)需和训练时保持一致。
* `max_seq_len`:模型处理文本的最大长度
* `batch_size`:模型批处理大小
* `use_gpu`:是否使用gpu,默认为False。对于GPU用户,建议开启use_gpu。
**返回**
```python
def get_embedding(
texts,
use_gpu=False
)
```
用于获取输入文本的句子粒度特征与字粒度特征
**参数**
* `texts`:输入文本列表,格式为\[\[sample\_a\_text\_a, sample\_a\_text\_b\], \[sample\_b\_text\_a, sample\_b\_text\_b\],…,\],其中每个元素都是一个样例,每个样例可以包含text\_a与text\_b。
* `use_gpu`:是否使用gpu,默认为False。对于GPU用户,建议开启use_gpu。
**返回**
* `results`:list类型,格式为\[\[sample\_a\_pooled\_feature, sample\_a\_seq\_feature\], \[sample\_b\_pooled\_feature, sample\_b\_seq\_feature\],…,\],其中每个元素都是对应样例的特征输出,每个样例都有句子粒度特征pooled\_feature与字粒度特征seq\_feature。
**代码示例**
```python
import paddlehub as hub
data = [
'这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般',
'怀着十分激动的心情放映,可是看着看着发现,在放映完毕后,出现一集米老鼠的动画片',
'作为老的四星酒店,房间依然很整洁,相当不错。机场接机服务很好,可以在车上办理入住手续,节省时间。',
]
label_map = {0: 'negative', 1: 'positive'}
model = hub.Module(
name='bert-base-cased',
version='2.0.0',
task='sequence_classification',
load_checkpoint='/path/to/parameters',
label_map=label_map)
results = model.predict(data, max_seq_len=50, batch_size=1, use_gpu=False)
for idx, text in enumerate(data):
print('Data: {} \t Lable: {}'.format(text, results[idx]))
```
参考PaddleHub 文本分类示例。https://github.com/PaddlePaddle/PaddleHub/tree/release/v2.0.0-beta/demo/text_classifcation
## 服务部署
PaddleHub Serving可以部署一个在线获取预训练词向量。
### Step1: 启动PaddleHub Serving
运行启动命令:
```shell
$ hub serving start -m bert-base-cased
```
这样就完成了一个获取预训练词向量服务化API的部署,默认端口号为8866。
**NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
### Step2: 发送预测请求
配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
```python
import requests
import json
# 指定用于预测的文本并生成字典{"text": [text_1, text_2, ... ]}
text = [["今天是个好日子", "天气预报说今天要下雨"], ["这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般"]]
# 以key的方式指定text传入预测方法的时的参数,此例中为"texts"
# 对应本地部署,则为module.get_embedding(texts=text)
data = {"texts": text}
# 发送post请求,content-type类型应指定json方式
url = "http://10.12.121.132:8866/predict/bert-base-cased"
# 指定post请求的headers为application/json方式
headers = {"Content-Type": "application/json"}
r = requests.post(url=url, headers=headers, data=json.dumps(data))
print(r.json())
```
## 查看代码
https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/pretrain_langauge_models/BERT
## 依赖
paddlepaddle >= 2.0.0
paddlehub >= 2.0.0
## 更新历史
* 1.0.0
初始发布
* 1.1.0
支持get_embedding与get_params_layer
* 2.0.0
全面升级动态图,接口有所变化。
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import Dict, List, Optional, Union, Tuple
import os
from paddle.dataset.common import DATA_HOME
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
from paddlehub import BertTokenizer
from paddlehub.module.modeling_bert import BertForSequenceClassification, BertModel
from paddlehub.module.module import moduleinfo, serving
from paddlehub.utils.log import logger
from paddlehub.utils.utils import download
@moduleinfo(
name="bert-base-cased",
version="2.0.0",
summary=
"bert_cased_L-12_H-768_A-12, 12-layer, 768-hidden, 12-heads, 110M parameters. The module is executed as paddle.dygraph.",
author="paddlepaddle",
author_email="",
type="nlp/semantic_model")
class Bert(nn.Layer):
"""
BERT model
"""
def __init__(
self,
task=None,
load_checkpoint=None,
label_map=None,
):
super(Bert, self).__init__()
# TODO(zhangxuefei): add token_classification task
if task == 'sequence_classification':
self.model = BertForSequenceClassification.from_pretrained(pretrained_model_name_or_path='bert-base-cased')
self.criterion = paddle.nn.loss.CrossEntropyLoss()
self.metric = paddle.metric.Accuracy(name='acc_accumulation')
elif task is None:
self.model = BertModel.from_pretrained(pretrained_model_name_or_path='bert-base-cased')
else:
raise RuntimeError("Unknown task %s, task should be sequence_classification" % task)
self.task = task
self.label_map = label_map
if load_checkpoint is not None and os.path.isfile(load_checkpoint):
state_dict = paddle.load(load_checkpoint)
self.set_state_dict(state_dict)
logger.info('Loaded parameters from %s' % os.path.abspath(load_checkpoint))
def forward(self, input_ids, token_type_ids=None, position_ids=None, attention_mask=None, labels=None):
result = self.model(input_ids, token_type_ids, position_ids, attention_mask)
if self.task is not None:
logits = result
probs = F.softmax(logits, axis=1)
if labels is not None:
loss = self.criterion(logits, labels)
correct = self.metric.compute(probs, labels)
acc = self.metric.update(correct)
return probs, loss, acc
return probs
else:
sequence_output, pooled_output = result
return sequence_output, pooled_output
def get_vocab_path(self):
"""
Gets the path of the module vocabulary path.
"""
save_path = os.path.join(DATA_HOME, 'bert-base-cased', 'bert-base-cased-vocab.txt')
if not os.path.exists(save_path) or not os.path.isfile(save_path):
url = "https://paddle-hapi.bj.bcebos.com/models/bert/bert-base-cased-vocab.txt"
download(url, os.path.join(DATA_HOME, 'bert-base-cased'))
return save_path
def get_tokenizer(self, tokenize_chinese_chars=True):
"""
Gets the tokenizer that is customized for this module.
Args:
tokenize_chinese_chars (:obj: bool , defaults to :obj: True):
Whether to tokenize chinese characters or not.
Returns:
tokenizer (:obj:BertTokenizer) : The tokenizer which was customized for this module.
"""
return BertTokenizer(tokenize_chinese_chars=tokenize_chinese_chars, vocab_file=self.get_vocab_path())
def training_step(self, batch: List[paddle.Tensor], batch_idx: int):
"""
One step for training, which should be called as forward computation.
Args:
batch(:obj:List[paddle.Tensor]): The one batch data, which contains the model needed,
such as input_ids, sent_ids, pos_ids, input_mask and labels.
batch_idx(int): The index of batch.
Returns:
results(:obj: Dict) : The model outputs, such as loss and metrics.
"""
predictions, avg_loss, acc = self(input_ids=batch[0], token_type_ids=batch[1], labels=batch[2])
return {'loss': avg_loss, 'metrics': {'acc': acc}}
def validation_step(self, batch: List[paddle.Tensor], batch_idx: int):
"""
One step for validation, which should be called as forward computation.
Args:
batch(:obj:List[paddle.Tensor]): The one batch data, which contains the model needed,
such as input_ids, sent_ids, pos_ids, input_mask and labels.
batch_idx(int): The index of batch.
Returns:
results(:obj: Dict) : The model outputs, such as metrics.
"""
predictions, avg_loss, acc = self(input_ids=batch[0], token_type_ids=batch[1], labels=batch[2])
return {'metrics': {'acc': acc}}
def predict(self, data, max_seq_len=128, batch_size=1, use_gpu=False):
"""
Predicts the data labels.
Args:
data (obj:`List(str)`): The processed data whose each element is the raw text.
max_seq_len (:obj:`int`, `optional`, defaults to :int:`None`):
If set to a number, will limit the total sequence returned so that it has a maximum length.
batch_size(obj:`int`, defaults to 1): The number of batch.
use_gpu(obj:`bool`, defaults to `False`): Whether to use gpu to run or not.
Returns:
results(obj:`list`): All the predictions labels.
"""
# TODO(zhangxuefei): add task token_classification task predict.
if self.task not in ['sequence_classification']:
raise RuntimeError("The predict method is for sequence_classification task, but got task %s." % self.task)
paddle.set_device('gpu') if use_gpu else paddle.set_device('cpu')
tokenizer = self.get_tokenizer()
examples = []
for text in data:
if len(text) == 1:
encoded_inputs = tokenizer.encode(text[0], text_pair=None, max_seq_len=max_seq_len)
elif len(text) == 2:
encoded_inputs = tokenizer.encode(text[0], text_pair=text[1], max_seq_len=max_seq_len)
else:
raise RuntimeError(
'The input text must have one or two sequence, but got %d. Please check your inputs.' % len(text))
examples.append((encoded_inputs['input_ids'], encoded_inputs['segment_ids']))
def _batchify_fn(batch):
input_ids = [entry[0] for entry in batch]
segment_ids = [entry[1] for entry in batch]
return input_ids, segment_ids
# Seperates data into some batches.
batches = []
one_batch = []
for example in examples:
one_batch.append(example)
if len(one_batch) == batch_size:
batches.append(one_batch)
one_batch = []
if one_batch:
# The last batch whose size is less than the config batch_size setting.
batches.append(one_batch)
results = []
self.eval()
for batch in batches:
input_ids, segment_ids = _batchify_fn(batch)
input_ids = paddle.to_tensor(input_ids)
segment_ids = paddle.to_tensor(segment_ids)
# TODO(zhangxuefei): add task token_classification postprocess after prediction.
if self.task == 'sequence_classification':
probs = self(input_ids, segment_ids)
idx = paddle.argmax(probs, axis=1).numpy()
idx = idx.tolist()
labels = [self.label_map[i] for i in idx]
results.extend(labels)
return results
@serving
def get_embedding(self, texts, use_gpu=False):
if self.task is not None:
raise RuntimeError("The get_embedding method is only valid when task is None, but got task %s" % self.task)
paddle.set_device('gpu') if use_gpu else paddle.set_device('cpu')
tokenizer = self.get_tokenizer()
results = []
for text in texts:
if len(text) == 1:
encoded_inputs = tokenizer.encode(text[0], text_pair=None, pad_to_max_seq_len=False)
elif len(text) == 2:
encoded_inputs = tokenizer.encode(text[0], text_pair=text[1], pad_to_max_seq_len=False)
else:
raise RuntimeError(
'The input text must have one or two sequence, but got %d. Please check your inputs.' % len(text))
encoded_inputs = tokenizer.encode(text, pad_to_max_seq_len=False)
input_ids = paddle.to_tensor(encoded_inputs['input_ids']).unsqueeze(0)
segment_ids = paddle.to_tensor(encoded_inputs['segment_ids']).unsqueeze(0)
sequence_output, pooled_output = self(input_ids, segment_ids)
sequence_output = sequence_output.squeeze(0)
pooled_output = pooled_output.squeeze(0)
results.append((sequence_output.numpy().tolist(), pooled_output.numpy().tolist()))
return results
```shell
$ hub install bert-base-chinese==2.0.0
```
<p align="center">
<img src="https://bj.bcebos.com/paddlehub/paddlehub-img/bert_network.png" hspace='10'/> <br />
</p>
更多详情请参考[BERT论文](https://arxiv.org/abs/1810.04805)
## API
```python
def __init__(
task=None,
load_checkpoint=None,
label_map=None)
```
创建Module对象(动态图组网版本)。
**参数**
* `task`: 任务名称,可为`sequence_classification`
* `load_checkpoint`:使用PaddleHub Fine-tune api训练保存的模型参数文件路径。
* `label_map`:预测时的类别映射表。
```python
def predict(
data,
max_seq_len=128,
batch_size=1,
use_gpu=False)
```
**参数**
* `data`: 待预测数据,格式为\[\[sample\_a\_text\_a, sample\_a\_text\_b\], \[sample\_b\_text\_a, sample\_b\_text\_b\],…,\],其中每个元素都是一个样例,
每个样例可以包含text\_a与text\_b。每个样例文本数量(1个或者2个)需和训练时保持一致。
* `max_seq_len`:模型处理文本的最大长度
* `batch_size`:模型批处理大小
* `use_gpu`:是否使用gpu,默认为False。对于GPU用户,建议开启use_gpu。
**返回**
```python
def get_embedding(
texts,
use_gpu=False
)
```
用于获取输入文本的句子粒度特征与字粒度特征
**参数**
* `texts`:输入文本列表,格式为\[\[sample\_a\_text\_a, sample\_a\_text\_b\], \[sample\_b\_text\_a, sample\_b\_text\_b\],…,\],其中每个元素都是一个样例,每个样例可以包含text\_a与text\_b。
* `use_gpu`:是否使用gpu,默认为False。对于GPU用户,建议开启use_gpu。
**返回**
* `results`:list类型,格式为\[\[sample\_a\_pooled\_feature, sample\_a\_seq\_feature\], \[sample\_b\_pooled\_feature, sample\_b\_seq\_feature\],…,\],其中每个元素都是对应样例的特征输出,每个样例都有句子粒度特征pooled\_feature与字粒度特征seq\_feature。
**代码示例**
```python
import paddlehub as hub
data = [
'这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般',
'怀着十分激动的心情放映,可是看着看着发现,在放映完毕后,出现一集米老鼠的动画片',
'作为老的四星酒店,房间依然很整洁,相当不错。机场接机服务很好,可以在车上办理入住手续,节省时间。',
]
label_map = {0: 'negative', 1: 'positive'}
model = hub.Module(
name='bert-base-chinese',
version='2.0.0',
task='sequence_classification',
load_checkpoint='/path/to/parameters',
label_map=label_map)
results = model.predict(data, max_seq_len=50, batch_size=1, use_gpu=False)
for idx, text in enumerate(data):
print('Data: {} \t Lable: {}'.format(text, results[idx]))
```
参考PaddleHub 文本分类示例。https://github.com/PaddlePaddle/PaddleHub/tree/release/v2.0.0-beta/demo/text_classifcation
## 服务部署
PaddleHub Serving可以部署一个在线获取预训练词向量。
### Step1: 启动PaddleHub Serving
运行启动命令:
```shell
$ hub serving start -m bert-base-chinese
```
这样就完成了一个获取预训练词向量服务化API的部署,默认端口号为8866。
**NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
### Step2: 发送预测请求
配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
```python
import requests
import json
# 指定用于预测的文本并生成字典{"text": [text_1, text_2, ... ]}
text = [["今天是个好日子", "天气预报说今天要下雨"], ["这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般"]]
# 以key的方式指定text传入预测方法的时的参数,此例中为"texts"
# 对应本地部署,则为module.get_embedding(texts=text)
data = {"texts": text}
# 发送post请求,content-type类型应指定json方式
url = "http://10.12.121.132:8866/predict/bert-base-chinese"
# 指定post请求的headers为application/json方式
headers = {"Content-Type": "application/json"}
r = requests.post(url=url, headers=headers, data=json.dumps(data))
print(r.json())
```
## 查看代码
https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/pretrain_langauge_models/BERT
## 依赖
paddlepaddle >= 2.0.0
paddlehub >= 2.0.0
## 更新历史
* 1.0.0
初始发布
* 1.1.0
支持get_embedding与get_params_layer
* 2.0.0
全面升级动态图,接口有所变化。
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import Dict, List, Optional, Union, Tuple
import os
from paddle.dataset.common import DATA_HOME
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
from paddlehub import BertTokenizer
from paddlehub.module.modeling_bert import BertForSequenceClassification, BertModel
from paddlehub.module.module import moduleinfo, serving
from paddlehub.utils.log import logger
from paddlehub.utils.utils import download
@moduleinfo(
name="bert-base-chinese",
version="2.0.0",
summary=
"bert_chinese_L-12_H-768_A-12, 12-layer, 768-hidden, 12-heads, 110M parameters. The module is executed as paddle.dygraph.",
author="paddlepaddle",
author_email="",
type="nlp/semantic_model")
class Bert(nn.Layer):
"""
Bert model
"""
def __init__(
self,
task=None,
load_checkpoint=None,
label_map=None,
):
super(Bert, self).__init__()
# TODO(zhangxuefei): add token_classification task
if task == 'sequence_classification':
self.model = BertForSequenceClassification.from_pretrained(
pretrained_model_name_or_path='bert-base-chinese')
self.criterion = paddle.nn.loss.CrossEntropyLoss()
self.metric = paddle.metric.Accuracy(name='acc_accumulation')
elif task is None:
self.model = BertModel.from_pretrained(pretrained_model_name_or_path='bert-base-chinese')
else:
raise RuntimeError("Unknown task %s, task should be sequence_classification" % task)
self.task = task
self.label_map = label_map
if load_checkpoint is not None and os.path.isfile(load_checkpoint):
state_dict = paddle.load(load_checkpoint)
self.set_state_dict(state_dict)
logger.info('Loaded parameters from %s' % os.path.abspath(load_checkpoint))
def forward(self, input_ids, token_type_ids=None, position_ids=None, attention_mask=None, labels=None):
result = self.model(input_ids, token_type_ids, position_ids, attention_mask)
if self.task is not None:
logits = result
probs = F.softmax(logits, axis=1)
if labels is not None:
loss = self.criterion(logits, labels)
correct = self.metric.compute(probs, labels)
acc = self.metric.update(correct)
return probs, loss, acc
return probs
else:
sequence_output, pooled_output = result
return sequence_output, pooled_output
def get_vocab_path(self):
"""
Gets the path of the module vocabulary path.
"""
save_path = os.path.join(DATA_HOME, 'bert-base-chinese', 'bert-base-chinese-vocab.txt')
if not os.path.exists(save_path) or not os.path.isfile(save_path):
url = "https://paddle-hapi.bj.bcebos.com/models/bert/bert-base-chinese-vocab.txt"
download(url, os.path.join(DATA_HOME, 'bert-base-chinese'))
return save_path
def get_tokenizer(self, tokenize_chinese_chars=True):
"""
Gets the tokenizer that is customized for this module.
Args:
tokenize_chinese_chars (:obj: bool , defaults to :obj: True):
Whether to tokenize chinese characters or not.
Returns:
tokenizer (:obj:BertTokenizer) : The tokenizer which was customized for this module.
"""
return BertTokenizer(tokenize_chinese_chars=tokenize_chinese_chars, vocab_file=self.get_vocab_path())
def training_step(self, batch: List[paddle.Tensor], batch_idx: int):
"""
One step for training, which should be called as forward computation.
Args:
batch(:obj:List[paddle.Tensor]): The one batch data, which contains the model needed,
such as input_ids, sent_ids, pos_ids, input_mask and labels.
batch_idx(int): The index of batch.
Returns:
results(:obj: Dict) : The model outputs, such as loss and metrics.
"""
predictions, avg_loss, acc = self(input_ids=batch[0], token_type_ids=batch[1], labels=batch[2])
return {'loss': avg_loss, 'metrics': {'acc': acc}}
def validation_step(self, batch: List[paddle.Tensor], batch_idx: int):
"""
One step for validation, which should be called as forward computation.
Args:
batch(:obj:List[paddle.Tensor]): The one batch data, which contains the model needed,
such as input_ids, sent_ids, pos_ids, input_mask and labels.
batch_idx(int): The index of batch.
Returns:
results(:obj: Dict) : The model outputs, such as metrics.
"""
predictions, avg_loss, acc = self(input_ids=batch[0], token_type_ids=batch[1], labels=batch[2])
return {'metrics': {'acc': acc}}
def predict(self, data, max_seq_len=128, batch_size=1, use_gpu=False):
"""
Predicts the data labels.
Args:
data (obj:`List(str)`): The processed data whose each element is the raw text.
max_seq_len (:obj:`int`, `optional`, defaults to :int:`None`):
If set to a number, will limit the total sequence returned so that it has a maximum length.
batch_size(obj:`int`, defaults to 1): The number of batch.
use_gpu(obj:`bool`, defaults to `False`): Whether to use gpu to run or not.
Returns:
results(obj:`list`): All the predictions labels.
"""
# TODO(zhangxuefei): add task token_classification task predict.
if self.task not in ['sequence_classification']:
raise RuntimeError("The predict method is for sequence_classification task, but got task %s." % self.task)
paddle.set_device('gpu') if use_gpu else paddle.set_device('cpu')
tokenizer = self.get_tokenizer()
examples = []
for text in data:
if len(text) == 1:
encoded_inputs = tokenizer.encode(text[0], text_pair=None, max_seq_len=max_seq_len)
elif len(text) == 2:
encoded_inputs = tokenizer.encode(text[0], text_pair=text[1], max_seq_len=max_seq_len)
else:
raise RuntimeError(
'The input text must have one or two sequence, but got %d. Please check your inputs.' % len(text))
examples.append((encoded_inputs['input_ids'], encoded_inputs['segment_ids']))
def _batchify_fn(batch):
input_ids = [entry[0] for entry in batch]
segment_ids = [entry[1] for entry in batch]
return input_ids, segment_ids
# Seperates data into some batches.
batches = []
one_batch = []
for example in examples:
one_batch.append(example)
if len(one_batch) == batch_size:
batches.append(one_batch)
one_batch = []
if one_batch:
# The last batch whose size is less than the config batch_size setting.
batches.append(one_batch)
results = []
self.eval()
for batch in batches:
input_ids, segment_ids = _batchify_fn(batch)
input_ids = paddle.to_tensor(input_ids)
segment_ids = paddle.to_tensor(segment_ids)
# TODO(zhangxuefei): add task token_classification postprocess after prediction.
if self.task == 'sequence_classification':
probs = self(input_ids, segment_ids)
idx = paddle.argmax(probs, axis=1).numpy()
idx = idx.tolist()
labels = [self.label_map[i] for i in idx]
results.extend(labels)
return results
@serving
def get_embedding(self, texts, use_gpu=False):
if self.task is not None:
raise RuntimeError("The get_embedding method is only valid when task is None, but got task %s" % self.task)
paddle.set_device('gpu') if use_gpu else paddle.set_device('cpu')
tokenizer = self.get_tokenizer()
results = []
for text in texts:
if len(text) == 1:
encoded_inputs = tokenizer.encode(text[0], text_pair=None, pad_to_max_seq_len=False)
elif len(text) == 2:
encoded_inputs = tokenizer.encode(text[0], text_pair=text[1], pad_to_max_seq_len=False)
else:
raise RuntimeError(
'The input text must have one or two sequence, but got %d. Please check your inputs.' % len(text))
input_ids = paddle.to_tensor(encoded_inputs['input_ids']).unsqueeze(0)
segment_ids = paddle.to_tensor(encoded_inputs['segment_ids']).unsqueeze(0)
sequence_output, pooled_output = self(input_ids, segment_ids)
sequence_output = sequence_output.squeeze(0)
pooled_output = pooled_output.squeeze(0)
results.append((sequence_output.numpy().tolist(), pooled_output.numpy().tolist()))
return results
```shell
$ hub install bert-base-multilingual-cased==2.0.0
```
<p align="center">
<img src="https://bj.bcebos.com/paddlehub/paddlehub-img/bert_network.png" hspace='10'/> <br />
</p>
更多详情请参考[BERT论文](https://arxiv.org/abs/1810.04805)
## API
```python
def __init__(
task=None,
load_checkpoint=None,
label_map=None)
```
创建Module对象(动态图组网版本)。
**参数**
* `task`: 任务名称,可为`sequence_classification`
* `load_checkpoint`:使用PaddleHub Fine-tune api训练保存的模型参数文件路径。
* `label_map`:预测时的类别映射表。
```python
def predict(
data,
max_seq_len=128,
batch_size=1,
use_gpu=False)
```
**参数**
* `data`: 待预测数据,格式为\[\[sample\_a\_text\_a, sample\_a\_text\_b\], \[sample\_b\_text\_a, sample\_b\_text\_b\],…,\],其中每个元素都是一个样例,
每个样例可以包含text\_a与text\_b。每个样例文本数量(1个或者2个)需和训练时保持一致。
* `max_seq_len`:模型处理文本的最大长度
* `batch_size`:模型批处理大小
* `use_gpu`:是否使用gpu,默认为False。对于GPU用户,建议开启use_gpu。
**返回**
```python
def get_embedding(
texts,
use_gpu=False
)
```
用于获取输入文本的句子粒度特征与字粒度特征
**参数**
* `texts`:输入文本列表,格式为\[\[sample\_a\_text\_a, sample\_a\_text\_b\], \[sample\_b\_text\_a, sample\_b\_text\_b\],…,\],其中每个元素都是一个样例,每个样例可以包含text\_a与text\_b。
* `use_gpu`:是否使用gpu,默认为False。对于GPU用户,建议开启use_gpu。
**返回**
* `results`:list类型,格式为\[\[sample\_a\_pooled\_feature, sample\_a\_seq\_feature\], \[sample\_b\_pooled\_feature, sample\_b\_seq\_feature\],…,\],其中每个元素都是对应样例的特征输出,每个样例都有句子粒度特征pooled\_feature与字粒度特征seq\_feature。
**代码示例**
```python
import paddlehub as hub
data = [
'这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般',
'怀着十分激动的心情放映,可是看着看着发现,在放映完毕后,出现一集米老鼠的动画片',
'作为老的四星酒店,房间依然很整洁,相当不错。机场接机服务很好,可以在车上办理入住手续,节省时间。',
]
label_map = {0: 'negative', 1: 'positive'}
model = hub.Module(
name='bert-base-multilingual-cased',
version='2.0.0',
task='sequence_classification',
load_checkpoint='/path/to/parameters',
label_map=label_map)
results = model.predict(data, max_seq_len=50, batch_size=1, use_gpu=False)
for idx, text in enumerate(data):
print('Data: {} \t Lable: {}'.format(text, results[idx]))
```
参考PaddleHub 文本分类示例。https://github.com/PaddlePaddle/PaddleHub/tree/release/v2.0.0-beta/demo/text_classifcation
## 服务部署
PaddleHub Serving可以部署一个在线获取预训练词向量。
### Step1: 启动PaddleHub Serving
运行启动命令:
```shell
$ hub serving start -m bert-base-multilingual-cased
```
这样就完成了一个获取预训练词向量服务化API的部署,默认端口号为8866。
**NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
### Step2: 发送预测请求
配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
```python
import requests
import json
# 指定用于预测的文本并生成字典{"text": [text_1, text_2, ... ]}
text = [["今天是个好日子", "天气预报说今天要下雨"], ["这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般"]]
# 以key的方式指定text传入预测方法的时的参数,此例中为"texts"
# 对应本地部署,则为module.get_embedding(texts=text)
data = {"texts": text}
# 发送post请求,content-type类型应指定json方式
url = "http://10.12.121.132:8866/predict/bert-base-multilingual-cased"
# 指定post请求的headers为application/json方式
headers = {"Content-Type": "application/json"}
r = requests.post(url=url, headers=headers, data=json.dumps(data))
print(r.json())
```
## 查看代码
https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/pretrain_langauge_models/BERT
## 依赖
paddlepaddle >= 2.0.0
paddlehub >= 2.0.0
## 更新历史
* 1.0.0
初始发布
* 1.1.0
支持get_embedding与get_params_layer
* 2.0.0
全面升级动态图,接口有所变化。
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import Dict, List, Optional, Union, Tuple
import os
from paddle.dataset.common import DATA_HOME
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
from paddlehub import BertTokenizer
from paddlehub.module.modeling_bert import BertForSequenceClassification, BertModel
from paddlehub.module.module import moduleinfo, serving
from paddlehub.utils.log import logger
from paddlehub.utils.utils import download
@moduleinfo(
name="bert-base-multilingual-cased",
version="2.0.0",
summary=
"bert_multi_cased_L-12_H-768_A-12, 12-layer, 768-hidden, 12-heads, 110M parameters. The module is executed as paddle.dygraph.",
author="paddlepaddle",
author_email="",
type="nlp/semantic_model")
class Bert(nn.Layer):
"""
BERT model
"""
def __init__(
self,
task=None,
load_checkpoint=None,
label_map=None,
):
super(Bert, self).__init__()
# TODO(zhangxuefei): add token_classification task
if task == 'sequence_classification':
self.model = BertForSequenceClassification.from_pretrained(
pretrained_model_name_or_path='bert-base-multilingual-cased')
self.criterion = paddle.nn.loss.CrossEntropyLoss()
self.metric = paddle.metric.Accuracy(name='acc_accumulation')
elif task is None:
self.model = BertModel.from_pretrained(pretrained_model_name_or_path='bert-base-multilingual-cased')
else:
raise RuntimeError("Unknown task %s, task should be sequence_classification" % task)
self.task = task
self.label_map = label_map
if load_checkpoint is not None and os.path.isfile(load_checkpoint):
state_dict = paddle.load(load_checkpoint)
self.set_state_dict(state_dict)
logger.info('Loaded parameters from %s' % os.path.abspath(load_checkpoint))
def forward(self, input_ids, token_type_ids=None, position_ids=None, attention_mask=None, labels=None):
result = self.model(input_ids, token_type_ids, position_ids, attention_mask)
if self.task is not None:
logits = result
probs = F.softmax(logits, axis=1)
if labels is not None:
loss = self.criterion(logits, labels)
correct = self.metric.compute(probs, labels)
acc = self.metric.update(correct)
return probs, loss, acc
return probs
else:
sequence_output, pooled_output = result
return sequence_output, pooled_output
def get_vocab_path(self):
"""
Gets the path of the module vocabulary path.
"""
save_path = os.path.join(DATA_HOME, 'bert-base-multilingual-cased', 'bert-base-multilingual-cased-vocab.txt')
if not os.path.exists(save_path) or not os.path.isfile(save_path):
url = "https://paddle-hapi.bj.bcebos.com/models/bert/bert-base-multilingual-cased-vocab.txt"
download(url, os.path.join(DATA_HOME, 'bert-base-multilingual-cased'))
return save_path
def get_tokenizer(self, tokenize_chinese_chars=True):
"""
Gets the tokenizer that is customized for this module.
Args:
tokenize_chinese_chars (:obj: bool , defaults to :obj: True):
Whether to tokenize chinese characters or not.
Returns:
tokenizer (:obj:BertTokenizer) : The tokenizer which was customized for this module.
"""
return BertTokenizer(tokenize_chinese_chars=tokenize_chinese_chars, vocab_file=self.get_vocab_path())
def training_step(self, batch: List[paddle.Tensor], batch_idx: int):
"""
One step for training, which should be called as forward computation.
Args:
batch(:obj:List[paddle.Tensor]): The one batch data, which contains the model needed,
such as input_ids, sent_ids, pos_ids, input_mask and labels.
batch_idx(int): The index of batch.
Returns:
results(:obj: Dict) : The model outputs, such as loss and metrics.
"""
predictions, avg_loss, acc = self(input_ids=batch[0], token_type_ids=batch[1], labels=batch[2])
return {'loss': avg_loss, 'metrics': {'acc': acc}}
def validation_step(self, batch: List[paddle.Tensor], batch_idx: int):
"""
One step for validation, which should be called as forward computation.
Args:
batch(:obj:List[paddle.Tensor]): The one batch data, which contains the model needed,
such as input_ids, sent_ids, pos_ids, input_mask and labels.
batch_idx(int): The index of batch.
Returns:
results(:obj: Dict) : The model outputs, such as metrics.
"""
predictions, avg_loss, acc = self(input_ids=batch[0], token_type_ids=batch[1], labels=batch[2])
return {'metrics': {'acc': acc}}
def predict(self, data, max_seq_len=128, batch_size=1, use_gpu=False):
"""
Predicts the data labels.
Args:
data (obj:`List(str)`): The processed data whose each element is the raw text.
max_seq_len (:obj:`int`, `optional`, defaults to :int:`None`):
If set to a number, will limit the total sequence returned so that it has a maximum length.
batch_size(obj:`int`, defaults to 1): The number of batch.
use_gpu(obj:`bool`, defaults to `False`): Whether to use gpu to run or not.
Returns:
results(obj:`list`): All the predictions labels.
"""
# TODO(zhangxuefei): add task token_classification task predict.
if self.task not in ['sequence_classification']:
raise RuntimeError("The predict method is for sequence_classification task, but got task %s." % self.task)
paddle.set_device('gpu') if use_gpu else paddle.set_device('cpu')
tokenizer = self.get_tokenizer()
examples = []
for text in data:
if len(text) == 1:
encoded_inputs = tokenizer.encode(text[0], text_pair=None, max_seq_len=max_seq_len)
elif len(text) == 2:
encoded_inputs = tokenizer.encode(text[0], text_pair=text[1], max_seq_len=max_seq_len)
else:
raise RuntimeError(
'The input text must have one or two sequence, but got %d. Please check your inputs.' % len(text))
examples.append((encoded_inputs['input_ids'], encoded_inputs['segment_ids']))
def _batchify_fn(batch):
input_ids = [entry[0] for entry in batch]
segment_ids = [entry[1] for entry in batch]
return input_ids, segment_ids
# Seperates data into some batches.
batches = []
one_batch = []
for example in examples:
one_batch.append(example)
if len(one_batch) == batch_size:
batches.append(one_batch)
one_batch = []
if one_batch:
# The last batch whose size is less than the config batch_size setting.
batches.append(one_batch)
results = []
self.eval()
for batch in batches:
input_ids, segment_ids = _batchify_fn(batch)
input_ids = paddle.to_tensor(input_ids)
segment_ids = paddle.to_tensor(segment_ids)
# TODO(zhangxuefei): add task token_classification postprocess after prediction.
if self.task == 'sequence_classification':
probs = self(input_ids, segment_ids)
idx = paddle.argmax(probs, axis=1).numpy()
idx = idx.tolist()
labels = [self.label_map[i] for i in idx]
results.extend(labels)
return results
@serving
def get_embedding(self, texts, use_gpu=False):
if self.task is not None:
raise RuntimeError("The get_embedding method is only valid when task is None, but got task %s" % self.task)
paddle.set_device('gpu') if use_gpu else paddle.set_device('cpu')
tokenizer = self.get_tokenizer()
results = []
for text in texts:
if len(text) == 1:
encoded_inputs = tokenizer.encode(text[0], text_pair=None, pad_to_max_seq_len=False)
elif len(text) == 2:
encoded_inputs = tokenizer.encode(text[0], text_pair=text[1], pad_to_max_seq_len=False)
else:
raise RuntimeError(
'The input text must have one or two sequence, but got %d. Please check your inputs.' % len(text))
input_ids = paddle.to_tensor(encoded_inputs['input_ids']).unsqueeze(0)
segment_ids = paddle.to_tensor(encoded_inputs['segment_ids']).unsqueeze(0)
sequence_output, pooled_output = self(input_ids, segment_ids)
sequence_output = sequence_output.squeeze(0)
pooled_output = pooled_output.squeeze(0)
results.append((sequence_output.numpy().tolist(), pooled_output.numpy().tolist()))
return results
```shell
$ hub install bert-base-multilingual-uncased==2.0.0
```
<p align="center">
<img src="https://bj.bcebos.com/paddlehub/paddlehub-img/bert_network.png" hspace='10'/> <br />
</p>
更多详情请参考[BERT论文](https://arxiv.org/abs/1810.04805)
## API
```python
def __init__(
task=None,
load_checkpoint=None,
label_map=None)
```
创建Module对象(动态图组网版本)。
**参数**
* `task`: 任务名称,可为`sequence_classification`
* `load_checkpoint`:使用PaddleHub Fine-tune api训练保存的模型参数文件路径。
* `label_map`:预测时的类别映射表。
```python
def predict(
data,
max_seq_len=128,
batch_size=1,
use_gpu=False)
```
**参数**
* `data`: 待预测数据,格式为\[\[sample\_a\_text\_a, sample\_a\_text\_b\], \[sample\_b\_text\_a, sample\_b\_text\_b\],…,\],其中每个元素都是一个样例,
每个样例可以包含text\_a与text\_b。每个样例文本数量(1个或者2个)需和训练时保持一致。
* `max_seq_len`:模型处理文本的最大长度
* `batch_size`:模型批处理大小
* `use_gpu`:是否使用gpu,默认为False。对于GPU用户,建议开启use_gpu。
**返回**
```python
def get_embedding(
texts,
use_gpu=False
)
```
用于获取输入文本的句子粒度特征与字粒度特征
**参数**
* `texts`:输入文本列表,格式为\[\[sample\_a\_text\_a, sample\_a\_text\_b\], \[sample\_b\_text\_a, sample\_b\_text\_b\],…,\],其中每个元素都是一个样例,每个样例可以包含text\_a与text\_b。
* `use_gpu`:是否使用gpu,默认为False。对于GPU用户,建议开启use_gpu。
**返回**
* `results`:list类型,格式为\[\[sample\_a\_pooled\_feature, sample\_a\_seq\_feature\], \[sample\_b\_pooled\_feature, sample\_b\_seq\_feature\],…,\],其中每个元素都是对应样例的特征输出,每个样例都有句子粒度特征pooled\_feature与字粒度特征seq\_feature。
**代码示例**
```python
import paddlehub as hub
data = [
'这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般',
'怀着十分激动的心情放映,可是看着看着发现,在放映完毕后,出现一集米老鼠的动画片',
'作为老的四星酒店,房间依然很整洁,相当不错。机场接机服务很好,可以在车上办理入住手续,节省时间。',
]
label_map = {0: 'negative', 1: 'positive'}
model = hub.Module(
name='bert-base-multilingual-uncased',
version='2.0.0',
task='sequence_classification',
load_checkpoint='/path/to/parameters',
label_map=label_map)
results = model.predict(data, max_seq_len=50, batch_size=1, use_gpu=False)
for idx, text in enumerate(data):
print('Data: {} \t Lable: {}'.format(text, results[idx]))
```
参考PaddleHub 文本分类示例。https://github.com/PaddlePaddle/PaddleHub/tree/release/v2.0.0-beta/demo/text_classifcation
## 服务部署
PaddleHub Serving可以部署一个在线获取预训练词向量。
### Step1: 启动PaddleHub Serving
运行启动命令:
```shell
$ hub serving start -m bert-base-multilingual-uncased
```
这样就完成了一个获取预训练词向量服务化API的部署,默认端口号为8866。
**NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
### Step2: 发送预测请求
配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
```python
import requests
import json
# 指定用于预测的文本并生成字典{"text": [text_1, text_2, ... ]}
text = [["今天是个好日子", "天气预报说今天要下雨"], ["这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般"]]
# 以key的方式指定text传入预测方法的时的参数,此例中为"texts"
# 对应本地部署,则为module.get_embedding(texts=text)
data = {"texts": text}
# 发送post请求,content-type类型应指定json方式
url = "http://10.12.121.132:8866/predict/bert-base-multilingual-uncased"
# 指定post请求的headers为application/json方式
headers = {"Content-Type": "application/json"}
r = requests.post(url=url, headers=headers, data=json.dumps(data))
print(r.json())
```
## 查看代码
https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/pretrain_langauge_models/BERT
## 依赖
paddlepaddle >= 2.0.0
paddlehub >= 2.0.0
## 更新历史
* 1.0.0
初始发布
* 1.1.0
支持get_embedding与get_params_layer
* 2.0.0
全面升级动态图,接口有所变化。
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import Dict, List, Optional, Union, Tuple
import os
from paddle.dataset.common import DATA_HOME
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
from paddlehub import BertTokenizer
from paddlehub.module.modeling_bert import BertForSequenceClassification, BertModel
from paddlehub.module.module import moduleinfo, serving
from paddlehub.utils.log import logger
from paddlehub.utils.utils import download
@moduleinfo(
name="bert-base-multilingual-uncased",
version="2.0.0",
summary=
"bert_multi_uncased_L-12_H-768_A-12, 12-layer, 768-hidden, 12-heads, 110M parameters. The module is executed as paddle.dygraph.",
author="paddlepaddle",
author_email="",
type="nlp/semantic_model")
class Bert(nn.Layer):
"""
BERT model
"""
def __init__(
self,
task=None,
load_checkpoint=None,
label_map=None,
):
super(Bert, self).__init__()
# TODO(zhangxuefei): add token_classification task
if task == 'sequence_classification':
self.model = BertForSequenceClassification.from_pretrained(
pretrained_model_name_or_path='bert-base-multilingual-uncased')
self.criterion = paddle.nn.loss.CrossEntropyLoss()
self.metric = paddle.metric.Accuracy(name='acc_accumulation')
elif task is None:
self.model = BertModel.from_pretrained(pretrained_model_name_or_path='bert-base-multilingual-uncased')
else:
raise RuntimeError("Unknown task %s, task should be sequence_classification" % task)
self.task = task
self.label_map = label_map
if load_checkpoint is not None and os.path.isfile(load_checkpoint):
state_dict = paddle.load(load_checkpoint)
self.set_state_dict(state_dict)
logger.info('Loaded parameters from %s' % os.path.abspath(load_checkpoint))
def forward(self, input_ids, token_type_ids=None, position_ids=None, attention_mask=None, labels=None):
result = self.model(input_ids, token_type_ids, position_ids, attention_mask)
if self.task is not None:
logits = result
probs = F.softmax(logits, axis=1)
if labels is not None:
loss = self.criterion(logits, labels)
correct = self.metric.compute(probs, labels)
acc = self.metric.update(correct)
return probs, loss, acc
return probs
else:
sequence_output, pooled_output = result
return sequence_output, pooled_output
def get_vocab_path(self):
"""
Gets the path of the module vocabulary path.
"""
save_path = os.path.join(DATA_HOME, 'bert-base-multilingual-uncased',
'bert-base-multilingual-uncased-vocab.txt')
if not os.path.exists(save_path) or not os.path.isfile(save_path):
url = "https://paddle-hapi.bj.bcebos.com/models/bert/bert-base-multilingual-uncased-vocab.txt"
download(url, os.path.join(DATA_HOME, 'bert-base-multilingual-uncased'))
return save_path
def get_tokenizer(self, tokenize_chinese_chars=True):
"""
Gets the tokenizer that is customized for this module.
Args:
tokenize_chinese_chars (:obj: bool , defaults to :obj: True):
Whether to tokenize chinese characters or not.
Returns:
tokenizer (:obj:BertTokenizer) : The tokenizer which was customized for this module.
"""
return BertTokenizer(tokenize_chinese_chars=tokenize_chinese_chars, vocab_file=self.get_vocab_path())
def training_step(self, batch: List[paddle.Tensor], batch_idx: int):
"""
One step for training, which should be called as forward computation.
Args:
batch(:obj:List[paddle.Tensor]): The one batch data, which contains the model needed,
such as input_ids, sent_ids, pos_ids, input_mask and labels.
batch_idx(int): The index of batch.
Returns:
results(:obj: Dict) : The model outputs, such as loss and metrics.
"""
predictions, avg_loss, acc = self(input_ids=batch[0], token_type_ids=batch[1], labels=batch[2])
return {'loss': avg_loss, 'metrics': {'acc': acc}}
def validation_step(self, batch: List[paddle.Tensor], batch_idx: int):
"""
One step for validation, which should be called as forward computation.
Args:
batch(:obj:List[paddle.Tensor]): The one batch data, which contains the model needed,
such as input_ids, sent_ids, pos_ids, input_mask and labels.
batch_idx(int): The index of batch.
Returns:
results(:obj: Dict) : The model outputs, such as metrics.
"""
predictions, avg_loss, acc = self(input_ids=batch[0], token_type_ids=batch[1], labels=batch[2])
return {'metrics': {'acc': acc}}
def predict(self, data, max_seq_len=128, batch_size=1, use_gpu=False):
"""
Predicts the data labels.
Args:
data (obj:`List(str)`): The processed data whose each element is the raw text.
max_seq_len (:obj:`int`, `optional`, defaults to :int:`None`):
If set to a number, will limit the total sequence returned so that it has a maximum length.
batch_size(obj:`int`, defaults to 1): The number of batch.
use_gpu(obj:`bool`, defaults to `False`): Whether to use gpu to run or not.
Returns:
results(obj:`list`): All the predictions labels.
"""
# TODO(zhangxuefei): add task token_classification task predict.
if self.task not in ['sequence_classification']:
raise RuntimeError("The predict method is for sequence_classification task, but got task %s." % self.task)
paddle.set_device('gpu') if use_gpu else paddle.set_device('cpu')
tokenizer = self.get_tokenizer()
examples = []
for text in data:
if len(text) == 1:
encoded_inputs = tokenizer.encode(text[0], text_pair=None, max_seq_len=max_seq_len)
elif len(text) == 2:
encoded_inputs = tokenizer.encode(text[0], text_pair=text[1], max_seq_len=max_seq_len)
else:
raise RuntimeError(
'The input text must have one or two sequence, but got %d. Please check your inputs.' % len(text))
examples.append((encoded_inputs['input_ids'], encoded_inputs['segment_ids']))
def _batchify_fn(batch):
input_ids = [entry[0] for entry in batch]
segment_ids = [entry[1] for entry in batch]
return input_ids, segment_ids
# Seperates data into some batches.
batches = []
one_batch = []
for example in examples:
one_batch.append(example)
if len(one_batch) == batch_size:
batches.append(one_batch)
one_batch = []
if one_batch:
# The last batch whose size is less than the config batch_size setting.
batches.append(one_batch)
results = []
self.eval()
for batch in batches:
input_ids, segment_ids = _batchify_fn(batch)
input_ids = paddle.to_tensor(input_ids)
segment_ids = paddle.to_tensor(segment_ids)
# TODO(zhangxuefei): add task token_classification postprocess after prediction.
if self.task == 'sequence_classification':
probs = self(input_ids, segment_ids)
idx = paddle.argmax(probs, axis=1).numpy()
idx = idx.tolist()
labels = [self.label_map[i] for i in idx]
results.extend(labels)
return results
@serving
def get_embedding(self, texts, use_gpu=False):
if self.task is not None:
raise RuntimeError("The get_embedding method is only valid when task is None, but got task %s" % self.task)
paddle.set_device('gpu') if use_gpu else paddle.set_device('cpu')
tokenizer = self.get_tokenizer()
results = []
for text in texts:
if len(text) == 1:
encoded_inputs = tokenizer.encode(text[0], text_pair=None, pad_to_max_seq_len=False)
elif len(text) == 2:
encoded_inputs = tokenizer.encode(text[0], text_pair=text[1], pad_to_max_seq_len=False)
else:
raise RuntimeError(
'The input text must have one or two sequence, but got %d. Please check your inputs.' % len(text))
input_ids = paddle.to_tensor(encoded_inputs['input_ids']).unsqueeze(0)
segment_ids = paddle.to_tensor(encoded_inputs['segment_ids']).unsqueeze(0)
sequence_output, pooled_output = self(input_ids, segment_ids)
sequence_output = sequence_output.squeeze(0)
pooled_output = pooled_output.squeeze(0)
results.append((sequence_output.numpy().tolist(), pooled_output.numpy().tolist()))
return results
```shell
$ hub install bert-base-uncased==2.0.0
```
<p align="center">
<img src="https://bj.bcebos.com/paddlehub/paddlehub-img/bert_network.png" hspace='10'/> <br />
</p>
更多详情请参考[BERT论文](https://arxiv.org/abs/1810.04805)
## API
```python
def __init__(
task=None,
load_checkpoint=None,
label_map=None)
```
创建Module对象(动态图组网版本)。
**参数**
* `task`: 任务名称,可为`sequence_classification`
* `load_checkpoint`:使用PaddleHub Fine-tune api训练保存的模型参数文件路径。
* `label_map`:预测时的类别映射表。
```python
def predict(
data,
max_seq_len=128,
batch_size=1,
use_gpu=False)
```
**参数**
* `data`: 待预测数据,格式为\[\[sample\_a\_text\_a, sample\_a\_text\_b\], \[sample\_b\_text\_a, sample\_b\_text\_b\],…,\],其中每个元素都是一个样例,
每个样例可以包含text\_a与text\_b。每个样例文本数量(1个或者2个)需和训练时保持一致。
* `max_seq_len`:模型处理文本的最大长度
* `batch_size`:模型批处理大小
* `use_gpu`:是否使用gpu,默认为False。对于GPU用户,建议开启use_gpu。
**返回**
```python
def get_embedding(
texts,
use_gpu=False
)
```
用于获取输入文本的句子粒度特征与字粒度特征
**参数**
* `texts`:输入文本列表,格式为\[\[sample\_a\_text\_a, sample\_a\_text\_b\], \[sample\_b\_text\_a, sample\_b\_text\_b\],…,\],其中每个元素都是一个样例,每个样例可以包含text\_a与text\_b。
* `use_gpu`:是否使用gpu,默认为False。对于GPU用户,建议开启use_gpu。
**返回**
* `results`:list类型,格式为\[\[sample\_a\_pooled\_feature, sample\_a\_seq\_feature\], \[sample\_b\_pooled\_feature, sample\_b\_seq\_feature\],…,\],其中每个元素都是对应样例的特征输出,每个样例都有句子粒度特征pooled\_feature与字粒度特征seq\_feature。
**代码示例**
```python
import paddlehub as hub
data = [
'这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般',
'怀着十分激动的心情放映,可是看着看着发现,在放映完毕后,出现一集米老鼠的动画片',
'作为老的四星酒店,房间依然很整洁,相当不错。机场接机服务很好,可以在车上办理入住手续,节省时间。',
]
label_map = {0: 'negative', 1: 'positive'}
model = hub.Module(
name='bert-base-uncased',
version='2.0.0',
task='sequence_classification',
load_checkpoint='/path/to/parameters',
label_map=label_map)
results = model.predict(data, max_seq_len=50, batch_size=1, use_gpu=False)
for idx, text in enumerate(data):
print('Data: {} \t Lable: {}'.format(text, results[idx]))
```
参考PaddleHub 文本分类示例。https://github.com/PaddlePaddle/PaddleHub/tree/release/v2.0.0-beta/demo/text_classifcation
## 服务部署
PaddleHub Serving可以部署一个在线获取预训练词向量。
### Step1: 启动PaddleHub Serving
运行启动命令:
```shell
$ hub serving start -m bert-base-uncased
```
这样就完成了一个获取预训练词向量服务化API的部署,默认端口号为8866。
**NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
### Step2: 发送预测请求
配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
```python
import requests
import json
# 指定用于预测的文本并生成字典{"text": [text_1, text_2, ... ]}
text = [["今天是个好日子", "天气预报说今天要下雨"], ["这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般"]]
# 以key的方式指定text传入预测方法的时的参数,此例中为"texts"
# 对应本地部署,则为module.get_embedding(texts=text)
data = {"texts": text}
# 发送post请求,content-type类型应指定json方式
url = "http://10.12.121.132:8866/predict/bert-base-uncased"
# 指定post请求的headers为application/json方式
headers = {"Content-Type": "application/json"}
r = requests.post(url=url, headers=headers, data=json.dumps(data))
print(r.json())
```
## 查看代码
https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/pretrain_langauge_models/BERT
## 依赖
paddlepaddle >= 2.0.0
paddlehub >= 2.0.0
## 更新历史
* 1.0.0
初始发布
* 1.1.0
支持get_embedding与get_params_layer
* 2.0.0
全面升级动态图,接口有所变化。
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import Dict, List, Optional, Union, Tuple
import os
from paddle.dataset.common import DATA_HOME
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
from paddlehub import BertTokenizer
from paddlehub.module.modeling_bert import BertForSequenceClassification, BertModel
from paddlehub.module.module import moduleinfo, serving
from paddlehub.utils.log import logger
from paddlehub.utils.utils import download
@moduleinfo(
name="bert-base-uncased",
version="2.0.0",
summary=
"bert_uncased_L-12_H-768_A-12, 12-layer, 768-hidden, 12-heads, 110M parameters. The module is executed as paddle.dygraph.",
author="paddlepaddle",
author_email="",
type="nlp/semantic_model")
class Bert(nn.Layer):
"""
BERT model
"""
def __init__(
self,
task=None,
load_checkpoint=None,
label_map=None,
):
super(Bert, self).__init__()
# TODO(zhangxuefei): add token_classification task
if task == 'sequence_classification':
self.model = BertForSequenceClassification.from_pretrained(
pretrained_model_name_or_path='bert-base-uncased')
self.criterion = paddle.nn.loss.CrossEntropyLoss()
self.metric = paddle.metric.Accuracy(name='acc_accumulation')
elif task is None:
self.model = BertModel.from_pretrained(pretrained_model_name_or_path='bert-base-uncased')
else:
raise RuntimeError("Unknown task %s, task should be sequence_classification" % task)
self.task = task
self.label_map = label_map
if load_checkpoint is not None and os.path.isfile(load_checkpoint):
state_dict = paddle.load(load_checkpoint)
self.set_state_dict(state_dict)
logger.info('Loaded parameters from %s' % os.path.abspath(load_checkpoint))
def forward(self, input_ids, token_type_ids=None, position_ids=None, attention_mask=None, labels=None):
result = self.model(input_ids, token_type_ids, position_ids, attention_mask)
if self.task is not None:
logits = result
probs = F.softmax(logits, axis=1)
if labels is not None:
loss = self.criterion(logits, labels)
correct = self.metric.compute(probs, labels)
acc = self.metric.update(correct)
return probs, loss, acc
return probs
else:
sequence_output, pooled_output = result
return sequence_output, pooled_output
def get_vocab_path(self):
"""
Gets the path of the module vocabulary path.
"""
save_path = os.path.join(DATA_HOME, 'bert-base-uncased', 'bert-base-uncased-vocab.txt')
if not os.path.exists(save_path) or not os.path.isfile(save_path):
url = "https://paddle-hapi.bj.bcebos.com/models/bert/bert-base-uncased-vocab.txt"
download(url, os.path.join(DATA_HOME, 'bert-base-uncased'))
return save_path
def get_tokenizer(self, tokenize_chinese_chars=True):
"""
Gets the tokenizer that is customized for this module.
Args:
tokenize_chinese_chars (:obj: bool , defaults to :obj: True):
Whether to tokenize chinese characters or not.
Returns:
tokenizer (:obj:BertTokenizer) : The tokenizer which was customized for this module.
"""
return BertTokenizer(tokenize_chinese_chars=tokenize_chinese_chars, vocab_file=self.get_vocab_path())
def training_step(self, batch: List[paddle.Tensor], batch_idx: int):
"""
One step for training, which should be called as forward computation.
Args:
batch(:obj:List[paddle.Tensor]): The one batch data, which contains the model needed,
such as input_ids, sent_ids, pos_ids, input_mask and labels.
batch_idx(int): The index of batch.
Returns:
results(:obj: Dict) : The model outputs, such as loss and metrics.
"""
predictions, avg_loss, acc = self(input_ids=batch[0], token_type_ids=batch[1], labels=batch[2])
return {'loss': avg_loss, 'metrics': {'acc': acc}}
def validation_step(self, batch: List[paddle.Tensor], batch_idx: int):
"""
One step for validation, which should be called as forward computation.
Args:
batch(:obj:List[paddle.Tensor]): The one batch data, which contains the model needed,
such as input_ids, sent_ids, pos_ids, input_mask and labels.
batch_idx(int): The index of batch.
Returns:
results(:obj: Dict) : The model outputs, such as metrics.
"""
predictions, avg_loss, acc = self(input_ids=batch[0], token_type_ids=batch[1], labels=batch[2])
return {'metrics': {'acc': acc}}
def predict(self, data, max_seq_len=128, batch_size=1, use_gpu=False):
"""
Predicts the data labels.
Args:
data (obj:`List(str)`): The processed data whose each element is the raw text.
max_seq_len (:obj:`int`, `optional`, defaults to :int:`None`):
If set to a number, will limit the total sequence returned so that it has a maximum length.
batch_size(obj:`int`, defaults to 1): The number of batch.
use_gpu(obj:`bool`, defaults to `False`): Whether to use gpu to run or not.
Returns:
results(obj:`list`): All the predictions labels.
"""
# TODO(zhangxuefei): add task token_classification task predict.
if self.task not in ['sequence_classification']:
raise RuntimeError("The predict method is for sequence_classification task, but got task %s." % self.task)
paddle.set_device('gpu') if use_gpu else paddle.set_device('cpu')
tokenizer = self.get_tokenizer()
examples = []
for text in data:
if len(text) == 1:
encoded_inputs = tokenizer.encode(text[0], text_pair=None, max_seq_len=max_seq_len)
elif len(text) == 2:
encoded_inputs = tokenizer.encode(text[0], text_pair=text[1], max_seq_len=max_seq_len)
else:
raise RuntimeError(
'The input text must have one or two sequence, but got %d. Please check your inputs.' % len(text))
examples.append((encoded_inputs['input_ids'], encoded_inputs['segment_ids']))
def _batchify_fn(batch):
input_ids = [entry[0] for entry in batch]
segment_ids = [entry[1] for entry in batch]
return input_ids, segment_ids
# Seperates data into some batches.
batches = []
one_batch = []
for example in examples:
one_batch.append(example)
if len(one_batch) == batch_size:
batches.append(one_batch)
one_batch = []
if one_batch:
# The last batch whose size is less than the config batch_size setting.
batches.append(one_batch)
results = []
self.eval()
for batch in batches:
input_ids, segment_ids = _batchify_fn(batch)
input_ids = paddle.to_tensor(input_ids)
segment_ids = paddle.to_tensor(segment_ids)
# TODO(zhangxuefei): add task token_classification postprocess after prediction.
if self.task == 'sequence_classification':
probs = self(input_ids, segment_ids)
idx = paddle.argmax(probs, axis=1).numpy()
idx = idx.tolist()
labels = [self.label_map[i] for i in idx]
results.extend(labels)
return results
@serving
def get_embedding(self, texts, use_gpu=False):
if self.task is not None:
raise RuntimeError("The get_embedding method is only valid when task is None, but got task %s" % self.task)
paddle.set_device('gpu') if use_gpu else paddle.set_device('cpu')
tokenizer = self.get_tokenizer()
results = []
for text in texts:
if len(text) == 1:
encoded_inputs = tokenizer.encode(text[0], text_pair=None, pad_to_max_seq_len=False)
elif len(text) == 2:
encoded_inputs = tokenizer.encode(text[0], text_pair=text[1], pad_to_max_seq_len=False)
else:
raise RuntimeError(
'The input text must have one or two sequence, but got %d. Please check your inputs.' % len(text))
input_ids = paddle.to_tensor(encoded_inputs['input_ids']).unsqueeze(0)
segment_ids = paddle.to_tensor(encoded_inputs['segment_ids']).unsqueeze(0)
sequence_output, pooled_output = self(input_ids, segment_ids)
sequence_output = sequence_output.squeeze(0)
pooled_output = pooled_output.squeeze(0)
results.append((sequence_output.numpy().tolist(), pooled_output.numpy().tolist()))
return results
```shell
$ hub install bert-large-cased==2.0.0
```
<p align="center">
<img src="https://bj.bcebos.com/paddlehub/paddlehub-img/bert_network.png" hspace='10'/> <br />
</p>
更多详情请参考[BERT论文](https://arxiv.org/abs/1810.04805)
## API
```python
def __init__(
task=None,
load_checkpoint=None,
label_map=None)
```
创建Module对象(动态图组网版本)。
**参数**
* `task`: 任务名称,可为`sequence_classification`
* `load_checkpoint`:使用PaddleHub Fine-tune api训练保存的模型参数文件路径。
* `label_map`:预测时的类别映射表。
```python
def predict(
data,
max_seq_len=128,
batch_size=1,
use_gpu=False)
```
**参数**
* `data`: 待预测数据,格式为\[\[sample\_a\_text\_a, sample\_a\_text\_b\], \[sample\_b\_text\_a, sample\_b\_text\_b\],…,\],其中每个元素都是一个样例,
每个样例可以包含text\_a与text\_b。每个样例文本数量(1个或者2个)需和训练时保持一致。
* `max_seq_len`:模型处理文本的最大长度
* `batch_size`:模型批处理大小
* `use_gpu`:是否使用gpu,默认为False。对于GPU用户,建议开启use_gpu。
**返回**
```python
def get_embedding(
texts,
use_gpu=False
)
```
用于获取输入文本的句子粒度特征与字粒度特征
**参数**
* `texts`:输入文本列表,格式为\[\[sample\_a\_text\_a, sample\_a\_text\_b\], \[sample\_b\_text\_a, sample\_b\_text\_b\],…,\],其中每个元素都是一个样例,每个样例可以包含text\_a与text\_b。
* `use_gpu`:是否使用gpu,默认为False。对于GPU用户,建议开启use_gpu。
**返回**
* `results`:list类型,格式为\[\[sample\_a\_pooled\_feature, sample\_a\_seq\_feature\], \[sample\_b\_pooled\_feature, sample\_b\_seq\_feature\],…,\],其中每个元素都是对应样例的特征输出,每个样例都有句子粒度特征pooled\_feature与字粒度特征seq\_feature。
**代码示例**
```python
import paddlehub as hub
data = [
'这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般',
'怀着十分激动的心情放映,可是看着看着发现,在放映完毕后,出现一集米老鼠的动画片',
'作为老的四星酒店,房间依然很整洁,相当不错。机场接机服务很好,可以在车上办理入住手续,节省时间。',
]
label_map = {0: 'negative', 1: 'positive'}
model = hub.Module(
name='bert-large-cased',
version='2.0.0',
task='sequence_classification',
load_checkpoint='/path/to/parameters',
label_map=label_map)
results = model.predict(data, max_seq_len=50, batch_size=1, use_gpu=False)
for idx, text in enumerate(data):
print('Data: {} \t Lable: {}'.format(text, results[idx]))
```
参考PaddleHub 文本分类示例。https://github.com/PaddlePaddle/PaddleHub/tree/release/v2.0.0-beta/demo/text_classifcation
## 服务部署
PaddleHub Serving可以部署一个在线获取预训练词向量。
### Step1: 启动PaddleHub Serving
运行启动命令:
```shell
$ hub serving start -m bert-large-cased
```
这样就完成了一个获取预训练词向量服务化API的部署,默认端口号为8866。
**NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
### Step2: 发送预测请求
配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
```python
import requests
import json
# 指定用于预测的文本并生成字典{"text": [text_1, text_2, ... ]}
text = [["今天是个好日子", "天气预报说今天要下雨"], ["这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般"]]
# 以key的方式指定text传入预测方法的时的参数,此例中为"texts"
# 对应本地部署,则为module.get_embedding(texts=text)
data = {"texts": text}
# 发送post请求,content-type类型应指定json方式
url = "http://10.12.121.132:8866/predict/bert-large-cased"
# 指定post请求的headers为application/json方式
headers = {"Content-Type": "application/json"}
r = requests.post(url=url, headers=headers, data=json.dumps(data))
print(r.json())
```
## 查看代码
https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/pretrain_langauge_models/BERT
## 依赖
paddlepaddle >= 2.0.0
paddlehub >= 2.0.0
## 更新历史
* 1.0.0
初始发布
* 1.1.0
支持get_embedding与get_params_layer
* 2.0.0
全面升级动态图,接口有所变化。
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import Dict, List, Optional, Union, Tuple
import os
from paddle.dataset.common import DATA_HOME
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
from paddlehub import BertTokenizer
from paddlehub.module.modeling_bert import BertForSequenceClassification, BertModel
from paddlehub.module.module import moduleinfo, serving
from paddlehub.utils.log import logger
from paddlehub.utils.utils import download
@moduleinfo(
name="bert-large-cased",
version="2.0.0",
summary=
"bert_cased_L-24_H-1024_A-16, 24-layer, 1024-hidden, 16-heads, 340M parameters. The module is executed as paddle.dygraph.",
author="paddlepaddle",
author_email="",
type="nlp/semantic_model")
class Bert(nn.Layer):
"""
BERT model
"""
def __init__(
self,
task=None,
load_checkpoint=None,
label_map=None,
):
super(Bert, self).__init__()
# TODO(zhangxuefei): add token_classification task
if task == 'sequence_classification':
self.model = BertForSequenceClassification.from_pretrained(pretrained_model_name_or_path='bert-large-cased')
self.criterion = paddle.nn.loss.CrossEntropyLoss()
self.metric = paddle.metric.Accuracy(name='acc_accumulation')
elif task is None:
self.model = BertModel.from_pretrained(pretrained_model_name_or_path='bert-large-cased')
else:
raise RuntimeError("Unknown task %s, task should be sequence_classification" % task)
self.task = task
self.label_map = label_map
if load_checkpoint is not None and os.path.isfile(load_checkpoint):
state_dict = paddle.load(load_checkpoint)
self.set_state_dict(state_dict)
logger.info('Loaded parameters from %s' % os.path.abspath(load_checkpoint))
def forward(self, input_ids, token_type_ids=None, position_ids=None, attention_mask=None, labels=None):
result = self.model(input_ids, token_type_ids, position_ids, attention_mask)
if self.task is not None:
logits = result
probs = F.softmax(logits, axis=1)
if labels is not None:
loss = self.criterion(logits, labels)
correct = self.metric.compute(probs, labels)
acc = self.metric.update(correct)
return probs, loss, acc
return probs
else:
sequence_output, pooled_output = result
return sequence_output, pooled_output
def get_vocab_path(self):
"""
Gets the path of the module vocabulary path.
"""
save_path = os.path.join(DATA_HOME, 'bert-large-cased', 'bert-large-cased-vocab.txt')
if not os.path.exists(save_path) or not os.path.isfile(save_path):
url = "https://paddle-hapi.bj.bcebos.com/models/bert/bert-large-cased-vocab.txt"
download(url, os.path.join(DATA_HOME, 'bert-large-cased'))
return save_path
def get_tokenizer(self, tokenize_chinese_chars=True):
"""
Gets the tokenizer that is customized for this module.
Args:
tokenize_chinese_chars (:obj: bool , defaults to :obj: True):
Whether to tokenize chinese characters or not.
Returns:
tokenizer (:obj:BertTokenizer) : The tokenizer which was customized for this module.
"""
return BertTokenizer(tokenize_chinese_chars=tokenize_chinese_chars, vocab_file=self.get_vocab_path())
def training_step(self, batch: List[paddle.Tensor], batch_idx: int):
"""
One step for training, which should be called as forward computation.
Args:
batch(:obj:List[paddle.Tensor]): The one batch data, which contains the model needed,
such as input_ids, sent_ids, pos_ids, input_mask and labels.
batch_idx(int): The index of batch.
Returns:
results(:obj: Dict) : The model outputs, such as loss and metrics.
"""
predictions, avg_loss, acc = self(input_ids=batch[0], token_type_ids=batch[1], labels=batch[2])
return {'loss': avg_loss, 'metrics': {'acc': acc}}
def validation_step(self, batch: List[paddle.Tensor], batch_idx: int):
"""
One step for validation, which should be called as forward computation.
Args:
batch(:obj:List[paddle.Tensor]): The one batch data, which contains the model needed,
such as input_ids, sent_ids, pos_ids, input_mask and labels.
batch_idx(int): The index of batch.
Returns:
results(:obj: Dict) : The model outputs, such as metrics.
"""
predictions, avg_loss, acc = self(input_ids=batch[0], token_type_ids=batch[1], labels=batch[2])
return {'metrics': {'acc': acc}}
def predict(self, data, max_seq_len=128, batch_size=1, use_gpu=False):
"""
Predicts the data labels.
Args:
data (obj:`List(str)`): The processed data whose each element is the raw text.
max_seq_len (:obj:`int`, `optional`, defaults to :int:`None`):
If set to a number, will limit the total sequence returned so that it has a maximum length.
batch_size(obj:`int`, defaults to 1): The number of batch.
use_gpu(obj:`bool`, defaults to `False`): Whether to use gpu to run or not.
Returns:
results(obj:`list`): All the predictions labels.
"""
# TODO(zhangxuefei): add task token_classification task predict.
if self.task not in ['sequence_classification']:
raise RuntimeError("The predict method is for sequence_classification task, but got task %s." % self.task)
paddle.set_device('gpu') if use_gpu else paddle.set_device('cpu')
tokenizer = self.get_tokenizer()
examples = []
for text in data:
if len(text) == 1:
encoded_inputs = tokenizer.encode(text[0], text_pair=None, max_seq_len=max_seq_len)
elif len(text) == 2:
encoded_inputs = tokenizer.encode(text[0], text_pair=text[1], max_seq_len=max_seq_len)
else:
raise RuntimeError(
'The input text must have one or two sequence, but got %d. Please check your inputs.' % len(text))
examples.append((encoded_inputs['input_ids'], encoded_inputs['segment_ids']))
def _batchify_fn(batch):
input_ids = [entry[0] for entry in batch]
segment_ids = [entry[1] for entry in batch]
return input_ids, segment_ids
# Seperates data into some batches.
batches = []
one_batch = []
for example in examples:
one_batch.append(example)
if len(one_batch) == batch_size:
batches.append(one_batch)
one_batch = []
if one_batch:
# The last batch whose size is less than the config batch_size setting.
batches.append(one_batch)
results = []
self.eval()
for batch in batches:
input_ids, segment_ids = _batchify_fn(batch)
input_ids = paddle.to_tensor(input_ids)
segment_ids = paddle.to_tensor(segment_ids)
# TODO(zhangxuefei): add task token_classification postprocess after prediction.
if self.task == 'sequence_classification':
probs = self(input_ids, segment_ids)
idx = paddle.argmax(probs, axis=1).numpy()
idx = idx.tolist()
labels = [self.label_map[i] for i in idx]
results.extend(labels)
return results
@serving
def get_embedding(self, texts, use_gpu=False):
if self.task is not None:
raise RuntimeError("The get_embedding method is only valid when task is None, but got task %s" % self.task)
paddle.set_device('gpu') if use_gpu else paddle.set_device('cpu')
tokenizer = self.get_tokenizer()
results = []
for text in texts:
if len(text) == 1:
encoded_inputs = tokenizer.encode(text[0], text_pair=None, pad_to_max_seq_len=False)
elif len(text) == 2:
encoded_inputs = tokenizer.encode(text[0], text_pair=text[1], pad_to_max_seq_len=False)
else:
raise RuntimeError(
'The input text must have one or two sequence, but got %d. Please check your inputs.' % len(text))
input_ids = paddle.to_tensor(encoded_inputs['input_ids']).unsqueeze(0)
segment_ids = paddle.to_tensor(encoded_inputs['segment_ids']).unsqueeze(0)
sequence_output, pooled_output = self(input_ids, segment_ids)
sequence_output = sequence_output.squeeze(0)
pooled_output = pooled_output.squeeze(0)
results.append((sequence_output.numpy().tolist(), pooled_output.numpy().tolist()))
return results
```shell
$ hub install bert-large-uncased==2.0.0
```
<p align="center">
<img src="https://bj.bcebos.com/paddlehub/paddlehub-img/bert_network.png" hspace='10'/> <br />
</p>
更多详情请参考[BERT论文](https://arxiv.org/abs/1810.04805)
## API
```python
def __init__(
task=None,
load_checkpoint=None,
label_map=None)
```
创建Module对象(动态图组网版本)。
**参数**
* `task`: 任务名称,可为`sequence_classification`
* `load_checkpoint`:使用PaddleHub Fine-tune api训练保存的模型参数文件路径。
* `label_map`:预测时的类别映射表。
```python
def predict(
data,
max_seq_len=128,
batch_size=1,
use_gpu=False)
```
**参数**
* `data`: 待预测数据,格式为\[\[sample\_a\_text\_a, sample\_a\_text\_b\], \[sample\_b\_text\_a, sample\_b\_text\_b\],…,\],其中每个元素都是一个样例,
每个样例可以包含text\_a与text\_b。每个样例文本数量(1个或者2个)需和训练时保持一致。
* `max_seq_len`:模型处理文本的最大长度
* `batch_size`:模型批处理大小
* `use_gpu`:是否使用gpu,默认为False。对于GPU用户,建议开启use_gpu。
**返回**
```python
def get_embedding(
texts,
use_gpu=False
)
```
用于获取输入文本的句子粒度特征与字粒度特征
**参数**
* `texts`:输入文本列表,格式为\[\[sample\_a\_text\_a, sample\_a\_text\_b\], \[sample\_b\_text\_a, sample\_b\_text\_b\],…,\],其中每个元素都是一个样例,每个样例可以包含text\_a与text\_b。
* `use_gpu`:是否使用gpu,默认为False。对于GPU用户,建议开启use_gpu。
**返回**
* `results`:list类型,格式为\[\[sample\_a\_pooled\_feature, sample\_a\_seq\_feature\], \[sample\_b\_pooled\_feature, sample\_b\_seq\_feature\],…,\],其中每个元素都是对应样例的特征输出,每个样例都有句子粒度特征pooled\_feature与字粒度特征seq\_feature。
**代码示例**
```python
import paddlehub as hub
data = [
'这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般',
'怀着十分激动的心情放映,可是看着看着发现,在放映完毕后,出现一集米老鼠的动画片',
'作为老的四星酒店,房间依然很整洁,相当不错。机场接机服务很好,可以在车上办理入住手续,节省时间。',
]
label_map = {0: 'negative', 1: 'positive'}
model = hub.Module(
name='bert-large-画丶cased',
version='2.0.0',
task='sequence_classification',
load_checkpoint='/path/to/parameters',
label_map=label_map)
results = model.predict(data, max_seq_len=50, batch_size=1, use_gpu=False)
for idx, text in enumerate(data):
print('Data: {} \t Lable: {}'.format(text, results[idx]))
```
参考PaddleHub 文本分类示例。https://github.com/PaddlePaddle/PaddleHub/tree/release/v2.0.0-beta/demo/text_classifcation
## 服务部署
PaddleHub Serving可以部署一个在线获取预训练词向量。
### Step1: 启动PaddleHub Serving
运行启动命令:
```shell
$ hub serving start -m bert-large-uncased
```
这样就完成了一个获取预训练词向量服务化API的部署,默认端口号为8866。
**NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
### Step2: 发送预测请求
配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
```python
import requests
import json
# 指定用于预测的文本并生成字典{"text": [text_1, text_2, ... ]}
text = [["今天是个好日子", "天气预报说今天要下雨"], ["这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般"]]
# 以key的方式指定text传入预测方法的时的参数,此例中为"texts"
# 对应本地部署,则为module.get_embedding(texts=text)
data = {"texts": text}
# 发送post请求,content-type类型应指定json方式
url = "http://10.12.121.132:8866/predict/bert-large-uncased"
# 指定post请求的headers为application/json方式
headers = {"Content-Type": "application/json"}
r = requests.post(url=url, headers=headers, data=json.dumps(data))
print(r.json())
```
## 查看代码
https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/pretrain_langauge_models/BERT
## 依赖
paddlepaddle >= 2.0.0
paddlehub >= 2.0.0
## 更新历史
* 1.0.0
初始发布
* 1.1.0
支持get_embedding与get_params_layer
* 2.0.0
全面升级动态图,接口有所变化。
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import Dict, List, Optional, Union, Tuple
import os
from paddle.dataset.common import DATA_HOME
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
from paddlehub import BertTokenizer
from paddlehub.module.modeling_bert import BertForSequenceClassification, BertModel
from paddlehub.module.module import moduleinfo, serving
from paddlehub.utils.log import logger
from paddlehub.utils.utils import download
@moduleinfo(
name="bert-large-uncased",
version="2.0.0",
summary=
"bert_uncased_L-24_H-1024_A-16, 24-layer, 1024-hidden, 16-heads, 340M parameters. The module is executed as paddle.dygraph.",
author="paddlepaddle",
author_email="",
type="nlp/semantic_model")
class Bert(nn.Layer):
"""
BERT model
"""
def __init__(
self,
task=None,
load_checkpoint=None,
label_map=None,
):
super(Bert, self).__init__()
# TODO(zhangxuefei): add token_classification task
if task == 'sequence_classification':
self.model = BertForSequenceClassification.from_pretrained(
pretrained_model_name_or_path='bert-large-uncased')
self.criterion = paddle.nn.loss.CrossEntropyLoss()
self.metric = paddle.metric.Accuracy(name='acc_accumulation')
elif task is None:
self.model = BertModel.from_pretrained(pretrained_model_name_or_path='bert-large-uncased')
else:
raise RuntimeError("Unknown task %s, task should be sequence_classification" % task)
self.task = task
self.label_map = label_map
if load_checkpoint is not None and os.path.isfile(load_checkpoint):
state_dict = paddle.load(load_checkpoint)
self.set_state_dict(state_dict)
logger.info('Loaded parameters from %s' % os.path.abspath(load_checkpoint))
def forward(self, input_ids, token_type_ids=None, position_ids=None, attention_mask=None, labels=None):
result = self.model(input_ids, token_type_ids, position_ids, attention_mask)
if self.task is not None:
logits = result
probs = F.softmax(logits, axis=1)
if labels is not None:
loss = self.criterion(logits, labels)
correct = self.metric.compute(probs, labels)
acc = self.metric.update(correct)
return probs, loss, acc
return probs
else:
sequence_output, pooled_output = result
return sequence_output, pooled_output
def get_vocab_path(self):
"""
Gets the path of the module vocabulary path.
"""
save_path = os.path.join(DATA_HOME, 'bert-large-uncased', 'bert-large-uncased-vocab.txt')
if not os.path.exists(save_path) or not os.path.isfile(save_path):
url = "https://paddle-hapi.bj.bcebos.com/models/bert/bert-large-uncased-vocab.txt"
download(url, os.path.join(DATA_HOME, 'bert-large-uncased'))
return save_path
def get_tokenizer(self, tokenize_chinese_chars=True):
"""
Gets the tokenizer that is customized for this module.
Args:
tokenize_chinese_chars (:obj: bool , defaults to :obj: True):
Whether to tokenize chinese characters or not.
Returns:
tokenizer (:obj:BertTokenizer) : The tokenizer which was customized for this module.
"""
return BertTokenizer(tokenize_chinese_chars=tokenize_chinese_chars, vocab_file=self.get_vocab_path())
def training_step(self, batch: List[paddle.Tensor], batch_idx: int):
"""
One step for training, which should be called as forward computation.
Args:
batch(:obj:List[paddle.Tensor]): The one batch data, which contains the model needed,
such as input_ids, sent_ids, pos_ids, input_mask and labels.
batch_idx(int): The index of batch.
Returns:
results(:obj: Dict) : The model outputs, such as loss and metrics.
"""
predictions, avg_loss, acc = self(input_ids=batch[0], token_type_ids=batch[1], labels=batch[2])
return {'loss': avg_loss, 'metrics': {'acc': acc}}
def validation_step(self, batch: List[paddle.Tensor], batch_idx: int):
"""
One step for validation, which should be called as forward computation.
Args:
batch(:obj:List[paddle.Tensor]): The one batch data, which contains the model needed,
such as input_ids, sent_ids, pos_ids, input_mask and labels.
batch_idx(int): The index of batch.
Returns:
results(:obj: Dict) : The model outputs, such as metrics.
"""
predictions, avg_loss, acc = self(input_ids=batch[0], token_type_ids=batch[1], labels=batch[2])
return {'metrics': {'acc': acc}}
def predict(self, data, max_seq_len=128, batch_size=1, use_gpu=False):
"""
Predicts the data labels.
Args:
data (obj:`List(str)`): The processed data whose each element is the raw text.
max_seq_len (:obj:`int`, `optional`, defaults to :int:`None`):
If set to a number, will limit the total sequence returned so that it has a maximum length.
batch_size(obj:`int`, defaults to 1): The number of batch.
use_gpu(obj:`bool`, defaults to `False`): Whether to use gpu to run or not.
Returns:
results(obj:`list`): All the predictions labels.
"""
# TODO(zhangxuefei): add task token_classification task predict.
if self.task not in ['sequence_classification']:
raise RuntimeError("The predict method is for sequence_classification task, but got task %s." % self.task)
paddle.set_device('gpu') if use_gpu else paddle.set_device('cpu')
tokenizer = self.get_tokenizer()
examples = []
for text in data:
if len(text) == 1:
encoded_inputs = tokenizer.encode(text[0], text_pair=None, max_seq_len=max_seq_len)
elif len(text) == 2:
encoded_inputs = tokenizer.encode(text[0], text_pair=text[1], max_seq_len=max_seq_len)
else:
raise RuntimeError(
'The input text must have one or two sequence, but got %d. Please check your inputs.' % len(text))
examples.append((encoded_inputs['input_ids'], encoded_inputs['segment_ids']))
def _batchify_fn(batch):
input_ids = [entry[0] for entry in batch]
segment_ids = [entry[1] for entry in batch]
return input_ids, segment_ids
# Seperates data into some batches.
batches = []
one_batch = []
for example in examples:
one_batch.append(example)
if len(one_batch) == batch_size:
batches.append(one_batch)
one_batch = []
if one_batch:
# The last batch whose size is less than the config batch_size setting.
batches.append(one_batch)
results = []
self.eval()
for batch in batches:
input_ids, segment_ids = _batchify_fn(batch)
input_ids = paddle.to_tensor(input_ids)
segment_ids = paddle.to_tensor(segment_ids)
# TODO(zhangxuefei): add task token_classification postprocess after prediction.
if self.task == 'sequence_classification':
probs = self(input_ids, segment_ids)
idx = paddle.argmax(probs, axis=1).numpy()
idx = idx.tolist()
labels = [self.label_map[i] for i in idx]
results.extend(labels)
return results
@serving
def get_embedding(self, texts, use_gpu=False):
if self.task is not None:
raise RuntimeError("The get_embedding method is only valid when task is None, but got task %s" % self.task)
paddle.set_device('gpu') if use_gpu else paddle.set_device('cpu')
tokenizer = self.get_tokenizer()
results = []
for text in texts:
if len(text) == 1:
encoded_inputs = tokenizer.encode(text[0], text_pair=None, pad_to_max_seq_len=False)
elif len(text) == 2:
encoded_inputs = tokenizer.encode(text[0], text_pair=text[1], pad_to_max_seq_len=False)
else:
raise RuntimeError(
'The input text must have one or two sequence, but got %d. Please check your inputs.' % len(text))
input_ids = paddle.to_tensor(encoded_inputs['input_ids']).unsqueeze(0)
segment_ids = paddle.to_tensor(encoded_inputs['segment_ids']).unsqueeze(0)
sequence_output, pooled_output = self(input_ids, segment_ids)
sequence_output = sequence_output.squeeze(0)
pooled_output = pooled_output.squeeze(0)
results.append((sequence_output.numpy().tolist(), pooled_output.numpy().tolist()))
return results
```shell
$ hub install bert_cased_L-12_H-768_A-12==1.1.0
```
<p align="center">
<img src="https://bj.bcebos.com/paddlehub/paddlehub-img/bert_network.png" hspace='10'/> <br />
</p>
更多详情请参考[BERT论文](https://arxiv.org/abs/1810.04805)
## API
```python
def context(
trainable=True,
max_seq_len=128
)
```
用于获取Module的上下文信息,得到输入、输出以及预训练的Paddle Program副本
**参数**
> trainable:设置为True时,Module中的参数在Fine-tune时也会随之训练,否则保持不变。
> max_seq_len:BERT模型的最大序列长度,若序列长度不足,会通过padding方式补到**max_seq_len**, 若序列长度大于该值,则会以截断方式让序列长度为**max_seq_len**,max_seq_len可取值范围为0~512;
**返回**
> inputs:dict类型,有以下字段:
> >**input_ids**存放输入文本tokenize后各token对应BERT词汇表的word ids, shape为\[batch_size, max_seq_len\],int64类型;
> >**position_ids**存放输入文本tokenize后各token所在该文本的位置,shape为\[batch_size, max_seq_len\],int64类型;
> >**segment_ids**存放各token所在文本的标识(token属于文本1或者文本2),shape为\[batch_size, max_seq_len\],int64类型;
> >**input_mask**存放token是否为padding的标识,shape为\[batch_size, max_seq_len\],int64类型;
>
> outputs:dict类型,Module的输出特征,有以下字段:
> >**pooled_output**字段存放句子粒度的特征,可用于文本分类等任务,shape为 \[batch_size, 768\],int64类型;
> >**sequence_output**字段存放字粒度的特征,可用于序列标注等任务,shape为 \[batch_size, seq_len, 768\],int64类型;
>
> program:包含该Module计算图的Program。
```python
def get_embedding(
texts,
use_gpu=False,
batch_size=1
)
```
用于获取输入文本的句子粒度特征与字粒度特征
**参数**
> texts:输入文本列表,格式为\[\[sample\_a\_text\_a, sample\_a\_text\_b\], \[sample\_b\_text\_a, sample\_b\_text\_b\],…,\],其中每个元素都是一个样例,每个样例可以包含text\_a与text\_b。
> use_gpu:是否使用gpu,默认为False。对于GPU用户,建议开启use_gpu。
**返回**
> results:list类型,格式为\[\[sample\_a\_pooled\_feature, sample\_a\_seq\_feature\], \[sample\_b\_pooled\_feature, sample\_b\_seq\_feature\],…,\],其中每个元素都是对应样例的特征输出,每个样例都有句子粒度特征pooled\_feature与字粒度特征seq\_feature。
>
```python
def get_params_layer()
```
用于获取参数层信息,该方法与ULMFiTStrategy联用可以严格按照层数设置分层学习率与逐层解冻。
**参数**
> 无
**返回**
> params_layer:dict类型,key为参数名,值为参数所在层数
**代码示例**
```python
import paddlehub as hub
# Load $ hub install bert_cased_L-12_H-768_A-12 pretrained model
module = hub.Module(name="bert_cased_L-12_H-768_A-12")
inputs, outputs, program = module.context(trainable=True, max_seq_len=128)
# Must feed all the tensor of bert_cased_L-12_H-768_A-12's module need
input_ids = inputs["input_ids"]
position_ids = inputs["position_ids"]
segment_ids = inputs["segment_ids"]
input_mask = inputs["input_mask"]
# Use "pooled_output" for sentence-level output.
pooled_output = outputs["pooled_output"]
# Use "sequence_output" for token-level output.
sequence_output = outputs["sequence_output"]
# Use "get_embedding" to get embedding result.
embedding_result = module.get_embedding(texts=[["Sample1_text_a"],["Sample2_text_a","Sample2_text_b"]], use_gpu=True)
# Use "get_params_layer" to get params layer and used to ULMFiTStrategy.
params_layer = module.get_params_layer()
strategy = hub.finetune.strategy.ULMFiTStrategy(frz_params_layer=params_layer, dis_params_layer=params_layer)
```
## 查看代码
https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/pretrain_langauge_models/BERT
## 依赖
paddlepaddle >= 1.6.2
paddlehub >= 1.6.0
## 更新历史
* 1.0.0
初始发布
* 1.1.0
支持get_embedding与get_params_layer
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""BERT model."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import six
import json
import paddle.fluid as fluid
from bert_cased_L_12_H_768_A_12.model.transformer_encoder import encoder, pre_process_layer
class BertConfig(object):
def __init__(self, config_path):
self._config_dict = self._parse(config_path)
def _parse(self, config_path):
try:
with open(config_path) as json_file:
config_dict = json.load(json_file)
except Exception:
raise IOError("Error in parsing bert model config file '%s'" % config_path)
else:
return config_dict
def __getitem__(self, key):
return self._config_dict[key]
def print_config(self):
for arg, value in sorted(six.iteritems(self._config_dict)):
print('%s: %s' % (arg, value))
print('------------------------------------------------')
class BertModel(object):
def __init__(self, src_ids, position_ids, sentence_ids, input_mask, config, weight_sharing=True, use_fp16=False):
self._emb_size = config['hidden_size']
self._n_layer = config['num_hidden_layers']
self._n_head = config['num_attention_heads']
self._voc_size = config['vocab_size']
self._max_position_seq_len = config['max_position_embeddings']
self._sent_types = config['type_vocab_size']
self._hidden_act = config['hidden_act']
self._prepostprocess_dropout = config['hidden_dropout_prob']
self._attention_dropout = config['attention_probs_dropout_prob']
self._weight_sharing = weight_sharing
self._word_emb_name = "word_embedding"
self._pos_emb_name = "pos_embedding"
self._sent_emb_name = "sent_embedding"
self._dtype = "float16" if use_fp16 else "float32"
# Initialize all weigths by truncated normal initializer, and all biases
# will be initialized by constant zero by default.
self._param_initializer = fluid.initializer.TruncatedNormal(scale=config['initializer_range'])
self._build_model(src_ids, position_ids, sentence_ids, input_mask)
def _build_model(self, src_ids, position_ids, sentence_ids, input_mask):
# padding id in vocabulary must be set to 0
emb_out = fluid.layers.embedding(input=src_ids,
size=[self._voc_size, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._word_emb_name,
initializer=self._param_initializer),
is_sparse=False)
position_emb_out = fluid.layers.embedding(input=position_ids,
size=[self._max_position_seq_len, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._pos_emb_name,
initializer=self._param_initializer))
sent_emb_out = fluid.layers.embedding(sentence_ids,
size=[self._sent_types, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._sent_emb_name,
initializer=self._param_initializer))
emb_out = emb_out + position_emb_out
emb_out = emb_out + sent_emb_out
emb_out = pre_process_layer(emb_out, 'nd', self._prepostprocess_dropout, name='pre_encoder')
if self._dtype == "float16":
input_mask = fluid.layers.cast(x=input_mask, dtype=self._dtype)
self_attn_mask = fluid.layers.matmul(x=input_mask, y=input_mask, transpose_y=True)
self_attn_mask = fluid.layers.scale(x=self_attn_mask, scale=10000.0, bias=-1.0, bias_after_scale=False)
n_head_self_attn_mask = fluid.layers.stack(x=[self_attn_mask] * self._n_head, axis=1)
n_head_self_attn_mask.stop_gradient = True
self._enc_out = encoder(enc_input=emb_out,
attn_bias=n_head_self_attn_mask,
n_layer=self._n_layer,
n_head=self._n_head,
d_key=self._emb_size // self._n_head,
d_value=self._emb_size // self._n_head,
d_model=self._emb_size,
d_inner_hid=self._emb_size * 4,
prepostprocess_dropout=self._prepostprocess_dropout,
attention_dropout=self._attention_dropout,
relu_dropout=0,
hidden_act=self._hidden_act,
preprocess_cmd="",
postprocess_cmd="dan",
param_initializer=self._param_initializer,
name='encoder')
def get_sequence_output(self):
return self._enc_out
def get_pooled_output(self):
"""Get the first feature of each sequence for classification"""
next_sent_feat = fluid.layers.slice(input=self._enc_out, axes=[1], starts=[0], ends=[1])
next_sent_feat = fluid.layers.fc(input=next_sent_feat,
size=self._emb_size,
act="tanh",
param_attr=fluid.ParamAttr(name="pooled_fc.w_0",
initializer=self._param_initializer),
bias_attr="pooled_fc.b_0")
return next_sent_feat
def get_pretraining_output(self, mask_label, mask_pos, labels):
"""Get the loss & accuracy for pretraining"""
mask_pos = fluid.layers.cast(x=mask_pos, dtype='int32')
# extract the first token feature in each sentence
next_sent_feat = self.get_pooled_output()
reshaped_emb_out = fluid.layers.reshape(x=self._enc_out, shape=[-1, self._emb_size])
# extract masked tokens' feature
mask_feat = fluid.layers.gather(input=reshaped_emb_out, index=mask_pos)
# transform: fc
mask_trans_feat = fluid.layers.fc(input=mask_feat,
size=self._emb_size,
act=self._hidden_act,
param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0',
initializer=self._param_initializer),
bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
# transform: layer norm
mask_trans_feat = pre_process_layer(mask_trans_feat, 'n', name='mask_lm_trans')
mask_lm_out_bias_attr = fluid.ParamAttr(name="mask_lm_out_fc.b_0",
initializer=fluid.initializer.Constant(value=0.0))
if self._weight_sharing:
fc_out = fluid.layers.matmul(x=mask_trans_feat,
y=fluid.default_main_program().global_block().var(self._word_emb_name),
transpose_y=True)
fc_out += fluid.layers.create_parameter(shape=[self._voc_size],
dtype=self._dtype,
attr=mask_lm_out_bias_attr,
is_bias=True)
else:
fc_out = fluid.layers.fc(input=mask_trans_feat,
size=self._voc_size,
param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0",
initializer=self._param_initializer),
bias_attr=mask_lm_out_bias_attr)
mask_lm_loss = fluid.layers.softmax_with_cross_entropy(logits=fc_out, label=mask_label)
mean_mask_lm_loss = fluid.layers.mean(mask_lm_loss)
next_sent_fc_out = fluid.layers.fc(input=next_sent_feat,
size=2,
param_attr=fluid.ParamAttr(name="next_sent_fc.w_0",
initializer=self._param_initializer),
bias_attr="next_sent_fc.b_0")
next_sent_loss, next_sent_softmax = fluid.layers.softmax_with_cross_entropy(logits=next_sent_fc_out,
label=labels,
return_softmax=True)
next_sent_acc = fluid.layers.accuracy(input=next_sent_softmax, label=labels)
mean_next_sent_loss = fluid.layers.mean(next_sent_loss)
loss = mean_next_sent_loss + mean_mask_lm_loss
return next_sent_acc, mean_mask_lm_loss, loss
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Transformer encoder."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from functools import partial
import paddle.fluid as fluid
import paddle.fluid.layers as layers
def multi_head_attention(queries,
keys,
values,
attn_bias,
d_key,
d_value,
d_model,
n_head=1,
dropout_rate=0.,
cache=None,
param_initializer=None,
name='multi_head_att'):
"""
Multi-Head Attention. Note that attn_bias is added to the logit before
computing softmax activiation to mask certain selected positions so that
they will not considered in attention weights.
"""
keys = queries if keys is None else keys
values = keys if values is None else values
if not (len(queries.shape) == len(keys.shape) == len(values.shape) == 3):
raise ValueError("Inputs: quries, keys and values should all be 3-D tensors.")
def __compute_qkv(queries, keys, values, n_head, d_key, d_value):
"""
Add linear projection to queries, keys, and values.
"""
q = layers.fc(input=queries,
size=d_key * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_query_fc.w_0', initializer=param_initializer),
bias_attr=name + '_query_fc.b_0')
k = layers.fc(input=keys,
size=d_key * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_key_fc.w_0', initializer=param_initializer),
bias_attr=name + '_key_fc.b_0')
v = layers.fc(input=values,
size=d_value * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_value_fc.w_0', initializer=param_initializer),
bias_attr=name + '_value_fc.b_0')
return q, k, v
def __split_heads(x, n_head):
"""
Reshape the last dimension of inpunt tensor x so that it becomes two
dimensions and then transpose. Specifically, input a tensor with shape
[bs, max_sequence_length, n_head * hidden_dim] then output a tensor
with shape [bs, n_head, max_sequence_length, hidden_dim].
"""
hidden_size = x.shape[-1]
# The value 0 in shape attr means copying the corresponding dimension
# size of the input as the output dimension size.
reshaped = layers.reshape(x=x, shape=[0, 0, n_head, hidden_size // n_head], inplace=True)
# permuate the dimensions into:
# [batch_size, n_head, max_sequence_len, hidden_size_per_head]
return layers.transpose(x=reshaped, perm=[0, 2, 1, 3])
def __combine_heads(x):
"""
Transpose and then reshape the last two dimensions of inpunt tensor x
so that it becomes one dimension, which is reverse to __split_heads.
"""
if len(x.shape) == 3: return x
if len(x.shape) != 4:
raise ValueError("Input(x) should be a 4-D Tensor.")
trans_x = layers.transpose(x, perm=[0, 2, 1, 3])
# The value 0 in shape attr means copying the corresponding dimension
# size of the input as the output dimension size.
return layers.reshape(x=trans_x, shape=[0, 0, trans_x.shape[2] * trans_x.shape[3]], inplace=True)
def scaled_dot_product_attention(q, k, v, attn_bias, d_key, dropout_rate):
"""
Scaled Dot-Product Attention
"""
scaled_q = layers.scale(x=q, scale=d_key**-0.5)
product = layers.matmul(x=scaled_q, y=k, transpose_y=True)
if attn_bias:
product += attn_bias
weights = layers.softmax(product)
if dropout_rate:
weights = layers.dropout(weights,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
out = layers.matmul(weights, v)
return out
q, k, v = __compute_qkv(queries, keys, values, n_head, d_key, d_value)
if cache is not None: # use cache and concat time steps
# Since the inplace reshape in __split_heads changes the shape of k and
# v, which is the cache input for next time step, reshape the cache
# input from the previous time step first.
k = cache["k"] = layers.concat([layers.reshape(cache["k"], shape=[0, 0, d_model]), k], axis=1)
v = cache["v"] = layers.concat([layers.reshape(cache["v"], shape=[0, 0, d_model]), v], axis=1)
q = __split_heads(q, n_head)
k = __split_heads(k, n_head)
v = __split_heads(v, n_head)
ctx_multiheads = scaled_dot_product_attention(q, k, v, attn_bias, d_key, dropout_rate)
out = __combine_heads(ctx_multiheads)
# Project back to the model size.
proj_out = layers.fc(input=out,
size=d_model,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_output_fc.w_0', initializer=param_initializer),
bias_attr=name + '_output_fc.b_0')
return proj_out
def positionwise_feed_forward(x, d_inner_hid, d_hid, dropout_rate, hidden_act, param_initializer=None, name='ffn'):
"""
Position-wise Feed-Forward Networks.
This module consists of two linear transformations with a ReLU activation
in between, which is applied to each position separately and identically.
"""
hidden = layers.fc(input=x,
size=d_inner_hid,
num_flatten_dims=2,
act=hidden_act,
param_attr=fluid.ParamAttr(name=name + '_fc_0.w_0', initializer=param_initializer),
bias_attr=name + '_fc_0.b_0')
if dropout_rate:
hidden = layers.dropout(hidden,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
out = layers.fc(input=hidden,
size=d_hid,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_fc_1.w_0', initializer=param_initializer),
bias_attr=name + '_fc_1.b_0')
return out
def pre_post_process_layer(prev_out, out, process_cmd, dropout_rate=0., name=''):
"""
Add residual connection, layer normalization and droput to the out tensor
optionally according to the value of process_cmd.
This will be used before or after multi-head attention and position-wise
feed-forward networks.
"""
for cmd in process_cmd:
if cmd == "a": # add residual connection
out = out + prev_out if prev_out else out
elif cmd == "n": # add layer normalization
out_dtype = out.dtype
if out_dtype == fluid.core.VarDesc.VarType.FP16:
out = layers.cast(x=out, dtype="float32")
out = layers.layer_norm(out,
begin_norm_axis=len(out.shape) - 1,
param_attr=fluid.ParamAttr(name=name + '_layer_norm_scale',
initializer=fluid.initializer.Constant(1.)),
bias_attr=fluid.ParamAttr(name=name + '_layer_norm_bias',
initializer=fluid.initializer.Constant(0.)))
if out_dtype == fluid.core.VarDesc.VarType.FP16:
out = layers.cast(x=out, dtype="float16")
elif cmd == "d": # add dropout
if dropout_rate:
out = layers.dropout(out,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
return out
pre_process_layer = partial(pre_post_process_layer, None)
post_process_layer = pre_post_process_layer
def encoder_layer(enc_input,
attn_bias,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd="n",
postprocess_cmd="da",
param_initializer=None,
name=''):
"""The encoder layers that can be stacked to form a deep encoder.
This module consits of a multi-head (self) attention followed by
position-wise feed-forward networks and both the two components companied
with the post_process_layer to add residual connection, layer normalization
and droput.
"""
attn_output = multi_head_attention(pre_process_layer(enc_input,
preprocess_cmd,
prepostprocess_dropout,
name=name + '_pre_att'),
None,
None,
attn_bias,
d_key,
d_value,
d_model,
n_head,
attention_dropout,
param_initializer=param_initializer,
name=name + '_multi_head_att')
attn_output = post_process_layer(enc_input,
attn_output,
postprocess_cmd,
prepostprocess_dropout,
name=name + '_post_att')
ffd_output = positionwise_feed_forward(pre_process_layer(attn_output,
preprocess_cmd,
prepostprocess_dropout,
name=name + '_pre_ffn'),
d_inner_hid,
d_model,
relu_dropout,
hidden_act,
param_initializer=param_initializer,
name=name + '_ffn')
return post_process_layer(attn_output, ffd_output, postprocess_cmd, prepostprocess_dropout, name=name + '_post_ffn')
def encoder(enc_input,
attn_bias,
n_layer,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd="n",
postprocess_cmd="da",
param_initializer=None,
name=''):
"""
The encoder is composed of a stack of identical layers returned by calling
encoder_layer.
"""
for i in range(n_layer):
enc_output = encoder_layer(enc_input,
attn_bias,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd,
postprocess_cmd,
param_initializer=param_initializer,
name=name + '_layer_' + str(i))
enc_input = enc_output
enc_output = pre_process_layer(enc_output, preprocess_cmd, prepostprocess_dropout, name="post_encoder")
return enc_output
# coding:utf-8
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
from paddlehub import TransformerModule
from paddlehub.module.module import moduleinfo
from bert_cased_L_12_H_768_A_12.model.bert import BertConfig, BertModel
@moduleinfo(
name="bert_cased_L-12_H-768_A-12",
version="1.1.0",
summary="bert_cased_L-12_H-768_A-12, 12-layer, 768-hidden, 12-heads, 110M parameters",
author="paddlepaddle",
author_email="paddle-dev@baidu.com",
type="nlp/semantic_model",
)
class Bert(TransformerModule):
def _initialize(self):
self.MAX_SEQ_LEN = 512
self.params_path = os.path.join(self.directory, "assets", "params")
self.vocab_path = os.path.join(self.directory, "assets", "vocab.txt")
bert_config_path = os.path.join(self.directory, "assets", "bert_config.json")
self.bert_config = BertConfig(bert_config_path)
def net(self, input_ids, position_ids, segment_ids, input_mask):
"""
create neural network.
Args:
input_ids (tensor): the word ids.
position_ids (tensor): the position ids.
segment_ids (tensor): the segment ids.
input_mask (tensor): the padding mask.
Returns:
pooled_output (tensor): sentence-level output for classification task.
sequence_output (tensor): token-level output for sequence task.
"""
bert = BertModel(src_ids=input_ids,
position_ids=position_ids,
sentence_ids=segment_ids,
input_mask=input_mask,
config=self.bert_config,
use_fp16=False)
pooled_output = bert.get_pooled_output()
sequence_output = bert.get_sequence_output()
return pooled_output, sequence_output
if __name__ == '__main__':
test_module = Bert()
```shell
$ hub install bert_cased_L-24_H-1024_A-16==1.1.0
```
<p align="center">
<img src="https://bj.bcebos.com/paddlehub/paddlehub-img/bert_network.png" hspace='10'/> <br />
</p>
更多详情请参考[BERT论文](https://arxiv.org/abs/1810.04805)
## API
```python
def context(
trainable=True,
max_seq_len=128
)
```
用于获取Module的上下文信息,得到输入、输出以及预训练的Paddle Program副本
**参数**
> trainable:设置为True时,Module中的参数在Fine-tune时也会随之训练,否则保持不变。
> max_seq_len:BERT模型的最大序列长度,若序列长度不足,会通过padding方式补到**max_seq_len**, 若序列长度大于该值,则会以截断方式让序列长度为**max_seq_len**,max_seq_len可取值范围为0~512;
**返回**
> inputs:dict类型,有以下字段:
> >**input_ids**存放输入文本tokenize后各token对应BERT词汇表的word ids, shape为\[batch_size, max_seq_len\],int64类型;
> >**position_ids**存放输入文本tokenize后各token所在该文本的位置,shape为\[batch_size, max_seq_len\],int64类型;
> >**segment_ids**存放各token所在文本的标识(token属于文本1或者文本2),shape为\[batch_size, max_seq_len\],int64类型;
> >**input_mask**存放token是否为padding的标识,shape为\[batch_size, max_seq_len\],int64类型;
>
> outputs:dict类型,Module的输出特征,有以下字段:
> >**pooled_output**字段存放句子粒度的特征,可用于文本分类等任务,shape为 \[batch_size, 768\],int64类型;
> >**sequence_output**字段存放字粒度的特征,可用于序列标注等任务,shape为 \[batch_size, seq_len, 768\],int64类型;
>
> program:包含该Module计算图的Program。
```python
def get_embedding(
texts,
use_gpu=False,
batch_size=1
)
```
用于获取输入文本的句子粒度特征与字粒度特征
**参数**
> texts:输入文本列表,格式为\[\[sample\_a\_text\_a, sample\_a\_text\_b\], \[sample\_b\_text\_a, sample\_b\_text\_b\],…,\],其中每个元素都是一个样例,每个样例可以包含text\_a与text\_b。
> use_gpu:是否使用gpu,默认为False。对于GPU用户,建议开启use_gpu。
**返回**
> results:list类型,格式为\[\[sample\_a\_pooled\_feature, sample\_a\_seq\_feature\], \[sample\_b\_pooled\_feature, sample\_b\_seq\_feature\],…,\],其中每个元素都是对应样例的特征输出,每个样例都有句子粒度特征pooled\_feature与字粒度特征seq\_feature。
>
```python
def get_params_layer()
```
用于获取参数层信息,该方法与ULMFiTStrategy联用可以严格按照层数设置分层学习率与逐层解冻。
**参数**
> 无
**返回**
> params_layer:dict类型,key为参数名,值为参数所在层数
**代码示例**
```python
import paddlehub as hub
# Load $ hub install bert_cased_L-24_H-1024_A-16 pretrained model
module = hub.Module(name="bert_cased_L-24_H-1024_A-16")
inputs, outputs, program = module.context(trainable=True, max_seq_len=128)
# Must feed all the tensor of bert_cased_L-24_H-1024_A-16's module need
input_ids = inputs["input_ids"]
position_ids = inputs["position_ids"]
segment_ids = inputs["segment_ids"]
input_mask = inputs["input_mask"]
# Use "pooled_output" for sentence-level output.
pooled_output = outputs["pooled_output"]
# Use "sequence_output" for token-level output.
sequence_output = outputs["sequence_output"]
# Use "get_embedding" to get embedding result.
embedding_result = module.get_embedding(texts=[["Sample1_text_a"],["Sample2_text_a","Sample2_text_b"]], use_gpu=True)
# Use "get_params_layer" to get params layer and used to ULMFiTStrategy.
params_layer = module.get_params_layer()
strategy = hub.finetune.strategy.ULMFiTStrategy(frz_params_layer=params_layer, dis_params_layer=params_layer)
```
## 查看代码
https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/pretrain_langauge_models/BERT
## 依赖
paddlepaddle >= 1.6.2
paddlehub >= 1.6.0
## 更新历史
* 1.0.0
初始发布
* 1.1.0
支持get_embedding与get_params_layer
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""BERT model."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import six
import json
import paddle.fluid as fluid
from bert_cased_L_24_H_1024_A_16.model.transformer_encoder import encoder, pre_process_layer
class BertConfig(object):
def __init__(self, config_path):
self._config_dict = self._parse(config_path)
def _parse(self, config_path):
try:
with open(config_path) as json_file:
config_dict = json.load(json_file)
except Exception:
raise IOError("Error in parsing bert model config file '%s'" % config_path)
else:
return config_dict
def __getitem__(self, key):
return self._config_dict[key]
def print_config(self):
for arg, value in sorted(six.iteritems(self._config_dict)):
print('%s: %s' % (arg, value))
print('------------------------------------------------')
class BertModel(object):
def __init__(self, src_ids, position_ids, sentence_ids, input_mask, config, weight_sharing=True, use_fp16=False):
self._emb_size = config['hidden_size']
self._n_layer = config['num_hidden_layers']
self._n_head = config['num_attention_heads']
self._voc_size = config['vocab_size']
self._max_position_seq_len = config['max_position_embeddings']
self._sent_types = config['type_vocab_size']
self._hidden_act = config['hidden_act']
self._prepostprocess_dropout = config['hidden_dropout_prob']
self._attention_dropout = config['attention_probs_dropout_prob']
self._weight_sharing = weight_sharing
self._word_emb_name = "word_embedding"
self._pos_emb_name = "pos_embedding"
self._sent_emb_name = "sent_embedding"
self._dtype = "float16" if use_fp16 else "float32"
# Initialize all weigths by truncated normal initializer, and all biases
# will be initialized by constant zero by default.
self._param_initializer = fluid.initializer.TruncatedNormal(scale=config['initializer_range'])
self._build_model(src_ids, position_ids, sentence_ids, input_mask)
def _build_model(self, src_ids, position_ids, sentence_ids, input_mask):
# padding id in vocabulary must be set to 0
emb_out = fluid.layers.embedding(input=src_ids,
size=[self._voc_size, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._word_emb_name,
initializer=self._param_initializer),
is_sparse=False)
position_emb_out = fluid.layers.embedding(input=position_ids,
size=[self._max_position_seq_len, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._pos_emb_name,
initializer=self._param_initializer))
sent_emb_out = fluid.layers.embedding(sentence_ids,
size=[self._sent_types, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._sent_emb_name,
initializer=self._param_initializer))
emb_out = emb_out + position_emb_out
emb_out = emb_out + sent_emb_out
emb_out = pre_process_layer(emb_out, 'nd', self._prepostprocess_dropout, name='pre_encoder')
if self._dtype == "float16":
input_mask = fluid.layers.cast(x=input_mask, dtype=self._dtype)
self_attn_mask = fluid.layers.matmul(x=input_mask, y=input_mask, transpose_y=True)
self_attn_mask = fluid.layers.scale(x=self_attn_mask, scale=10000.0, bias=-1.0, bias_after_scale=False)
n_head_self_attn_mask = fluid.layers.stack(x=[self_attn_mask] * self._n_head, axis=1)
n_head_self_attn_mask.stop_gradient = True
self._enc_out = encoder(enc_input=emb_out,
attn_bias=n_head_self_attn_mask,
n_layer=self._n_layer,
n_head=self._n_head,
d_key=self._emb_size // self._n_head,
d_value=self._emb_size // self._n_head,
d_model=self._emb_size,
d_inner_hid=self._emb_size * 4,
prepostprocess_dropout=self._prepostprocess_dropout,
attention_dropout=self._attention_dropout,
relu_dropout=0,
hidden_act=self._hidden_act,
preprocess_cmd="",
postprocess_cmd="dan",
param_initializer=self._param_initializer,
name='encoder')
def get_sequence_output(self):
return self._enc_out
def get_pooled_output(self):
"""Get the first feature of each sequence for classification"""
next_sent_feat = fluid.layers.slice(input=self._enc_out, axes=[1], starts=[0], ends=[1])
next_sent_feat = fluid.layers.fc(input=next_sent_feat,
size=self._emb_size,
act="tanh",
param_attr=fluid.ParamAttr(name="pooled_fc.w_0",
initializer=self._param_initializer),
bias_attr="pooled_fc.b_0")
return next_sent_feat
def get_pretraining_output(self, mask_label, mask_pos, labels):
"""Get the loss & accuracy for pretraining"""
mask_pos = fluid.layers.cast(x=mask_pos, dtype='int32')
# extract the first token feature in each sentence
next_sent_feat = self.get_pooled_output()
reshaped_emb_out = fluid.layers.reshape(x=self._enc_out, shape=[-1, self._emb_size])
# extract masked tokens' feature
mask_feat = fluid.layers.gather(input=reshaped_emb_out, index=mask_pos)
# transform: fc
mask_trans_feat = fluid.layers.fc(input=mask_feat,
size=self._emb_size,
act=self._hidden_act,
param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0',
initializer=self._param_initializer),
bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
# transform: layer norm
mask_trans_feat = pre_process_layer(mask_trans_feat, 'n', name='mask_lm_trans')
mask_lm_out_bias_attr = fluid.ParamAttr(name="mask_lm_out_fc.b_0",
initializer=fluid.initializer.Constant(value=0.0))
if self._weight_sharing:
fc_out = fluid.layers.matmul(x=mask_trans_feat,
y=fluid.default_main_program().global_block().var(self._word_emb_name),
transpose_y=True)
fc_out += fluid.layers.create_parameter(shape=[self._voc_size],
dtype=self._dtype,
attr=mask_lm_out_bias_attr,
is_bias=True)
else:
fc_out = fluid.layers.fc(input=mask_trans_feat,
size=self._voc_size,
param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0",
initializer=self._param_initializer),
bias_attr=mask_lm_out_bias_attr)
mask_lm_loss = fluid.layers.softmax_with_cross_entropy(logits=fc_out, label=mask_label)
mean_mask_lm_loss = fluid.layers.mean(mask_lm_loss)
next_sent_fc_out = fluid.layers.fc(input=next_sent_feat,
size=2,
param_attr=fluid.ParamAttr(name="next_sent_fc.w_0",
initializer=self._param_initializer),
bias_attr="next_sent_fc.b_0")
next_sent_loss, next_sent_softmax = fluid.layers.softmax_with_cross_entropy(logits=next_sent_fc_out,
label=labels,
return_softmax=True)
next_sent_acc = fluid.layers.accuracy(input=next_sent_softmax, label=labels)
mean_next_sent_loss = fluid.layers.mean(next_sent_loss)
loss = mean_next_sent_loss + mean_mask_lm_loss
return next_sent_acc, mean_mask_lm_loss, loss
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Transformer encoder."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from functools import partial
import paddle.fluid as fluid
import paddle.fluid.layers as layers
def multi_head_attention(queries,
keys,
values,
attn_bias,
d_key,
d_value,
d_model,
n_head=1,
dropout_rate=0.,
cache=None,
param_initializer=None,
name='multi_head_att'):
"""
Multi-Head Attention. Note that attn_bias is added to the logit before
computing softmax activiation to mask certain selected positions so that
they will not considered in attention weights.
"""
keys = queries if keys is None else keys
values = keys if values is None else values
if not (len(queries.shape) == len(keys.shape) == len(values.shape) == 3):
raise ValueError("Inputs: quries, keys and values should all be 3-D tensors.")
def __compute_qkv(queries, keys, values, n_head, d_key, d_value):
"""
Add linear projection to queries, keys, and values.
"""
q = layers.fc(input=queries,
size=d_key * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_query_fc.w_0', initializer=param_initializer),
bias_attr=name + '_query_fc.b_0')
k = layers.fc(input=keys,
size=d_key * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_key_fc.w_0', initializer=param_initializer),
bias_attr=name + '_key_fc.b_0')
v = layers.fc(input=values,
size=d_value * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_value_fc.w_0', initializer=param_initializer),
bias_attr=name + '_value_fc.b_0')
return q, k, v
def __split_heads(x, n_head):
"""
Reshape the last dimension of inpunt tensor x so that it becomes two
dimensions and then transpose. Specifically, input a tensor with shape
[bs, max_sequence_length, n_head * hidden_dim] then output a tensor
with shape [bs, n_head, max_sequence_length, hidden_dim].
"""
hidden_size = x.shape[-1]
# The value 0 in shape attr means copying the corresponding dimension
# size of the input as the output dimension size.
reshaped = layers.reshape(x=x, shape=[0, 0, n_head, hidden_size // n_head], inplace=True)
# permuate the dimensions into:
# [batch_size, n_head, max_sequence_len, hidden_size_per_head]
return layers.transpose(x=reshaped, perm=[0, 2, 1, 3])
def __combine_heads(x):
"""
Transpose and then reshape the last two dimensions of inpunt tensor x
so that it becomes one dimension, which is reverse to __split_heads.
"""
if len(x.shape) == 3: return x
if len(x.shape) != 4:
raise ValueError("Input(x) should be a 4-D Tensor.")
trans_x = layers.transpose(x, perm=[0, 2, 1, 3])
# The value 0 in shape attr means copying the corresponding dimension
# size of the input as the output dimension size.
return layers.reshape(x=trans_x, shape=[0, 0, trans_x.shape[2] * trans_x.shape[3]], inplace=True)
def scaled_dot_product_attention(q, k, v, attn_bias, d_key, dropout_rate):
"""
Scaled Dot-Product Attention
"""
scaled_q = layers.scale(x=q, scale=d_key**-0.5)
product = layers.matmul(x=scaled_q, y=k, transpose_y=True)
if attn_bias:
product += attn_bias
weights = layers.softmax(product)
if dropout_rate:
weights = layers.dropout(weights,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
out = layers.matmul(weights, v)
return out
q, k, v = __compute_qkv(queries, keys, values, n_head, d_key, d_value)
if cache is not None: # use cache and concat time steps
# Since the inplace reshape in __split_heads changes the shape of k and
# v, which is the cache input for next time step, reshape the cache
# input from the previous time step first.
k = cache["k"] = layers.concat([layers.reshape(cache["k"], shape=[0, 0, d_model]), k], axis=1)
v = cache["v"] = layers.concat([layers.reshape(cache["v"], shape=[0, 0, d_model]), v], axis=1)
q = __split_heads(q, n_head)
k = __split_heads(k, n_head)
v = __split_heads(v, n_head)
ctx_multiheads = scaled_dot_product_attention(q, k, v, attn_bias, d_key, dropout_rate)
out = __combine_heads(ctx_multiheads)
# Project back to the model size.
proj_out = layers.fc(input=out,
size=d_model,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_output_fc.w_0', initializer=param_initializer),
bias_attr=name + '_output_fc.b_0')
return proj_out
def positionwise_feed_forward(x, d_inner_hid, d_hid, dropout_rate, hidden_act, param_initializer=None, name='ffn'):
"""
Position-wise Feed-Forward Networks.
This module consists of two linear transformations with a ReLU activation
in between, which is applied to each position separately and identically.
"""
hidden = layers.fc(input=x,
size=d_inner_hid,
num_flatten_dims=2,
act=hidden_act,
param_attr=fluid.ParamAttr(name=name + '_fc_0.w_0', initializer=param_initializer),
bias_attr=name + '_fc_0.b_0')
if dropout_rate:
hidden = layers.dropout(hidden,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
out = layers.fc(input=hidden,
size=d_hid,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_fc_1.w_0', initializer=param_initializer),
bias_attr=name + '_fc_1.b_0')
return out
def pre_post_process_layer(prev_out, out, process_cmd, dropout_rate=0., name=''):
"""
Add residual connection, layer normalization and droput to the out tensor
optionally according to the value of process_cmd.
This will be used before or after multi-head attention and position-wise
feed-forward networks.
"""
for cmd in process_cmd:
if cmd == "a": # add residual connection
out = out + prev_out if prev_out else out
elif cmd == "n": # add layer normalization
out_dtype = out.dtype
if out_dtype == fluid.core.VarDesc.VarType.FP16:
out = layers.cast(x=out, dtype="float32")
out = layers.layer_norm(out,
begin_norm_axis=len(out.shape) - 1,
param_attr=fluid.ParamAttr(name=name + '_layer_norm_scale',
initializer=fluid.initializer.Constant(1.)),
bias_attr=fluid.ParamAttr(name=name + '_layer_norm_bias',
initializer=fluid.initializer.Constant(0.)))
if out_dtype == fluid.core.VarDesc.VarType.FP16:
out = layers.cast(x=out, dtype="float16")
elif cmd == "d": # add dropout
if dropout_rate:
out = layers.dropout(out,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
return out
pre_process_layer = partial(pre_post_process_layer, None)
post_process_layer = pre_post_process_layer
def encoder_layer(enc_input,
attn_bias,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd="n",
postprocess_cmd="da",
param_initializer=None,
name=''):
"""The encoder layers that can be stacked to form a deep encoder.
This module consits of a multi-head (self) attention followed by
position-wise feed-forward networks and both the two components companied
with the post_process_layer to add residual connection, layer normalization
and droput.
"""
attn_output = multi_head_attention(pre_process_layer(enc_input,
preprocess_cmd,
prepostprocess_dropout,
name=name + '_pre_att'),
None,
None,
attn_bias,
d_key,
d_value,
d_model,
n_head,
attention_dropout,
param_initializer=param_initializer,
name=name + '_multi_head_att')
attn_output = post_process_layer(enc_input,
attn_output,
postprocess_cmd,
prepostprocess_dropout,
name=name + '_post_att')
ffd_output = positionwise_feed_forward(pre_process_layer(attn_output,
preprocess_cmd,
prepostprocess_dropout,
name=name + '_pre_ffn'),
d_inner_hid,
d_model,
relu_dropout,
hidden_act,
param_initializer=param_initializer,
name=name + '_ffn')
return post_process_layer(attn_output, ffd_output, postprocess_cmd, prepostprocess_dropout, name=name + '_post_ffn')
def encoder(enc_input,
attn_bias,
n_layer,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd="n",
postprocess_cmd="da",
param_initializer=None,
name=''):
"""
The encoder is composed of a stack of identical layers returned by calling
encoder_layer.
"""
for i in range(n_layer):
enc_output = encoder_layer(enc_input,
attn_bias,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd,
postprocess_cmd,
param_initializer=param_initializer,
name=name + '_layer_' + str(i))
enc_input = enc_output
enc_output = pre_process_layer(enc_output, preprocess_cmd, prepostprocess_dropout, name="post_encoder")
return enc_output
# coding:utf-8
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
from paddlehub import TransformerModule
from paddlehub.module.module import moduleinfo
from bert_cased_L_24_H_1024_A_16.model.bert import BertConfig, BertModel
@moduleinfo(
name="bert_cased_L-24_H-1024_A-16",
version="1.1.0",
summary="bert_cased_L-24_H-1024_A-16, 24-layer, 1024-hidden, 16-heads, 340M parameters ",
author="paddlepaddle",
author_email="paddle-dev@baidu.com",
type="nlp/semantic_model",
)
class Bert(TransformerModule):
def _initialize(self):
self.MAX_SEQ_LEN = 512
self.params_path = os.path.join(self.directory, "assets", "params")
self.vocab_path = os.path.join(self.directory, "assets", "vocab.txt")
bert_config_path = os.path.join(self.directory, "assets", "bert_config.json")
self.bert_config = BertConfig(bert_config_path)
def net(self, input_ids, position_ids, segment_ids, input_mask):
"""
create neural network.
Args:
input_ids (tensor): the word ids.
position_ids (tensor): the position ids.
segment_ids (tensor): the segment ids.
input_mask (tensor): the padding mask.
Returns:
pooled_output (tensor): sentence-level output for classification task.
sequence_output (tensor): token-level output for sequence task.
"""
bert = BertModel(src_ids=input_ids,
position_ids=position_ids,
sentence_ids=segment_ids,
input_mask=input_mask,
config=self.bert_config,
use_fp16=False)
pooled_output = bert.get_pooled_output()
sequence_output = bert.get_sequence_output()
return pooled_output, sequence_output
if __name__ == '__main__':
test_module = Bert()
```shell
$ hub install bert_chinese_L-12_H-768_A-12==1.1.0
```
<p align="center">
<img src="https://bj.bcebos.com/paddlehub/paddlehub-img/bert_network.png" hspace='10'/> <br />
</p>
更多详情请参考[BERT论文](https://arxiv.org/abs/1810.04805)
## API
```python
def context(
trainable=True,
max_seq_len=128
)
```
用于获取Module的上下文信息,得到输入、输出以及预训练的Paddle Program副本
**参数**
> trainable:设置为True时,Module中的参数在Fine-tune时也会随之训练,否则保持不变。
> max_seq_len:BERT模型的最大序列长度,若序列长度不足,会通过padding方式补到**max_seq_len**, 若序列长度大于该值,则会以截断方式让序列长度为**max_seq_len**,max_seq_len可取值范围为0~512;
**返回**
> inputs:dict类型,有以下字段:
> >**input_ids**存放输入文本tokenize后各token对应BERT词汇表的word ids, shape为\[batch_size, max_seq_len\],int64类型;
> >**position_ids**存放输入文本tokenize后各token所在该文本的位置,shape为\[batch_size, max_seq_len\],int64类型;
> >**segment_ids**存放各token所在文本的标识(token属于文本1或者文本2),shape为\[batch_size, max_seq_len\],int64类型;
> >**input_mask**存放token是否为padding的标识,shape为\[batch_size, max_seq_len\],int64类型;
>
> outputs:dict类型,Module的输出特征,有以下字段:
> >**pooled_output**字段存放句子粒度的特征,可用于文本分类等任务,shape为 \[batch_size, 768\],int64类型;
> >**sequence_output**字段存放字粒度的特征,可用于序列标注等任务,shape为 \[batch_size, seq_len, 768\],int64类型;
>
> program:包含该Module计算图的Program。
```python
def get_embedding(
texts,
use_gpu=False,
batch_size=1
)
```
用于获取输入文本的句子粒度特征与字粒度特征
**参数**
> texts:输入文本列表,格式为\[\[sample\_a\_text\_a, sample\_a\_text\_b\], \[sample\_b\_text\_a, sample\_b\_text\_b\],…,\],其中每个元素都是一个样例,每个样例可以包含text\_a与text\_b。
> use_gpu:是否使用gpu,默认为False。对于GPU用户,建议开启use_gpu。
**返回**
> results:list类型,格式为\[\[sample\_a\_pooled\_feature, sample\_a\_seq\_feature\], \[sample\_b\_pooled\_feature, sample\_b\_seq\_feature\],…,\],其中每个元素都是对应样例的特征输出,每个样例都有句子粒度特征pooled\_feature与字粒度特征seq\_feature。
>
```python
def get_params_layer()
```
用于获取参数层信息,该方法与ULMFiTStrategy联用可以严格按照层数设置分层学习率与逐层解冻。
**参数**
> 无
**返回**
> params_layer:dict类型,key为参数名,值为参数所在层数
**代码示例**
```python
import paddlehub as hub
# Load $ hub install bert_chinese_L-12_H-768_A-12 pretrained model
module = hub.Module(name="bert_chinese_L-12_H-768_A-12")
inputs, outputs, program = module.context(trainable=True, max_seq_len=128)
# Must feed all the tensor of bert_chinese_L-12_H-768_A-12's module need
input_ids = inputs["input_ids"]
position_ids = inputs["position_ids"]
segment_ids = inputs["segment_ids"]
input_mask = inputs["input_mask"]
# Use "pooled_output" for sentence-level output.
pooled_output = outputs["pooled_output"]
# Use "sequence_output" for token-level output.
sequence_output = outputs["sequence_output"]
# Use "get_embedding" to get embedding result.
embedding_result = module.get_embedding(texts=[["Sample1_text_a"],["Sample2_text_a","Sample2_text_b"]], use_gpu=True)
# Use "get_params_layer" to get params layer and used to ULMFiTStrategy.
params_layer = module.get_params_layer()
strategy = hub.finetune.strategy.ULMFiTStrategy(frz_params_layer=params_layer, dis_params_layer=params_layer)
```
## 查看代码
https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/pretrain_langauge_models/BERT
## 依赖
paddlepaddle >= 1.6.2
paddlehub >= 1.6.0
## 更新历史
* 1.0.0
初始发布
* 1.1.0
支持get_embedding与get_params_layer
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""BERT model."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import six
import json
import paddle.fluid as fluid
from bert_chinese_L_12_H_768_A_12.model.transformer_encoder import encoder, pre_process_layer
class BertConfig(object):
def __init__(self, config_path):
self._config_dict = self._parse(config_path)
def _parse(self, config_path):
try:
with open(config_path) as json_file:
config_dict = json.load(json_file)
except Exception:
raise IOError("Error in parsing bert model config file '%s'" % config_path)
else:
return config_dict
def __getitem__(self, key):
return self._config_dict[key]
def print_config(self):
for arg, value in sorted(six.iteritems(self._config_dict)):
print('%s: %s' % (arg, value))
print('------------------------------------------------')
class BertModel(object):
def __init__(self, src_ids, position_ids, sentence_ids, input_mask, config, weight_sharing=True, use_fp16=False):
self._emb_size = config['hidden_size']
self._n_layer = config['num_hidden_layers']
self._n_head = config['num_attention_heads']
self._voc_size = config['vocab_size']
self._max_position_seq_len = config['max_position_embeddings']
self._sent_types = config['type_vocab_size']
self._hidden_act = config['hidden_act']
self._prepostprocess_dropout = config['hidden_dropout_prob']
self._attention_dropout = config['attention_probs_dropout_prob']
self._weight_sharing = weight_sharing
self._word_emb_name = "word_embedding"
self._pos_emb_name = "pos_embedding"
self._sent_emb_name = "sent_embedding"
self._dtype = "float16" if use_fp16 else "float32"
# Initialize all weigths by truncated normal initializer, and all biases
# will be initialized by constant zero by default.
self._param_initializer = fluid.initializer.TruncatedNormal(scale=config['initializer_range'])
self._build_model(src_ids, position_ids, sentence_ids, input_mask)
def _build_model(self, src_ids, position_ids, sentence_ids, input_mask):
# padding id in vocabulary must be set to 0
emb_out = fluid.layers.embedding(input=src_ids,
size=[self._voc_size, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._word_emb_name,
initializer=self._param_initializer),
is_sparse=False)
position_emb_out = fluid.layers.embedding(input=position_ids,
size=[self._max_position_seq_len, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._pos_emb_name,
initializer=self._param_initializer))
sent_emb_out = fluid.layers.embedding(sentence_ids,
size=[self._sent_types, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._sent_emb_name,
initializer=self._param_initializer))
emb_out = emb_out + position_emb_out
emb_out = emb_out + sent_emb_out
emb_out = pre_process_layer(emb_out, 'nd', self._prepostprocess_dropout, name='pre_encoder')
if self._dtype == "float16":
input_mask = fluid.layers.cast(x=input_mask, dtype=self._dtype)
self_attn_mask = fluid.layers.matmul(x=input_mask, y=input_mask, transpose_y=True)
self_attn_mask = fluid.layers.scale(x=self_attn_mask, scale=10000.0, bias=-1.0, bias_after_scale=False)
n_head_self_attn_mask = fluid.layers.stack(x=[self_attn_mask] * self._n_head, axis=1)
n_head_self_attn_mask.stop_gradient = True
self._enc_out = encoder(enc_input=emb_out,
attn_bias=n_head_self_attn_mask,
n_layer=self._n_layer,
n_head=self._n_head,
d_key=self._emb_size // self._n_head,
d_value=self._emb_size // self._n_head,
d_model=self._emb_size,
d_inner_hid=self._emb_size * 4,
prepostprocess_dropout=self._prepostprocess_dropout,
attention_dropout=self._attention_dropout,
relu_dropout=0,
hidden_act=self._hidden_act,
preprocess_cmd="",
postprocess_cmd="dan",
param_initializer=self._param_initializer,
name='encoder')
def get_sequence_output(self):
return self._enc_out
def get_pooled_output(self):
"""Get the first feature of each sequence for classification"""
next_sent_feat = fluid.layers.slice(input=self._enc_out, axes=[1], starts=[0], ends=[1])
next_sent_feat = fluid.layers.fc(input=next_sent_feat,
size=self._emb_size,
act="tanh",
param_attr=fluid.ParamAttr(name="pooled_fc.w_0",
initializer=self._param_initializer),
bias_attr="pooled_fc.b_0")
return next_sent_feat
def get_pretraining_output(self, mask_label, mask_pos, labels):
"""Get the loss & accuracy for pretraining"""
mask_pos = fluid.layers.cast(x=mask_pos, dtype='int32')
# extract the first token feature in each sentence
next_sent_feat = self.get_pooled_output()
reshaped_emb_out = fluid.layers.reshape(x=self._enc_out, shape=[-1, self._emb_size])
# extract masked tokens' feature
mask_feat = fluid.layers.gather(input=reshaped_emb_out, index=mask_pos)
# transform: fc
mask_trans_feat = fluid.layers.fc(input=mask_feat,
size=self._emb_size,
act=self._hidden_act,
param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0',
initializer=self._param_initializer),
bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
# transform: layer norm
mask_trans_feat = pre_process_layer(mask_trans_feat, 'n', name='mask_lm_trans')
mask_lm_out_bias_attr = fluid.ParamAttr(name="mask_lm_out_fc.b_0",
initializer=fluid.initializer.Constant(value=0.0))
if self._weight_sharing:
fc_out = fluid.layers.matmul(x=mask_trans_feat,
y=fluid.default_main_program().global_block().var(self._word_emb_name),
transpose_y=True)
fc_out += fluid.layers.create_parameter(shape=[self._voc_size],
dtype=self._dtype,
attr=mask_lm_out_bias_attr,
is_bias=True)
else:
fc_out = fluid.layers.fc(input=mask_trans_feat,
size=self._voc_size,
param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0",
initializer=self._param_initializer),
bias_attr=mask_lm_out_bias_attr)
mask_lm_loss = fluid.layers.softmax_with_cross_entropy(logits=fc_out, label=mask_label)
mean_mask_lm_loss = fluid.layers.mean(mask_lm_loss)
next_sent_fc_out = fluid.layers.fc(input=next_sent_feat,
size=2,
param_attr=fluid.ParamAttr(name="next_sent_fc.w_0",
initializer=self._param_initializer),
bias_attr="next_sent_fc.b_0")
next_sent_loss, next_sent_softmax = fluid.layers.softmax_with_cross_entropy(logits=next_sent_fc_out,
label=labels,
return_softmax=True)
next_sent_acc = fluid.layers.accuracy(input=next_sent_softmax, label=labels)
mean_next_sent_loss = fluid.layers.mean(next_sent_loss)
loss = mean_next_sent_loss + mean_mask_lm_loss
return next_sent_acc, mean_mask_lm_loss, loss
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Transformer encoder."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from functools import partial
import paddle.fluid as fluid
import paddle.fluid.layers as layers
def multi_head_attention(queries,
keys,
values,
attn_bias,
d_key,
d_value,
d_model,
n_head=1,
dropout_rate=0.,
cache=None,
param_initializer=None,
name='multi_head_att'):
"""
Multi-Head Attention. Note that attn_bias is added to the logit before
computing softmax activiation to mask certain selected positions so that
they will not considered in attention weights.
"""
keys = queries if keys is None else keys
values = keys if values is None else values
if not (len(queries.shape) == len(keys.shape) == len(values.shape) == 3):
raise ValueError("Inputs: quries, keys and values should all be 3-D tensors.")
def __compute_qkv(queries, keys, values, n_head, d_key, d_value):
"""
Add linear projection to queries, keys, and values.
"""
q = layers.fc(input=queries,
size=d_key * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_query_fc.w_0', initializer=param_initializer),
bias_attr=name + '_query_fc.b_0')
k = layers.fc(input=keys,
size=d_key * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_key_fc.w_0', initializer=param_initializer),
bias_attr=name + '_key_fc.b_0')
v = layers.fc(input=values,
size=d_value * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_value_fc.w_0', initializer=param_initializer),
bias_attr=name + '_value_fc.b_0')
return q, k, v
def __split_heads(x, n_head):
"""
Reshape the last dimension of inpunt tensor x so that it becomes two
dimensions and then transpose. Specifically, input a tensor with shape
[bs, max_sequence_length, n_head * hidden_dim] then output a tensor
with shape [bs, n_head, max_sequence_length, hidden_dim].
"""
hidden_size = x.shape[-1]
# The value 0 in shape attr means copying the corresponding dimension
# size of the input as the output dimension size.
reshaped = layers.reshape(x=x, shape=[0, 0, n_head, hidden_size // n_head], inplace=True)
# permuate the dimensions into:
# [batch_size, n_head, max_sequence_len, hidden_size_per_head]
return layers.transpose(x=reshaped, perm=[0, 2, 1, 3])
def __combine_heads(x):
"""
Transpose and then reshape the last two dimensions of inpunt tensor x
so that it becomes one dimension, which is reverse to __split_heads.
"""
if len(x.shape) == 3: return x
if len(x.shape) != 4:
raise ValueError("Input(x) should be a 4-D Tensor.")
trans_x = layers.transpose(x, perm=[0, 2, 1, 3])
# The value 0 in shape attr means copying the corresponding dimension
# size of the input as the output dimension size.
return layers.reshape(x=trans_x, shape=[0, 0, trans_x.shape[2] * trans_x.shape[3]], inplace=True)
def scaled_dot_product_attention(q, k, v, attn_bias, d_key, dropout_rate):
"""
Scaled Dot-Product Attention
"""
scaled_q = layers.scale(x=q, scale=d_key**-0.5)
product = layers.matmul(x=scaled_q, y=k, transpose_y=True)
if attn_bias:
product += attn_bias
weights = layers.softmax(product)
if dropout_rate:
weights = layers.dropout(weights,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
out = layers.matmul(weights, v)
return out
q, k, v = __compute_qkv(queries, keys, values, n_head, d_key, d_value)
if cache is not None: # use cache and concat time steps
# Since the inplace reshape in __split_heads changes the shape of k and
# v, which is the cache input for next time step, reshape the cache
# input from the previous time step first.
k = cache["k"] = layers.concat([layers.reshape(cache["k"], shape=[0, 0, d_model]), k], axis=1)
v = cache["v"] = layers.concat([layers.reshape(cache["v"], shape=[0, 0, d_model]), v], axis=1)
q = __split_heads(q, n_head)
k = __split_heads(k, n_head)
v = __split_heads(v, n_head)
ctx_multiheads = scaled_dot_product_attention(q, k, v, attn_bias, d_key, dropout_rate)
out = __combine_heads(ctx_multiheads)
# Project back to the model size.
proj_out = layers.fc(input=out,
size=d_model,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_output_fc.w_0', initializer=param_initializer),
bias_attr=name + '_output_fc.b_0')
return proj_out
def positionwise_feed_forward(x, d_inner_hid, d_hid, dropout_rate, hidden_act, param_initializer=None, name='ffn'):
"""
Position-wise Feed-Forward Networks.
This module consists of two linear transformations with a ReLU activation
in between, which is applied to each position separately and identically.
"""
hidden = layers.fc(input=x,
size=d_inner_hid,
num_flatten_dims=2,
act=hidden_act,
param_attr=fluid.ParamAttr(name=name + '_fc_0.w_0', initializer=param_initializer),
bias_attr=name + '_fc_0.b_0')
if dropout_rate:
hidden = layers.dropout(hidden,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
out = layers.fc(input=hidden,
size=d_hid,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_fc_1.w_0', initializer=param_initializer),
bias_attr=name + '_fc_1.b_0')
return out
def pre_post_process_layer(prev_out, out, process_cmd, dropout_rate=0., name=''):
"""
Add residual connection, layer normalization and droput to the out tensor
optionally according to the value of process_cmd.
This will be used before or after multi-head attention and position-wise
feed-forward networks.
"""
for cmd in process_cmd:
if cmd == "a": # add residual connection
out = out + prev_out if prev_out else out
elif cmd == "n": # add layer normalization
out_dtype = out.dtype
if out_dtype == fluid.core.VarDesc.VarType.FP16:
out = layers.cast(x=out, dtype="float32")
out = layers.layer_norm(out,
begin_norm_axis=len(out.shape) - 1,
param_attr=fluid.ParamAttr(name=name + '_layer_norm_scale',
initializer=fluid.initializer.Constant(1.)),
bias_attr=fluid.ParamAttr(name=name + '_layer_norm_bias',
initializer=fluid.initializer.Constant(0.)))
if out_dtype == fluid.core.VarDesc.VarType.FP16:
out = layers.cast(x=out, dtype="float16")
elif cmd == "d": # add dropout
if dropout_rate:
out = layers.dropout(out,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
return out
pre_process_layer = partial(pre_post_process_layer, None)
post_process_layer = pre_post_process_layer
def encoder_layer(enc_input,
attn_bias,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd="n",
postprocess_cmd="da",
param_initializer=None,
name=''):
"""The encoder layers that can be stacked to form a deep encoder.
This module consits of a multi-head (self) attention followed by
position-wise feed-forward networks and both the two components companied
with the post_process_layer to add residual connection, layer normalization
and droput.
"""
attn_output = multi_head_attention(pre_process_layer(enc_input,
preprocess_cmd,
prepostprocess_dropout,
name=name + '_pre_att'),
None,
None,
attn_bias,
d_key,
d_value,
d_model,
n_head,
attention_dropout,
param_initializer=param_initializer,
name=name + '_multi_head_att')
attn_output = post_process_layer(enc_input,
attn_output,
postprocess_cmd,
prepostprocess_dropout,
name=name + '_post_att')
ffd_output = positionwise_feed_forward(pre_process_layer(attn_output,
preprocess_cmd,
prepostprocess_dropout,
name=name + '_pre_ffn'),
d_inner_hid,
d_model,
relu_dropout,
hidden_act,
param_initializer=param_initializer,
name=name + '_ffn')
return post_process_layer(attn_output, ffd_output, postprocess_cmd, prepostprocess_dropout, name=name + '_post_ffn')
def encoder(enc_input,
attn_bias,
n_layer,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd="n",
postprocess_cmd="da",
param_initializer=None,
name=''):
"""
The encoder is composed of a stack of identical layers returned by calling
encoder_layer.
"""
for i in range(n_layer):
enc_output = encoder_layer(enc_input,
attn_bias,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd,
postprocess_cmd,
param_initializer=param_initializer,
name=name + '_layer_' + str(i))
enc_input = enc_output
enc_output = pre_process_layer(enc_output, preprocess_cmd, prepostprocess_dropout, name="post_encoder")
return enc_output
# coding:utf-8
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
from paddlehub import TransformerModule
from paddlehub.module.module import moduleinfo
from bert_chinese_L_12_H_768_A_12.model.bert import BertConfig, BertModel
@moduleinfo(
name="bert_chinese_L-12_H-768_A-12",
version="1.1.0",
summary="bert_chinese_L-12_H-768_A-12, 12-layer, 768-hidden, 12-heads, 110M parameters ",
author="paddlepaddle",
author_email="paddle-dev@baidu.com",
type="nlp/semantic_model",
)
class BertChinese(TransformerModule):
def _initialize(self):
self.MAX_SEQ_LEN = 512
self.params_path = os.path.join(self.directory, "assets", "params")
self.vocab_path = os.path.join(self.directory, "assets", "vocab.txt")
bert_config_path = os.path.join(self.directory, "assets", "bert_config.json")
self.bert_config = BertConfig(bert_config_path)
def net(self, input_ids, position_ids, segment_ids, input_mask):
"""
create neural network.
Args:
input_ids (tensor): the word ids.
position_ids (tensor): the position ids.
segment_ids (tensor): the segment ids.
input_mask (tensor): the padding mask.
Returns:
pooled_output (tensor): sentence-level output for classification task.
sequence_output (tensor): token-level output for sequence task.
"""
bert = BertModel(src_ids=input_ids,
position_ids=position_ids,
sentence_ids=segment_ids,
input_mask=input_mask,
config=self.bert_config,
use_fp16=False)
pooled_output = bert.get_pooled_output()
sequence_output = bert.get_sequence_output()
return pooled_output, sequence_output
if __name__ == '__main__':
test_module = BertChinese()
```shell
$ hub install bert_multi_cased_L-12_H-768_A-12==1.1.0
```
<p align="center">
<img src="https://bj.bcebos.com/paddlehub/paddlehub-img/bert_network.png" hspace='10'/> <br />
</p>
更多详情请参考[BERT论文](https://arxiv.org/abs/1810.04805)
## API
```python
def context(
trainable=True,
max_seq_len=128
)
```
用于获取Module的上下文信息,得到输入、输出以及预训练的Paddle Program副本
**参数**
> trainable:设置为True时,Module中的参数在Fine-tune时也会随之训练,否则保持不变。
> max_seq_len:BERT模型的最大序列长度,若序列长度不足,会通过padding方式补到**max_seq_len**, 若序列长度大于该值,则会以截断方式让序列长度为**max_seq_len**,max_seq_len可取值范围为0~512;
**返回**
> inputs:dict类型,有以下字段:
> >**input_ids**存放输入文本tokenize后各token对应BERT词汇表的word ids, shape为\[batch_size, max_seq_len\],int64类型;
> >**position_ids**存放输入文本tokenize后各token所在该文本的位置,shape为\[batch_size, max_seq_len\],int64类型;
> >**segment_ids**存放各token所在文本的标识(token属于文本1或者文本2),shape为\[batch_size, max_seq_len\],int64类型;
> >**input_mask**存放token是否为padding的标识,shape为\[batch_size, max_seq_len\],int64类型;
>
> outputs:dict类型,Module的输出特征,有以下字段:
> >**pooled_output**字段存放句子粒度的特征,可用于文本分类等任务,shape为 \[batch_size, 768\],int64类型;
> >**sequence_output**字段存放字粒度的特征,可用于序列标注等任务,shape为 \[batch_size, seq_len, 768\],int64类型;
>
> program:包含该Module计算图的Program。
```python
def get_embedding(
texts,
use_gpu=False,
batch_size=1
)
```
用于获取输入文本的句子粒度特征与字粒度特征
**参数**
> texts:输入文本列表,格式为\[\[sample\_a\_text\_a, sample\_a\_text\_b\], \[sample\_b\_text\_a, sample\_b\_text\_b\],…,\],其中每个元素都是一个样例,每个样例可以包含text\_a与text\_b。
> use_gpu:是否使用gpu,默认为False。对于GPU用户,建议开启use_gpu。
**返回**
> results:list类型,格式为\[\[sample\_a\_pooled\_feature, sample\_a\_seq\_feature\], \[sample\_b\_pooled\_feature, sample\_b\_seq\_feature\],…,\],其中每个元素都是对应样例的特征输出,每个样例都有句子粒度特征pooled\_feature与字粒度特征seq\_feature。
>
```python
def get_params_layer()
```
用于获取参数层信息,该方法与ULMFiTStrategy联用可以严格按照层数设置分层学习率与逐层解冻。
**参数**
> 无
**返回**
> params_layer:dict类型,key为参数名,值为参数所在层数
**代码示例**
```python
import paddlehub as hub
# Load $ hub install bert_multi_cased_L-12_H-768_A-12 pretrained model
module = hub.Module(name="bert_multi_cased_L-12_H-768_A-12")
inputs, outputs, program = module.context(trainable=True, max_seq_len=128)
# Must feed all the tensor of bert_multi_cased_L-12_H-768_A-12's module need
input_ids = inputs["input_ids"]
position_ids = inputs["position_ids"]
segment_ids = inputs["segment_ids"]
input_mask = inputs["input_mask"]
# Use "pooled_output" for sentence-level output.
pooled_output = outputs["pooled_output"]
# Use "sequence_output" for token-level output.
sequence_output = outputs["sequence_output"]
# Use "get_embedding" to get embedding result.
embedding_result = module.get_embedding(texts=[["Sample1_text_a"],["Sample2_text_a","Sample2_text_b"]], use_gpu=True)
# Use "get_params_layer" to get params layer and used to ULMFiTStrategy.
params_layer = module.get_params_layer()
strategy = hub.finetune.strategy.ULMFiTStrategy(frz_params_layer=params_layer, dis_params_layer=params_layer)
```
## 查看代码
https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/pretrain_langauge_models/BERT
## 依赖
paddlepaddle >= 1.6.2
paddlehub >= 1.6.0
## 更新历史
* 1.0.0
初始发布
* 1.1.0
支持get_embedding与get_params_layer
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""BERT model."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import six
import json
import paddle.fluid as fluid
from bert_multi_cased_L_12_H_768_A_12.model.transformer_encoder import encoder, pre_process_layer
class BertConfig(object):
def __init__(self, config_path):
self._config_dict = self._parse(config_path)
def _parse(self, config_path):
try:
with open(config_path) as json_file:
config_dict = json.load(json_file)
except Exception:
raise IOError("Error in parsing bert model config file '%s'" % config_path)
else:
return config_dict
def __getitem__(self, key):
return self._config_dict[key]
def print_config(self):
for arg, value in sorted(six.iteritems(self._config_dict)):
print('%s: %s' % (arg, value))
print('------------------------------------------------')
class BertModel(object):
def __init__(self, src_ids, position_ids, sentence_ids, input_mask, config, weight_sharing=True, use_fp16=False):
self._emb_size = config['hidden_size']
self._n_layer = config['num_hidden_layers']
self._n_head = config['num_attention_heads']
self._voc_size = config['vocab_size']
self._max_position_seq_len = config['max_position_embeddings']
self._sent_types = config['type_vocab_size']
self._hidden_act = config['hidden_act']
self._prepostprocess_dropout = config['hidden_dropout_prob']
self._attention_dropout = config['attention_probs_dropout_prob']
self._weight_sharing = weight_sharing
self._word_emb_name = "word_embedding"
self._pos_emb_name = "pos_embedding"
self._sent_emb_name = "sent_embedding"
self._dtype = "float16" if use_fp16 else "float32"
# Initialize all weigths by truncated normal initializer, and all biases
# will be initialized by constant zero by default.
self._param_initializer = fluid.initializer.TruncatedNormal(scale=config['initializer_range'])
self._build_model(src_ids, position_ids, sentence_ids, input_mask)
def _build_model(self, src_ids, position_ids, sentence_ids, input_mask):
# padding id in vocabulary must be set to 0
emb_out = fluid.layers.embedding(input=src_ids,
size=[self._voc_size, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._word_emb_name,
initializer=self._param_initializer),
is_sparse=False)
position_emb_out = fluid.layers.embedding(input=position_ids,
size=[self._max_position_seq_len, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._pos_emb_name,
initializer=self._param_initializer))
sent_emb_out = fluid.layers.embedding(sentence_ids,
size=[self._sent_types, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._sent_emb_name,
initializer=self._param_initializer))
emb_out = emb_out + position_emb_out
emb_out = emb_out + sent_emb_out
emb_out = pre_process_layer(emb_out, 'nd', self._prepostprocess_dropout, name='pre_encoder')
if self._dtype == "float16":
input_mask = fluid.layers.cast(x=input_mask, dtype=self._dtype)
self_attn_mask = fluid.layers.matmul(x=input_mask, y=input_mask, transpose_y=True)
self_attn_mask = fluid.layers.scale(x=self_attn_mask, scale=10000.0, bias=-1.0, bias_after_scale=False)
n_head_self_attn_mask = fluid.layers.stack(x=[self_attn_mask] * self._n_head, axis=1)
n_head_self_attn_mask.stop_gradient = True
self._enc_out = encoder(enc_input=emb_out,
attn_bias=n_head_self_attn_mask,
n_layer=self._n_layer,
n_head=self._n_head,
d_key=self._emb_size // self._n_head,
d_value=self._emb_size // self._n_head,
d_model=self._emb_size,
d_inner_hid=self._emb_size * 4,
prepostprocess_dropout=self._prepostprocess_dropout,
attention_dropout=self._attention_dropout,
relu_dropout=0,
hidden_act=self._hidden_act,
preprocess_cmd="",
postprocess_cmd="dan",
param_initializer=self._param_initializer,
name='encoder')
def get_sequence_output(self):
return self._enc_out
def get_pooled_output(self):
"""Get the first feature of each sequence for classification"""
next_sent_feat = fluid.layers.slice(input=self._enc_out, axes=[1], starts=[0], ends=[1])
next_sent_feat = fluid.layers.fc(input=next_sent_feat,
size=self._emb_size,
act="tanh",
param_attr=fluid.ParamAttr(name="pooled_fc.w_0",
initializer=self._param_initializer),
bias_attr="pooled_fc.b_0")
return next_sent_feat
def get_pretraining_output(self, mask_label, mask_pos, labels):
"""Get the loss & accuracy for pretraining"""
mask_pos = fluid.layers.cast(x=mask_pos, dtype='int32')
# extract the first token feature in each sentence
next_sent_feat = self.get_pooled_output()
reshaped_emb_out = fluid.layers.reshape(x=self._enc_out, shape=[-1, self._emb_size])
# extract masked tokens' feature
mask_feat = fluid.layers.gather(input=reshaped_emb_out, index=mask_pos)
# transform: fc
mask_trans_feat = fluid.layers.fc(input=mask_feat,
size=self._emb_size,
act=self._hidden_act,
param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0',
initializer=self._param_initializer),
bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
# transform: layer norm
mask_trans_feat = pre_process_layer(mask_trans_feat, 'n', name='mask_lm_trans')
mask_lm_out_bias_attr = fluid.ParamAttr(name="mask_lm_out_fc.b_0",
initializer=fluid.initializer.Constant(value=0.0))
if self._weight_sharing:
fc_out = fluid.layers.matmul(x=mask_trans_feat,
y=fluid.default_main_program().global_block().var(self._word_emb_name),
transpose_y=True)
fc_out += fluid.layers.create_parameter(shape=[self._voc_size],
dtype=self._dtype,
attr=mask_lm_out_bias_attr,
is_bias=True)
else:
fc_out = fluid.layers.fc(input=mask_trans_feat,
size=self._voc_size,
param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0",
initializer=self._param_initializer),
bias_attr=mask_lm_out_bias_attr)
mask_lm_loss = fluid.layers.softmax_with_cross_entropy(logits=fc_out, label=mask_label)
mean_mask_lm_loss = fluid.layers.mean(mask_lm_loss)
next_sent_fc_out = fluid.layers.fc(input=next_sent_feat,
size=2,
param_attr=fluid.ParamAttr(name="next_sent_fc.w_0",
initializer=self._param_initializer),
bias_attr="next_sent_fc.b_0")
next_sent_loss, next_sent_softmax = fluid.layers.softmax_with_cross_entropy(logits=next_sent_fc_out,
label=labels,
return_softmax=True)
next_sent_acc = fluid.layers.accuracy(input=next_sent_softmax, label=labels)
mean_next_sent_loss = fluid.layers.mean(next_sent_loss)
loss = mean_next_sent_loss + mean_mask_lm_loss
return next_sent_acc, mean_mask_lm_loss, loss
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Transformer encoder."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from functools import partial
import paddle.fluid as fluid
import paddle.fluid.layers as layers
def multi_head_attention(queries,
keys,
values,
attn_bias,
d_key,
d_value,
d_model,
n_head=1,
dropout_rate=0.,
cache=None,
param_initializer=None,
name='multi_head_att'):
"""
Multi-Head Attention. Note that attn_bias is added to the logit before
computing softmax activiation to mask certain selected positions so that
they will not considered in attention weights.
"""
keys = queries if keys is None else keys
values = keys if values is None else values
if not (len(queries.shape) == len(keys.shape) == len(values.shape) == 3):
raise ValueError("Inputs: quries, keys and values should all be 3-D tensors.")
def __compute_qkv(queries, keys, values, n_head, d_key, d_value):
"""
Add linear projection to queries, keys, and values.
"""
q = layers.fc(input=queries,
size=d_key * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_query_fc.w_0', initializer=param_initializer),
bias_attr=name + '_query_fc.b_0')
k = layers.fc(input=keys,
size=d_key * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_key_fc.w_0', initializer=param_initializer),
bias_attr=name + '_key_fc.b_0')
v = layers.fc(input=values,
size=d_value * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_value_fc.w_0', initializer=param_initializer),
bias_attr=name + '_value_fc.b_0')
return q, k, v
def __split_heads(x, n_head):
"""
Reshape the last dimension of inpunt tensor x so that it becomes two
dimensions and then transpose. Specifically, input a tensor with shape
[bs, max_sequence_length, n_head * hidden_dim] then output a tensor
with shape [bs, n_head, max_sequence_length, hidden_dim].
"""
hidden_size = x.shape[-1]
# The value 0 in shape attr means copying the corresponding dimension
# size of the input as the output dimension size.
reshaped = layers.reshape(x=x, shape=[0, 0, n_head, hidden_size // n_head], inplace=True)
# permuate the dimensions into:
# [batch_size, n_head, max_sequence_len, hidden_size_per_head]
return layers.transpose(x=reshaped, perm=[0, 2, 1, 3])
def __combine_heads(x):
"""
Transpose and then reshape the last two dimensions of inpunt tensor x
so that it becomes one dimension, which is reverse to __split_heads.
"""
if len(x.shape) == 3: return x
if len(x.shape) != 4:
raise ValueError("Input(x) should be a 4-D Tensor.")
trans_x = layers.transpose(x, perm=[0, 2, 1, 3])
# The value 0 in shape attr means copying the corresponding dimension
# size of the input as the output dimension size.
return layers.reshape(x=trans_x, shape=[0, 0, trans_x.shape[2] * trans_x.shape[3]], inplace=True)
def scaled_dot_product_attention(q, k, v, attn_bias, d_key, dropout_rate):
"""
Scaled Dot-Product Attention
"""
scaled_q = layers.scale(x=q, scale=d_key**-0.5)
product = layers.matmul(x=scaled_q, y=k, transpose_y=True)
if attn_bias:
product += attn_bias
weights = layers.softmax(product)
if dropout_rate:
weights = layers.dropout(weights,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
out = layers.matmul(weights, v)
return out
q, k, v = __compute_qkv(queries, keys, values, n_head, d_key, d_value)
if cache is not None: # use cache and concat time steps
# Since the inplace reshape in __split_heads changes the shape of k and
# v, which is the cache input for next time step, reshape the cache
# input from the previous time step first.
k = cache["k"] = layers.concat([layers.reshape(cache["k"], shape=[0, 0, d_model]), k], axis=1)
v = cache["v"] = layers.concat([layers.reshape(cache["v"], shape=[0, 0, d_model]), v], axis=1)
q = __split_heads(q, n_head)
k = __split_heads(k, n_head)
v = __split_heads(v, n_head)
ctx_multiheads = scaled_dot_product_attention(q, k, v, attn_bias, d_key, dropout_rate)
out = __combine_heads(ctx_multiheads)
# Project back to the model size.
proj_out = layers.fc(input=out,
size=d_model,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_output_fc.w_0', initializer=param_initializer),
bias_attr=name + '_output_fc.b_0')
return proj_out
def positionwise_feed_forward(x, d_inner_hid, d_hid, dropout_rate, hidden_act, param_initializer=None, name='ffn'):
"""
Position-wise Feed-Forward Networks.
This module consists of two linear transformations with a ReLU activation
in between, which is applied to each position separately and identically.
"""
hidden = layers.fc(input=x,
size=d_inner_hid,
num_flatten_dims=2,
act=hidden_act,
param_attr=fluid.ParamAttr(name=name + '_fc_0.w_0', initializer=param_initializer),
bias_attr=name + '_fc_0.b_0')
if dropout_rate:
hidden = layers.dropout(hidden,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
out = layers.fc(input=hidden,
size=d_hid,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_fc_1.w_0', initializer=param_initializer),
bias_attr=name + '_fc_1.b_0')
return out
def pre_post_process_layer(prev_out, out, process_cmd, dropout_rate=0., name=''):
"""
Add residual connection, layer normalization and droput to the out tensor
optionally according to the value of process_cmd.
This will be used before or after multi-head attention and position-wise
feed-forward networks.
"""
for cmd in process_cmd:
if cmd == "a": # add residual connection
out = out + prev_out if prev_out else out
elif cmd == "n": # add layer normalization
out_dtype = out.dtype
if out_dtype == fluid.core.VarDesc.VarType.FP16:
out = layers.cast(x=out, dtype="float32")
out = layers.layer_norm(out,
begin_norm_axis=len(out.shape) - 1,
param_attr=fluid.ParamAttr(name=name + '_layer_norm_scale',
initializer=fluid.initializer.Constant(1.)),
bias_attr=fluid.ParamAttr(name=name + '_layer_norm_bias',
initializer=fluid.initializer.Constant(0.)))
if out_dtype == fluid.core.VarDesc.VarType.FP16:
out = layers.cast(x=out, dtype="float16")
elif cmd == "d": # add dropout
if dropout_rate:
out = layers.dropout(out,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
return out
pre_process_layer = partial(pre_post_process_layer, None)
post_process_layer = pre_post_process_layer
def encoder_layer(enc_input,
attn_bias,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd="n",
postprocess_cmd="da",
param_initializer=None,
name=''):
"""The encoder layers that can be stacked to form a deep encoder.
This module consits of a multi-head (self) attention followed by
position-wise feed-forward networks and both the two components companied
with the post_process_layer to add residual connection, layer normalization
and droput.
"""
attn_output = multi_head_attention(pre_process_layer(enc_input,
preprocess_cmd,
prepostprocess_dropout,
name=name + '_pre_att'),
None,
None,
attn_bias,
d_key,
d_value,
d_model,
n_head,
attention_dropout,
param_initializer=param_initializer,
name=name + '_multi_head_att')
attn_output = post_process_layer(enc_input,
attn_output,
postprocess_cmd,
prepostprocess_dropout,
name=name + '_post_att')
ffd_output = positionwise_feed_forward(pre_process_layer(attn_output,
preprocess_cmd,
prepostprocess_dropout,
name=name + '_pre_ffn'),
d_inner_hid,
d_model,
relu_dropout,
hidden_act,
param_initializer=param_initializer,
name=name + '_ffn')
return post_process_layer(attn_output, ffd_output, postprocess_cmd, prepostprocess_dropout, name=name + '_post_ffn')
def encoder(enc_input,
attn_bias,
n_layer,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd="n",
postprocess_cmd="da",
param_initializer=None,
name=''):
"""
The encoder is composed of a stack of identical layers returned by calling
encoder_layer.
"""
for i in range(n_layer):
enc_output = encoder_layer(enc_input,
attn_bias,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd,
postprocess_cmd,
param_initializer=param_initializer,
name=name + '_layer_' + str(i))
enc_input = enc_output
enc_output = pre_process_layer(enc_output, preprocess_cmd, prepostprocess_dropout, name="post_encoder")
return enc_output
# coding:utf-8
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
from paddlehub import TransformerModule
from paddlehub.module.module import moduleinfo
from bert_multi_cased_L_12_H_768_A_12.model.bert import BertConfig, BertModel
@moduleinfo(
name="bert_multi_cased_L-12_H-768_A-12",
version="1.1.0",
summary="bert_multi_cased_L-12_H-768_A-12, 12-layer, 768-hidden, 12-heads, 110M parameters ",
author="paddlepaddle",
author_email="paddle-dev@baidu.com",
type="nlp/semantic_model",
)
class Bert(TransformerModule):
def _initialize(self):
self.MAX_SEQ_LEN = 512
self.params_path = os.path.join(self.directory, "assets", "params")
self.vocab_path = os.path.join(self.directory, "assets", "vocab.txt")
bert_config_path = os.path.join(self.directory, "assets", "bert_config.json")
self.bert_config = BertConfig(bert_config_path)
def net(self, input_ids, position_ids, segment_ids, input_mask):
"""
create neural network.
Args:
input_ids (tensor): the word ids.
position_ids (tensor): the position ids.
segment_ids (tensor): the segment ids.
input_mask (tensor): the padding mask.
Returns:
pooled_output (tensor): sentence-level output for classification task.
sequence_output (tensor): token-level output for sequence task.
"""
bert = BertModel(src_ids=input_ids,
position_ids=position_ids,
sentence_ids=segment_ids,
input_mask=input_mask,
config=self.bert_config,
use_fp16=False)
pooled_output = bert.get_pooled_output()
sequence_output = bert.get_sequence_output()
return pooled_output, sequence_output
if __name__ == '__main__':
test_module = Bert()
```shell
$ hub install bert_multi_uncased_L-12_H-768_A-12==1.0.0
```
<p align="center">
<img src="https://bj.bcebos.com/paddlehub/paddlehub-img/bert_network.png" hspace='10'/> <br />
</p>
更多详情请参考[BERT论文](https://arxiv.org/abs/1810.04805)
## API
```python
def context(
trainable=True,
max_seq_len=128
)
```
用于获取Module的上下文信息,得到输入、输出以及预训练的Paddle Program副本
**参数**
> trainable:设置为True时,Module中的参数在Fine-tune时也会随之训练,否则保持不变。
> max_seq_len:BERT模型的最大序列长度,若序列长度不足,会通过padding方式补到**max_seq_len**, 若序列长度大于该值,则会以截断方式让序列长度为**max_seq_len**,max_seq_len可取值范围为0~512;
**返回**
> inputs:dict类型,有以下字段:
> >**input_ids**存放输入文本tokenize后各token对应BERT词汇表的word ids, shape为\[batch_size, max_seq_len\],int64类型;
> >**position_ids**存放输入文本tokenize后各token所在该文本的位置,shape为\[batch_size, max_seq_len\],int64类型;
> >**segment_ids**存放各token所在文本的标识(token属于文本1或者文本2),shape为\[batch_size, max_seq_len\],int64类型;
> >**input_mask**存放token是否为padding的标识,shape为\[batch_size, max_seq_len\],int64类型;
>
> outputs:dict类型,Module的输出特征,有以下字段:
> >**pooled_output**字段存放句子粒度的特征,可用于文本分类等任务,shape为 \[batch_size, 768\],int64类型;
> >**sequence_output**字段存放字粒度的特征,可用于序列标注等任务,shape为 \[batch_size, seq_len, 768\],int64类型;
>
> program:包含该Module计算图的Program。
```python
def get_embedding(
texts,
use_gpu=False,
batch_size=1
)
```
用于获取输入文本的句子粒度特征与字粒度特征
**参数**
> texts:输入文本列表,格式为\[\[sample\_a\_text\_a, sample\_a\_text\_b\], \[sample\_b\_text\_a, sample\_b\_text\_b\],…,\],其中每个元素都是一个样例,每个样例可以包含text\_a与text\_b。
> use_gpu:是否使用gpu,默认为False。对于GPU用户,建议开启use_gpu。
**返回**
> results:list类型,格式为\[\[sample\_a\_pooled\_feature, sample\_a\_seq\_feature\], \[sample\_b\_pooled\_feature, sample\_b\_seq\_feature\],…,\],其中每个元素都是对应样例的特征输出,每个样例都有句子粒度特征pooled\_feature与字粒度特征seq\_feature。
>
```python
def get_params_layer()
```
用于获取参数层信息,该方法与ULMFiTStrategy联用可以严格按照层数设置分层学习率与逐层解冻。
**参数**
> 无
**返回**
> params_layer:dict类型,key为参数名,值为参数所在层数
**代码示例**
```python
import paddlehub as hub
# Load $ hub install bert_multi_uncased_L-12_H-768_A-12 pretrained model
module = hub.Module(name="bert_multi_uncased_L-12_H-768_A-12")
inputs, outputs, program = module.context(trainable=True, max_seq_len=128)
# Must feed all the tensor of bert_multi_uncased_L-12_H-768_A-12's module need
input_ids = inputs["input_ids"]
position_ids = inputs["position_ids"]
segment_ids = inputs["segment_ids"]
input_mask = inputs["input_mask"]
# Use "pooled_output" for sentence-level output.
pooled_output = outputs["pooled_output"]
# Use "sequence_output" for token-level output.
sequence_output = outputs["sequence_output"]
# Use "get_embedding" to get embedding result.
embedding_result = module.get_embedding(texts=[["Sample1_text_a"],["Sample2_text_a","Sample2_text_b"]], use_gpu=True)
# Use "get_params_layer" to get params layer and used to ULMFiTStrategy.
params_layer = module.get_params_layer()
strategy = hub.finetune.strategy.ULMFiTStrategy(frz_params_layer=params_layer, dis_params_layer=params_layer)
```
## 查看代码
https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/pretrain_langauge_models/BERT
## 依赖
paddlepaddle >= 1.6.2
paddlehub >= 1.6.0
## 更新历史
* 1.0.0
初始发布
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""BERT model."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import six
import json
import paddle.fluid as fluid
from bert_multi_uncased_L_12_H_768_A_12.model.transformer_encoder import encoder, pre_process_layer
class BertConfig(object):
def __init__(self, config_path):
self._config_dict = self._parse(config_path)
def _parse(self, config_path):
try:
with open(config_path) as json_file:
config_dict = json.load(json_file)
except Exception:
raise IOError("Error in parsing bert model config file '%s'" % config_path)
else:
return config_dict
def __getitem__(self, key):
return self._config_dict[key]
def print_config(self):
for arg, value in sorted(six.iteritems(self._config_dict)):
print('%s: %s' % (arg, value))
print('------------------------------------------------')
class BertModel(object):
def __init__(self, src_ids, position_ids, sentence_ids, input_mask, config, weight_sharing=True, use_fp16=False):
self._emb_size = config['hidden_size']
self._n_layer = config['num_hidden_layers']
self._n_head = config['num_attention_heads']
self._voc_size = config['vocab_size']
self._max_position_seq_len = config['max_position_embeddings']
self._sent_types = config['type_vocab_size']
self._hidden_act = config['hidden_act']
self._prepostprocess_dropout = config['hidden_dropout_prob']
self._attention_dropout = config['attention_probs_dropout_prob']
self._weight_sharing = weight_sharing
self._word_emb_name = "word_embedding"
self._pos_emb_name = "pos_embedding"
self._sent_emb_name = "sent_embedding"
self._dtype = "float16" if use_fp16 else "float32"
# Initialize all weigths by truncated normal initializer, and all biases
# will be initialized by constant zero by default.
self._param_initializer = fluid.initializer.TruncatedNormal(scale=config['initializer_range'])
self._build_model(src_ids, position_ids, sentence_ids, input_mask)
def _build_model(self, src_ids, position_ids, sentence_ids, input_mask):
# padding id in vocabulary must be set to 0
emb_out = fluid.layers.embedding(input=src_ids,
size=[self._voc_size, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._word_emb_name,
initializer=self._param_initializer),
is_sparse=False)
position_emb_out = fluid.layers.embedding(input=position_ids,
size=[self._max_position_seq_len, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._pos_emb_name,
initializer=self._param_initializer))
sent_emb_out = fluid.layers.embedding(sentence_ids,
size=[self._sent_types, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._sent_emb_name,
initializer=self._param_initializer))
emb_out = emb_out + position_emb_out
emb_out = emb_out + sent_emb_out
emb_out = pre_process_layer(emb_out, 'nd', self._prepostprocess_dropout, name='pre_encoder')
if self._dtype == "float16":
input_mask = fluid.layers.cast(x=input_mask, dtype=self._dtype)
self_attn_mask = fluid.layers.matmul(x=input_mask, y=input_mask, transpose_y=True)
self_attn_mask = fluid.layers.scale(x=self_attn_mask, scale=10000.0, bias=-1.0, bias_after_scale=False)
n_head_self_attn_mask = fluid.layers.stack(x=[self_attn_mask] * self._n_head, axis=1)
n_head_self_attn_mask.stop_gradient = True
self._enc_out = encoder(enc_input=emb_out,
attn_bias=n_head_self_attn_mask,
n_layer=self._n_layer,
n_head=self._n_head,
d_key=self._emb_size // self._n_head,
d_value=self._emb_size // self._n_head,
d_model=self._emb_size,
d_inner_hid=self._emb_size * 4,
prepostprocess_dropout=self._prepostprocess_dropout,
attention_dropout=self._attention_dropout,
relu_dropout=0,
hidden_act=self._hidden_act,
preprocess_cmd="",
postprocess_cmd="dan",
param_initializer=self._param_initializer,
name='encoder')
def get_sequence_output(self):
return self._enc_out
def get_pooled_output(self):
"""Get the first feature of each sequence for classification"""
next_sent_feat = fluid.layers.slice(input=self._enc_out, axes=[1], starts=[0], ends=[1])
next_sent_feat = fluid.layers.fc(input=next_sent_feat,
size=self._emb_size,
act="tanh",
param_attr=fluid.ParamAttr(name="pooled_fc.w_0",
initializer=self._param_initializer),
bias_attr="pooled_fc.b_0")
return next_sent_feat
def get_pretraining_output(self, mask_label, mask_pos, labels):
"""Get the loss & accuracy for pretraining"""
mask_pos = fluid.layers.cast(x=mask_pos, dtype='int32')
# extract the first token feature in each sentence
next_sent_feat = self.get_pooled_output()
reshaped_emb_out = fluid.layers.reshape(x=self._enc_out, shape=[-1, self._emb_size])
# extract masked tokens' feature
mask_feat = fluid.layers.gather(input=reshaped_emb_out, index=mask_pos)
# transform: fc
mask_trans_feat = fluid.layers.fc(input=mask_feat,
size=self._emb_size,
act=self._hidden_act,
param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0',
initializer=self._param_initializer),
bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
# transform: layer norm
mask_trans_feat = pre_process_layer(mask_trans_feat, 'n', name='mask_lm_trans')
mask_lm_out_bias_attr = fluid.ParamAttr(name="mask_lm_out_fc.b_0",
initializer=fluid.initializer.Constant(value=0.0))
if self._weight_sharing:
fc_out = fluid.layers.matmul(x=mask_trans_feat,
y=fluid.default_main_program().global_block().var(self._word_emb_name),
transpose_y=True)
fc_out += fluid.layers.create_parameter(shape=[self._voc_size],
dtype=self._dtype,
attr=mask_lm_out_bias_attr,
is_bias=True)
else:
fc_out = fluid.layers.fc(input=mask_trans_feat,
size=self._voc_size,
param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0",
initializer=self._param_initializer),
bias_attr=mask_lm_out_bias_attr)
mask_lm_loss = fluid.layers.softmax_with_cross_entropy(logits=fc_out, label=mask_label)
mean_mask_lm_loss = fluid.layers.mean(mask_lm_loss)
next_sent_fc_out = fluid.layers.fc(input=next_sent_feat,
size=2,
param_attr=fluid.ParamAttr(name="next_sent_fc.w_0",
initializer=self._param_initializer),
bias_attr="next_sent_fc.b_0")
next_sent_loss, next_sent_softmax = fluid.layers.softmax_with_cross_entropy(logits=next_sent_fc_out,
label=labels,
return_softmax=True)
next_sent_acc = fluid.layers.accuracy(input=next_sent_softmax, label=labels)
mean_next_sent_loss = fluid.layers.mean(next_sent_loss)
loss = mean_next_sent_loss + mean_mask_lm_loss
return next_sent_acc, mean_mask_lm_loss, loss
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Transformer encoder."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from functools import partial
import paddle.fluid as fluid
import paddle.fluid.layers as layers
def multi_head_attention(queries,
keys,
values,
attn_bias,
d_key,
d_value,
d_model,
n_head=1,
dropout_rate=0.,
cache=None,
param_initializer=None,
name='multi_head_att'):
"""
Multi-Head Attention. Note that attn_bias is added to the logit before
computing softmax activiation to mask certain selected positions so that
they will not considered in attention weights.
"""
keys = queries if keys is None else keys
values = keys if values is None else values
if not (len(queries.shape) == len(keys.shape) == len(values.shape) == 3):
raise ValueError("Inputs: quries, keys and values should all be 3-D tensors.")
def __compute_qkv(queries, keys, values, n_head, d_key, d_value):
"""
Add linear projection to queries, keys, and values.
"""
q = layers.fc(input=queries,
size=d_key * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_query_fc.w_0', initializer=param_initializer),
bias_attr=name + '_query_fc.b_0')
k = layers.fc(input=keys,
size=d_key * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_key_fc.w_0', initializer=param_initializer),
bias_attr=name + '_key_fc.b_0')
v = layers.fc(input=values,
size=d_value * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_value_fc.w_0', initializer=param_initializer),
bias_attr=name + '_value_fc.b_0')
return q, k, v
def __split_heads(x, n_head):
"""
Reshape the last dimension of inpunt tensor x so that it becomes two
dimensions and then transpose. Specifically, input a tensor with shape
[bs, max_sequence_length, n_head * hidden_dim] then output a tensor
with shape [bs, n_head, max_sequence_length, hidden_dim].
"""
hidden_size = x.shape[-1]
# The value 0 in shape attr means copying the corresponding dimension
# size of the input as the output dimension size.
reshaped = layers.reshape(x=x, shape=[0, 0, n_head, hidden_size // n_head], inplace=True)
# permuate the dimensions into:
# [batch_size, n_head, max_sequence_len, hidden_size_per_head]
return layers.transpose(x=reshaped, perm=[0, 2, 1, 3])
def __combine_heads(x):
"""
Transpose and then reshape the last two dimensions of inpunt tensor x
so that it becomes one dimension, which is reverse to __split_heads.
"""
if len(x.shape) == 3: return x
if len(x.shape) != 4:
raise ValueError("Input(x) should be a 4-D Tensor.")
trans_x = layers.transpose(x, perm=[0, 2, 1, 3])
# The value 0 in shape attr means copying the corresponding dimension
# size of the input as the output dimension size.
return layers.reshape(x=trans_x, shape=[0, 0, trans_x.shape[2] * trans_x.shape[3]], inplace=True)
def scaled_dot_product_attention(q, k, v, attn_bias, d_key, dropout_rate):
"""
Scaled Dot-Product Attention
"""
scaled_q = layers.scale(x=q, scale=d_key**-0.5)
product = layers.matmul(x=scaled_q, y=k, transpose_y=True)
if attn_bias:
product += attn_bias
weights = layers.softmax(product)
if dropout_rate:
weights = layers.dropout(weights,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
out = layers.matmul(weights, v)
return out
q, k, v = __compute_qkv(queries, keys, values, n_head, d_key, d_value)
if cache is not None: # use cache and concat time steps
# Since the inplace reshape in __split_heads changes the shape of k and
# v, which is the cache input for next time step, reshape the cache
# input from the previous time step first.
k = cache["k"] = layers.concat([layers.reshape(cache["k"], shape=[0, 0, d_model]), k], axis=1)
v = cache["v"] = layers.concat([layers.reshape(cache["v"], shape=[0, 0, d_model]), v], axis=1)
q = __split_heads(q, n_head)
k = __split_heads(k, n_head)
v = __split_heads(v, n_head)
ctx_multiheads = scaled_dot_product_attention(q, k, v, attn_bias, d_key, dropout_rate)
out = __combine_heads(ctx_multiheads)
# Project back to the model size.
proj_out = layers.fc(input=out,
size=d_model,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_output_fc.w_0', initializer=param_initializer),
bias_attr=name + '_output_fc.b_0')
return proj_out
def positionwise_feed_forward(x, d_inner_hid, d_hid, dropout_rate, hidden_act, param_initializer=None, name='ffn'):
"""
Position-wise Feed-Forward Networks.
This module consists of two linear transformations with a ReLU activation
in between, which is applied to each position separately and identically.
"""
hidden = layers.fc(input=x,
size=d_inner_hid,
num_flatten_dims=2,
act=hidden_act,
param_attr=fluid.ParamAttr(name=name + '_fc_0.w_0', initializer=param_initializer),
bias_attr=name + '_fc_0.b_0')
if dropout_rate:
hidden = layers.dropout(hidden,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
out = layers.fc(input=hidden,
size=d_hid,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_fc_1.w_0', initializer=param_initializer),
bias_attr=name + '_fc_1.b_0')
return out
def pre_post_process_layer(prev_out, out, process_cmd, dropout_rate=0., name=''):
"""
Add residual connection, layer normalization and droput to the out tensor
optionally according to the value of process_cmd.
This will be used before or after multi-head attention and position-wise
feed-forward networks.
"""
for cmd in process_cmd:
if cmd == "a": # add residual connection
out = out + prev_out if prev_out else out
elif cmd == "n": # add layer normalization
out_dtype = out.dtype
if out_dtype == fluid.core.VarDesc.VarType.FP16:
out = layers.cast(x=out, dtype="float32")
out = layers.layer_norm(out,
begin_norm_axis=len(out.shape) - 1,
param_attr=fluid.ParamAttr(name=name + '_layer_norm_scale',
initializer=fluid.initializer.Constant(1.)),
bias_attr=fluid.ParamAttr(name=name + '_layer_norm_bias',
initializer=fluid.initializer.Constant(0.)))
if out_dtype == fluid.core.VarDesc.VarType.FP16:
out = layers.cast(x=out, dtype="float16")
elif cmd == "d": # add dropout
if dropout_rate:
out = layers.dropout(out,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
return out
pre_process_layer = partial(pre_post_process_layer, None)
post_process_layer = pre_post_process_layer
def encoder_layer(enc_input,
attn_bias,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd="n",
postprocess_cmd="da",
param_initializer=None,
name=''):
"""The encoder layers that can be stacked to form a deep encoder.
This module consits of a multi-head (self) attention followed by
position-wise feed-forward networks and both the two components companied
with the post_process_layer to add residual connection, layer normalization
and droput.
"""
attn_output = multi_head_attention(pre_process_layer(enc_input,
preprocess_cmd,
prepostprocess_dropout,
name=name + '_pre_att'),
None,
None,
attn_bias,
d_key,
d_value,
d_model,
n_head,
attention_dropout,
param_initializer=param_initializer,
name=name + '_multi_head_att')
attn_output = post_process_layer(enc_input,
attn_output,
postprocess_cmd,
prepostprocess_dropout,
name=name + '_post_att')
ffd_output = positionwise_feed_forward(pre_process_layer(attn_output,
preprocess_cmd,
prepostprocess_dropout,
name=name + '_pre_ffn'),
d_inner_hid,
d_model,
relu_dropout,
hidden_act,
param_initializer=param_initializer,
name=name + '_ffn')
return post_process_layer(attn_output, ffd_output, postprocess_cmd, prepostprocess_dropout, name=name + '_post_ffn')
def encoder(enc_input,
attn_bias,
n_layer,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd="n",
postprocess_cmd="da",
param_initializer=None,
name=''):
"""
The encoder is composed of a stack of identical layers returned by calling
encoder_layer.
"""
for i in range(n_layer):
enc_output = encoder_layer(enc_input,
attn_bias,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd,
postprocess_cmd,
param_initializer=param_initializer,
name=name + '_layer_' + str(i))
enc_input = enc_output
enc_output = pre_process_layer(enc_output, preprocess_cmd, prepostprocess_dropout, name="post_encoder")
return enc_output
# coding:utf-8
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
from paddlehub import TransformerModule
from paddlehub.module.module import moduleinfo
from bert_multi_uncased_L_12_H_768_A_12.model.bert import BertConfig, BertModel
@moduleinfo(
name="bert_multi_uncased_L-12_H-768_A-12",
version="1.0.0",
summary="bert_multi_uncased_L-12_H-768_A-12, 12-layer, 768-hidden, 12-heads, 110M parameters ",
author="paddlepaddle",
author_email="paddle-dev@baidu.com",
type="nlp/semantic_model",
)
class Bert(TransformerModule):
def _initialize(self):
self.MAX_SEQ_LEN = 512
self.params_path = os.path.join(self.directory, "assets", "params")
self.vocab_path = os.path.join(self.directory, "assets", "vocab.txt")
bert_config_path = os.path.join(self.directory, "assets", "bert_config.json")
self.bert_config = BertConfig(bert_config_path)
def net(self, input_ids, position_ids, segment_ids, input_mask):
"""
create neural network.
Args:
input_ids (tensor): the word ids.
position_ids (tensor): the position ids.
segment_ids (tensor): the segment ids.
input_mask (tensor): the padding mask.
Returns:
pooled_output (tensor): sentence-level output for classification task.
sequence_output (tensor): token-level output for sequence task.
"""
bert = BertModel(src_ids=input_ids,
position_ids=position_ids,
sentence_ids=segment_ids,
input_mask=input_mask,
config=self.bert_config,
use_fp16=False)
pooled_output = bert.get_pooled_output()
sequence_output = bert.get_sequence_output()
return pooled_output, sequence_output
if __name__ == '__main__':
test_module = Bert()
```shell
$ hub install bert_uncased_L_12_H_768_A_12==1.1.0
```
<p align="center">
<img src="https://bj.bcebos.com/paddlehub/paddlehub-img/bert_network.png" hspace='10'/> <br />
</p>
更多详情请参考[BERT论文](https://arxiv.org/abs/1810.04805)
## API
```python
def context(
trainable=True,
max_seq_len=128
)
```
用于获取Module的上下文信息,得到输入、输出以及预训练的Paddle Program副本
**参数**
> trainable:设置为True时,Module中的参数在Fine-tune时也会随之训练,否则保持不变。
> max_seq_len:BERT模型的最大序列长度,若序列长度不足,会通过padding方式补到**max_seq_len**, 若序列长度大于该值,则会以截断方式让序列长度为**max_seq_len**,max_seq_len可取值范围为0~512;
**返回**
> inputs:dict类型,有以下字段:
> >**input_ids**存放输入文本tokenize后各token对应BERT词汇表的word ids, shape为\[batch_size, max_seq_len\],int64类型;
> >**position_ids**存放输入文本tokenize后各token所在该文本的位置,shape为\[batch_size, max_seq_len\],int64类型;
> >**segment_ids**存放各token所在文本的标识(token属于文本1或者文本2),shape为\[batch_size, max_seq_len\],int64类型;
> >**input_mask**存放token是否为padding的标识,shape为\[batch_size, max_seq_len\],int64类型;
>
> outputs:dict类型,Module的输出特征,有以下字段:
> >**pooled_output**字段存放句子粒度的特征,可用于文本分类等任务,shape为 \[batch_size, 768\],int64类型;
> >**sequence_output**字段存放字粒度的特征,可用于序列标注等任务,shape为 \[batch_size, seq_len, 768\],int64类型;
>
> program:包含该Module计算图的Program。
```python
def get_embedding(
texts,
use_gpu=False,
batch_size=1
)
```
用于获取输入文本的句子粒度特征与字粒度特征
**参数**
> texts:输入文本列表,格式为\[\[sample\_a\_text\_a, sample\_a\_text\_b\], \[sample\_b\_text\_a, sample\_b\_text\_b\],…,\],其中每个元素都是一个样例,每个样例可以包含text\_a与text\_b。
> use_gpu:是否使用gpu,默认为False。对于GPU用户,建议开启use_gpu。
**返回**
> results:list类型,格式为\[\[sample\_a\_pooled\_feature, sample\_a\_seq\_feature\], \[sample\_b\_pooled\_feature, sample\_b\_seq\_feature\],…,\],其中每个元素都是对应样例的特征输出,每个样例都有句子粒度特征pooled\_feature与字粒度特征seq\_feature。
>
```python
def get_params_layer()
```
用于获取参数层信息,该方法与ULMFiTStrategy联用可以严格按照层数设置分层学习率与逐层解冻。
**参数**
> 无
**返回**
> params_layer:dict类型,key为参数名,值为参数所在层数
**代码示例**
```python
import paddlehub as hub
# Load $ hub install bert_uncased_L_12_H_768_A_12 pretrained model
module = hub.Module(name="bert_uncased_L_12_H_768_A_12")
inputs, outputs, program = module.context(trainable=True, max_seq_len=128)
# Must feed all the tensor of bert_uncased_L_12_H_768_A_12's module need
input_ids = inputs["input_ids"]
position_ids = inputs["position_ids"]
segment_ids = inputs["segment_ids"]
input_mask = inputs["input_mask"]
# Use "pooled_output" for sentence-level output.
pooled_output = outputs["pooled_output"]
# Use "sequence_output" for token-level output.
sequence_output = outputs["sequence_output"]
# Use "get_embedding" to get embedding result.
embedding_result = module.get_embedding(texts=[["Sample1_text_a"],["Sample2_text_a","Sample2_text_b"]], use_gpu=True)
# Use "get_params_layer" to get params layer and used to ULMFiTStrategy.
params_layer = module.get_params_layer()
strategy = hub.finetune.strategy.ULMFiTStrategy(frz_params_layer=params_layer, dis_params_layer=params_layer)
```
## 查看代码
https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/pretrain_langauge_models/BERT
## 依赖
paddlepaddle >= 1.6.2
paddlehub >= 1.6.0
## 更新历史
* 1.0.0
初始发布
* 1.1.0
支持get_embedding与get_params_layer
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""BERT model."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import six
import json
import paddle.fluid as fluid
from bert_uncased_L_12_H_768_A_12.model.transformer_encoder import encoder, pre_process_layer
class BertConfig(object):
def __init__(self, config_path):
self._config_dict = self._parse(config_path)
def _parse(self, config_path):
try:
with open(config_path) as json_file:
config_dict = json.load(json_file)
except Exception:
raise IOError("Error in parsing bert model config file '%s'" % config_path)
else:
return config_dict
def __getitem__(self, key):
return self._config_dict[key]
def print_config(self):
for arg, value in sorted(six.iteritems(self._config_dict)):
print('%s: %s' % (arg, value))
print('------------------------------------------------')
class BertModel(object):
def __init__(self, src_ids, position_ids, sentence_ids, input_mask, config, weight_sharing=True, use_fp16=False):
self._emb_size = config['hidden_size']
self._n_layer = config['num_hidden_layers']
self._n_head = config['num_attention_heads']
self._voc_size = config['vocab_size']
self._max_position_seq_len = config['max_position_embeddings']
self._sent_types = config['type_vocab_size']
self._hidden_act = config['hidden_act']
self._prepostprocess_dropout = config['hidden_dropout_prob']
self._attention_dropout = config['attention_probs_dropout_prob']
self._weight_sharing = weight_sharing
self._word_emb_name = "word_embedding"
self._pos_emb_name = "pos_embedding"
self._sent_emb_name = "sent_embedding"
self._dtype = "float16" if use_fp16 else "float32"
# Initialize all weigths by truncated normal initializer, and all biases
# will be initialized by constant zero by default.
self._param_initializer = fluid.initializer.TruncatedNormal(scale=config['initializer_range'])
self._build_model(src_ids, position_ids, sentence_ids, input_mask)
def _build_model(self, src_ids, position_ids, sentence_ids, input_mask):
# padding id in vocabulary must be set to 0
emb_out = fluid.layers.embedding(input=src_ids,
size=[self._voc_size, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._word_emb_name,
initializer=self._param_initializer),
is_sparse=False)
position_emb_out = fluid.layers.embedding(input=position_ids,
size=[self._max_position_seq_len, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._pos_emb_name,
initializer=self._param_initializer))
sent_emb_out = fluid.layers.embedding(sentence_ids,
size=[self._sent_types, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._sent_emb_name,
initializer=self._param_initializer))
emb_out = emb_out + position_emb_out
emb_out = emb_out + sent_emb_out
emb_out = pre_process_layer(emb_out, 'nd', self._prepostprocess_dropout, name='pre_encoder')
if self._dtype == "float16":
input_mask = fluid.layers.cast(x=input_mask, dtype=self._dtype)
self_attn_mask = fluid.layers.matmul(x=input_mask, y=input_mask, transpose_y=True)
self_attn_mask = fluid.layers.scale(x=self_attn_mask, scale=10000.0, bias=-1.0, bias_after_scale=False)
n_head_self_attn_mask = fluid.layers.stack(x=[self_attn_mask] * self._n_head, axis=1)
n_head_self_attn_mask.stop_gradient = True
self._enc_out = encoder(enc_input=emb_out,
attn_bias=n_head_self_attn_mask,
n_layer=self._n_layer,
n_head=self._n_head,
d_key=self._emb_size // self._n_head,
d_value=self._emb_size // self._n_head,
d_model=self._emb_size,
d_inner_hid=self._emb_size * 4,
prepostprocess_dropout=self._prepostprocess_dropout,
attention_dropout=self._attention_dropout,
relu_dropout=0,
hidden_act=self._hidden_act,
preprocess_cmd="",
postprocess_cmd="dan",
param_initializer=self._param_initializer,
name='encoder')
def get_sequence_output(self):
return self._enc_out
def get_pooled_output(self):
"""Get the first feature of each sequence for classification"""
next_sent_feat = fluid.layers.slice(input=self._enc_out, axes=[1], starts=[0], ends=[1])
next_sent_feat = fluid.layers.fc(input=next_sent_feat,
size=self._emb_size,
act="tanh",
param_attr=fluid.ParamAttr(name="pooled_fc.w_0",
initializer=self._param_initializer),
bias_attr="pooled_fc.b_0")
return next_sent_feat
def get_pretraining_output(self, mask_label, mask_pos, labels):
"""Get the loss & accuracy for pretraining"""
mask_pos = fluid.layers.cast(x=mask_pos, dtype='int32')
# extract the first token feature in each sentence
next_sent_feat = self.get_pooled_output()
reshaped_emb_out = fluid.layers.reshape(x=self._enc_out, shape=[-1, self._emb_size])
# extract masked tokens' feature
mask_feat = fluid.layers.gather(input=reshaped_emb_out, index=mask_pos)
# transform: fc
mask_trans_feat = fluid.layers.fc(input=mask_feat,
size=self._emb_size,
act=self._hidden_act,
param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0',
initializer=self._param_initializer),
bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
# transform: layer norm
mask_trans_feat = pre_process_layer(mask_trans_feat, 'n', name='mask_lm_trans')
mask_lm_out_bias_attr = fluid.ParamAttr(name="mask_lm_out_fc.b_0",
initializer=fluid.initializer.Constant(value=0.0))
if self._weight_sharing:
fc_out = fluid.layers.matmul(x=mask_trans_feat,
y=fluid.default_main_program().global_block().var(self._word_emb_name),
transpose_y=True)
fc_out += fluid.layers.create_parameter(shape=[self._voc_size],
dtype=self._dtype,
attr=mask_lm_out_bias_attr,
is_bias=True)
else:
fc_out = fluid.layers.fc(input=mask_trans_feat,
size=self._voc_size,
param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0",
initializer=self._param_initializer),
bias_attr=mask_lm_out_bias_attr)
mask_lm_loss = fluid.layers.softmax_with_cross_entropy(logits=fc_out, label=mask_label)
mean_mask_lm_loss = fluid.layers.mean(mask_lm_loss)
next_sent_fc_out = fluid.layers.fc(input=next_sent_feat,
size=2,
param_attr=fluid.ParamAttr(name="next_sent_fc.w_0",
initializer=self._param_initializer),
bias_attr="next_sent_fc.b_0")
next_sent_loss, next_sent_softmax = fluid.layers.softmax_with_cross_entropy(logits=next_sent_fc_out,
label=labels,
return_softmax=True)
next_sent_acc = fluid.layers.accuracy(input=next_sent_softmax, label=labels)
mean_next_sent_loss = fluid.layers.mean(next_sent_loss)
loss = mean_next_sent_loss + mean_mask_lm_loss
return next_sent_acc, mean_mask_lm_loss, loss
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Transformer encoder."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from functools import partial
import paddle.fluid as fluid
import paddle.fluid.layers as layers
def multi_head_attention(queries,
keys,
values,
attn_bias,
d_key,
d_value,
d_model,
n_head=1,
dropout_rate=0.,
cache=None,
param_initializer=None,
name='multi_head_att'):
"""
Multi-Head Attention. Note that attn_bias is added to the logit before
computing softmax activiation to mask certain selected positions so that
they will not considered in attention weights.
"""
keys = queries if keys is None else keys
values = keys if values is None else values
if not (len(queries.shape) == len(keys.shape) == len(values.shape) == 3):
raise ValueError("Inputs: quries, keys and values should all be 3-D tensors.")
def __compute_qkv(queries, keys, values, n_head, d_key, d_value):
"""
Add linear projection to queries, keys, and values.
"""
q = layers.fc(input=queries,
size=d_key * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_query_fc.w_0', initializer=param_initializer),
bias_attr=name + '_query_fc.b_0')
k = layers.fc(input=keys,
size=d_key * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_key_fc.w_0', initializer=param_initializer),
bias_attr=name + '_key_fc.b_0')
v = layers.fc(input=values,
size=d_value * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_value_fc.w_0', initializer=param_initializer),
bias_attr=name + '_value_fc.b_0')
return q, k, v
def __split_heads(x, n_head):
"""
Reshape the last dimension of inpunt tensor x so that it becomes two
dimensions and then transpose. Specifically, input a tensor with shape
[bs, max_sequence_length, n_head * hidden_dim] then output a tensor
with shape [bs, n_head, max_sequence_length, hidden_dim].
"""
hidden_size = x.shape[-1]
# The value 0 in shape attr means copying the corresponding dimension
# size of the input as the output dimension size.
reshaped = layers.reshape(x=x, shape=[0, 0, n_head, hidden_size // n_head], inplace=True)
# permuate the dimensions into:
# [batch_size, n_head, max_sequence_len, hidden_size_per_head]
return layers.transpose(x=reshaped, perm=[0, 2, 1, 3])
def __combine_heads(x):
"""
Transpose and then reshape the last two dimensions of inpunt tensor x
so that it becomes one dimension, which is reverse to __split_heads.
"""
if len(x.shape) == 3: return x
if len(x.shape) != 4:
raise ValueError("Input(x) should be a 4-D Tensor.")
trans_x = layers.transpose(x, perm=[0, 2, 1, 3])
# The value 0 in shape attr means copying the corresponding dimension
# size of the input as the output dimension size.
return layers.reshape(x=trans_x, shape=[0, 0, trans_x.shape[2] * trans_x.shape[3]], inplace=True)
def scaled_dot_product_attention(q, k, v, attn_bias, d_key, dropout_rate):
"""
Scaled Dot-Product Attention
"""
scaled_q = layers.scale(x=q, scale=d_key**-0.5)
product = layers.matmul(x=scaled_q, y=k, transpose_y=True)
if attn_bias:
product += attn_bias
weights = layers.softmax(product)
if dropout_rate:
weights = layers.dropout(weights,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
out = layers.matmul(weights, v)
return out
q, k, v = __compute_qkv(queries, keys, values, n_head, d_key, d_value)
if cache is not None: # use cache and concat time steps
# Since the inplace reshape in __split_heads changes the shape of k and
# v, which is the cache input for next time step, reshape the cache
# input from the previous time step first.
k = cache["k"] = layers.concat([layers.reshape(cache["k"], shape=[0, 0, d_model]), k], axis=1)
v = cache["v"] = layers.concat([layers.reshape(cache["v"], shape=[0, 0, d_model]), v], axis=1)
q = __split_heads(q, n_head)
k = __split_heads(k, n_head)
v = __split_heads(v, n_head)
ctx_multiheads = scaled_dot_product_attention(q, k, v, attn_bias, d_key, dropout_rate)
out = __combine_heads(ctx_multiheads)
# Project back to the model size.
proj_out = layers.fc(input=out,
size=d_model,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_output_fc.w_0', initializer=param_initializer),
bias_attr=name + '_output_fc.b_0')
return proj_out
def positionwise_feed_forward(x, d_inner_hid, d_hid, dropout_rate, hidden_act, param_initializer=None, name='ffn'):
"""
Position-wise Feed-Forward Networks.
This module consists of two linear transformations with a ReLU activation
in between, which is applied to each position separately and identically.
"""
hidden = layers.fc(input=x,
size=d_inner_hid,
num_flatten_dims=2,
act=hidden_act,
param_attr=fluid.ParamAttr(name=name + '_fc_0.w_0', initializer=param_initializer),
bias_attr=name + '_fc_0.b_0')
if dropout_rate:
hidden = layers.dropout(hidden,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
out = layers.fc(input=hidden,
size=d_hid,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_fc_1.w_0', initializer=param_initializer),
bias_attr=name + '_fc_1.b_0')
return out
def pre_post_process_layer(prev_out, out, process_cmd, dropout_rate=0., name=''):
"""
Add residual connection, layer normalization and droput to the out tensor
optionally according to the value of process_cmd.
This will be used before or after multi-head attention and position-wise
feed-forward networks.
"""
for cmd in process_cmd:
if cmd == "a": # add residual connection
out = out + prev_out if prev_out else out
elif cmd == "n": # add layer normalization
out_dtype = out.dtype
if out_dtype == fluid.core.VarDesc.VarType.FP16:
out = layers.cast(x=out, dtype="float32")
out = layers.layer_norm(out,
begin_norm_axis=len(out.shape) - 1,
param_attr=fluid.ParamAttr(name=name + '_layer_norm_scale',
initializer=fluid.initializer.Constant(1.)),
bias_attr=fluid.ParamAttr(name=name + '_layer_norm_bias',
initializer=fluid.initializer.Constant(0.)))
if out_dtype == fluid.core.VarDesc.VarType.FP16:
out = layers.cast(x=out, dtype="float16")
elif cmd == "d": # add dropout
if dropout_rate:
out = layers.dropout(out,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
return out
pre_process_layer = partial(pre_post_process_layer, None)
post_process_layer = pre_post_process_layer
def encoder_layer(enc_input,
attn_bias,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd="n",
postprocess_cmd="da",
param_initializer=None,
name=''):
"""The encoder layers that can be stacked to form a deep encoder.
This module consits of a multi-head (self) attention followed by
position-wise feed-forward networks and both the two components companied
with the post_process_layer to add residual connection, layer normalization
and droput.
"""
attn_output = multi_head_attention(pre_process_layer(enc_input,
preprocess_cmd,
prepostprocess_dropout,
name=name + '_pre_att'),
None,
None,
attn_bias,
d_key,
d_value,
d_model,
n_head,
attention_dropout,
param_initializer=param_initializer,
name=name + '_multi_head_att')
attn_output = post_process_layer(enc_input,
attn_output,
postprocess_cmd,
prepostprocess_dropout,
name=name + '_post_att')
ffd_output = positionwise_feed_forward(pre_process_layer(attn_output,
preprocess_cmd,
prepostprocess_dropout,
name=name + '_pre_ffn'),
d_inner_hid,
d_model,
relu_dropout,
hidden_act,
param_initializer=param_initializer,
name=name + '_ffn')
return post_process_layer(attn_output, ffd_output, postprocess_cmd, prepostprocess_dropout, name=name + '_post_ffn')
def encoder(enc_input,
attn_bias,
n_layer,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd="n",
postprocess_cmd="da",
param_initializer=None,
name=''):
"""
The encoder is composed of a stack of identical layers returned by calling
encoder_layer.
"""
for i in range(n_layer):
enc_output = encoder_layer(enc_input,
attn_bias,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd,
postprocess_cmd,
param_initializer=param_initializer,
name=name + '_layer_' + str(i))
enc_input = enc_output
enc_output = pre_process_layer(enc_output, preprocess_cmd, prepostprocess_dropout, name="post_encoder")
return enc_output
# coding:utf-8
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
from paddlehub import TransformerModule
from paddlehub.module.module import moduleinfo
from bert_uncased_L_12_H_768_A_12.model.bert import BertConfig, BertModel
@moduleinfo(
name="bert_uncased_L-12_H-768_A-12",
version="1.1.0",
summary="bert_uncased_L-12_H-768_A-12, 12-layer, 768-hidden, 12-heads, 110M parameters",
author="paddlepaddle",
author_email="paddle-dev@baidu.com",
type="nlp/semantic_model",
)
class Bert(TransformerModule):
def _initialize(self):
self.MAX_SEQ_LEN = 512
self.params_path = os.path.join(self.directory, "assets", "params")
self.vocab_path = os.path.join(self.directory, "assets", "vocab.txt")
bert_config_path = os.path.join(self.directory, "assets", "bert_config.json")
self.bert_config = BertConfig(bert_config_path)
def net(self, input_ids, position_ids, segment_ids, input_mask):
"""
create neural network.
Args:
input_ids (tensor): the word ids.
position_ids (tensor): the position ids.
segment_ids (tensor): the segment ids.
input_mask (tensor): the padding mask.
Returns:
pooled_output (tensor): sentence-level output for classification task.
sequence_output (tensor): token-level output for sequence task.
"""
bert = BertModel(src_ids=input_ids,
position_ids=position_ids,
sentence_ids=segment_ids,
input_mask=input_mask,
config=self.bert_config,
use_fp16=False)
pooled_output = bert.get_pooled_output()
sequence_output = bert.get_sequence_output()
return pooled_output, sequence_output
if __name__ == '__main__':
test_module = Bert()
```shell
$ hub install bert_uncased_L-24_H-1024_A-16==1.1.0
```
<p align="center">
<img src="https://bj.bcebos.com/paddlehub/paddlehub-img/bert_network.png" hspace='10'/> <br />
</p>
更多详情请参考[BERT论文](https://arxiv.org/abs/1810.04805)
## API
```python
def context(
trainable=True,
max_seq_len=128
)
```
用于获取Module的上下文信息,得到输入、输出以及预训练的Paddle Program副本
**参数**
> trainable:设置为True时,Module中的参数在Fine-tune时也会随之训练,否则保持不变。
> max_seq_len:BERT模型的最大序列长度,若序列长度不足,会通过padding方式补到**max_seq_len**, 若序列长度大于该值,则会以截断方式让序列长度为**max_seq_len**,max_seq_len可取值范围为0~512;
**返回**
> inputs:dict类型,有以下字段:
> >**input_ids**存放输入文本tokenize后各token对应BERT词汇表的word ids, shape为\[batch_size, max_seq_len\],int64类型;
> >**position_ids**存放输入文本tokenize后各token所在该文本的位置,shape为\[batch_size, max_seq_len\],int64类型;
> >**segment_ids**存放各token所在文本的标识(token属于文本1或者文本2),shape为\[batch_size, max_seq_len\],int64类型;
> >**input_mask**存放token是否为padding的标识,shape为\[batch_size, max_seq_len\],int64类型;
>
> outputs:dict类型,Module的输出特征,有以下字段:
> >**pooled_output**字段存放句子粒度的特征,可用于文本分类等任务,shape为 \[batch_size, 768\],int64类型;
> >**sequence_output**字段存放字粒度的特征,可用于序列标注等任务,shape为 \[batch_size, seq_len, 768\],int64类型;
>
> program:包含该Module计算图的Program。
```python
def get_embedding(
texts,
use_gpu=False,
batch_size=1
)
```
用于获取输入文本的句子粒度特征与字粒度特征
**参数**
> texts:输入文本列表,格式为\[\[sample\_a\_text\_a, sample\_a\_text\_b\], \[sample\_b\_text\_a, sample\_b\_text\_b\],…,\],其中每个元素都是一个样例,每个样例可以包含text\_a与text\_b。
> use_gpu:是否使用gpu,默认为False。对于GPU用户,建议开启use_gpu。
**返回**
> results:list类型,格式为\[\[sample\_a\_pooled\_feature, sample\_a\_seq\_feature\], \[sample\_b\_pooled\_feature, sample\_b\_seq\_feature\],…,\],其中每个元素都是对应样例的特征输出,每个样例都有句子粒度特征pooled\_feature与字粒度特征seq\_feature。
>
```python
def get_params_layer()
```
用于获取参数层信息,该方法与ULMFiTStrategy联用可以严格按照层数设置分层学习率与逐层解冻。
**参数**
> 无
**返回**
> params_layer:dict类型,key为参数名,值为参数所在层数
**代码示例**
```python
import paddlehub as hub
# Load $ hub install bert_uncased_L-24_H-1024_A-16 pretrained model
module = hub.Module(name="bert_uncased_L-24_H-1024_A-16")
inputs, outputs, program = module.context(trainable=True, max_seq_len=128)
# Must feed all the tensor of bert_uncased_L-24_H-1024_A-16's module need
input_ids = inputs["input_ids"]
position_ids = inputs["position_ids"]
segment_ids = inputs["segment_ids"]
input_mask = inputs["input_mask"]
# Use "pooled_output" for sentence-level output.
pooled_output = outputs["pooled_output"]
# Use "sequence_output" for token-level output.
sequence_output = outputs["sequence_output"]
# Use "get_embedding" to get embedding result.
embedding_result = module.get_embedding(texts=[["Sample1_text_a"],["Sample2_text_a","Sample2_text_b"]], use_gpu=True)
# Use "get_params_layer" to get params layer and used to ULMFiTStrategy.
params_layer = module.get_params_layer()
strategy = hub.finetune.strategy.ULMFiTStrategy(frz_params_layer=params_layer, dis_params_layer=params_layer)
```
## 查看代码
https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/pretrain_langauge_models/BERT
## 依赖
paddlepaddle >= 1.6.2
paddlehub >= 1.6.0
## 更新历史
* 1.0.0
初始发布
* 1.1.0
支持get_embedding与get_params_layer
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""BERT model."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import six
import json
import paddle.fluid as fluid
from bert_uncased_L_24_H_1024_A_16.model.transformer_encoder import encoder, pre_process_layer
class BertConfig(object):
def __init__(self, config_path):
self._config_dict = self._parse(config_path)
def _parse(self, config_path):
try:
with open(config_path) as json_file:
config_dict = json.load(json_file)
except Exception:
raise IOError("Error in parsing bert model config file '%s'" % config_path)
else:
return config_dict
def __getitem__(self, key):
return self._config_dict[key]
def print_config(self):
for arg, value in sorted(six.iteritems(self._config_dict)):
print('%s: %s' % (arg, value))
print('------------------------------------------------')
class BertModel(object):
def __init__(self, src_ids, position_ids, sentence_ids, input_mask, config, weight_sharing=True, use_fp16=False):
self._emb_size = config['hidden_size']
self._n_layer = config['num_hidden_layers']
self._n_head = config['num_attention_heads']
self._voc_size = config['vocab_size']
self._max_position_seq_len = config['max_position_embeddings']
self._sent_types = config['type_vocab_size']
self._hidden_act = config['hidden_act']
self._prepostprocess_dropout = config['hidden_dropout_prob']
self._attention_dropout = config['attention_probs_dropout_prob']
self._weight_sharing = weight_sharing
self._word_emb_name = "word_embedding"
self._pos_emb_name = "pos_embedding"
self._sent_emb_name = "sent_embedding"
self._dtype = "float16" if use_fp16 else "float32"
# Initialize all weigths by truncated normal initializer, and all biases
# will be initialized by constant zero by default.
self._param_initializer = fluid.initializer.TruncatedNormal(scale=config['initializer_range'])
self._build_model(src_ids, position_ids, sentence_ids, input_mask)
def _build_model(self, src_ids, position_ids, sentence_ids, input_mask):
# padding id in vocabulary must be set to 0
emb_out = fluid.layers.embedding(input=src_ids,
size=[self._voc_size, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._word_emb_name,
initializer=self._param_initializer),
is_sparse=False)
position_emb_out = fluid.layers.embedding(input=position_ids,
size=[self._max_position_seq_len, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._pos_emb_name,
initializer=self._param_initializer))
sent_emb_out = fluid.layers.embedding(sentence_ids,
size=[self._sent_types, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._sent_emb_name,
initializer=self._param_initializer))
emb_out = emb_out + position_emb_out
emb_out = emb_out + sent_emb_out
emb_out = pre_process_layer(emb_out, 'nd', self._prepostprocess_dropout, name='pre_encoder')
if self._dtype == "float16":
input_mask = fluid.layers.cast(x=input_mask, dtype=self._dtype)
self_attn_mask = fluid.layers.matmul(x=input_mask, y=input_mask, transpose_y=True)
self_attn_mask = fluid.layers.scale(x=self_attn_mask, scale=10000.0, bias=-1.0, bias_after_scale=False)
n_head_self_attn_mask = fluid.layers.stack(x=[self_attn_mask] * self._n_head, axis=1)
n_head_self_attn_mask.stop_gradient = True
self._enc_out = encoder(enc_input=emb_out,
attn_bias=n_head_self_attn_mask,
n_layer=self._n_layer,
n_head=self._n_head,
d_key=self._emb_size // self._n_head,
d_value=self._emb_size // self._n_head,
d_model=self._emb_size,
d_inner_hid=self._emb_size * 4,
prepostprocess_dropout=self._prepostprocess_dropout,
attention_dropout=self._attention_dropout,
relu_dropout=0,
hidden_act=self._hidden_act,
preprocess_cmd="",
postprocess_cmd="dan",
param_initializer=self._param_initializer,
name='encoder')
def get_sequence_output(self):
return self._enc_out
def get_pooled_output(self):
"""Get the first feature of each sequence for classification"""
next_sent_feat = fluid.layers.slice(input=self._enc_out, axes=[1], starts=[0], ends=[1])
next_sent_feat = fluid.layers.fc(input=next_sent_feat,
size=self._emb_size,
act="tanh",
param_attr=fluid.ParamAttr(name="pooled_fc.w_0",
initializer=self._param_initializer),
bias_attr="pooled_fc.b_0")
return next_sent_feat
def get_pretraining_output(self, mask_label, mask_pos, labels):
"""Get the loss & accuracy for pretraining"""
mask_pos = fluid.layers.cast(x=mask_pos, dtype='int32')
# extract the first token feature in each sentence
next_sent_feat = self.get_pooled_output()
reshaped_emb_out = fluid.layers.reshape(x=self._enc_out, shape=[-1, self._emb_size])
# extract masked tokens' feature
mask_feat = fluid.layers.gather(input=reshaped_emb_out, index=mask_pos)
# transform: fc
mask_trans_feat = fluid.layers.fc(input=mask_feat,
size=self._emb_size,
act=self._hidden_act,
param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0',
initializer=self._param_initializer),
bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
# transform: layer norm
mask_trans_feat = pre_process_layer(mask_trans_feat, 'n', name='mask_lm_trans')
mask_lm_out_bias_attr = fluid.ParamAttr(name="mask_lm_out_fc.b_0",
initializer=fluid.initializer.Constant(value=0.0))
if self._weight_sharing:
fc_out = fluid.layers.matmul(x=mask_trans_feat,
y=fluid.default_main_program().global_block().var(self._word_emb_name),
transpose_y=True)
fc_out += fluid.layers.create_parameter(shape=[self._voc_size],
dtype=self._dtype,
attr=mask_lm_out_bias_attr,
is_bias=True)
else:
fc_out = fluid.layers.fc(input=mask_trans_feat,
size=self._voc_size,
param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0",
initializer=self._param_initializer),
bias_attr=mask_lm_out_bias_attr)
mask_lm_loss = fluid.layers.softmax_with_cross_entropy(logits=fc_out, label=mask_label)
mean_mask_lm_loss = fluid.layers.mean(mask_lm_loss)
next_sent_fc_out = fluid.layers.fc(input=next_sent_feat,
size=2,
param_attr=fluid.ParamAttr(name="next_sent_fc.w_0",
initializer=self._param_initializer),
bias_attr="next_sent_fc.b_0")
next_sent_loss, next_sent_softmax = fluid.layers.softmax_with_cross_entropy(logits=next_sent_fc_out,
label=labels,
return_softmax=True)
next_sent_acc = fluid.layers.accuracy(input=next_sent_softmax, label=labels)
mean_next_sent_loss = fluid.layers.mean(next_sent_loss)
loss = mean_next_sent_loss + mean_mask_lm_loss
return next_sent_acc, mean_mask_lm_loss, loss
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Transformer encoder."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from functools import partial
import paddle.fluid as fluid
import paddle.fluid.layers as layers
def multi_head_attention(queries,
keys,
values,
attn_bias,
d_key,
d_value,
d_model,
n_head=1,
dropout_rate=0.,
cache=None,
param_initializer=None,
name='multi_head_att'):
"""
Multi-Head Attention. Note that attn_bias is added to the logit before
computing softmax activiation to mask certain selected positions so that
they will not considered in attention weights.
"""
keys = queries if keys is None else keys
values = keys if values is None else values
if not (len(queries.shape) == len(keys.shape) == len(values.shape) == 3):
raise ValueError("Inputs: quries, keys and values should all be 3-D tensors.")
def __compute_qkv(queries, keys, values, n_head, d_key, d_value):
"""
Add linear projection to queries, keys, and values.
"""
q = layers.fc(input=queries,
size=d_key * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_query_fc.w_0', initializer=param_initializer),
bias_attr=name + '_query_fc.b_0')
k = layers.fc(input=keys,
size=d_key * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_key_fc.w_0', initializer=param_initializer),
bias_attr=name + '_key_fc.b_0')
v = layers.fc(input=values,
size=d_value * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_value_fc.w_0', initializer=param_initializer),
bias_attr=name + '_value_fc.b_0')
return q, k, v
def __split_heads(x, n_head):
"""
Reshape the last dimension of inpunt tensor x so that it becomes two
dimensions and then transpose. Specifically, input a tensor with shape
[bs, max_sequence_length, n_head * hidden_dim] then output a tensor
with shape [bs, n_head, max_sequence_length, hidden_dim].
"""
hidden_size = x.shape[-1]
# The value 0 in shape attr means copying the corresponding dimension
# size of the input as the output dimension size.
reshaped = layers.reshape(x=x, shape=[0, 0, n_head, hidden_size // n_head], inplace=True)
# permuate the dimensions into:
# [batch_size, n_head, max_sequence_len, hidden_size_per_head]
return layers.transpose(x=reshaped, perm=[0, 2, 1, 3])
def __combine_heads(x):
"""
Transpose and then reshape the last two dimensions of inpunt tensor x
so that it becomes one dimension, which is reverse to __split_heads.
"""
if len(x.shape) == 3: return x
if len(x.shape) != 4:
raise ValueError("Input(x) should be a 4-D Tensor.")
trans_x = layers.transpose(x, perm=[0, 2, 1, 3])
# The value 0 in shape attr means copying the corresponding dimension
# size of the input as the output dimension size.
return layers.reshape(x=trans_x, shape=[0, 0, trans_x.shape[2] * trans_x.shape[3]], inplace=True)
def scaled_dot_product_attention(q, k, v, attn_bias, d_key, dropout_rate):
"""
Scaled Dot-Product Attention
"""
scaled_q = layers.scale(x=q, scale=d_key**-0.5)
product = layers.matmul(x=scaled_q, y=k, transpose_y=True)
if attn_bias:
product += attn_bias
weights = layers.softmax(product)
if dropout_rate:
weights = layers.dropout(weights,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
out = layers.matmul(weights, v)
return out
q, k, v = __compute_qkv(queries, keys, values, n_head, d_key, d_value)
if cache is not None: # use cache and concat time steps
# Since the inplace reshape in __split_heads changes the shape of k and
# v, which is the cache input for next time step, reshape the cache
# input from the previous time step first.
k = cache["k"] = layers.concat([layers.reshape(cache["k"], shape=[0, 0, d_model]), k], axis=1)
v = cache["v"] = layers.concat([layers.reshape(cache["v"], shape=[0, 0, d_model]), v], axis=1)
q = __split_heads(q, n_head)
k = __split_heads(k, n_head)
v = __split_heads(v, n_head)
ctx_multiheads = scaled_dot_product_attention(q, k, v, attn_bias, d_key, dropout_rate)
out = __combine_heads(ctx_multiheads)
# Project back to the model size.
proj_out = layers.fc(input=out,
size=d_model,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_output_fc.w_0', initializer=param_initializer),
bias_attr=name + '_output_fc.b_0')
return proj_out
def positionwise_feed_forward(x, d_inner_hid, d_hid, dropout_rate, hidden_act, param_initializer=None, name='ffn'):
"""
Position-wise Feed-Forward Networks.
This module consists of two linear transformations with a ReLU activation
in between, which is applied to each position separately and identically.
"""
hidden = layers.fc(input=x,
size=d_inner_hid,
num_flatten_dims=2,
act=hidden_act,
param_attr=fluid.ParamAttr(name=name + '_fc_0.w_0', initializer=param_initializer),
bias_attr=name + '_fc_0.b_0')
if dropout_rate:
hidden = layers.dropout(hidden,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
out = layers.fc(input=hidden,
size=d_hid,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_fc_1.w_0', initializer=param_initializer),
bias_attr=name + '_fc_1.b_0')
return out
def pre_post_process_layer(prev_out, out, process_cmd, dropout_rate=0., name=''):
"""
Add residual connection, layer normalization and droput to the out tensor
optionally according to the value of process_cmd.
This will be used before or after multi-head attention and position-wise
feed-forward networks.
"""
for cmd in process_cmd:
if cmd == "a": # add residual connection
out = out + prev_out if prev_out else out
elif cmd == "n": # add layer normalization
out_dtype = out.dtype
if out_dtype == fluid.core.VarDesc.VarType.FP16:
out = layers.cast(x=out, dtype="float32")
out = layers.layer_norm(out,
begin_norm_axis=len(out.shape) - 1,
param_attr=fluid.ParamAttr(name=name + '_layer_norm_scale',
initializer=fluid.initializer.Constant(1.)),
bias_attr=fluid.ParamAttr(name=name + '_layer_norm_bias',
initializer=fluid.initializer.Constant(0.)))
if out_dtype == fluid.core.VarDesc.VarType.FP16:
out = layers.cast(x=out, dtype="float16")
elif cmd == "d": # add dropout
if dropout_rate:
out = layers.dropout(out,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
return out
pre_process_layer = partial(pre_post_process_layer, None)
post_process_layer = pre_post_process_layer
def encoder_layer(enc_input,
attn_bias,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd="n",
postprocess_cmd="da",
param_initializer=None,
name=''):
"""The encoder layers that can be stacked to form a deep encoder.
This module consits of a multi-head (self) attention followed by
position-wise feed-forward networks and both the two components companied
with the post_process_layer to add residual connection, layer normalization
and droput.
"""
attn_output = multi_head_attention(pre_process_layer(enc_input,
preprocess_cmd,
prepostprocess_dropout,
name=name + '_pre_att'),
None,
None,
attn_bias,
d_key,
d_value,
d_model,
n_head,
attention_dropout,
param_initializer=param_initializer,
name=name + '_multi_head_att')
attn_output = post_process_layer(enc_input,
attn_output,
postprocess_cmd,
prepostprocess_dropout,
name=name + '_post_att')
ffd_output = positionwise_feed_forward(pre_process_layer(attn_output,
preprocess_cmd,
prepostprocess_dropout,
name=name + '_pre_ffn'),
d_inner_hid,
d_model,
relu_dropout,
hidden_act,
param_initializer=param_initializer,
name=name + '_ffn')
return post_process_layer(attn_output, ffd_output, postprocess_cmd, prepostprocess_dropout, name=name + '_post_ffn')
def encoder(enc_input,
attn_bias,
n_layer,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd="n",
postprocess_cmd="da",
param_initializer=None,
name=''):
"""
The encoder is composed of a stack of identical layers returned by calling
encoder_layer.
"""
for i in range(n_layer):
enc_output = encoder_layer(enc_input,
attn_bias,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd,
postprocess_cmd,
param_initializer=param_initializer,
name=name + '_layer_' + str(i))
enc_input = enc_output
enc_output = pre_process_layer(enc_output, preprocess_cmd, prepostprocess_dropout, name="post_encoder")
return enc_output
# coding:utf-8
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
from paddlehub import TransformerModule
from paddlehub.module.module import moduleinfo
from bert_uncased_L_24_H_1024_A_16.model.bert import BertConfig, BertModel
@moduleinfo(
name="bert_uncased_L-24_H-1024_A-16",
version="1.1.0",
summary="bert_uncased_L-24_H-1024_A-16, 24-layer, 1024-hidden, 16-heads, 340M parameters ",
author="paddlepaddle",
author_email="paddle-dev@baidu.com",
type="nlp/semantic_model",
)
class Bert(TransformerModule):
def _initialize(self):
self.MAX_SEQ_LEN = 512
self.params_path = os.path.join(self.directory, "assets", "params")
self.vocab_path = os.path.join(self.directory, "assets", "vocab.txt")
bert_config_path = os.path.join(self.directory, "assets", "bert_config.json")
self.bert_config = BertConfig(bert_config_path)
def net(self, input_ids, position_ids, segment_ids, input_mask):
"""
create neural network.
Args:
input_ids (tensor): the word ids.
position_ids (tensor): the position ids.
segment_ids (tensor): the segment ids.
input_mask (tensor): the padding mask.
Returns:
pooled_output (tensor): sentence-level output for classification task.
sequence_output (tensor): token-level output for sequence task.
"""
bert = BertModel(src_ids=input_ids,
position_ids=position_ids,
sentence_ids=segment_ids,
input_mask=input_mask,
config=self.bert_config,
use_fp16=False)
pooled_output = bert.get_pooled_output()
sequence_output = bert.get_sequence_output()
return pooled_output, sequence_output
if __name__ == '__main__':
test_module = Bert()
```shell
$ hub install chinese-roberta-wwm-ext==1.0.0
```
<p align="center">
<img src="https://bj.bcebos.com/paddlehub/paddlehub-img/bert_network.png" hspace='10'/> <br />
</p>
更多详情请参考[RoBERTa论文](https://arxiv.org/abs/1907.11692)[Chinese-BERT-wwm技术报告](https://arxiv.org/abs/1906.08101)
## API
```python
def context(
trainable=True,
max_seq_len=128
)
```
用于获取Module的上下文信息,得到输入、输出以及预训练的Paddle Program副本
**参数**
> trainable:设置为True时,Module中的参数在Fine-tune时也会随之训练,否则保持不变。
> max_seq_len:BERT模型的最大序列长度,若序列长度不足,会通过padding方式补到**max_seq_len**, 若序列长度大于该值,则会以截断方式让序列长度为**max_seq_len**,max_seq_len可取值范围为0~512;
**返回**
> inputs:dict类型,有以下字段:
> >**input_ids**存放输入文本tokenize后各token对应BERT词汇表的word ids, shape为\[batch_size, max_seq_len\],int64类型;
> >**position_ids**存放输入文本tokenize后各token所在该文本的位置,shape为\[batch_size, max_seq_len\],int64类型;
> >**segment_ids**存放各token所在文本的标识(token属于文本1或者文本2),shape为\[batch_size, max_seq_len\],int64类型;
> >**input_mask**存放token是否为padding的标识,shape为\[batch_size, max_seq_len\],int64类型;
>
> outputs:dict类型,Module的输出特征,有以下字段:
> >**pooled_output**字段存放句子粒度的特征,可用于文本分类等任务,shape为 \[batch_size, 768\],int64类型;
> >**sequence_output**字段存放字粒度的特征,可用于序列标注等任务,shape为 \[batch_size, seq_len, 768\],int64类型;
>
> program:包含该Module计算图的Program。
```python
def get_embedding(
texts,
use_gpu=False,
batch_size=1
)
```
用于获取输入文本的句子粒度特征与字粒度特征
**参数**
> texts:输入文本列表,格式为\[\[sample\_a\_text\_a, sample\_a\_text\_b\], \[sample\_b\_text\_a, sample\_b\_text\_b\],…,\],其中每个元素都是一个样例,每个样例可以包含text\_a与text\_b。
> use_gpu:是否使用gpu,默认为False。对于GPU用户,建议开启use_gpu。
**返回**
> results:list类型,格式为\[\[sample\_a\_pooled\_feature, sample\_a\_seq\_feature\], \[sample\_b\_pooled\_feature, sample\_b\_seq\_feature\],…,\],其中每个元素都是对应样例的特征输出,每个样例都有句子粒度特征pooled\_feature与字粒度特征seq\_feature。
>
```python
def get_params_layer()
```
用于获取参数层信息,该方法与ULMFiTStrategy联用可以严格按照层数设置分层学习率与逐层解冻。
**参数**
> 无
**返回**
> params_layer:dict类型,key为参数名,值为参数所在层数
**代码示例**
```python
import paddlehub as hub
# Load $ hub install chinese-roberta-wwm-ext pretrained model
module = hub.Module(name="chinese-roberta-wwm-ext")
inputs, outputs, program = module.context(trainable=True, max_seq_len=128)
# Must feed all the tensor of chinese-roberta-wwm-ext's module need
input_ids = inputs["input_ids"]
position_ids = inputs["position_ids"]
segment_ids = inputs["segment_ids"]
input_mask = inputs["input_mask"]
# Use "pooled_output" for sentence-level output.
pooled_output = outputs["pooled_output"]
# Use "sequence_output" for token-level output.
sequence_output = outputs["sequence_output"]
# Use "get_embedding" to get embedding result.
embedding_result = module.get_embedding(texts=[["Sample1_text_a"],["Sample2_text_a","Sample2_text_b"]], use_gpu=True)
# Use "get_params_layer" to get params layer and used to ULMFiTStrategy.
params_layer = module.get_params_layer()
strategy = hub.finetune.strategy.ULMFiTStrategy(frz_params_layer=params_layer, dis_params_layer=params_layer)
```
## 查看代码
https://github.com/ymcui/Chinese-BERT-wwm
## 贡献者
[ymcui](https://github.com/ymcui)
## 依赖
paddlepaddle >= 1.6.2
paddlehub >= 1.6.0
## 更新历史
* 1.0.0
初始发布
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""BERT model."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import six
import json
import paddle.fluid as fluid
from chinese_roberta_wwm_ext.model.transformer_encoder import encoder, pre_process_layer
class BertConfig(object):
def __init__(self, config_path):
self._config_dict = self._parse(config_path)
def _parse(self, config_path):
try:
with open(config_path) as json_file:
config_dict = json.load(json_file)
except Exception:
raise IOError("Error in parsing bert model config file '%s'" % config_path)
else:
return config_dict
def __getitem__(self, key):
return self._config_dict[key]
def print_config(self):
for arg, value in sorted(six.iteritems(self._config_dict)):
print('%s: %s' % (arg, value))
print('------------------------------------------------')
class BertModel(object):
def __init__(self, src_ids, position_ids, sentence_ids, input_mask, config, weight_sharing=True, use_fp16=False):
self._emb_size = config['hidden_size']
self._n_layer = config['num_hidden_layers']
self._n_head = config['num_attention_heads']
self._voc_size = config['vocab_size']
self._max_position_seq_len = config['max_position_embeddings']
self._sent_types = config['type_vocab_size']
self._hidden_act = config['hidden_act']
self._prepostprocess_dropout = config['hidden_dropout_prob']
self._attention_dropout = config['attention_probs_dropout_prob']
self._weight_sharing = weight_sharing
self._word_emb_name = "word_embedding"
self._pos_emb_name = "pos_embedding"
self._sent_emb_name = "sent_embedding"
self._dtype = "float16" if use_fp16 else "float32"
# Initialize all weigths by truncated normal initializer, and all biases
# will be initialized by constant zero by default.
self._param_initializer = fluid.initializer.TruncatedNormal(scale=config['initializer_range'])
self._build_model(src_ids, position_ids, sentence_ids, input_mask)
def _build_model(self, src_ids, position_ids, sentence_ids, input_mask):
# padding id in vocabulary must be set to 0
emb_out = fluid.layers.embedding(input=src_ids,
size=[self._voc_size, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._word_emb_name,
initializer=self._param_initializer),
is_sparse=False)
position_emb_out = fluid.layers.embedding(input=position_ids,
size=[self._max_position_seq_len, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._pos_emb_name,
initializer=self._param_initializer))
sent_emb_out = fluid.layers.embedding(sentence_ids,
size=[self._sent_types, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._sent_emb_name,
initializer=self._param_initializer))
emb_out = emb_out + position_emb_out
emb_out = emb_out + sent_emb_out
emb_out = pre_process_layer(emb_out, 'nd', self._prepostprocess_dropout, name='pre_encoder')
if self._dtype == "float16":
input_mask = fluid.layers.cast(x=input_mask, dtype=self._dtype)
self_attn_mask = fluid.layers.matmul(x=input_mask, y=input_mask, transpose_y=True)
self_attn_mask = fluid.layers.scale(x=self_attn_mask, scale=10000.0, bias=-1.0, bias_after_scale=False)
n_head_self_attn_mask = fluid.layers.stack(x=[self_attn_mask] * self._n_head, axis=1)
n_head_self_attn_mask.stop_gradient = True
self._enc_out = encoder(enc_input=emb_out,
attn_bias=n_head_self_attn_mask,
n_layer=self._n_layer,
n_head=self._n_head,
d_key=self._emb_size // self._n_head,
d_value=self._emb_size // self._n_head,
d_model=self._emb_size,
d_inner_hid=self._emb_size * 4,
prepostprocess_dropout=self._prepostprocess_dropout,
attention_dropout=self._attention_dropout,
relu_dropout=0,
hidden_act=self._hidden_act,
preprocess_cmd="",
postprocess_cmd="dan",
param_initializer=self._param_initializer,
name='encoder')
def get_sequence_output(self):
return self._enc_out
def get_pooled_output(self):
"""Get the first feature of each sequence for classification"""
next_sent_feat = fluid.layers.slice(input=self._enc_out, axes=[1], starts=[0], ends=[1])
next_sent_feat = fluid.layers.fc(input=next_sent_feat,
size=self._emb_size,
act="tanh",
param_attr=fluid.ParamAttr(name="pooled_fc.w_0",
initializer=self._param_initializer),
bias_attr="pooled_fc.b_0")
return next_sent_feat
def get_pretraining_output(self, mask_label, mask_pos, labels):
"""Get the loss & accuracy for pretraining"""
mask_pos = fluid.layers.cast(x=mask_pos, dtype='int32')
# extract the first token feature in each sentence
next_sent_feat = self.get_pooled_output()
reshaped_emb_out = fluid.layers.reshape(x=self._enc_out, shape=[-1, self._emb_size])
# extract masked tokens' feature
mask_feat = fluid.layers.gather(input=reshaped_emb_out, index=mask_pos)
# transform: fc
mask_trans_feat = fluid.layers.fc(input=mask_feat,
size=self._emb_size,
act=self._hidden_act,
param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0',
initializer=self._param_initializer),
bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
# transform: layer norm
mask_trans_feat = pre_process_layer(mask_trans_feat, 'n', name='mask_lm_trans')
mask_lm_out_bias_attr = fluid.ParamAttr(name="mask_lm_out_fc.b_0",
initializer=fluid.initializer.Constant(value=0.0))
if self._weight_sharing:
fc_out = fluid.layers.matmul(x=mask_trans_feat,
y=fluid.default_main_program().global_block().var(self._word_emb_name),
transpose_y=True)
fc_out += fluid.layers.create_parameter(shape=[self._voc_size],
dtype=self._dtype,
attr=mask_lm_out_bias_attr,
is_bias=True)
else:
fc_out = fluid.layers.fc(input=mask_trans_feat,
size=self._voc_size,
param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0",
initializer=self._param_initializer),
bias_attr=mask_lm_out_bias_attr)
mask_lm_loss = fluid.layers.softmax_with_cross_entropy(logits=fc_out, label=mask_label)
mean_mask_lm_loss = fluid.layers.mean(mask_lm_loss)
next_sent_fc_out = fluid.layers.fc(input=next_sent_feat,
size=2,
param_attr=fluid.ParamAttr(name="next_sent_fc.w_0",
initializer=self._param_initializer),
bias_attr="next_sent_fc.b_0")
next_sent_loss, next_sent_softmax = fluid.layers.softmax_with_cross_entropy(logits=next_sent_fc_out,
label=labels,
return_softmax=True)
next_sent_acc = fluid.layers.accuracy(input=next_sent_softmax, label=labels)
mean_next_sent_loss = fluid.layers.mean(next_sent_loss)
loss = mean_next_sent_loss + mean_mask_lm_loss
return next_sent_acc, mean_mask_lm_loss, loss
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Transformer encoder."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from functools import partial
import paddle.fluid as fluid
import paddle.fluid.layers as layers
def multi_head_attention(queries,
keys,
values,
attn_bias,
d_key,
d_value,
d_model,
n_head=1,
dropout_rate=0.,
cache=None,
param_initializer=None,
name='multi_head_att'):
"""
Multi-Head Attention. Note that attn_bias is added to the logit before
computing softmax activiation to mask certain selected positions so that
they will not considered in attention weights.
"""
keys = queries if keys is None else keys
values = keys if values is None else values
if not (len(queries.shape) == len(keys.shape) == len(values.shape) == 3):
raise ValueError("Inputs: quries, keys and values should all be 3-D tensors.")
def __compute_qkv(queries, keys, values, n_head, d_key, d_value):
"""
Add linear projection to queries, keys, and values.
"""
q = layers.fc(input=queries,
size=d_key * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_query_fc.w_0', initializer=param_initializer),
bias_attr=name + '_query_fc.b_0')
k = layers.fc(input=keys,
size=d_key * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_key_fc.w_0', initializer=param_initializer),
bias_attr=name + '_key_fc.b_0')
v = layers.fc(input=values,
size=d_value * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_value_fc.w_0', initializer=param_initializer),
bias_attr=name + '_value_fc.b_0')
return q, k, v
def __split_heads(x, n_head):
"""
Reshape the last dimension of inpunt tensor x so that it becomes two
dimensions and then transpose. Specifically, input a tensor with shape
[bs, max_sequence_length, n_head * hidden_dim] then output a tensor
with shape [bs, n_head, max_sequence_length, hidden_dim].
"""
hidden_size = x.shape[-1]
# The value 0 in shape attr means copying the corresponding dimension
# size of the input as the output dimension size.
reshaped = layers.reshape(x=x, shape=[0, 0, n_head, hidden_size // n_head], inplace=True)
# permuate the dimensions into:
# [batch_size, n_head, max_sequence_len, hidden_size_per_head]
return layers.transpose(x=reshaped, perm=[0, 2, 1, 3])
def __combine_heads(x):
"""
Transpose and then reshape the last two dimensions of inpunt tensor x
so that it becomes one dimension, which is reverse to __split_heads.
"""
if len(x.shape) == 3: return x
if len(x.shape) != 4:
raise ValueError("Input(x) should be a 4-D Tensor.")
trans_x = layers.transpose(x, perm=[0, 2, 1, 3])
# The value 0 in shape attr means copying the corresponding dimension
# size of the input as the output dimension size.
return layers.reshape(x=trans_x, shape=[0, 0, trans_x.shape[2] * trans_x.shape[3]], inplace=True)
def scaled_dot_product_attention(q, k, v, attn_bias, d_key, dropout_rate):
"""
Scaled Dot-Product Attention
"""
scaled_q = layers.scale(x=q, scale=d_key**-0.5)
product = layers.matmul(x=scaled_q, y=k, transpose_y=True)
if attn_bias:
product += attn_bias
weights = layers.softmax(product)
if dropout_rate:
weights = layers.dropout(weights,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
out = layers.matmul(weights, v)
return out
q, k, v = __compute_qkv(queries, keys, values, n_head, d_key, d_value)
if cache is not None: # use cache and concat time steps
# Since the inplace reshape in __split_heads changes the shape of k and
# v, which is the cache input for next time step, reshape the cache
# input from the previous time step first.
k = cache["k"] = layers.concat([layers.reshape(cache["k"], shape=[0, 0, d_model]), k], axis=1)
v = cache["v"] = layers.concat([layers.reshape(cache["v"], shape=[0, 0, d_model]), v], axis=1)
q = __split_heads(q, n_head)
k = __split_heads(k, n_head)
v = __split_heads(v, n_head)
ctx_multiheads = scaled_dot_product_attention(q, k, v, attn_bias, d_key, dropout_rate)
out = __combine_heads(ctx_multiheads)
# Project back to the model size.
proj_out = layers.fc(input=out,
size=d_model,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_output_fc.w_0', initializer=param_initializer),
bias_attr=name + '_output_fc.b_0')
return proj_out
def positionwise_feed_forward(x, d_inner_hid, d_hid, dropout_rate, hidden_act, param_initializer=None, name='ffn'):
"""
Position-wise Feed-Forward Networks.
This module consists of two linear transformations with a ReLU activation
in between, which is applied to each position separately and identically.
"""
hidden = layers.fc(input=x,
size=d_inner_hid,
num_flatten_dims=2,
act=hidden_act,
param_attr=fluid.ParamAttr(name=name + '_fc_0.w_0', initializer=param_initializer),
bias_attr=name + '_fc_0.b_0')
if dropout_rate:
hidden = layers.dropout(hidden,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
out = layers.fc(input=hidden,
size=d_hid,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_fc_1.w_0', initializer=param_initializer),
bias_attr=name + '_fc_1.b_0')
return out
def pre_post_process_layer(prev_out, out, process_cmd, dropout_rate=0., name=''):
"""
Add residual connection, layer normalization and droput to the out tensor
optionally according to the value of process_cmd.
This will be used before or after multi-head attention and position-wise
feed-forward networks.
"""
for cmd in process_cmd:
if cmd == "a": # add residual connection
out = out + prev_out if prev_out else out
elif cmd == "n": # add layer normalization
out_dtype = out.dtype
if out_dtype == fluid.core.VarDesc.VarType.FP16:
out = layers.cast(x=out, dtype="float32")
out = layers.layer_norm(out,
begin_norm_axis=len(out.shape) - 1,
param_attr=fluid.ParamAttr(name=name + '_layer_norm_scale',
initializer=fluid.initializer.Constant(1.)),
bias_attr=fluid.ParamAttr(name=name + '_layer_norm_bias',
initializer=fluid.initializer.Constant(0.)))
if out_dtype == fluid.core.VarDesc.VarType.FP16:
out = layers.cast(x=out, dtype="float16")
elif cmd == "d": # add dropout
if dropout_rate:
out = layers.dropout(out,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
return out
pre_process_layer = partial(pre_post_process_layer, None)
post_process_layer = pre_post_process_layer
def encoder_layer(enc_input,
attn_bias,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd="n",
postprocess_cmd="da",
param_initializer=None,
name=''):
"""The encoder layers that can be stacked to form a deep encoder.
This module consits of a multi-head (self) attention followed by
position-wise feed-forward networks and both the two components companied
with the post_process_layer to add residual connection, layer normalization
and droput.
"""
attn_output = multi_head_attention(pre_process_layer(enc_input,
preprocess_cmd,
prepostprocess_dropout,
name=name + '_pre_att'),
None,
None,
attn_bias,
d_key,
d_value,
d_model,
n_head,
attention_dropout,
param_initializer=param_initializer,
name=name + '_multi_head_att')
attn_output = post_process_layer(enc_input,
attn_output,
postprocess_cmd,
prepostprocess_dropout,
name=name + '_post_att')
ffd_output = positionwise_feed_forward(pre_process_layer(attn_output,
preprocess_cmd,
prepostprocess_dropout,
name=name + '_pre_ffn'),
d_inner_hid,
d_model,
relu_dropout,
hidden_act,
param_initializer=param_initializer,
name=name + '_ffn')
return post_process_layer(attn_output, ffd_output, postprocess_cmd, prepostprocess_dropout, name=name + '_post_ffn')
def encoder(enc_input,
attn_bias,
n_layer,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd="n",
postprocess_cmd="da",
param_initializer=None,
name=''):
"""
The encoder is composed of a stack of identical layers returned by calling
encoder_layer.
"""
for i in range(n_layer):
enc_output = encoder_layer(enc_input,
attn_bias,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd,
postprocess_cmd,
param_initializer=param_initializer,
name=name + '_layer_' + str(i))
enc_input = enc_output
enc_output = pre_process_layer(enc_output, preprocess_cmd, prepostprocess_dropout, name="post_encoder")
return enc_output
# coding:utf-8
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
from paddlehub import TransformerModule
from paddlehub.module.module import moduleinfo
from chinese_roberta_wwm_ext.model.bert import BertConfig, BertModel
@moduleinfo(
name="chinese-roberta-wwm-ext",
version="1.0.0",
summary="chinese-roberta-wwm-ext, 12-layer, 768-hidden, 12-heads, 110M parameters ",
author="ymcui",
author_email="ymcui@ir.hit.edu.cn",
type="nlp/semantic_model",
)
class BertWwm(TransformerModule):
def _initialize(self):
self.MAX_SEQ_LEN = 512
self.params_path = os.path.join(self.directory, "assets", "params")
self.vocab_path = os.path.join(self.directory, "assets", "vocab.txt")
bert_config_path = os.path.join(self.directory, "assets", "bert_config.json")
self.bert_config = BertConfig(bert_config_path)
def net(self, input_ids, position_ids, segment_ids, input_mask):
"""
create neural network.
Args:
input_ids (tensor): the word ids.
position_ids (tensor): the position ids.
segment_ids (tensor): the segment ids.
input_mask (tensor): the padding mask.
Returns:
pooled_output (tensor): sentence-level output for classification task.
sequence_output (tensor): token-level output for sequence task.
"""
bert = BertModel(src_ids=input_ids,
position_ids=position_ids,
sentence_ids=segment_ids,
input_mask=input_mask,
config=self.bert_config,
use_fp16=False)
pooled_output = bert.get_pooled_output()
sequence_output = bert.get_sequence_output()
return pooled_output, sequence_output
if __name__ == '__main__':
test_module = BertWwm()
```shell
$ hub install chinese-roberta-wwm-ext-large==1.0.0
```
<p align="center">
<img src="https://bj.bcebos.com/paddlehub/paddlehub-img/bert_network.png" hspace='10'/> <br />
</p>
更多详情请参考[RoBERTa论文](https://arxiv.org/abs/1907.11692)[Chinese-BERT-wwm技术报告](https://arxiv.org/abs/1906.08101)
## API
```python
def context(
trainable=True,
max_seq_len=128
)
```
用于获取Module的上下文信息,得到输入、输出以及预训练的Paddle Program副本
**参数**
> trainable:设置为True时,Module中的参数在Fine-tune时也会随之训练,否则保持不变。
> max_seq_len:BERT模型的最大序列长度,若序列长度不足,会通过padding方式补到**max_seq_len**, 若序列长度大于该值,则会以截断方式让序列长度为**max_seq_len**,max_seq_len可取值范围为0~512;
**返回**
> inputs:dict类型,有以下字段:
> >**input_ids**存放输入文本tokenize后各token对应BERT词汇表的word ids, shape为\[batch_size, max_seq_len\],int64类型;
> >**position_ids**存放输入文本tokenize后各token所在该文本的位置,shape为\[batch_size, max_seq_len\],int64类型;
> >**segment_ids**存放各token所在文本的标识(token属于文本1或者文本2),shape为\[batch_size, max_seq_len\],int64类型;
> >**input_mask**存放token是否为padding的标识,shape为\[batch_size, max_seq_len\],int64类型;
>
> outputs:dict类型,Module的输出特征,有以下字段:
> >**pooled_output**字段存放句子粒度的特征,可用于文本分类等任务,shape为 \[batch_size, 768\],int64类型;
> >**sequence_output**字段存放字粒度的特征,可用于序列标注等任务,shape为 \[batch_size, seq_len, 768\],int64类型;
>
> program:包含该Module计算图的Program。
```python
def get_embedding(
texts,
use_gpu=False,
batch_size=1
)
```
用于获取输入文本的句子粒度特征与字粒度特征
**参数**
> texts:输入文本列表,格式为\[\[sample\_a\_text\_a, sample\_a\_text\_b\], \[sample\_b\_text\_a, sample\_b\_text\_b\],…,\],其中每个元素都是一个样例,每个样例可以包含text\_a与text\_b。
> use_gpu:是否使用gpu,默认为False。对于GPU用户,建议开启use_gpu。
**返回**
> results:list类型,格式为\[\[sample\_a\_pooled\_feature, sample\_a\_seq\_feature\], \[sample\_b\_pooled\_feature, sample\_b\_seq\_feature\],…,\],其中每个元素都是对应样例的特征输出,每个样例都有句子粒度特征pooled\_feature与字粒度特征seq\_feature。
>
```python
def get_params_layer()
```
用于获取参数层信息,该方法与ULMFiTStrategy联用可以严格按照层数设置分层学习率与逐层解冻。
**参数**
> 无
**返回**
> params_layer:dict类型,key为参数名,值为参数所在层数
**代码示例**
```python
import paddlehub as hub
# Load $ hub install chinese-roberta-wwm-ext-large pretrained model
module = hub.Module(name="chinese-roberta-wwm-ext-large")
inputs, outputs, program = module.context(trainable=True, max_seq_len=128)
# Must feed all the tensor of chinese-roberta-wwm-ext-large's module need
input_ids = inputs["input_ids"]
position_ids = inputs["position_ids"]
segment_ids = inputs["segment_ids"]
input_mask = inputs["input_mask"]
# Use "pooled_output" for sentence-level output.
pooled_output = outputs["pooled_output"]
# Use "sequence_output" for token-level output.
sequence_output = outputs["sequence_output"]
# Use "get_embedding" to get embedding result.
embedding_result = module.get_embedding(texts=[["Sample1_text_a"],["Sample2_text_a","Sample2_text_b"]], use_gpu=True)
# Use "get_params_layer" to get params layer and used to ULMFiTStrategy.
params_layer = module.get_params_layer()
strategy = hub.finetune.strategy.ULMFiTStrategy(frz_params_layer=params_layer, dis_params_layer=params_layer)
```
## 查看代码
https://github.com/ymcui/Chinese-BERT-wwm
## 贡献者
[ymcui](https://github.com/ymcui)
## 依赖
paddlepaddle >= 1.6.2
paddlehub >= 1.6.0
## 更新历史
* 1.0.0
初始发布
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""BERT model."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import six
import json
import paddle.fluid as fluid
from chinese_roberta_wwm_ext_large.model.transformer_encoder import encoder, pre_process_layer
class BertConfig(object):
def __init__(self, config_path):
self._config_dict = self._parse(config_path)
def _parse(self, config_path):
try:
with open(config_path) as json_file:
config_dict = json.load(json_file)
except Exception:
raise IOError("Error in parsing bert model config file '%s'" % config_path)
else:
return config_dict
def __getitem__(self, key):
return self._config_dict[key]
def print_config(self):
for arg, value in sorted(six.iteritems(self._config_dict)):
print('%s: %s' % (arg, value))
print('------------------------------------------------')
class BertModel(object):
def __init__(self, src_ids, position_ids, sentence_ids, input_mask, config, weight_sharing=True, use_fp16=False):
self._emb_size = config['hidden_size']
self._n_layer = config['num_hidden_layers']
self._n_head = config['num_attention_heads']
self._voc_size = config['vocab_size']
self._max_position_seq_len = config['max_position_embeddings']
self._sent_types = config['type_vocab_size']
self._hidden_act = config['hidden_act']
self._prepostprocess_dropout = config['hidden_dropout_prob']
self._attention_dropout = config['attention_probs_dropout_prob']
self._weight_sharing = weight_sharing
self._word_emb_name = "word_embedding"
self._pos_emb_name = "pos_embedding"
self._sent_emb_name = "sent_embedding"
self._dtype = "float16" if use_fp16 else "float32"
# Initialize all weigths by truncated normal initializer, and all biases
# will be initialized by constant zero by default.
self._param_initializer = fluid.initializer.TruncatedNormal(scale=config['initializer_range'])
self._build_model(src_ids, position_ids, sentence_ids, input_mask)
def _build_model(self, src_ids, position_ids, sentence_ids, input_mask):
# padding id in vocabulary must be set to 0
emb_out = fluid.layers.embedding(input=src_ids,
size=[self._voc_size, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._word_emb_name,
initializer=self._param_initializer),
is_sparse=False)
position_emb_out = fluid.layers.embedding(input=position_ids,
size=[self._max_position_seq_len, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._pos_emb_name,
initializer=self._param_initializer))
sent_emb_out = fluid.layers.embedding(sentence_ids,
size=[self._sent_types, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._sent_emb_name,
initializer=self._param_initializer))
emb_out = emb_out + position_emb_out
emb_out = emb_out + sent_emb_out
emb_out = pre_process_layer(emb_out, 'nd', self._prepostprocess_dropout, name='pre_encoder')
if self._dtype == "float16":
input_mask = fluid.layers.cast(x=input_mask, dtype=self._dtype)
self_attn_mask = fluid.layers.matmul(x=input_mask, y=input_mask, transpose_y=True)
self_attn_mask = fluid.layers.scale(x=self_attn_mask, scale=10000.0, bias=-1.0, bias_after_scale=False)
n_head_self_attn_mask = fluid.layers.stack(x=[self_attn_mask] * self._n_head, axis=1)
n_head_self_attn_mask.stop_gradient = True
self._enc_out = encoder(enc_input=emb_out,
attn_bias=n_head_self_attn_mask,
n_layer=self._n_layer,
n_head=self._n_head,
d_key=self._emb_size // self._n_head,
d_value=self._emb_size // self._n_head,
d_model=self._emb_size,
d_inner_hid=self._emb_size * 4,
prepostprocess_dropout=self._prepostprocess_dropout,
attention_dropout=self._attention_dropout,
relu_dropout=0,
hidden_act=self._hidden_act,
preprocess_cmd="",
postprocess_cmd="dan",
param_initializer=self._param_initializer,
name='encoder')
def get_sequence_output(self):
return self._enc_out
def get_pooled_output(self):
"""Get the first feature of each sequence for classification"""
next_sent_feat = fluid.layers.slice(input=self._enc_out, axes=[1], starts=[0], ends=[1])
next_sent_feat = fluid.layers.fc(input=next_sent_feat,
size=self._emb_size,
act="tanh",
param_attr=fluid.ParamAttr(name="pooled_fc.w_0",
initializer=self._param_initializer),
bias_attr="pooled_fc.b_0")
return next_sent_feat
def get_pretraining_output(self, mask_label, mask_pos, labels):
"""Get the loss & accuracy for pretraining"""
mask_pos = fluid.layers.cast(x=mask_pos, dtype='int32')
# extract the first token feature in each sentence
next_sent_feat = self.get_pooled_output()
reshaped_emb_out = fluid.layers.reshape(x=self._enc_out, shape=[-1, self._emb_size])
# extract masked tokens' feature
mask_feat = fluid.layers.gather(input=reshaped_emb_out, index=mask_pos)
# transform: fc
mask_trans_feat = fluid.layers.fc(input=mask_feat,
size=self._emb_size,
act=self._hidden_act,
param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0',
initializer=self._param_initializer),
bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
# transform: layer norm
mask_trans_feat = pre_process_layer(mask_trans_feat, 'n', name='mask_lm_trans')
mask_lm_out_bias_attr = fluid.ParamAttr(name="mask_lm_out_fc.b_0",
initializer=fluid.initializer.Constant(value=0.0))
if self._weight_sharing:
fc_out = fluid.layers.matmul(x=mask_trans_feat,
y=fluid.default_main_program().global_block().var(self._word_emb_name),
transpose_y=True)
fc_out += fluid.layers.create_parameter(shape=[self._voc_size],
dtype=self._dtype,
attr=mask_lm_out_bias_attr,
is_bias=True)
else:
fc_out = fluid.layers.fc(input=mask_trans_feat,
size=self._voc_size,
param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0",
initializer=self._param_initializer),
bias_attr=mask_lm_out_bias_attr)
mask_lm_loss = fluid.layers.softmax_with_cross_entropy(logits=fc_out, label=mask_label)
mean_mask_lm_loss = fluid.layers.mean(mask_lm_loss)
next_sent_fc_out = fluid.layers.fc(input=next_sent_feat,
size=2,
param_attr=fluid.ParamAttr(name="next_sent_fc.w_0",
initializer=self._param_initializer),
bias_attr="next_sent_fc.b_0")
next_sent_loss, next_sent_softmax = fluid.layers.softmax_with_cross_entropy(logits=next_sent_fc_out,
label=labels,
return_softmax=True)
next_sent_acc = fluid.layers.accuracy(input=next_sent_softmax, label=labels)
mean_next_sent_loss = fluid.layers.mean(next_sent_loss)
loss = mean_next_sent_loss + mean_mask_lm_loss
return next_sent_acc, mean_mask_lm_loss, loss
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Transformer encoder."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from functools import partial
import paddle.fluid as fluid
import paddle.fluid.layers as layers
def multi_head_attention(queries,
keys,
values,
attn_bias,
d_key,
d_value,
d_model,
n_head=1,
dropout_rate=0.,
cache=None,
param_initializer=None,
name='multi_head_att'):
"""
Multi-Head Attention. Note that attn_bias is added to the logit before
computing softmax activiation to mask certain selected positions so that
they will not considered in attention weights.
"""
keys = queries if keys is None else keys
values = keys if values is None else values
if not (len(queries.shape) == len(keys.shape) == len(values.shape) == 3):
raise ValueError("Inputs: quries, keys and values should all be 3-D tensors.")
def __compute_qkv(queries, keys, values, n_head, d_key, d_value):
"""
Add linear projection to queries, keys, and values.
"""
q = layers.fc(input=queries,
size=d_key * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_query_fc.w_0', initializer=param_initializer),
bias_attr=name + '_query_fc.b_0')
k = layers.fc(input=keys,
size=d_key * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_key_fc.w_0', initializer=param_initializer),
bias_attr=name + '_key_fc.b_0')
v = layers.fc(input=values,
size=d_value * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_value_fc.w_0', initializer=param_initializer),
bias_attr=name + '_value_fc.b_0')
return q, k, v
def __split_heads(x, n_head):
"""
Reshape the last dimension of inpunt tensor x so that it becomes two
dimensions and then transpose. Specifically, input a tensor with shape
[bs, max_sequence_length, n_head * hidden_dim] then output a tensor
with shape [bs, n_head, max_sequence_length, hidden_dim].
"""
hidden_size = x.shape[-1]
# The value 0 in shape attr means copying the corresponding dimension
# size of the input as the output dimension size.
reshaped = layers.reshape(x=x, shape=[0, 0, n_head, hidden_size // n_head], inplace=True)
# permuate the dimensions into:
# [batch_size, n_head, max_sequence_len, hidden_size_per_head]
return layers.transpose(x=reshaped, perm=[0, 2, 1, 3])
def __combine_heads(x):
"""
Transpose and then reshape the last two dimensions of inpunt tensor x
so that it becomes one dimension, which is reverse to __split_heads.
"""
if len(x.shape) == 3: return x
if len(x.shape) != 4:
raise ValueError("Input(x) should be a 4-D Tensor.")
trans_x = layers.transpose(x, perm=[0, 2, 1, 3])
# The value 0 in shape attr means copying the corresponding dimension
# size of the input as the output dimension size.
return layers.reshape(x=trans_x, shape=[0, 0, trans_x.shape[2] * trans_x.shape[3]], inplace=True)
def scaled_dot_product_attention(q, k, v, attn_bias, d_key, dropout_rate):
"""
Scaled Dot-Product Attention
"""
scaled_q = layers.scale(x=q, scale=d_key**-0.5)
product = layers.matmul(x=scaled_q, y=k, transpose_y=True)
if attn_bias:
product += attn_bias
weights = layers.softmax(product)
if dropout_rate:
weights = layers.dropout(weights,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
out = layers.matmul(weights, v)
return out
q, k, v = __compute_qkv(queries, keys, values, n_head, d_key, d_value)
if cache is not None: # use cache and concat time steps
# Since the inplace reshape in __split_heads changes the shape of k and
# v, which is the cache input for next time step, reshape the cache
# input from the previous time step first.
k = cache["k"] = layers.concat([layers.reshape(cache["k"], shape=[0, 0, d_model]), k], axis=1)
v = cache["v"] = layers.concat([layers.reshape(cache["v"], shape=[0, 0, d_model]), v], axis=1)
q = __split_heads(q, n_head)
k = __split_heads(k, n_head)
v = __split_heads(v, n_head)
ctx_multiheads = scaled_dot_product_attention(q, k, v, attn_bias, d_key, dropout_rate)
out = __combine_heads(ctx_multiheads)
# Project back to the model size.
proj_out = layers.fc(input=out,
size=d_model,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_output_fc.w_0', initializer=param_initializer),
bias_attr=name + '_output_fc.b_0')
return proj_out
def positionwise_feed_forward(x, d_inner_hid, d_hid, dropout_rate, hidden_act, param_initializer=None, name='ffn'):
"""
Position-wise Feed-Forward Networks.
This module consists of two linear transformations with a ReLU activation
in between, which is applied to each position separately and identically.
"""
hidden = layers.fc(input=x,
size=d_inner_hid,
num_flatten_dims=2,
act=hidden_act,
param_attr=fluid.ParamAttr(name=name + '_fc_0.w_0', initializer=param_initializer),
bias_attr=name + '_fc_0.b_0')
if dropout_rate:
hidden = layers.dropout(hidden,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
out = layers.fc(input=hidden,
size=d_hid,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_fc_1.w_0', initializer=param_initializer),
bias_attr=name + '_fc_1.b_0')
return out
def pre_post_process_layer(prev_out, out, process_cmd, dropout_rate=0., name=''):
"""
Add residual connection, layer normalization and droput to the out tensor
optionally according to the value of process_cmd.
This will be used before or after multi-head attention and position-wise
feed-forward networks.
"""
for cmd in process_cmd:
if cmd == "a": # add residual connection
out = out + prev_out if prev_out else out
elif cmd == "n": # add layer normalization
out_dtype = out.dtype
if out_dtype == fluid.core.VarDesc.VarType.FP16:
out = layers.cast(x=out, dtype="float32")
out = layers.layer_norm(out,
begin_norm_axis=len(out.shape) - 1,
param_attr=fluid.ParamAttr(name=name + '_layer_norm_scale',
initializer=fluid.initializer.Constant(1.)),
bias_attr=fluid.ParamAttr(name=name + '_layer_norm_bias',
initializer=fluid.initializer.Constant(0.)))
if out_dtype == fluid.core.VarDesc.VarType.FP16:
out = layers.cast(x=out, dtype="float16")
elif cmd == "d": # add dropout
if dropout_rate:
out = layers.dropout(out,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
return out
pre_process_layer = partial(pre_post_process_layer, None)
post_process_layer = pre_post_process_layer
def encoder_layer(enc_input,
attn_bias,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd="n",
postprocess_cmd="da",
param_initializer=None,
name=''):
"""The encoder layers that can be stacked to form a deep encoder.
This module consits of a multi-head (self) attention followed by
position-wise feed-forward networks and both the two components companied
with the post_process_layer to add residual connection, layer normalization
and droput.
"""
attn_output = multi_head_attention(pre_process_layer(enc_input,
preprocess_cmd,
prepostprocess_dropout,
name=name + '_pre_att'),
None,
None,
attn_bias,
d_key,
d_value,
d_model,
n_head,
attention_dropout,
param_initializer=param_initializer,
name=name + '_multi_head_att')
attn_output = post_process_layer(enc_input,
attn_output,
postprocess_cmd,
prepostprocess_dropout,
name=name + '_post_att')
ffd_output = positionwise_feed_forward(pre_process_layer(attn_output,
preprocess_cmd,
prepostprocess_dropout,
name=name + '_pre_ffn'),
d_inner_hid,
d_model,
relu_dropout,
hidden_act,
param_initializer=param_initializer,
name=name + '_ffn')
return post_process_layer(attn_output, ffd_output, postprocess_cmd, prepostprocess_dropout, name=name + '_post_ffn')
def encoder(enc_input,
attn_bias,
n_layer,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd="n",
postprocess_cmd="da",
param_initializer=None,
name=''):
"""
The encoder is composed of a stack of identical layers returned by calling
encoder_layer.
"""
for i in range(n_layer):
enc_output = encoder_layer(enc_input,
attn_bias,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd,
postprocess_cmd,
param_initializer=param_initializer,
name=name + '_layer_' + str(i))
enc_input = enc_output
enc_output = pre_process_layer(enc_output, preprocess_cmd, prepostprocess_dropout, name="post_encoder")
return enc_output
# coding:utf-8
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
from paddlehub import TransformerModule
from paddlehub.module.module import moduleinfo
from chinese_roberta_wwm_ext_large.model.bert import BertConfig, BertModel
@moduleinfo(
name="chinese-roberta-wwm-ext-large",
version="1.0.0",
summary="chinese-roberta-wwm-ext-large, 24-layer, 1024-hidden, 16-heads, 340M parameters ",
author="ymcui",
author_email="ymcui@ir.hit.edu.cn",
type="nlp/semantic_model",
)
class BertWwm(TransformerModule):
def _initialize(self):
self.MAX_SEQ_LEN = 512
self.params_path = os.path.join(self.directory, "assets", "params")
self.vocab_path = os.path.join(self.directory, "assets", "vocab.txt")
bert_config_path = os.path.join(self.directory, "assets", "bert_config.json")
self.bert_config = BertConfig(bert_config_path)
def net(self, input_ids, position_ids, segment_ids, input_mask):
"""
create neural network.
Args:
input_ids (tensor): the word ids.
position_ids (tensor): the position ids.
segment_ids (tensor): the segment ids.
input_mask (tensor): the padding mask.
Returns:
pooled_output (tensor): sentence-level output for classification task.
sequence_output (tensor): token-level output for sequence task.
"""
bert = BertModel(src_ids=input_ids,
position_ids=position_ids,
sentence_ids=segment_ids,
input_mask=input_mask,
config=self.bert_config,
use_fp16=False)
pooled_output = bert.get_pooled_output()
sequence_output = bert.get_sequence_output()
return pooled_output, sequence_output
if __name__ == '__main__':
test_module = BertWwm()
```shell ```shell
$ hub install ernie==1.2.0 $ hub install ernie==2.0.0
``` ```
## 在线体验 ## 在线体验
<a class="ant-btn large" href="https://aistudio.baidu.com/aistudio/projectDetail/79380" target="_blank">AI Studio 快速体验</a> <a class="ant-btn large" href="https://aistudio.baidu.com/aistudio/projectDetail/79380" target="_blank">AI Studio 快速体验</a>
...@@ -19,111 +19,131 @@ $ hub install ernie==1.2.0 ...@@ -19,111 +19,131 @@ $ hub install ernie==1.2.0
更多详情请参考[ERNIE论文](https://arxiv.org/abs/1904.09223) 更多详情请参考[ERNIE论文](https://arxiv.org/abs/1904.09223)
## API ## API
```python ```python
def context( def __init__(
trainable=True, task=None,
max_seq_len=128 load_checkpoint=None,
) label_map=None)
``` ```
用于获取Module的上下文信息,得到输入、输出以及预训练的Paddle Program副本
**参数**
> trainable:设置为True时,Module中的参数在Fine-tune时也会随之训练,否则保持不变。
> max_seq_len:BERT模型的最大序列长度,若序列长度不足,会通过padding方式补到**max_seq_len**, 若序列长度大于该值,则会以截断方式让序列长度为**max_seq_len**,max_seq_len可取值范围为0~512;
**返回** 创建Module对象(动态图组网版本)。
> inputs:dict类型,有以下字段:
> >**input_ids**存放输入文本tokenize后各token对应BERT词汇表的word ids, shape为\[batch_size, max_seq_len\],int64类型;
> >**position_ids**存放输入文本tokenize后各token所在该文本的位置,shape为\[batch_size, max_seq_len\],int64类型;
> >**segment_ids**存放各token所在文本的标识(token属于文本1或者文本2),shape为\[batch_size, max_seq_len\],int64类型;
> >**input_mask**存放token是否为padding的标识,shape为\[batch_size, max_seq_len\],int64类型;
>
> outputs:dict类型,Module的输出特征,有以下字段:
> >**pooled_output**字段存放句子粒度的特征,可用于文本分类等任务,shape为 \[batch_size, 768\],int64类型;
> >**sequence_output**字段存放字粒度的特征,可用于序列标注等任务,shape为 \[batch_size, seq_len, 768\],int64类型;
>
> program:包含该Module计算图的Program。
**参数**
* `task`: 任务名称,可为`sequence_classification`
* `load_checkpoint`:使用PaddleHub Fine-tune api训练保存的模型参数文件路径。
* `label_map`:预测时的类别映射表。
```python ```python
def get_embedding( def predict(
texts, data,
use_gpu=False, max_seq_len=128,
batch_size=1 batch_size=1,
) use_gpu=False)
``` ```
用于获取输入文本的句子粒度特征与字粒度特征
**参数** **参数**
> texts:输入文本列表,格式为\[\[sample\_a\_text\_a, sample\_a\_text\_b\], \[sample\_b\_text\_a, sample\_b\_text\_b\],…,\],其中每个元素都是一个样例,每个样例可以包含text\_a与text\_b。 * `data`: 待预测数据,格式为\[\[sample\_a\_text\_a, sample\_a\_text\_b\], \[sample\_b\_text\_a, sample\_b\_text\_b\],…,\],其中每个元素都是一个样例,
> use_gpu:是否使用gpu,默认为False。对于GPU用户,建议开启use_gpu。 每个样例可以包含text\_a与text\_b。每个样例文本数量(1个或者2个)需和训练时保持一致。
* `max_seq_len`:模型处理文本的最大长度
* `batch_size`:模型批处理大小
* `use_gpu`:是否使用gpu,默认为False。对于GPU用户,建议开启use_gpu。
**返回** **返回**
> results:list类型,格式为\[\[sample\_a\_pooled\_feature, sample\_a\_seq\_feature\], \[sample\_b\_pooled\_feature, sample\_b\_seq\_feature\],…,\],其中每个元素都是对应样例的特征输出,每个样例都有句子粒度特征pooled\_feature与字粒度特征seq\_feature。
>
```python ```python
def get_params_layer() def get_embedding(
texts,
use_gpu=False
)
``` ```
用于获取参数层信息,该方法与ULMFiTStrategy联用可以严格按照层数设置分层学习率与逐层解冻。 用于获取输入文本的句子粒度特征与字粒度特征
**参数** **参数**
> 无 * `texts`:输入文本列表,格式为\[\[sample\_a\_text\_a, sample\_a\_text\_b\], \[sample\_b\_text\_a, sample\_b\_text\_b\],…,\],其中每个元素都是一个样例,每个样例可以包含text\_a与text\_b。
* `use_gpu`:是否使用gpu,默认为False。对于GPU用户,建议开启use_gpu。
**返回** **返回**
> params_layer:dict类型,key为参数名,值为参数所在层数 * `results`:list类型,格式为\[\[sample\_a\_pooled\_feature, sample\_a\_seq\_feature\], \[sample\_b\_pooled\_feature, sample\_b\_seq\_feature\],…,\],其中每个元素都是对应样例的特征输出,每个样例都有句子粒度特征pooled\_feature与字粒度特征seq\_feature。
**代码示例** **代码示例**
```python ```python
import paddlehub as hub import paddlehub as hub
# Load $ hub install ernie pretrained model data = [
module = hub.Module(name="ernie") '这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般',
inputs, outputs, program = module.context(trainable=True, max_seq_len=128) '怀着十分激动的心情放映,可是看着看着发现,在放映完毕后,出现一集米老鼠的动画片',
'作为老的四星酒店,房间依然很整洁,相当不错。机场接机服务很好,可以在车上办理入住手续,节省时间。',
]
label_map = {0: 'negative', 1: 'positive'}
model = hub.Module(
name='ernie',
version='2.0.0',
task='sequence_classification',
load_checkpoint='/path/to/parameters',
label_map=label_map)
results = model.predict(data, max_seq_len=50, batch_size=1, use_gpu=False)
for idx, text in enumerate(data):
print('Data: {} \t Lable: {}'.format(text, results[idx]))
```
参考PaddleHub 文本分类示例。https://github.com/PaddlePaddle/PaddleHub/tree/release/v2.0.0-beta/demo/text_classifcation
# Must feed all the tensor of ernie's module need ## 服务部署
input_ids = inputs["input_ids"]
position_ids = inputs["position_ids"]
segment_ids = inputs["segment_ids"]
input_mask = inputs["input_mask"]
# Use "pooled_output" for sentence-level output. PaddleHub Serving可以部署一个在线获取预训练词向量。
pooled_output = outputs["pooled_output"]
# Use "sequence_output" for token-level output. ### Step1: 启动PaddleHub Serving
sequence_output = outputs["sequence_output"]
# Use "get_embedding" to get embedding result. 运行启动命令:
embedding_result = module.get_embedding(texts=[["Sample1_text_a"],["Sample2_text_a","Sample2_text_b"]], use_gpu=True)
# Use "get_params_layer" to get params layer and used to ULMFiTStrategy. ```shell
params_layer = module.get_params_layer() $ hub serving start -m ernie
strategy = hub.finetune.strategy.ULMFiTStrategy(frz_params_layer=params_layer, dis_params_layer=params_layer)
``` ```
利用该PaddleHub Module Fine-tune示例,可参考[文本分类](https://github.com/PaddlePaddle/PaddleHub/tree/release/v1.2/demo/text-classification)[序列标注](https://github.com/PaddlePaddle/PaddleHub/tree/release/v1.2/demo/sequence-labeling)
`Note`:建议该PaddleHub Module在**GPU**环境中运行。如出现显存不足,可以将**batch_size****max_seq_len**调小。 这样就完成了一个获取预训练词向量服务化API的部署,默认端口号为8866。
**NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
### Step2: 发送预测请求
配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
```python
import requests
import json
# 指定用于预测的文本并生成字典{"text": [text_1, text_2, ... ]}
text = [["今天是个好日子", "天气预报说今天要下雨"], ["这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般"]]
# 以key的方式指定text传入预测方法的时的参数,此例中为"texts"
# 对应本地部署,则为module.get_embedding(texts=text)
data = {"texts": text}
# 发送post请求,content-type类型应指定json方式
url = "http://10.12.121.132:8866/predict/ernie"
# 指定post请求的headers为application/json方式
headers = {"Content-Type": "application/json"}
r = requests.post(url=url, headers=headers, data=json.dumps(data))
print(r.json())
```
## 查看代码 ## 查看代码
https://github.com/PaddlePaddle/ERNIE https://github.com/PaddlePaddle/ERNIE
## 依赖 ## 依赖
paddlepaddle >= 1.6.2 paddlepaddle >= 2.0.0
paddlehub >= 1.6.0 paddlehub >= 2.0.0
## 更新历史 ## 更新历史
...@@ -146,3 +166,7 @@ paddlehub >= 1.6.0 ...@@ -146,3 +166,7 @@ paddlehub >= 1.6.0
* 1.2.0 * 1.2.0
支持get_embedding与get_params_layer 支持get_embedding与get_params_layer
* 2.0.0
全面升级动态图版本,接口有所变化
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Ernie model."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
from __future__ import absolute_import
import json
import six
import paddle.fluid as fluid
from io import open
from paddlehub.common.logger import logger
from ernie.model.transformer_encoder import encoder, pre_process_layer
class ErnieConfig(object):
def __init__(self, config_path):
self._config_dict = self._parse(config_path)
def _parse(self, config_path):
try:
with open(config_path, 'r', encoding='utf8') as json_file:
config_dict = json.load(json_file)
except Exception:
raise IOError("Error in parsing Ernie model config file '%s'" % config_path)
else:
return config_dict
def __getitem__(self, key):
return self._config_dict.get(key, None)
def print_config(self):
for arg, value in sorted(six.iteritems(self._config_dict)):
logger.info('%s: %s' % (arg, value))
logger.info('------------------------------------------------')
class ErnieModel(object):
def __init__(self, src_ids, position_ids, sentence_ids, input_mask, config, weight_sharing=True, use_fp16=False):
self._emb_size = config['hidden_size']
self._n_layer = config['num_hidden_layers']
self._n_head = config['num_attention_heads']
self._voc_size = config['vocab_size']
self._max_position_seq_len = config['max_position_embeddings']
self._sent_types = config['type_vocab_size']
self._hidden_act = config['hidden_act']
self._prepostprocess_dropout = config['hidden_dropout_prob']
self._attention_dropout = config['attention_probs_dropout_prob']
self._weight_sharing = weight_sharing
self._word_emb_name = "word_embedding"
self._pos_emb_name = "pos_embedding"
self._sent_emb_name = "sent_embedding"
self._dtype = "float16" if use_fp16 else "float32"
# Initialize all weigths by truncated normal initializer, and all biases
# will be initialized by constant zero by default.
self._param_initializer = fluid.initializer.TruncatedNormal(scale=config['initializer_range'])
self._build_model(src_ids, position_ids, sentence_ids, input_mask)
def _build_model(self, src_ids, position_ids, sentence_ids, input_mask):
# padding id in vocabulary must be set to 0
emb_out = fluid.layers.embedding(input=src_ids,
size=[self._voc_size, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._word_emb_name,
initializer=self._param_initializer),
is_sparse=False)
position_emb_out = fluid.layers.embedding(input=position_ids,
size=[self._max_position_seq_len, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._pos_emb_name,
initializer=self._param_initializer))
sent_emb_out = fluid.layers.embedding(sentence_ids,
size=[self._sent_types, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._sent_emb_name,
initializer=self._param_initializer))
emb_out = emb_out + position_emb_out
emb_out = emb_out + sent_emb_out
emb_out = pre_process_layer(emb_out, 'nd', self._prepostprocess_dropout, name='pre_encoder')
if self._dtype == "float16":
input_mask = fluid.layers.cast(x=input_mask, dtype=self._dtype)
self_attn_mask = fluid.layers.matmul(x=input_mask, y=input_mask, transpose_y=True)
self_attn_mask = fluid.layers.scale(x=self_attn_mask, scale=10000.0, bias=-1.0, bias_after_scale=False)
n_head_self_attn_mask = fluid.layers.stack(x=[self_attn_mask] * self._n_head, axis=1)
n_head_self_attn_mask.stop_gradient = True
self._enc_out = encoder(enc_input=emb_out,
attn_bias=n_head_self_attn_mask,
n_layer=self._n_layer,
n_head=self._n_head,
d_key=self._emb_size // self._n_head,
d_value=self._emb_size // self._n_head,
d_model=self._emb_size,
d_inner_hid=self._emb_size * 4,
prepostprocess_dropout=self._prepostprocess_dropout,
attention_dropout=self._attention_dropout,
relu_dropout=0,
hidden_act=self._hidden_act,
preprocess_cmd="",
postprocess_cmd="dan",
param_initializer=self._param_initializer,
name='encoder')
def get_sequence_output(self):
return self._enc_out
def get_pooled_output(self):
"""Get the first feature of each sequence for classification"""
next_sent_feat = fluid.layers.slice(input=self._enc_out, axes=[1], starts=[0], ends=[1])
next_sent_feat = fluid.layers.fc(input=next_sent_feat,
size=self._emb_size,
act="tanh",
param_attr=fluid.ParamAttr(name="pooled_fc.w_0",
initializer=self._param_initializer),
bias_attr="pooled_fc.b_0")
return next_sent_feat
def get_pretraining_output(self, mask_label, mask_pos, labels):
"""Get the loss & accuracy for pretraining"""
mask_pos = fluid.layers.cast(x=mask_pos, dtype='int32')
# extract the first token feature in each sentence
next_sent_feat = self.get_pooled_output()
reshaped_emb_out = fluid.layers.reshape(x=self._enc_out, shape=[-1, self._emb_size])
# extract masked tokens' feature
mask_feat = fluid.layers.gather(input=reshaped_emb_out, index=mask_pos)
# transform: fc
mask_trans_feat = fluid.layers.fc(input=mask_feat,
size=self._emb_size,
act=self._hidden_act,
param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0',
initializer=self._param_initializer),
bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
# transform: layer norm
mask_trans_feat = pre_process_layer(mask_trans_feat, 'n', name='mask_lm_trans')
mask_lm_out_bias_attr = fluid.ParamAttr(name="mask_lm_out_fc.b_0",
initializer=fluid.initializer.Constant(value=0.0))
if self._weight_sharing:
fc_out = fluid.layers.matmul(x=mask_trans_feat,
y=fluid.default_main_program().global_block().var(self._word_emb_name),
transpose_y=True)
fc_out += fluid.layers.create_parameter(shape=[self._voc_size],
dtype=self._dtype,
attr=mask_lm_out_bias_attr,
is_bias=True)
else:
fc_out = fluid.layers.fc(input=mask_trans_feat,
size=self._voc_size,
param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0",
initializer=self._param_initializer),
bias_attr=mask_lm_out_bias_attr)
mask_lm_loss = fluid.layers.softmax_with_cross_entropy(logits=fc_out, label=mask_label)
mean_mask_lm_loss = fluid.layers.mean(mask_lm_loss)
next_sent_fc_out = fluid.layers.fc(input=next_sent_feat,
size=2,
param_attr=fluid.ParamAttr(name="next_sent_fc.w_0",
initializer=self._param_initializer),
bias_attr="next_sent_fc.b_0")
next_sent_loss, next_sent_softmax = fluid.layers.softmax_with_cross_entropy(logits=next_sent_fc_out,
label=labels,
return_softmax=True)
next_sent_acc = fluid.layers.accuracy(input=next_sent_softmax, label=labels)
mean_next_sent_loss = fluid.layers.mean(next_sent_loss)
loss = mean_next_sent_loss + mean_mask_lm_loss
return next_sent_acc, mean_mask_lm_loss, loss
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Transformer encoder."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from functools import partial
import paddle.fluid as fluid
import paddle.fluid.layers as layers
def multi_head_attention(queries,
keys,
values,
attn_bias,
d_key,
d_value,
d_model,
n_head=1,
dropout_rate=0.,
cache=None,
param_initializer=None,
name='multi_head_att'):
"""
Multi-Head Attention. Note that attn_bias is added to the logit before
computing softmax activiation to mask certain selected positions so that
they will not considered in attention weights.
"""
keys = queries if keys is None else keys
values = keys if values is None else values
if not (len(queries.shape) == len(keys.shape) == len(values.shape) == 3):
raise ValueError("Inputs: quries, keys and values should all be 3-D tensors.")
def __compute_qkv(queries, keys, values, n_head, d_key, d_value):
"""
Add linear projection to queries, keys, and values.
"""
q = layers.fc(input=queries,
size=d_key * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_query_fc.w_0', initializer=param_initializer),
bias_attr=name + '_query_fc.b_0')
k = layers.fc(input=keys,
size=d_key * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_key_fc.w_0', initializer=param_initializer),
bias_attr=name + '_key_fc.b_0')
v = layers.fc(input=values,
size=d_value * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_value_fc.w_0', initializer=param_initializer),
bias_attr=name + '_value_fc.b_0')
return q, k, v
def __split_heads(x, n_head):
"""
Reshape the last dimension of inpunt tensor x so that it becomes two
dimensions and then transpose. Specifically, input a tensor with shape
[bs, max_sequence_length, n_head * hidden_dim] then output a tensor
with shape [bs, n_head, max_sequence_length, hidden_dim].
"""
hidden_size = x.shape[-1]
# The value 0 in shape attr means copying the corresponding dimension
# size of the input as the output dimension size.
reshaped = layers.reshape(x=x, shape=[0, 0, n_head, hidden_size // n_head], inplace=True)
# permuate the dimensions into:
# [batch_size, n_head, max_sequence_len, hidden_size_per_head]
return layers.transpose(x=reshaped, perm=[0, 2, 1, 3])
def __combine_heads(x):
"""
Transpose and then reshape the last two dimensions of inpunt tensor x
so that it becomes one dimension, which is reverse to __split_heads.
"""
if len(x.shape) == 3: return x
if len(x.shape) != 4:
raise ValueError("Input(x) should be a 4-D Tensor.")
trans_x = layers.transpose(x, perm=[0, 2, 1, 3])
# The value 0 in shape attr means copying the corresponding dimension
# size of the input as the output dimension size.
return layers.reshape(x=trans_x, shape=[0, 0, trans_x.shape[2] * trans_x.shape[3]], inplace=True)
def scaled_dot_product_attention(q, k, v, attn_bias, d_key, dropout_rate):
"""
Scaled Dot-Product Attention
"""
scaled_q = layers.scale(x=q, scale=d_key**-0.5)
product = layers.matmul(x=scaled_q, y=k, transpose_y=True)
if attn_bias:
product += attn_bias
weights = layers.softmax(product)
if dropout_rate:
weights = layers.dropout(weights,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
out = layers.matmul(weights, v)
return out
q, k, v = __compute_qkv(queries, keys, values, n_head, d_key, d_value)
if cache is not None: # use cache and concat time steps
# Since the inplace reshape in __split_heads changes the shape of k and
# v, which is the cache input for next time step, reshape the cache
# input from the previous time step first.
k = cache["k"] = layers.concat([layers.reshape(cache["k"], shape=[0, 0, d_model]), k], axis=1)
v = cache["v"] = layers.concat([layers.reshape(cache["v"], shape=[0, 0, d_model]), v], axis=1)
q = __split_heads(q, n_head)
k = __split_heads(k, n_head)
v = __split_heads(v, n_head)
ctx_multiheads = scaled_dot_product_attention(q, k, v, attn_bias, d_key, dropout_rate)
out = __combine_heads(ctx_multiheads)
# Project back to the model size.
proj_out = layers.fc(input=out,
size=d_model,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_output_fc.w_0', initializer=param_initializer),
bias_attr=name + '_output_fc.b_0')
return proj_out
def positionwise_feed_forward(x, d_inner_hid, d_hid, dropout_rate, hidden_act, param_initializer=None, name='ffn'):
"""
Position-wise Feed-Forward Networks.
This module consists of two linear transformations with a ReLU activation
in between, which is applied to each position separately and identically.
"""
hidden = layers.fc(input=x,
size=d_inner_hid,
num_flatten_dims=2,
act=hidden_act,
param_attr=fluid.ParamAttr(name=name + '_fc_0.w_0', initializer=param_initializer),
bias_attr=name + '_fc_0.b_0')
if dropout_rate:
hidden = layers.dropout(hidden,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
out = layers.fc(input=hidden,
size=d_hid,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_fc_1.w_0', initializer=param_initializer),
bias_attr=name + '_fc_1.b_0')
return out
def pre_post_process_layer(prev_out, out, process_cmd, dropout_rate=0., name=''):
"""
Add residual connection, layer normalization and droput to the out tensor
optionally according to the value of process_cmd.
This will be used before or after multi-head attention and position-wise
feed-forward networks.
"""
for cmd in process_cmd:
if cmd == "a": # add residual connection
out = out + prev_out if prev_out else out
elif cmd == "n": # add layer normalization
out_dtype = out.dtype
if out_dtype == fluid.core.VarDesc.VarType.FP16:
out = layers.cast(x=out, dtype="float32")
out = layers.layer_norm(out,
begin_norm_axis=len(out.shape) - 1,
param_attr=fluid.ParamAttr(name=name + '_layer_norm_scale',
initializer=fluid.initializer.Constant(1.)),
bias_attr=fluid.ParamAttr(name=name + '_layer_norm_bias',
initializer=fluid.initializer.Constant(0.)))
if out_dtype == fluid.core.VarDesc.VarType.FP16:
out = layers.cast(x=out, dtype="float16")
elif cmd == "d": # add dropout
if dropout_rate:
out = layers.dropout(out,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
return out
pre_process_layer = partial(pre_post_process_layer, None)
post_process_layer = pre_post_process_layer
def encoder_layer(enc_input,
attn_bias,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd="n",
postprocess_cmd="da",
param_initializer=None,
name=''):
"""The encoder layers that can be stacked to form a deep encoder.
This module consits of a multi-head (self) attention followed by
position-wise feed-forward networks and both the two components companied
with the post_process_layer to add residual connection, layer normalization
and droput.
"""
attn_output = multi_head_attention(pre_process_layer(enc_input,
preprocess_cmd,
prepostprocess_dropout,
name=name + '_pre_att'),
None,
None,
attn_bias,
d_key,
d_value,
d_model,
n_head,
attention_dropout,
param_initializer=param_initializer,
name=name + '_multi_head_att')
attn_output = post_process_layer(enc_input,
attn_output,
postprocess_cmd,
prepostprocess_dropout,
name=name + '_post_att')
ffd_output = positionwise_feed_forward(pre_process_layer(attn_output,
preprocess_cmd,
prepostprocess_dropout,
name=name + '_pre_ffn'),
d_inner_hid,
d_model,
relu_dropout,
hidden_act,
param_initializer=param_initializer,
name=name + '_ffn')
return post_process_layer(attn_output, ffd_output, postprocess_cmd, prepostprocess_dropout, name=name + '_post_ffn')
def encoder(enc_input,
attn_bias,
n_layer,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd="n",
postprocess_cmd="da",
param_initializer=None,
name=''):
"""
The encoder is composed of a stack of identical layers returned by calling
encoder_layer.
"""
for i in range(n_layer):
enc_output = encoder_layer(enc_input,
attn_bias,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd,
postprocess_cmd,
param_initializer=param_initializer,
name=name + '_layer_' + str(i))
enc_input = enc_output
enc_output = pre_process_layer(enc_output, preprocess_cmd, prepostprocess_dropout, name="post_encoder")
return enc_output
# coding:utf-8 # Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License" # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
# You may obtain a copy of the License at # You may obtain a copy of the License at
# #
...@@ -12,65 +11,210 @@ ...@@ -12,65 +11,210 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
from __future__ import absolute_import from typing import Dict, List, Optional, Union, Tuple
from __future__ import division
from __future__ import print_function
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os import os
from paddlehub import TransformerModule from paddle.dataset.common import DATA_HOME
from paddlehub.module.module import moduleinfo import paddle
import paddle.nn as nn
import paddle.nn.functional as F
from ernie.model.ernie import ErnieModel, ErnieConfig from paddlehub import BertTokenizer
from paddlehub.module.modeling_ernie import ErnieModel, ErnieForSequenceClassification
from paddlehub.module.module import moduleinfo, serving
from paddlehub.utils.log import logger
from paddlehub.utils.utils import download
@moduleinfo( @moduleinfo(
name="ernie", name="ernie",
version="1.2.0", version="2.0.0",
summary="Baidu's ERNIE, Enhanced Representation through kNowledge IntEgration, max_seq_len=512 when predtrained", summary=
author="baidu-nlp", "Baidu's ERNIE, Enhanced Representation through kNowledge IntEgration, max_seq_len=512 when predtrained. The module is executed as paddle.dygraph.",
author="paddlepaddle",
author_email="", author_email="",
type="nlp/semantic_model", type="nlp/semantic_model")
) class Ernie(nn.Layer):
class Ernie(TransformerModule): """
def _initialize(self): Ernie model
ernie_config_path = os.path.join(self.directory, "assets", "ernie_config.json") """
self.ernie_config = ErnieConfig(ernie_config_path)
self.MAX_SEQ_LEN = 512 def __init__(
self.params_path = os.path.join(self.directory, "assets", "params") self,
self.vocab_path = os.path.join(self.directory, "assets", "vocab.txt")\ task=None,
load_checkpoint=None,
def net(self, input_ids, position_ids, segment_ids, input_mask): label_map=None,
):
super(Ernie, self).__init__()
# TODO(zhangxuefei): add token_classification task
if task == 'sequence_classification':
self.model = ErnieForSequenceClassification.from_pretrained(pretrained_model_name_or_path='ernie')
self.criterion = paddle.nn.loss.CrossEntropyLoss()
self.metric = paddle.metric.Accuracy(name='acc_accumulation')
elif task is None:
self.model = ErnieModel.from_pretrained(pretrained_model_name_or_path='ernie')
else:
raise RuntimeError("Unknown task %s, task should be sequence_classification" % task)
self.task = task
self.label_map = label_map
if load_checkpoint is not None and os.path.isfile(load_checkpoint):
state_dict = paddle.load(load_checkpoint)
self.set_state_dict(state_dict)
logger.info('Loaded parameters from %s' % os.path.abspath(load_checkpoint))
def forward(self, input_ids, token_type_ids=None, position_ids=None, attention_mask=None, labels=None):
result = self.model(input_ids, token_type_ids, position_ids, attention_mask)
if self.task is not None:
logits = result
probs = F.softmax(logits, axis=1)
if labels is not None:
loss = self.criterion(logits, labels)
correct = self.metric.compute(probs, labels)
acc = self.metric.update(correct)
return probs, loss, acc
return probs
else:
sequence_output, pooled_output = result
return sequence_output, pooled_output
def get_vocab_path(self):
"""
Gets the path of the module vocabulary path.
"""
save_path = os.path.join(DATA_HOME, 'ernie', 'vocab.txt')
if not os.path.exists(save_path) or not os.path.isfile(save_path):
url = "https://paddlenlp.bj.bcebos.com/models/transformers/ernie/vocab.txt"
download(url, os.path.join(DATA_HOME, 'ernie'))
return save_path
def get_tokenizer(self, tokenize_chinese_chars=True):
"""
Gets the tokenizer that is customized for this module.
Args:
tokenize_chinese_chars (:obj: bool , defaults to :obj: True):
Whether to tokenize chinese characters or not.
Returns:
tokenizer (:obj:BertTokenizer) : The tokenizer which was customized for this module.
"""
return BertTokenizer(tokenize_chinese_chars=tokenize_chinese_chars, vocab_file=self.get_vocab_path())
def training_step(self, batch: List[paddle.Tensor], batch_idx: int):
"""
One step for training, which should be called as forward computation.
Args:
batch(:obj:List[paddle.Tensor]): The one batch data, which contains the model needed,
such as input_ids, sent_ids, pos_ids, input_mask and labels.
batch_idx(int): The index of batch.
Returns:
results(:obj: Dict) : The model outputs, such as loss and metrics.
"""
predictions, avg_loss, acc = self(input_ids=batch[0], token_type_ids=batch[1], labels=batch[2])
return {'loss': avg_loss, 'metrics': {'acc': acc}}
def validation_step(self, batch: List[paddle.Tensor], batch_idx: int):
""" """
create neural network. One step for validation, which should be called as forward computation.
Args:
batch(:obj:List[paddle.Tensor]): The one batch data, which contains the model needed,
such as input_ids, sent_ids, pos_ids, input_mask and labels.
batch_idx(int): The index of batch.
Returns:
results(:obj: Dict) : The model outputs, such as metrics.
"""
predictions, avg_loss, acc = self(input_ids=batch[0], token_type_ids=batch[1], labels=batch[2])
return {'metrics': {'acc': acc}}
def predict(self, data, max_seq_len=128, batch_size=1, use_gpu=False):
"""
Predicts the data labels.
Args: Args:
input_ids (tensor): the word ids. data (obj:`List(str)`): The processed data whose each element is the raw text.
position_ids (tensor): the position ids. max_seq_len (:obj:`int`, `optional`, defaults to :int:`None`):
segment_ids (tensor): the segment ids. If set to a number, will limit the total sequence returned so that it has a maximum length.
input_mask (tensor): the padding mask. batch_size(obj:`int`, defaults to 1): The number of batch.
use_gpu(obj:`bool`, defaults to `False`): Whether to use gpu to run or not.
Returns: Returns:
pooled_output (tensor): sentence-level output for classification task. results(obj:`list`): All the predictions labels.
sequence_output (tensor): token-level output for sequence task.
""" """
self.ernie_config._config_dict['use_task_id'] = False # TODO(zhangxuefei): add task token_classification task predict.
ernie = ErnieModel(src_ids=input_ids, if self.task not in ['sequence_classification']:
position_ids=position_ids, raise RuntimeError("The predict method is for sequence_classification task, but got task %s." % self.task)
sentence_ids=segment_ids,
input_mask=input_mask, paddle.set_device('gpu') if use_gpu else paddle.set_device('cpu')
config=self.ernie_config, tokenizer = self.get_tokenizer()
use_fp16=False)
pooled_output = ernie.get_pooled_output() examples = []
sequence_output = ernie.get_sequence_output() for text in data:
return pooled_output, sequence_output if len(text) == 1:
encoded_inputs = tokenizer.encode(text[0], text_pair=None, max_seq_len=max_seq_len)
def param_prefix(self): elif len(text) == 2:
return "@HUB_ernie-stable@" encoded_inputs = tokenizer.encode(text[0], text_pair=text[1], max_seq_len=max_seq_len)
else:
raise RuntimeError(
if __name__ == '__main__': 'The input text must have one or two sequence, but got %d. Please check your inputs.' % len(text))
test_module = Ernie() examples.append((encoded_inputs['input_ids'], encoded_inputs['segment_ids']))
def _batchify_fn(batch):
input_ids = [entry[0] for entry in batch]
segment_ids = [entry[1] for entry in batch]
return input_ids, segment_ids
# Seperates data into some batches.
batches = []
one_batch = []
for example in examples:
one_batch.append(example)
if len(one_batch) == batch_size:
batches.append(one_batch)
one_batch = []
if one_batch:
# The last batch whose size is less than the config batch_size setting.
batches.append(one_batch)
results = []
self.eval()
for batch in batches:
input_ids, segment_ids = _batchify_fn(batch)
input_ids = paddle.to_tensor(input_ids)
segment_ids = paddle.to_tensor(segment_ids)
# TODO(zhangxuefei): add task token_classification postprocess after prediction.
if self.task == 'sequence_classification':
probs = self(input_ids, segment_ids)
idx = paddle.argmax(probs, axis=1).numpy()
idx = idx.tolist()
labels = [self.label_map[i] for i in idx]
results.extend(labels)
return results
@serving
def get_embedding(self, texts, use_gpu=False):
if self.task is not None:
raise RuntimeError("The get_embedding method is only valid when task is None, but got task %s" % self.task)
paddle.set_device('gpu') if use_gpu else paddle.set_device('cpu')
tokenizer = self.get_tokenizer()
results = []
for text in texts:
if len(text) == 1:
encoded_inputs = tokenizer.encode(text[0], text_pair=None, pad_to_max_seq_len=False)
elif len(text) == 2:
encoded_inputs = tokenizer.encode(text[0], text_pair=text[1], pad_to_max_seq_len=False)
else:
raise RuntimeError(
'The input text must have one or two sequence, but got %d. Please check your inputs.' % len(text))
input_ids = paddle.to_tensor(encoded_inputs['input_ids']).unsqueeze(0)
segment_ids = paddle.to_tensor(encoded_inputs['segment_ids']).unsqueeze(0)
sequence_output, pooled_output = self(input_ids, segment_ids)
sequence_output = sequence_output.squeeze(0)
pooled_output = pooled_output.squeeze(0)
results.append((sequence_output.numpy().tolist(), pooled_output.numpy().tolist()))
return results
```shell ```shell
$ hub install ernie_tiny==1.1.0 $ hub install ernie_tiny==2.0.0
``` ```
## 在线体验
<a class="ant-btn large" href="https://aistudio.baidu.com/aistudio/projectDetail/79380" target="_blank">AI Studio 快速体验</a>
<p align="center"> <p align="center">
<img src="https://paddlehub.bj.bcebos.com/paddlehub-img%2Fernie_tiny_framework.PNG" hspace='10'/> <br /> <img src="https://bj.bcebos.com/paddlehub/paddlehub-img/ernie_network_1.png" hspace='10'/> <br />
</p> </p>
<p align="center">
<img src="https://bj.bcebos.com/paddlehub/paddlehub-img/ernie_network_2.png" hspace='10'/> <br />
</p>
更多详情请参考[ERNIE论文](https://arxiv.org/abs/1904.09223)
## API ## API
```python ```python
def context( def __init__(
trainable=True, task=None,
max_seq_len=128 load_checkpoint=None,
) label_map=None)
``` ```
用于获取Module的上下文信息,得到输入、输出以及预训练的Paddle Program副本
创建Module对象(动态图组网版本)。
**参数** **参数**
> trainable:设置为True时,Module中的参数在Fine-tune时也会随之训练,否则保持不变。
> max_seq_len:ERNIE模型的最大序列长度,若序列长度不足,会通过padding方式补到**max_seq_len**, 若序列长度大于该值,则会以截断方式让序列长度为**max_seq_len**,max_seq_len可取值范围为0~512;
**返回** * `task`: 任务名称,可为`sequence_classification`
> inputs:dict类型,有以下字段: * `load_checkpoint`:使用PaddleHub Fine-tune api训练保存的模型参数文件路径。
> >**input_ids**字段存放Token Embedding,shape为\[batch_size, max_seq_len\],int64类型; * `label_map`:预测时的类别映射表。
> >**position_ids**字段存放Position Embedding,shape为\[batch_size, max_seq_len\],int64类型;
> >**segment_ids**字段存放Sentence Embedding,shape为\[batch_size, max_seq_len\],int64类型;
> >**input_mask**字段存放token是否为padding的标识,shape为\[batch_size, max_seq_len\],int64类型;
>
> outputs:dict类型,Module的输出特征,有以下字段:
> >**pooled_output**字段存放句子粒度的特征,可用于文本分类等任务,shape为 \[batch_size, 768\],int64类型;
> >**sequence_output**字段存放字粒度的特征,可用于序列标注等任务,shape为 \[batch_size, seq_len, 768\],int64类型;
>
> program:包含该Module计算图的Program。
```python
def predict(
data,
max_seq_len=128,
batch_size=1,
use_gpu=False)
```
**参数**
* `data`: 待预测数据,格式为\[\[sample\_a\_text\_a, sample\_a\_text\_b\], \[sample\_b\_text\_a, sample\_b\_text\_b\],…,\],其中每个元素都是一个样例,
每个样例可以包含text\_a与text\_b。每个样例文本数量(1个或者2个)需和训练时保持一致。
* `max_seq_len`:模型处理文本的最大长度
* `batch_size`:模型批处理大小
* `use_gpu`:是否使用gpu,默认为False。对于GPU用户,建议开启use_gpu。
**返回**
```python ```python
def get_embedding( def get_embedding(
texts, texts,
use_gpu=False, use_gpu=False
batch_size=1
) )
``` ```
...@@ -46,70 +63,86 @@ def get_embedding( ...@@ -46,70 +63,86 @@ def get_embedding(
**参数** **参数**
> texts:输入文本列表,格式为\[\[sample\_a\_text\_a, sample\_a\_text\_b\], \[sample\_b\_text\_a, sample\_b\_text\_b\],…,\],其中每个元素都是一个样例,每个样例可以包含text\_a与text\_b。 * `texts`:输入文本列表,格式为\[\[sample\_a\_text\_a, sample\_a\_text\_b\], \[sample\_b\_text\_a, sample\_b\_text\_b\],…,\],其中每个元素都是一个样例,每个样例可以包含text\_a与text\_b。
> use_gpu:是否使用gpu,默认为False。对于GPU用户,建议开启use_gpu。 * `use_gpu`:是否使用gpu,默认为False。对于GPU用户,建议开启use_gpu。
**返回** **返回**
> results:list类型,格式为\[\[sample\_a\_pooled\_feature, sample\_a\_seq\_feature\], \[sample\_b\_pooled\_feature, sample\_b\_seq\_feature\],…,\],其中每个元素都是对应样例的特征输出,每个样例都有句子粒度特征pooled\_feature与字粒度特征seq\_feature。 * `results`:list类型,格式为\[\[sample\_a\_pooled\_feature, sample\_a\_seq\_feature\], \[sample\_b\_pooled\_feature, sample\_b\_seq\_feature\],…,\],其中每个元素都是对应样例的特征输出,每个样例都有句子粒度特征pooled\_feature与字粒度特征seq\_feature。
>
```python
def get_params_layer()
```
用于获取参数层信息,该方法与ULMFiTStrategy联用可以严格按照层数设置分层学习率与逐层解冻。 **代码示例**
**参数** ```python
import paddlehub as hub
> 无 data = [
'这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般',
'怀着十分激动的心情放映,可是看着看着发现,在放映完毕后,出现一集米老鼠的动画片',
'作为老的四星酒店,房间依然很整洁,相当不错。机场接机服务很好,可以在车上办理入住手续,节省时间。',
]
label_map = {0: 'negative', 1: 'positive'}
model = hub.Module(
name='ernie_tiny',
version='2.0.0',
task='sequence_classification',
load_checkpoint='/path/to/parameters',
label_map=label_map)
results = model.predict(data, max_seq_len=50, batch_size=1, use_gpu=False)
for idx, text in enumerate(data):
print('Data: {} \t Lable: {}'.format(text, results[idx]))
```
**返回** 参考PaddleHub 文本分类示例。https://github.com/PaddlePaddle/PaddleHub/tree/release/v2.0.0-beta/demo/text_classifcation
> params_layer:dict类型,key为参数名,值为参数所在层数 ## 服务部署
**代码示例** PaddleHub Serving可以部署一个在线获取预训练词向量。
```python ### Step1: 启动PaddleHub Serving
import paddlehub as hub
# Load ernie pretrained model 运行启动命令:
module = hub.Module(name="ernie_tiny")
inputs, outputs, program = module.context(trainable=True, max_seq_len=128)
# Must feed all the tensor of ernie's module need ```shell
input_ids = inputs["input_ids"] $ hub serving start -m ernie_tiny
position_ids = inputs["position_ids"] ```
segment_ids = inputs["segment_ids"]
input_mask = inputs["input_mask"]
# Use "pooled_output" for sentence-level output. 这样就完成了一个获取预训练词向量服务化API的部署,默认端口号为8866。
pooled_output = outputs["pooled_output"]
# Use "sequence_output" for token-level output. **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
sequence_output = outputs["sequence_output"]
# Use "get_embedding" to get embedding result. ### Step2: 发送预测请求
embedding_result = module.get_embedding(texts=[["Sample1_text_a"],["Sample2_text_a","Sample2_text_b"]], use_gpu=True)
# Use "get_params_layer" to get params layer and used to ULMFiTStrategy. 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
params_layer = module.get_params_layer()
strategy = hub.finetune.strategy.ULMFiTStrategy(frz_params_layer=params_layer, dis_params_layer=params_layer)
```
利用该PaddleHub Module Fine-tune示例,可参考[文本分类](https://github.com/PaddlePaddle/PaddleHub/tree/release/v1.8/demo/text_classification)
**Note**:建议该PaddleHub Module在**GPU**环境中运行。如出现显存不足,可以将**batch_size****max_seq_len**调小。 ```python
import requests
import json
# 指定用于预测的文本并生成字典{"text": [text_1, text_2, ... ]}
text = [["今天是个好日子", "天气预报说今天要下雨"], ["这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般"]]
# 以key的方式指定text传入预测方法的时的参数,此例中为"texts"
# 对应本地部署,则为module.get_embedding(texts=text)
data = {"texts": text}
# 发送post请求,content-type类型应指定json方式
url = "http://10.12.121.132:8866/predict/ernie_tiny"
# 指定post请求的headers为application/json方式
headers = {"Content-Type": "application/json"}
r = requests.post(url=url, headers=headers, data=json.dumps(data))
print(r.json())
```
## 查看代码 ## 查看代码
https://github.com/PaddlePaddle/ERNIE https://github.com/PaddlePaddle/ERNIE
## 依赖 ## 依赖
paddlepaddle >= 1.6.2 paddlepaddle >= 2.0.0
paddlehub >= 1.6.0 paddlehub >= 2.0.0
## 更新历史 ## 更新历史
...@@ -124,3 +157,7 @@ paddlehub >= 1.6.0 ...@@ -124,3 +157,7 @@ paddlehub >= 1.6.0
* 1.1.0 * 1.1.0
支持get_embedding与get_params_layer 支持get_embedding与get_params_layer
* 2.0.0
全面升级动态图版本,接口有所变化
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Ernie model."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
from __future__ import absolute_import
import json
import six
import paddle.fluid as fluid
from io import open
from paddlehub.common.logger import logger
from ernie_tiny.model.transformer_encoder import encoder, pre_process_layer
class ErnieConfig(object):
def __init__(self, config_path):
self._config_dict = self._parse(config_path)
def _parse(self, config_path):
try:
with open(config_path, 'r', encoding='utf8') as json_file:
config_dict = json.load(json_file)
except Exception:
raise IOError("Error in parsing Ernie model config file '%s'" % config_path)
else:
return config_dict
def __getitem__(self, key):
return self._config_dict.get(key, None)
def print_config(self):
for arg, value in sorted(six.iteritems(self._config_dict)):
logger.info('%s: %s' % (arg, value))
logger.info('------------------------------------------------')
class ErnieModel(object):
def __init__(self,
src_ids,
position_ids,
sentence_ids,
task_ids,
input_mask,
config,
weight_sharing=True,
use_fp16=False):
self._emb_size = config['hidden_size']
self._n_layer = config['num_hidden_layers']
self._n_head = config['num_attention_heads']
self._voc_size = config['vocab_size']
self._max_position_seq_len = config['max_position_embeddings']
if config['sent_type_vocab_size']: # line 47: return self._config_dict.get(key, None)
self._sent_types = config['sent_type_vocab_size']
else:
self._sent_types = config['type_vocab_size']
self._use_task_id = config['use_task_id']
if self._use_task_id:
self._task_types = config['task_type_vocab_size']
self._hidden_act = config['hidden_act']
self._prepostprocess_dropout = config['hidden_dropout_prob']
self._attention_dropout = config['attention_probs_dropout_prob']
self._weight_sharing = weight_sharing
self._word_emb_name = "word_embedding"
self._pos_emb_name = "pos_embedding"
self._sent_emb_name = "sent_embedding"
self._task_emb_name = "task_embedding"
self._dtype = "float16" if use_fp16 else "float32"
self._emb_dtype = "float32"
# Initialize all weigths by truncated normal initializer, and all biases
# will be initialized by constant zero by default.
self._param_initializer = fluid.initializer.TruncatedNormal(scale=config['initializer_range'])
self._build_model(src_ids, position_ids, sentence_ids, task_ids, input_mask)
def _build_model(self, src_ids, position_ids, sentence_ids, task_ids, input_mask):
# padding id in vocabulary must be set to 0
emb_out = fluid.layers.embedding(input=src_ids,
size=[self._voc_size, self._emb_size],
dtype=self._emb_dtype,
param_attr=fluid.ParamAttr(name=self._word_emb_name,
initializer=self._param_initializer),
is_sparse=False)
position_emb_out = fluid.layers.embedding(input=position_ids,
size=[self._max_position_seq_len, self._emb_size],
dtype=self._emb_dtype,
param_attr=fluid.ParamAttr(name=self._pos_emb_name,
initializer=self._param_initializer))
sent_emb_out = fluid.layers.embedding(sentence_ids,
size=[self._sent_types, self._emb_size],
dtype=self._emb_dtype,
param_attr=fluid.ParamAttr(name=self._sent_emb_name,
initializer=self._param_initializer))
emb_out = emb_out + position_emb_out
emb_out = emb_out + sent_emb_out
if self._use_task_id:
task_emb_out = fluid.layers.embedding(task_ids,
size=[self._task_types, self._emb_size],
dtype=self._emb_dtype,
param_attr=fluid.ParamAttr(name=self._task_emb_name,
initializer=self._param_initializer))
emb_out = emb_out + task_emb_out
emb_out = pre_process_layer(emb_out, 'nd', self._prepostprocess_dropout, name='pre_encoder')
if self._dtype == "float16":
emb_out = fluid.layers.cast(x=emb_out, dtype=self._dtype)
input_mask = fluid.layers.cast(x=input_mask, dtype=self._dtype)
self_attn_mask = fluid.layers.matmul(x=input_mask, y=input_mask, transpose_y=True)
self_attn_mask = fluid.layers.scale(x=self_attn_mask, scale=10000.0, bias=-1.0, bias_after_scale=False)
n_head_self_attn_mask = fluid.layers.stack(x=[self_attn_mask] * self._n_head, axis=1)
n_head_self_attn_mask.stop_gradient = True
self._enc_out = encoder(enc_input=emb_out,
attn_bias=n_head_self_attn_mask,
n_layer=self._n_layer,
n_head=self._n_head,
d_key=self._emb_size // self._n_head,
d_value=self._emb_size // self._n_head,
d_model=self._emb_size,
d_inner_hid=self._emb_size * 4,
prepostprocess_dropout=self._prepostprocess_dropout,
attention_dropout=self._attention_dropout,
relu_dropout=0,
hidden_act=self._hidden_act,
preprocess_cmd="",
postprocess_cmd="dan",
param_initializer=self._param_initializer,
name='encoder')
if self._dtype == "float16":
self._enc_out = fluid.layers.cast(x=self._enc_out, dtype=self._emb_dtype)
def get_sequence_output(self):
return self._enc_out
def get_pooled_output(self):
"""Get the first feature of each sequence for classification"""
next_sent_feat = fluid.layers.slice(input=self._enc_out, axes=[1], starts=[0], ends=[1])
next_sent_feat = fluid.layers.fc(input=next_sent_feat,
size=self._emb_size,
act="tanh",
param_attr=fluid.ParamAttr(name="pooled_fc.w_0",
initializer=self._param_initializer),
bias_attr="pooled_fc.b_0")
return next_sent_feat
def get_lm_output(self, mask_label, mask_pos):
"""Get the loss & accuracy for pretraining"""
mask_pos = fluid.layers.cast(x=mask_pos, dtype='int32')
# extract the first token feature in each sentence
self.next_sent_feat = self.get_pooled_output()
reshaped_emb_out = fluid.layers.reshape(x=self._enc_out, shape=[-1, self._emb_size])
# extract masked tokens' feature
mask_feat = fluid.layers.gather(input=reshaped_emb_out, index=mask_pos)
# transform: fc
mask_trans_feat = fluid.layers.fc(input=mask_feat,
size=self._emb_size,
act=self._hidden_act,
param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0',
initializer=self._param_initializer),
bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
# transform: layer norm
mask_trans_feat = fluid.layers.layer_norm(mask_trans_feat,
begin_norm_axis=len(mask_trans_feat.shape) - 1,
param_attr=fluid.ParamAttr(
name='mask_lm_trans_layer_norm_scale',
initializer=fluid.initializer.Constant(1.)),
bias_attr=fluid.ParamAttr(name='mask_lm_trans_layer_norm_bias',
initializer=fluid.initializer.Constant(1.)))
# transform: layer norm
#mask_trans_feat = pre_process_layer(
# mask_trans_feat, 'n', name='mask_lm_trans')
mask_lm_out_bias_attr = fluid.ParamAttr(name="mask_lm_out_fc.b_0",
initializer=fluid.initializer.Constant(value=0.0))
if self._weight_sharing:
fc_out = fluid.layers.matmul(x=mask_trans_feat,
y=fluid.default_main_program().global_block().var(self._word_emb_name),
transpose_y=True)
fc_out += fluid.layers.create_parameter(shape=[self._voc_size],
dtype=self._emb_dtype,
attr=mask_lm_out_bias_attr,
is_bias=True)
else:
fc_out = fluid.layers.fc(input=mask_trans_feat,
size=self._voc_size,
param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0",
initializer=self._param_initializer),
bias_attr=mask_lm_out_bias_attr)
mask_lm_loss = fluid.layers.softmax_with_cross_entropy(logits=fc_out, label=mask_label)
mean_mask_lm_loss = fluid.layers.mean(mask_lm_loss)
return mean_mask_lm_loss
def get_task_output(self, task, task_labels):
task_fc_out = fluid.layers.fc(input=self.next_sent_feat,
size=task["num_labels"],
param_attr=fluid.ParamAttr(name=task["task_name"] + "_fc.w_0",
initializer=self._param_initializer),
bias_attr=task["task_name"] + "_fc.b_0")
task_loss, task_softmax = fluid.layers.softmax_with_cross_entropy(logits=task_fc_out,
label=task_labels,
return_softmax=True)
task_acc = fluid.layers.accuracy(input=task_softmax, label=task_labels)
mean_task_loss = fluid.layers.mean(task_loss)
return mean_task_loss, task_acc
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Transformer encoder."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from functools import partial
import paddle.fluid as fluid
import paddle.fluid.layers as layers
def multi_head_attention(queries,
keys,
values,
attn_bias,
d_key,
d_value,
d_model,
n_head=1,
dropout_rate=0.,
cache=None,
param_initializer=None,
name='multi_head_att'):
"""
Multi-Head Attention. Note that attn_bias is added to the logit before
computing softmax activiation to mask certain selected positions so that
they will not considered in attention weights.
"""
keys = queries if keys is None else keys
values = keys if values is None else values
if not (len(queries.shape) == len(keys.shape) == len(values.shape) == 3):
raise ValueError("Inputs: quries, keys and values should all be 3-D tensors.")
def __compute_qkv(queries, keys, values, n_head, d_key, d_value):
"""
Add linear projection to queries, keys, and values.
"""
q = layers.fc(input=queries,
size=d_key * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_query_fc.w_0', initializer=param_initializer),
bias_attr=name + '_query_fc.b_0')
k = layers.fc(input=keys,
size=d_key * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_key_fc.w_0', initializer=param_initializer),
bias_attr=name + '_key_fc.b_0')
v = layers.fc(input=values,
size=d_value * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_value_fc.w_0', initializer=param_initializer),
bias_attr=name + '_value_fc.b_0')
return q, k, v
def __split_heads(x, n_head):
"""
Reshape the last dimension of inpunt tensor x so that it becomes two
dimensions and then transpose. Specifically, input a tensor with shape
[bs, max_sequence_length, n_head * hidden_dim] then output a tensor
with shape [bs, n_head, max_sequence_length, hidden_dim].
"""
hidden_size = x.shape[-1]
# The value 0 in shape attr means copying the corresponding dimension
# size of the input as the output dimension size.
reshaped = layers.reshape(x=x, shape=[0, 0, n_head, hidden_size // n_head], inplace=True)
# permuate the dimensions into:
# [batch_size, n_head, max_sequence_len, hidden_size_per_head]
return layers.transpose(x=reshaped, perm=[0, 2, 1, 3])
def __combine_heads(x):
"""
Transpose and then reshape the last two dimensions of inpunt tensor x
so that it becomes one dimension, which is reverse to __split_heads.
"""
if len(x.shape) == 3: return x
if len(x.shape) != 4:
raise ValueError("Input(x) should be a 4-D Tensor.")
trans_x = layers.transpose(x, perm=[0, 2, 1, 3])
# The value 0 in shape attr means copying the corresponding dimension
# size of the input as the output dimension size.
return layers.reshape(x=trans_x, shape=[0, 0, trans_x.shape[2] * trans_x.shape[3]], inplace=True)
def scaled_dot_product_attention(q, k, v, attn_bias, d_key, dropout_rate):
"""
Scaled Dot-Product Attention
"""
scaled_q = layers.scale(x=q, scale=d_key**-0.5)
product = layers.matmul(x=scaled_q, y=k, transpose_y=True)
if attn_bias:
product += attn_bias
weights = layers.softmax(product)
if dropout_rate:
weights = layers.dropout(weights,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
out = layers.matmul(weights, v)
return out
q, k, v = __compute_qkv(queries, keys, values, n_head, d_key, d_value)
if cache is not None: # use cache and concat time steps
# Since the inplace reshape in __split_heads changes the shape of k and
# v, which is the cache input for next time step, reshape the cache
# input from the previous time step first.
k = cache["k"] = layers.concat([layers.reshape(cache["k"], shape=[0, 0, d_model]), k], axis=1)
v = cache["v"] = layers.concat([layers.reshape(cache["v"], shape=[0, 0, d_model]), v], axis=1)
q = __split_heads(q, n_head)
k = __split_heads(k, n_head)
v = __split_heads(v, n_head)
ctx_multiheads = scaled_dot_product_attention(q, k, v, attn_bias, d_key, dropout_rate)
out = __combine_heads(ctx_multiheads)
# Project back to the model size.
proj_out = layers.fc(input=out,
size=d_model,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_output_fc.w_0', initializer=param_initializer),
bias_attr=name + '_output_fc.b_0')
return proj_out
def positionwise_feed_forward(x, d_inner_hid, d_hid, dropout_rate, hidden_act, param_initializer=None, name='ffn'):
"""
Position-wise Feed-Forward Networks.
This module consists of two linear transformations with a ReLU activation
in between, which is applied to each position separately and identically.
"""
hidden = layers.fc(input=x,
size=d_inner_hid,
num_flatten_dims=2,
act=hidden_act,
param_attr=fluid.ParamAttr(name=name + '_fc_0.w_0', initializer=param_initializer),
bias_attr=name + '_fc_0.b_0')
if dropout_rate:
hidden = layers.dropout(hidden,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
out = layers.fc(input=hidden,
size=d_hid,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_fc_1.w_0', initializer=param_initializer),
bias_attr=name + '_fc_1.b_0')
return out
def pre_post_process_layer(prev_out, out, process_cmd, dropout_rate=0., name=''):
"""
Add residual connection, layer normalization and droput to the out tensor
optionally according to the value of process_cmd.
This will be used before or after multi-head attention and position-wise
feed-forward networks.
"""
for cmd in process_cmd:
if cmd == "a": # add residual connection
out = out + prev_out if prev_out else out
elif cmd == "n": # add layer normalization
out_dtype = out.dtype
if out_dtype == fluid.core.VarDesc.VarType.FP16:
out = layers.cast(x=out, dtype="float32")
out = layers.layer_norm(out,
begin_norm_axis=len(out.shape) - 1,
param_attr=fluid.ParamAttr(name=name + '_layer_norm_scale',
initializer=fluid.initializer.Constant(1.)),
bias_attr=fluid.ParamAttr(name=name + '_layer_norm_bias',
initializer=fluid.initializer.Constant(0.)))
if out_dtype == fluid.core.VarDesc.VarType.FP16:
out = layers.cast(x=out, dtype="float16")
elif cmd == "d": # add dropout
if dropout_rate:
out = layers.dropout(out,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
return out
pre_process_layer = partial(pre_post_process_layer, None)
post_process_layer = pre_post_process_layer
def encoder_layer(enc_input,
attn_bias,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd="n",
postprocess_cmd="da",
param_initializer=None,
name=''):
"""The encoder layers that can be stacked to form a deep encoder.
This module consits of a multi-head (self) attention followed by
position-wise feed-forward networks and both the two components companied
with the post_process_layer to add residual connection, layer normalization
and droput.
"""
attn_output = multi_head_attention(pre_process_layer(enc_input,
preprocess_cmd,
prepostprocess_dropout,
name=name + '_pre_att'),
None,
None,
attn_bias,
d_key,
d_value,
d_model,
n_head,
attention_dropout,
param_initializer=param_initializer,
name=name + '_multi_head_att')
attn_output = post_process_layer(enc_input,
attn_output,
postprocess_cmd,
prepostprocess_dropout,
name=name + '_post_att')
ffd_output = positionwise_feed_forward(pre_process_layer(attn_output,
preprocess_cmd,
prepostprocess_dropout,
name=name + '_pre_ffn'),
d_inner_hid,
d_model,
relu_dropout,
hidden_act,
param_initializer=param_initializer,
name=name + '_ffn')
return post_process_layer(attn_output, ffd_output, postprocess_cmd, prepostprocess_dropout, name=name + '_post_ffn')
def encoder(enc_input,
attn_bias,
n_layer,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd="n",
postprocess_cmd="da",
param_initializer=None,
name=''):
"""
The encoder is composed of a stack of identical layers returned by calling
encoder_layer.
"""
for i in range(n_layer):
enc_output = encoder_layer(enc_input,
attn_bias,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd,
postprocess_cmd,
param_initializer=param_initializer,
name=name + '_layer_' + str(i))
enc_input = enc_output
enc_output = pre_process_layer(enc_output, preprocess_cmd, prepostprocess_dropout, name="post_encoder")
return enc_output
# coding:utf-8 # Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License" # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
# You may obtain a copy of the License at # You may obtain a copy of the License at
# #
...@@ -12,68 +11,219 @@ ...@@ -12,68 +11,219 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
from __future__ import absolute_import from typing import Dict, List, Optional, Union, Tuple
from __future__ import division
from __future__ import print_function
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os import os
from paddlehub import TransformerModule from paddle.dataset.common import DATA_HOME
from paddlehub.module.module import moduleinfo import paddle
import paddle.nn as nn
import paddle.nn.functional as F
from ernie_tiny.model.ernie import ErnieModel, ErnieConfig from paddlehub import ErnieTinyTokenizer
from paddlehub.module.modeling_ernie import ErnieModel, ErnieForSequenceClassification
from paddlehub.module.module import moduleinfo, serving
from paddlehub.utils.log import logger
from paddlehub.utils.utils import download
@moduleinfo( @moduleinfo(
name="ernie_tiny", name="ernie_tiny",
version="1.1.0", version="2.0.0",
summary="Baidu's ERNIE-tiny, Enhanced Representation through kNowledge IntEgration, tiny version, max_seq_len=512", summary="Baidu's ERNIE-tiny, Enhanced Representation through kNowledge IntEgration, tiny version, max_seq_len=512",
author="baidu-nlp", author="paddlepaddle",
author_email="", author_email="",
type="nlp/semantic_model", type="nlp/semantic_model")
) class ErnieTiny(nn.Layer):
class ErnieTiny(TransformerModule): """
def _initialize(self): Ernie model
ernie_config_path = os.path.join(self.directory, "assets", "ernie_tiny_config.json") """
self.ernie_config = ErnieConfig(ernie_config_path)
self.MAX_SEQ_LEN = 512 def __init__(
self.params_path = os.path.join(self.directory, "assets", "params") self,
self.vocab_path = os.path.join(self.directory, "assets", "vocab.txt") task=None,
self.spm_path = os.path.join(self.directory, "assets", "spm_cased_simp_sampled.model") load_checkpoint=None,
self.word_dict_path = os.path.join(self.directory, "assets", "dict.wordseg.pickle") label_map=None,
):
def net(self, input_ids, position_ids, segment_ids, input_mask): super(ErnieTiny, self).__init__()
# TODO(zhangxuefei): add token_classification task
if task == 'sequence_classification':
self.model = ErnieForSequenceClassification.from_pretrained(pretrained_model_name_or_path='ernie_tiny')
self.criterion = paddle.nn.loss.CrossEntropyLoss()
self.metric = paddle.metric.Accuracy(name='acc_accumulation')
elif task is None:
self.model = ErnieModel.from_pretrained(pretrained_model_name_or_path='ernie_tiny')
else:
raise RuntimeError("Unknown task %s, task should be sequence_classification" % task)
self.task = task
self.label_map = label_map
if load_checkpoint is not None and os.path.isfile(load_checkpoint):
state_dict = paddle.load(load_checkpoint)
self.set_state_dict(state_dict)
logger.info('Loaded parameters from %s' % os.path.abspath(load_checkpoint))
def forward(self, input_ids, token_type_ids=None, position_ids=None, attention_mask=None, labels=None):
result = self.model(input_ids, token_type_ids, position_ids, attention_mask)
if self.task is not None:
logits = result
probs = F.softmax(logits, axis=1)
if labels is not None:
loss = self.criterion(logits, labels)
correct = self.metric.compute(probs, labels)
acc = self.metric.update(correct)
return probs, loss, acc
return probs
else:
sequence_output, pooled_output = result
return sequence_output, pooled_output
def get_vocab_path(self):
"""
Gets the path of the module vocabulary path.
"""
save_path = os.path.join(DATA_HOME, 'ernie_tiny', 'vocab.txt')
if not os.path.exists(save_path) or not os.path.isfile(save_path):
url = "https://paddlenlp.bj.bcebos.com/models/transformers/ernie_tiny/vocab.txt"
download(url, os.path.join(DATA_HOME, 'ernie_tiny'))
return save_path
def get_tokenizer(self, tokenize_chinese_chars=True):
"""
Gets the tokenizer that is customized for this module.
Args:
tokenize_chinese_chars (:obj: bool , defaults to :obj: True):
Whether to tokenize chinese characters or not.
Returns:
tokenizer (:obj:BertTokenizer) : The tokenizer which was customized for this module.
"""
spm_path = os.path.join(DATA_HOME, 'ernie_tiny', 'spm_cased_simp_sampled.model')
if not os.path.exists(spm_path) or not os.path.isfile(spm_path):
url = "https://paddlenlp.bj.bcebos.com/models/transformers/ernie_tiny/spm_cased_simp_sampled.model"
download(url, os.path.join(DATA_HOME, 'ernie_tiny'))
word_dict_path = os.path.join(DATA_HOME, 'ernie_tiny', 'dict.wordseg.pickle')
if not os.path.exists(word_dict_path) or not os.path.isfile(word_dict_path):
url = "https://paddlenlp.bj.bcebos.com/models/transformers/ernie_tiny/dict.wordseg.pickle"
download(url, os.path.join(DATA_HOME, 'ernie_tiny'))
return ErnieTinyTokenizer(self.get_vocab_path(), spm_path, word_dict_path)
def training_step(self, batch: List[paddle.Tensor], batch_idx: int):
"""
One step for training, which should be called as forward computation.
Args:
batch(:obj:List[paddle.Tensor]): The one batch data, which contains the model needed,
such as input_ids, sent_ids, pos_ids, input_mask and labels.
batch_idx(int): The index of batch.
Returns:
results(:obj: Dict) : The model outputs, such as loss and metrics.
""" """
create neural network. predictions, avg_loss, acc = self(input_ids=batch[0], token_type_ids=batch[1], labels=batch[2])
return {'loss': avg_loss, 'metrics': {'acc': acc}}
def validation_step(self, batch: List[paddle.Tensor], batch_idx: int):
"""
One step for validation, which should be called as forward computation.
Args:
batch(:obj:List[paddle.Tensor]): The one batch data, which contains the model needed,
such as input_ids, sent_ids, pos_ids, input_mask and labels.
batch_idx(int): The index of batch.
Returns:
results(:obj: Dict) : The model outputs, such as metrics.
"""
predictions, avg_loss, acc = self(input_ids=batch[0], token_type_ids=batch[1], labels=batch[2])
return {'metrics': {'acc': acc}}
def predict(self, data, max_seq_len=128, batch_size=1, use_gpu=False):
"""
Predicts the data labels.
Args: Args:
input_ids (tensor): the word ids. data (obj:`List(str)`): The processed data whose each element is the raw text.
position_ids (tensor): the position ids. max_seq_len (:obj:`int`, `optional`, defaults to :int:`None`):
segment_ids (tensor): the segment ids. If set to a number, will limit the total sequence returned so that it has a maximum length.
input_mask (tensor): the padding mask. batch_size(obj:`int`, defaults to 1): The number of batch.
use_gpu(obj:`bool`, defaults to `False`): Whether to use gpu to run or not.
Returns: Returns:
pooled_output (tensor): sentence-level output for classification task. results(obj:`list`): All the predictions labels.
sequence_output (tensor): token-level output for sequence task.
""" """
self.ernie_config._config_dict['use_task_id'] = False # TODO(zhangxuefei): add task token_classification task predict.
ernie = ErnieModel(src_ids=input_ids, if self.task not in ['sequence_classification']:
position_ids=position_ids, raise RuntimeError("The predict method is for sequence_classification task, but got task %s." % self.task)
sentence_ids=segment_ids,
task_ids=None, paddle.set_device('gpu') if use_gpu else paddle.set_device('cpu')
input_mask=input_mask, tokenizer = self.get_tokenizer()
config=self.ernie_config,
use_fp16=False) examples = []
pooled_output = ernie.get_pooled_output() for text in data:
sequence_output = ernie.get_sequence_output() if len(text) == 1:
return pooled_output, sequence_output encoded_inputs = tokenizer.encode(text[0], text_pair=None, max_seq_len=max_seq_len)
elif len(text) == 2:
def param_prefix(self): encoded_inputs = tokenizer.encode(text[0], text_pair=text[1], max_seq_len=max_seq_len)
return "@HUB_ernie-tiny@" else:
raise RuntimeError(
'The input text must have one or two sequence, but got %d. Please check your inputs.' % len(text))
if __name__ == '__main__': examples.append((encoded_inputs['input_ids'], encoded_inputs['segment_ids']))
test_module = ErnieTiny()
def _batchify_fn(batch):
input_ids = [entry[0] for entry in batch]
segment_ids = [entry[1] for entry in batch]
return input_ids, segment_ids
# Seperates data into some batches.
batches = []
one_batch = []
for example in examples:
one_batch.append(example)
if len(one_batch) == batch_size:
batches.append(one_batch)
one_batch = []
if one_batch:
# The last batch whose size is less than the config batch_size setting.
batches.append(one_batch)
results = []
self.eval()
for batch in batches:
input_ids, segment_ids = _batchify_fn(batch)
input_ids = paddle.to_tensor(input_ids)
segment_ids = paddle.to_tensor(segment_ids)
# TODO(zhangxuefei): add task token_classification postprocess after prediction.
if self.task == 'sequence_classification':
probs = self(input_ids, segment_ids)
idx = paddle.argmax(probs, axis=1).numpy()
idx = idx.tolist()
labels = [self.label_map[i] for i in idx]
results.extend(labels)
return results
@serving
def get_embedding(self, texts, use_gpu=False):
if self.task is not None:
raise RuntimeError("The get_embedding method is only valid when task is None, but got task %s" % self.task)
paddle.set_device('gpu') if use_gpu else paddle.set_device('cpu')
tokenizer = self.get_tokenizer()
results = []
for text in texts:
if len(text) == 1:
encoded_inputs = tokenizer.encode(text[0], text_pair=None, pad_to_max_seq_len=False)
elif len(text) == 2:
encoded_inputs = tokenizer.encode(text[0], text_pair=text[1], pad_to_max_seq_len=False)
else:
raise RuntimeError(
'The input text must have one or two sequence, but got %d. Please check your inputs.' % len(text))
input_ids = paddle.to_tensor(encoded_inputs['input_ids']).unsqueeze(0)
segment_ids = paddle.to_tensor(encoded_inputs['segment_ids']).unsqueeze(0)
sequence_output, pooled_output = self(input_ids, segment_ids)
sequence_output = sequence_output.squeeze(0)
pooled_output = pooled_output.squeeze(0)
results.append((sequence_output.numpy().tolist(), pooled_output.numpy().tolist()))
return results
```shell ```shell
$ hub install ernie_v2_eng_base==1.1.0 $ hub install ernie_v2_eng_base==2.0.0
``` ```
<p align="center"> <p align="center">
<img src="https://bj.bcebos.com/paddlehub/paddlehub-img/ernie2.0_arch.png" hspace='10'/> <br /> <img src="https://bj.bcebos.com/paddlehub/paddlehub-img/ernie2.0_arch.png" hspace='10'/> <br />
</p> </p>
...@@ -12,39 +14,44 @@ $ hub install ernie_v2_eng_base==1.1.0 ...@@ -12,39 +14,44 @@ $ hub install ernie_v2_eng_base==1.1.0
更多详情请参考[ERNIE论文](https://arxiv.org/abs/1907.12412) 更多详情请参考[ERNIE论文](https://arxiv.org/abs/1907.12412)
## API ## API
```python ```python
def context( def __init__(
trainable=True, task=None,
max_seq_len=128 load_checkpoint=None,
) label_map=None)
``` ```
用于获取Module的上下文信息,得到输入、输出以及预训练的Paddle Program副本
创建Module对象(动态图组网版本)。
**参数** **参数**
> trainable:设置为True时,Module中的参数在Fine-tune时也会随之训练,否则保持不变。
> max_seq_len:ERNIE模型的最大序列长度,若序列长度不足,会通过padding方式补到**max_seq_len**, 若序列长度大于该值,则会以截断方式让序列长度为**max_seq_len**,max_seq_len可取值范围为0~512;
**返回** * `task`: 任务名称,可为`sequence_classification`
> inputs:dict类型,有以下字段: * `load_checkpoint`:使用PaddleHub Fine-tune api训练保存的模型参数文件路径。
> >**input_ids**字段存放Token Embedding,shape为\[batch_size, max_seq_len\],int64类型; * `label_map`:预测时的类别映射表。
> >**position_ids**字段存放Position Embedding,shape为\[batch_size, max_seq_len\],int64类型;
> >**segment_ids**字段存放Sentence Embedding,shape为\[batch_size, max_seq_len\],int64类型;
> >**input_mask**字段存放token是否为padding的标识,shape为\[batch_size, max_seq_len\],int64类型;
>
> outputs:dict类型,Module的输出特征,有以下字段:
> >**pooled_output**字段存放句子粒度的特征,可用于文本分类等任务,shape为 \[batch_size, 768\],int64类型;
> >**sequence_output**字段存放字粒度的特征,可用于序列标注等任务,shape为 \[batch_size, seq_len, 768\],int64类型;
>
> program:包含该Module计算图的Program。
```python
def predict(
data,
max_seq_len=128,
batch_size=1,
use_gpu=False)
```
**参数**
* `data`: 待预测数据,格式为\[\[sample\_a\_text\_a, sample\_a\_text\_b\], \[sample\_b\_text\_a, sample\_b\_text\_b\],…,\],其中每个元素都是一个样例,
每个样例可以包含text\_a与text\_b。每个样例文本数量(1个或者2个)需和训练时保持一致。
* `max_seq_len`:模型处理文本的最大长度
* `batch_size`:模型批处理大小
* `use_gpu`:是否使用gpu,默认为False。对于GPU用户,建议开启use_gpu。
**返回**
```python ```python
def get_embedding( def get_embedding(
texts, texts,
use_gpu=False, use_gpu=False
batch_size=1
) )
``` ```
...@@ -52,70 +59,86 @@ def get_embedding( ...@@ -52,70 +59,86 @@ def get_embedding(
**参数** **参数**
> texts:输入文本列表,格式为\[\[sample\_a\_text\_a, sample\_a\_text\_b\], \[sample\_b\_text\_a, sample\_b\_text\_b\],…,\],其中每个元素都是一个样例,每个样例可以包含text\_a与text\_b。 * `texts`:输入文本列表,格式为\[\[sample\_a\_text\_a, sample\_a\_text\_b\], \[sample\_b\_text\_a, sample\_b\_text\_b\],…,\],其中每个元素都是一个样例,每个样例可以包含text\_a与text\_b。
> use_gpu:是否使用gpu,默认为False。对于GPU用户,建议开启use_gpu。 * `use_gpu`:是否使用gpu,默认为False。对于GPU用户,建议开启use_gpu。
**返回** **返回**
> results:list类型,格式为\[\[sample\_a\_pooled\_feature, sample\_a\_seq\_feature\], \[sample\_b\_pooled\_feature, sample\_b\_seq\_feature\],…,\],其中每个元素都是对应样例的特征输出,每个样例都有句子粒度特征pooled\_feature与字粒度特征seq\_feature。 * `results`:list类型,格式为\[\[sample\_a\_pooled\_feature, sample\_a\_seq\_feature\], \[sample\_b\_pooled\_feature, sample\_b\_seq\_feature\],…,\],其中每个元素都是对应样例的特征输出,每个样例都有句子粒度特征pooled\_feature与字粒度特征seq\_feature。
>
```python
def get_params_layer()
```
用于获取参数层信息,该方法与ULMFiTStrategy联用可以严格按照层数设置分层学习率与逐层解冻。 **代码示例**
**参数** ```python
import paddlehub as hub
> 无 data = [
'这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般',
'怀着十分激动的心情放映,可是看着看着发现,在放映完毕后,出现一集米老鼠的动画片',
'作为老的四星酒店,房间依然很整洁,相当不错。机场接机服务很好,可以在车上办理入住手续,节省时间。',
]
label_map = {0: 'negative', 1: 'positive'}
model = hub.Module(
name='ernie_v2_eng_base',
version='2.0.0',
task='sequence_classification',
load_checkpoint='/path/to/parameters',
label_map=label_map)
results = model.predict(data, max_seq_len=50, batch_size=1, use_gpu=False)
for idx, text in enumerate(data):
print('Data: {} \t Lable: {}'.format(text, results[idx]))
```
**返回** 参考PaddleHub 文本分类示例。https://github.com/PaddlePaddle/PaddleHub/tree/release/v2.0.0-beta/demo/text_classifcation
> params_layer:dict类型,key为参数名,值为参数所在层数 ## 服务部署
**代码示例** PaddleHub Serving可以部署一个在线获取预训练词向量。
```python ### Step1: 启动PaddleHub Serving
import paddlehub as hub
# Load ernie pretrained model 运行启动命令:
module = hub.Module(name="ernie_v2_eng_base")
inputs, outputs, program = module.context(trainable=True, max_seq_len=128)
# Must feed all the tensor of ernie's module need ```shell
input_ids = inputs["input_ids"] $ hub serving start -m ernie_v2_eng_base
position_ids = inputs["position_ids"] ```
segment_ids = inputs["segment_ids"]
input_mask = inputs["input_mask"]
# Use "pooled_output" for sentence-level output. 这样就完成了一个获取预训练词向量服务化API的部署,默认端口号为8866。
pooled_output = outputs["pooled_output"]
# Use "sequence_output" for token-level output. **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
sequence_output = outputs["sequence_output"]
# Use "get_embedding" to get embedding result. ### Step2: 发送预测请求
embedding_result = module.get_embedding(texts=[["Sample1_text_a"],["Sample2_text_a","Sample2_text_b"]], use_gpu=True)
# Use "get_params_layer" to get params layer and used to ULMFiTStrategy. 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
params_layer = module.get_params_layer()
strategy = hub.finetune.strategy.ULMFiTStrategy(frz_params_layer=params_layer, dis_params_layer=params_layer)
```
利用该PaddleHub Module Fine-tune示例,可参考[文本分类](https://github.com/PaddlePaddle/PaddleHub/tree/release/v1.4.0/demo/text-classification)
**Note**:建议该PaddleHub Module在**GPU**环境中运行。如出现显存不足,可以将**batch_size****max_seq_len**调小。 ```python
import requests
import json
# 指定用于预测的文本并生成字典{"text": [text_1, text_2, ... ]}
text = [["今天是个好日子", "天气预报说今天要下雨"], ["这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般"]]
# 以key的方式指定text传入预测方法的时的参数,此例中为"texts"
# 对应本地部署,则为module.get_embedding(texts=text)
data = {"texts": text}
# 发送post请求,content-type类型应指定json方式
url = "http://10.12.121.132:8866/predict/ernie_v2_eng_base"
# 指定post请求的headers为application/json方式
headers = {"Content-Type": "application/json"}
r = requests.post(url=url, headers=headers, data=json.dumps(data))
print(r.json())
```
## 查看代码 ## 查看代码
https://github.com/PaddlePaddle/ERNIE https://github.com/PaddlePaddle/ERNIE
## 依赖 ## 依赖
paddlepaddle >= 1.6.2 paddlepaddle >= 2.0.0
paddlehub >= 1.6.0 paddlehub >= 2.0.0
## 更新历史 ## 更新历史
...@@ -130,3 +153,7 @@ paddlehub >= 1.6.0 ...@@ -130,3 +153,7 @@ paddlehub >= 1.6.0
* 1.1.0 * 1.1.0
支持get_embedding与get_params_layer 支持get_embedding与get_params_layer
* 2.0.0
全面升级动态图版本,接口有所变化
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Ernie model."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
from __future__ import absolute_import
import json
import six
import paddle.fluid as fluid
from io import open
from paddlehub.common.logger import logger
from ernie_v2_eng_base.model.transformer_encoder import encoder, pre_process_layer
class ErnieConfig(object):
def __init__(self, config_path):
self._config_dict = self._parse(config_path)
def _parse(self, config_path):
try:
with open(config_path, 'r', encoding='utf8') as json_file:
config_dict = json.load(json_file)
except Exception:
raise IOError("Error in parsing Ernie model config file '%s'" % config_path)
else:
return config_dict
def __getitem__(self, key):
return self._config_dict.get(key, None)
def print_config(self):
for arg, value in sorted(six.iteritems(self._config_dict)):
logger.info('%s: %s' % (arg, value))
logger.info('------------------------------------------------')
class ErnieModel(object):
def __init__(self,
src_ids,
position_ids,
sentence_ids,
task_ids,
input_mask,
config,
weight_sharing=True,
use_fp16=False):
self._emb_size = config['hidden_size']
self._n_layer = config['num_hidden_layers']
self._n_head = config['num_attention_heads']
self._voc_size = config['vocab_size']
self._max_position_seq_len = config['max_position_embeddings']
if config['sent_type_vocab_size']:
self._sent_types = config['sent_type_vocab_size']
else:
self._sent_types = config['type_vocab_size']
self._use_task_id = config['use_task_id']
if self._use_task_id:
self._task_types = config['task_type_vocab_size']
self._hidden_act = config['hidden_act']
self._prepostprocess_dropout = config['hidden_dropout_prob']
self._attention_dropout = config['attention_probs_dropout_prob']
self._weight_sharing = weight_sharing
self._word_emb_name = "word_embedding"
self._pos_emb_name = "pos_embedding"
self._sent_emb_name = "sent_embedding"
self._task_emb_name = "task_embedding"
self._dtype = "float16" if use_fp16 else "float32"
self._emb_dtype = "float32"
# Initialize all weigths by truncated normal initializer, and all biases
# will be initialized by constant zero by default.
self._param_initializer = fluid.initializer.TruncatedNormal(scale=config['initializer_range'])
self._build_model(src_ids, position_ids, sentence_ids, task_ids, input_mask)
def _build_model(self, src_ids, position_ids, sentence_ids, task_ids, input_mask):
# padding id in vocabulary must be set to 0
emb_out = fluid.layers.embedding(input=src_ids,
size=[self._voc_size, self._emb_size],
dtype=self._emb_dtype,
param_attr=fluid.ParamAttr(name=self._word_emb_name,
initializer=self._param_initializer),
is_sparse=False)
position_emb_out = fluid.layers.embedding(input=position_ids,
size=[self._max_position_seq_len, self._emb_size],
dtype=self._emb_dtype,
param_attr=fluid.ParamAttr(name=self._pos_emb_name,
initializer=self._param_initializer))
sent_emb_out = fluid.layers.embedding(sentence_ids,
size=[self._sent_types, self._emb_size],
dtype=self._emb_dtype,
param_attr=fluid.ParamAttr(name=self._sent_emb_name,
initializer=self._param_initializer))
emb_out = emb_out + position_emb_out
emb_out = emb_out + sent_emb_out
if self._use_task_id:
task_emb_out = fluid.layers.embedding(task_ids,
size=[self._task_types, self._emb_size],
dtype=self._emb_dtype,
param_attr=fluid.ParamAttr(name=self._task_emb_name,
initializer=self._param_initializer))
emb_out = emb_out + task_emb_out
emb_out = pre_process_layer(emb_out, 'nd', self._prepostprocess_dropout, name='pre_encoder')
if self._dtype == "float16":
emb_out = fluid.layers.cast(x=emb_out, dtype=self._dtype)
input_mask = fluid.layers.cast(x=input_mask, dtype=self._dtype)
self_attn_mask = fluid.layers.matmul(x=input_mask, y=input_mask, transpose_y=True)
self_attn_mask = fluid.layers.scale(x=self_attn_mask, scale=10000.0, bias=-1.0, bias_after_scale=False)
n_head_self_attn_mask = fluid.layers.stack(x=[self_attn_mask] * self._n_head, axis=1)
n_head_self_attn_mask.stop_gradient = True
self._enc_out = encoder(enc_input=emb_out,
attn_bias=n_head_self_attn_mask,
n_layer=self._n_layer,
n_head=self._n_head,
d_key=self._emb_size // self._n_head,
d_value=self._emb_size // self._n_head,
d_model=self._emb_size,
d_inner_hid=self._emb_size * 4,
prepostprocess_dropout=self._prepostprocess_dropout,
attention_dropout=self._attention_dropout,
relu_dropout=0,
hidden_act=self._hidden_act,
preprocess_cmd="",
postprocess_cmd="dan",
param_initializer=self._param_initializer,
name='encoder')
def get_sequence_output(self):
return self._enc_out
def get_pooled_output(self):
"""Get the first feature of each sequence for classification"""
next_sent_feat = fluid.layers.slice(input=self._enc_out, axes=[1], starts=[0], ends=[1])
if self._dtype == "float16":
next_sent_feat = fluid.layers.cast(x=next_sent_feat, dtype=self._emb_dtype)
next_sent_feat = fluid.layers.fc(input=next_sent_feat,
size=self._emb_size,
act="tanh",
param_attr=fluid.ParamAttr(name="pooled_fc.w_0",
initializer=self._param_initializer),
bias_attr="pooled_fc.b_0")
return next_sent_feat
def get_lm_output(self, mask_label, mask_pos):
"""Get the loss & accuracy for pretraining"""
mask_pos = fluid.layers.cast(x=mask_pos, dtype='int32')
# extract the first token feature in each sentence
self.next_sent_feat = self.get_pooled_output()
reshaped_emb_out = fluid.layers.reshape(x=self._enc_out, shape=[-1, self._emb_size])
# extract masked tokens' feature
mask_feat = fluid.layers.gather(input=reshaped_emb_out, index=mask_pos)
if self._dtype == "float16":
mask_feat = fluid.layers.cast(x=mask_feat, dtype=self._emb_dtype)
# transform: fc
mask_trans_feat = fluid.layers.fc(input=mask_feat,
size=self._emb_size,
act=self._hidden_act,
param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0',
initializer=self._param_initializer),
bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
# transform: layer norm
mask_trans_feat = fluid.layers.layer_norm(mask_trans_feat,
begin_norm_axis=len(mask_trans_feat.shape) - 1,
param_attr=fluid.ParamAttr(
name='mask_lm_trans_layer_norm_scale',
initializer=fluid.initializer.Constant(1.)),
bias_attr=fluid.ParamAttr(name='mask_lm_trans_layer_norm_bias',
initializer=fluid.initializer.Constant(1.)))
# transform: layer norm
#mask_trans_feat = pre_process_layer(
# mask_trans_feat, 'n', name='mask_lm_trans')
mask_lm_out_bias_attr = fluid.ParamAttr(name="mask_lm_out_fc.b_0",
initializer=fluid.initializer.Constant(value=0.0))
if self._weight_sharing:
fc_out = fluid.layers.matmul(x=mask_trans_feat,
y=fluid.default_main_program().global_block().var(self._word_emb_name),
transpose_y=True)
fc_out += fluid.layers.create_parameter(shape=[self._voc_size],
dtype=self._emb_dtype,
attr=mask_lm_out_bias_attr,
is_bias=True)
else:
fc_out = fluid.layers.fc(input=mask_trans_feat,
size=self._voc_size,
param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0",
initializer=self._param_initializer),
bias_attr=mask_lm_out_bias_attr)
mask_lm_loss = fluid.layers.softmax_with_cross_entropy(logits=fc_out, label=mask_label)
mean_mask_lm_loss = fluid.layers.mean(mask_lm_loss)
return mean_mask_lm_loss
def get_task_output(self, task, task_labels):
task_fc_out = fluid.layers.fc(input=self.next_sent_feat,
size=task["num_labels"],
param_attr=fluid.ParamAttr(name=task["task_name"] + "_fc.w_0",
initializer=self._param_initializer),
bias_attr=task["task_name"] + "_fc.b_0")
task_loss, task_softmax = fluid.layers.softmax_with_cross_entropy(logits=task_fc_out,
label=task_labels,
return_softmax=True)
task_acc = fluid.layers.accuracy(input=task_softmax, label=task_labels)
mean_task_loss = fluid.layers.mean(task_loss)
return mean_task_loss, task_acc
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Transformer encoder."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from functools import partial
import paddle.fluid as fluid
import paddle.fluid.layers as layers
def multi_head_attention(queries,
keys,
values,
attn_bias,
d_key,
d_value,
d_model,
n_head=1,
dropout_rate=0.,
cache=None,
param_initializer=None,
name='multi_head_att'):
"""
Multi-Head Attention. Note that attn_bias is added to the logit before
computing softmax activiation to mask certain selected positions so that
they will not considered in attention weights.
"""
keys = queries if keys is None else keys
values = keys if values is None else values
if not (len(queries.shape) == len(keys.shape) == len(values.shape) == 3):
raise ValueError("Inputs: quries, keys and values should all be 3-D tensors.")
def __compute_qkv(queries, keys, values, n_head, d_key, d_value):
"""
Add linear projection to queries, keys, and values.
"""
q = layers.fc(input=queries,
size=d_key * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_query_fc.w_0', initializer=param_initializer),
bias_attr=name + '_query_fc.b_0')
k = layers.fc(input=keys,
size=d_key * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_key_fc.w_0', initializer=param_initializer),
bias_attr=name + '_key_fc.b_0')
v = layers.fc(input=values,
size=d_value * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_value_fc.w_0', initializer=param_initializer),
bias_attr=name + '_value_fc.b_0')
return q, k, v
def __split_heads(x, n_head):
"""
Reshape the last dimension of inpunt tensor x so that it becomes two
dimensions and then transpose. Specifically, input a tensor with shape
[bs, max_sequence_length, n_head * hidden_dim] then output a tensor
with shape [bs, n_head, max_sequence_length, hidden_dim].
"""
hidden_size = x.shape[-1]
# The value 0 in shape attr means copying the corresponding dimension
# size of the input as the output dimension size.
reshaped = layers.reshape(x=x, shape=[0, 0, n_head, hidden_size // n_head], inplace=True)
# permuate the dimensions into:
# [batch_size, n_head, max_sequence_len, hidden_size_per_head]
return layers.transpose(x=reshaped, perm=[0, 2, 1, 3])
def __combine_heads(x):
"""
Transpose and then reshape the last two dimensions of inpunt tensor x
so that it becomes one dimension, which is reverse to __split_heads.
"""
if len(x.shape) == 3: return x
if len(x.shape) != 4:
raise ValueError("Input(x) should be a 4-D Tensor.")
trans_x = layers.transpose(x, perm=[0, 2, 1, 3])
# The value 0 in shape attr means copying the corresponding dimension
# size of the input as the output dimension size.
return layers.reshape(x=trans_x, shape=[0, 0, trans_x.shape[2] * trans_x.shape[3]], inplace=True)
def scaled_dot_product_attention(q, k, v, attn_bias, d_key, dropout_rate):
"""
Scaled Dot-Product Attention
"""
scaled_q = layers.scale(x=q, scale=d_key**-0.5)
product = layers.matmul(x=scaled_q, y=k, transpose_y=True)
if attn_bias:
product += attn_bias
weights = layers.softmax(product)
if dropout_rate:
weights = layers.dropout(weights,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
out = layers.matmul(weights, v)
return out
q, k, v = __compute_qkv(queries, keys, values, n_head, d_key, d_value)
if cache is not None: # use cache and concat time steps
# Since the inplace reshape in __split_heads changes the shape of k and
# v, which is the cache input for next time step, reshape the cache
# input from the previous time step first.
k = cache["k"] = layers.concat([layers.reshape(cache["k"], shape=[0, 0, d_model]), k], axis=1)
v = cache["v"] = layers.concat([layers.reshape(cache["v"], shape=[0, 0, d_model]), v], axis=1)
q = __split_heads(q, n_head)
k = __split_heads(k, n_head)
v = __split_heads(v, n_head)
ctx_multiheads = scaled_dot_product_attention(q, k, v, attn_bias, d_key, dropout_rate)
out = __combine_heads(ctx_multiheads)
# Project back to the model size.
proj_out = layers.fc(input=out,
size=d_model,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_output_fc.w_0', initializer=param_initializer),
bias_attr=name + '_output_fc.b_0')
return proj_out
def positionwise_feed_forward(x, d_inner_hid, d_hid, dropout_rate, hidden_act, param_initializer=None, name='ffn'):
"""
Position-wise Feed-Forward Networks.
This module consists of two linear transformations with a ReLU activation
in between, which is applied to each position separately and identically.
"""
hidden = layers.fc(input=x,
size=d_inner_hid,
num_flatten_dims=2,
act=hidden_act,
param_attr=fluid.ParamAttr(name=name + '_fc_0.w_0', initializer=param_initializer),
bias_attr=name + '_fc_0.b_0')
if dropout_rate:
hidden = layers.dropout(hidden,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
out = layers.fc(input=hidden,
size=d_hid,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_fc_1.w_0', initializer=param_initializer),
bias_attr=name + '_fc_1.b_0')
return out
def pre_post_process_layer(prev_out, out, process_cmd, dropout_rate=0., name=''):
"""
Add residual connection, layer normalization and droput to the out tensor
optionally according to the value of process_cmd.
This will be used before or after multi-head attention and position-wise
feed-forward networks.
"""
for cmd in process_cmd:
if cmd == "a": # add residual connection
out = out + prev_out if prev_out else out
elif cmd == "n": # add layer normalization
out_dtype = out.dtype
if out_dtype == fluid.core.VarDesc.VarType.FP16:
out = layers.cast(x=out, dtype="float32")
out = layers.layer_norm(out,
begin_norm_axis=len(out.shape) - 1,
param_attr=fluid.ParamAttr(name=name + '_layer_norm_scale',
initializer=fluid.initializer.Constant(1.)),
bias_attr=fluid.ParamAttr(name=name + '_layer_norm_bias',
initializer=fluid.initializer.Constant(0.)))
if out_dtype == fluid.core.VarDesc.VarType.FP16:
out = layers.cast(x=out, dtype="float16")
elif cmd == "d": # add dropout
if dropout_rate:
out = layers.dropout(out,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
return out
pre_process_layer = partial(pre_post_process_layer, None)
post_process_layer = pre_post_process_layer
def encoder_layer(enc_input,
attn_bias,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd="n",
postprocess_cmd="da",
param_initializer=None,
name=''):
"""The encoder layers that can be stacked to form a deep encoder.
This module consits of a multi-head (self) attention followed by
position-wise feed-forward networks and both the two components companied
with the post_process_layer to add residual connection, layer normalization
and droput.
"""
attn_output = multi_head_attention(pre_process_layer(enc_input,
preprocess_cmd,
prepostprocess_dropout,
name=name + '_pre_att'),
None,
None,
attn_bias,
d_key,
d_value,
d_model,
n_head,
attention_dropout,
param_initializer=param_initializer,
name=name + '_multi_head_att')
attn_output = post_process_layer(enc_input,
attn_output,
postprocess_cmd,
prepostprocess_dropout,
name=name + '_post_att')
ffd_output = positionwise_feed_forward(pre_process_layer(attn_output,
preprocess_cmd,
prepostprocess_dropout,
name=name + '_pre_ffn'),
d_inner_hid,
d_model,
relu_dropout,
hidden_act,
param_initializer=param_initializer,
name=name + '_ffn')
return post_process_layer(attn_output, ffd_output, postprocess_cmd, prepostprocess_dropout, name=name + '_post_ffn')
def encoder(enc_input,
attn_bias,
n_layer,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd="n",
postprocess_cmd="da",
param_initializer=None,
name=''):
"""
The encoder is composed of a stack of identical layers returned by calling
encoder_layer.
"""
for i in range(n_layer):
enc_output = encoder_layer(enc_input,
attn_bias,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd,
postprocess_cmd,
param_initializer=param_initializer,
name=name + '_layer_' + str(i))
enc_input = enc_output
enc_output = pre_process_layer(enc_output, preprocess_cmd, prepostprocess_dropout, name="post_encoder")
return enc_output
# coding:utf-8 # Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License" # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
# You may obtain a copy of the License at # You may obtain a copy of the License at
# #
...@@ -12,64 +11,211 @@ ...@@ -12,64 +11,211 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
from __future__ import absolute_import from typing import Dict, List, Optional, Union, Tuple
from __future__ import division
from __future__ import print_function
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os import os
from paddlehub import TransformerModule from paddle.dataset.common import DATA_HOME
from paddlehub.module.module import moduleinfo import paddle
import paddle.nn as nn
import paddle.nn.functional as F
from ernie_v2_eng_base.model.ernie import ErnieModel, ErnieConfig from paddlehub import BertTokenizer
from paddlehub.module.modeling_ernie import ErnieModel, ErnieForSequenceClassification
from paddlehub.module.module import moduleinfo, serving
from paddlehub.utils.log import logger
from paddlehub.utils.utils import download
@moduleinfo( @moduleinfo(
name="ernie_v2_eng_base", name="ernie_v2_eng_base",
version="1.1.0", version="2.0.0",
summary= summary=
"Baidu's ERNIE 2.0, Enhanced Representation through kNowledge IntEgration, A Continual Pre-training Framework for Language Understanding. 12-layer, 768-hidden, 12-heads, 110M parameters.", "Baidu's ERNIE 2.0, Enhanced Representation through kNowledge IntEgration, max_seq_len=512 when predtrained. The module is executed as paddle.dygraph.",
author="baidu-nlp", author="paddlepaddle",
author_email="", author_email="",
type="nlp/semantic_model", type="nlp/semantic_model")
) class ErnieV2(nn.Layer):
class ErnieV2EngBase(TransformerModule): """
def _initialize(self): Ernie model
ernie_config_path = os.path.join(self.directory, "assets", "ernie_config.json") """
self.ernie_config = ErnieConfig(ernie_config_path)
self.MAX_SEQ_LEN = 512 def __init__(
self.params_path = os.path.join(self.directory, "assets", "params") self,
self.vocab_path = os.path.join(self.directory, "assets", "vocab.txt")\ task=None,
load_checkpoint=None,
def net(self, input_ids, position_ids, segment_ids, input_mask): label_map=None,
):
super(ErnieV2, self).__init__()
# TODO(zhangxuefei): add token_classification task
if task == 'sequence_classification':
self.model = ErnieForSequenceClassification.from_pretrained(
pretrained_model_name_or_path='ernie_v2_eng_base')
self.criterion = paddle.nn.loss.CrossEntropyLoss()
self.metric = paddle.metric.Accuracy(name='acc_accumulation')
elif task is None:
self.model = ErnieModel.from_pretrained(pretrained_model_name_or_path='ernie_v2_eng_base')
else:
raise RuntimeError("Unknown task %s, task should be sequence_classification" % task)
self.task = task
self.label_map = label_map
if load_checkpoint is not None and os.path.isfile(load_checkpoint):
state_dict = paddle.load(load_checkpoint)
self.set_state_dict(state_dict)
logger.info('Loaded parameters from %s' % os.path.abspath(load_checkpoint))
def forward(self, input_ids, token_type_ids=None, position_ids=None, attention_mask=None, labels=None):
result = self.model(input_ids, token_type_ids, position_ids, attention_mask)
if self.task is not None:
logits = result
probs = F.softmax(logits, axis=1)
if labels is not None:
loss = self.criterion(logits, labels)
correct = self.metric.compute(probs, labels)
acc = self.metric.update(correct)
return probs, loss, acc
return probs
else:
sequence_output, pooled_output = result
return sequence_output, pooled_output
def get_vocab_path(self):
"""
Gets the path of the module vocabulary path.
"""
save_path = os.path.join(DATA_HOME, 'ernie', 'vocab.txt')
if not os.path.exists(save_path) or not os.path.isfile(save_path):
url = "https://paddlenlp.bj.bcebos.com/models/transformers/ernie_v2_eng_base/vocab.txt"
download(url, os.path.join(DATA_HOME, 'ernie'))
return save_path
def get_tokenizer(self, tokenize_chinese_chars=True):
"""
Gets the tokenizer that is customized for this module.
Args:
tokenize_chinese_chars (:obj: bool , defaults to :obj: True):
Whether to tokenize chinese characters or not.
Returns:
tokenizer (:obj:BertTokenizer) : The tokenizer which was customized for this module.
"""
return BertTokenizer(tokenize_chinese_chars=tokenize_chinese_chars, vocab_file=self.get_vocab_path())
def training_step(self, batch: List[paddle.Tensor], batch_idx: int):
"""
One step for training, which should be called as forward computation.
Args:
batch(:obj:List[paddle.Tensor]): The one batch data, which contains the model needed,
such as input_ids, sent_ids, pos_ids, input_mask and labels.
batch_idx(int): The index of batch.
Returns:
results(:obj: Dict) : The model outputs, such as loss and metrics.
"""
predictions, avg_loss, acc = self(input_ids=batch[0], token_type_ids=batch[1], labels=batch[2])
return {'loss': avg_loss, 'metrics': {'acc': acc}}
def validation_step(self, batch: List[paddle.Tensor], batch_idx: int):
""" """
create neural network. One step for validation, which should be called as forward computation.
Args:
batch(:obj:List[paddle.Tensor]): The one batch data, which contains the model needed,
such as input_ids, sent_ids, pos_ids, input_mask and labels.
batch_idx(int): The index of batch.
Returns:
results(:obj: Dict) : The model outputs, such as metrics.
"""
predictions, avg_loss, acc = self(input_ids=batch[0], token_type_ids=batch[1], labels=batch[2])
return {'metrics': {'acc': acc}}
def predict(self, data, max_seq_len=128, batch_size=1, use_gpu=False):
"""
Predicts the data labels.
Args: Args:
input_ids (tensor): the word ids. data (obj:`List(str)`): The processed data whose each element is the raw text.
position_ids (tensor): the position ids. max_seq_len (:obj:`int`, `optional`, defaults to :int:`None`):
segment_ids (tensor): the segment ids. If set to a number, will limit the total sequence returned so that it has a maximum length.
input_mask (tensor): the padding mask. batch_size(obj:`int`, defaults to 1): The number of batch.
use_gpu(obj:`bool`, defaults to `False`): Whether to use gpu to run or not.
Returns: Returns:
pooled_output (tensor): sentence-level output for classification task. results(obj:`list`): All the predictions labels.
sequence_output (tensor): token-level output for sequence task.
""" """
self.ernie_config._config_dict['use_task_id'] = False # TODO(zhangxuefei): add task token_classification task predict.
ernie = ErnieModel(src_ids=input_ids, if self.task not in ['sequence_classification']:
position_ids=position_ids, raise RuntimeError("The predict method is for sequence_classification task, but got task %s." % self.task)
sentence_ids=segment_ids,
task_ids=None, paddle.set_device('gpu') if use_gpu else paddle.set_device('cpu')
input_mask=input_mask, tokenizer = self.get_tokenizer()
config=self.ernie_config,
use_fp16=False) examples = []
pooled_output = ernie.get_pooled_output() for text in data:
sequence_output = ernie.get_sequence_output() if len(text) == 1:
return pooled_output, sequence_output encoded_inputs = tokenizer.encode(text[0], text_pair=None, max_seq_len=max_seq_len)
elif len(text) == 2:
encoded_inputs = tokenizer.encode(text[0], text_pair=text[1], max_seq_len=max_seq_len)
if __name__ == '__main__': else:
test_module = ErnieV2EngBase() raise RuntimeError(
'The input text must have one or two sequence, but got %d. Please check your inputs.' % len(text))
examples.append((encoded_inputs['input_ids'], encoded_inputs['segment_ids']))
def _batchify_fn(batch):
input_ids = [entry[0] for entry in batch]
segment_ids = [entry[1] for entry in batch]
return input_ids, segment_ids
# Seperates data into some batches.
batches = []
one_batch = []
for example in examples:
one_batch.append(example)
if len(one_batch) == batch_size:
batches.append(one_batch)
one_batch = []
if one_batch:
# The last batch whose size is less than the config batch_size setting.
batches.append(one_batch)
results = []
self.eval()
for batch in batches:
input_ids, segment_ids = _batchify_fn(batch)
input_ids = paddle.to_tensor(input_ids)
segment_ids = paddle.to_tensor(segment_ids)
# TODO(zhangxuefei): add task token_classification postprocess after prediction.
if self.task == 'sequence_classification':
probs = self(input_ids, segment_ids)
idx = paddle.argmax(probs, axis=1).numpy()
idx = idx.tolist()
labels = [self.label_map[i] for i in idx]
results.extend(labels)
return results
@serving
def get_embedding(self, texts, use_gpu=False):
if self.task is not None:
raise RuntimeError("The get_embedding method is only valid when task is None, but got task %s" % self.task)
paddle.set_device('gpu') if use_gpu else paddle.set_device('cpu')
tokenizer = self.get_tokenizer()
results = []
for text in texts:
if len(text) == 1:
encoded_inputs = tokenizer.encode(text[0], text_pair=None, pad_to_max_seq_len=False)
elif len(text) == 2:
encoded_inputs = tokenizer.encode(text[0], text_pair=text[1], pad_to_max_seq_len=False)
else:
raise RuntimeError(
'The input text must have one or two sequence, but got %d. Please check your inputs.' % len(text))
input_ids = paddle.to_tensor(encoded_inputs['input_ids']).unsqueeze(0)
segment_ids = paddle.to_tensor(encoded_inputs['segment_ids']).unsqueeze(0)
sequence_output, pooled_output = self(input_ids, segment_ids)
sequence_output = sequence_output.squeeze(0)
pooled_output = pooled_output.squeeze(0)
results.append((sequence_output.numpy().tolist(), pooled_output.numpy().tolist()))
return results
```shell ```shell
$ hub install ernie_v2_eng_large==1.1.0 $ hub install ernie_v2_eng_large==2.0.0
``` ```
<p align="center"> <p align="center">
<img src="https://bj.bcebos.com/paddlehub/paddlehub-img/ernie2.0_arch.png" hspace='10'/> <br /> <img src="https://bj.bcebos.com/paddlehub/paddlehub-img/ernie2.0_arch.png" hspace='10'/> <br />
</p> </p>
...@@ -12,40 +14,44 @@ $ hub install ernie_v2_eng_large==1.1.0 ...@@ -12,40 +14,44 @@ $ hub install ernie_v2_eng_large==1.1.0
更多详情请参考[ERNIE论文](https://arxiv.org/abs/1907.12412) 更多详情请参考[ERNIE论文](https://arxiv.org/abs/1907.12412)
## API ## API
```python ```python
def context( def __init__(
trainable=True, task=None,
max_seq_len=128 load_checkpoint=None,
) label_map=None)
``` ```
用于获取Module的上下文信息,得到输入、输出以及预训练的Paddle Program副本
创建Module对象(动态图组网版本)。
**参数** **参数**
> trainable:设置为True时,Module中的参数在Fine-tune时也会随之训练,否则保持不变。
> max_seq_len:ERNIE模型的最大序列长度,若序列长度不足,会通过padding方式补到**max_seq_len**, 若序列长度大于该值,则会以截断方式让序列长度为**max_seq_len**,max_seq_len可取值范围为0~512;
**返回** * `task`: 任务名称,可为`sequence_classification`
> inputs:dict类型,有以下字段: * `load_checkpoint`:使用PaddleHub Fine-tune api训练保存的模型参数文件路径。
> >**input_ids**对应于上图的Token Embedding,shape为\[batch_size, max_seq_len\],int64类型; * `label_map`:预测时的类别映射表。
> >**position_ids**对应于上图的Position Embedding,shape为\[batch_size, max_seq_len\],int64类型;
> >**segment_ids**对应于上图的Sentence Embedding,shape为\[batch_size, max_seq_len\],int64类型; ```python
> >**input_mask**存放token是否为padding的标识,shape为\[batch_size, max_seq_len\],int64类型; def predict(
> >**task_id**对应于上图的Task Embedding的标识,shape为\[batch_size, max_seq_len\],int64类型; data,
> max_seq_len=128,
> outputs:dict类型,Module的输出特征,有以下字段: batch_size=1,
> >**pooled_output**字段存放句子粒度的特征,可用于文本分类等任务,shape为 \[batch_size, 768\],int64类型; use_gpu=False)
> >**sequence_output**字段存放字粒度的特征,可用于序列标注等任务,shape为\[batch_size, seq_len, 768\],int64类型; ```
>
> program:包含该Module计算图的Program。
**参数**
* `data`: 待预测数据,格式为\[\[sample\_a\_text\_a, sample\_a\_text\_b\], \[sample\_b\_text\_a, sample\_b\_text\_b\],…,\],其中每个元素都是一个样例,
每个样例可以包含text\_a与text\_b。每个样例文本数量(1个或者2个)需和训练时保持一致。
* `max_seq_len`:模型处理文本的最大长度
* `batch_size`:模型批处理大小
* `use_gpu`:是否使用gpu,默认为False。对于GPU用户,建议开启use_gpu。
**返回**
```python ```python
def get_embedding( def get_embedding(
texts, texts,
use_gpu=False, use_gpu=False
batch_size=1
) )
``` ```
...@@ -53,73 +59,86 @@ def get_embedding( ...@@ -53,73 +59,86 @@ def get_embedding(
**参数** **参数**
> texts:输入文本列表,格式为\[\[sample\_a\_text\_a, sample\_a\_text\_b\], \[sample\_b\_text\_a, sample\_b\_text\_b\],…,\],其中每个元素都是一个样例,每个样例可以包含text\_a与text\_b。 * `texts`:输入文本列表,格式为\[\[sample\_a\_text\_a, sample\_a\_text\_b\], \[sample\_b\_text\_a, sample\_b\_text\_b\],…,\],其中每个元素都是一个样例,每个样例可以包含text\_a与text\_b。
> use_gpu:是否使用gpu,默认为False。对于GPU用户,建议开启use_gpu。 * `use_gpu`:是否使用gpu,默认为False。对于GPU用户,建议开启use_gpu。
**返回** **返回**
> results:list类型,格式为\[\[sample\_a\_pooled\_feature, sample\_a\_seq\_feature\], \[sample\_b\_pooled\_feature, sample\_b\_seq\_feature\],…,\],其中每个元素都是对应样例的特征输出,每个样例都有句子粒度特征pooled\_feature与字粒度特征seq\_feature。 * `results`:list类型,格式为\[\[sample\_a\_pooled\_feature, sample\_a\_seq\_feature\], \[sample\_b\_pooled\_feature, sample\_b\_seq\_feature\],…,\],其中每个元素都是对应样例的特征输出,每个样例都有句子粒度特征pooled\_feature与字粒度特征seq\_feature。
>
```python
def get_params_layer()
```
用于获取参数层信息,该方法与ULMFiTStrategy联用可以严格按照层数设置分层学习率与逐层解冻。 **代码示例**
**参数** ```python
import paddlehub as hub
> 无 data = [
'这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般',
'怀着十分激动的心情放映,可是看着看着发现,在放映完毕后,出现一集米老鼠的动画片',
'作为老的四星酒店,房间依然很整洁,相当不错。机场接机服务很好,可以在车上办理入住手续,节省时间。',
]
label_map = {0: 'negative', 1: 'positive'}
model = hub.Module(
name='ernie_v2_eng_large',
version='2.0.0',
task='sequence_classification',
load_checkpoint='/path/to/parameters',
label_map=label_map)
results = model.predict(data, max_seq_len=50, batch_size=1, use_gpu=False)
for idx, text in enumerate(data):
print('Data: {} \t Lable: {}'.format(text, results[idx]))
```
**返回** 参考PaddleHub 文本分类示例。https://github.com/PaddlePaddle/PaddleHub/tree/release/v2.0.0-beta/demo/text_classifcation
> params_layer:dict类型,key为参数名,值为参数所在层数 ## 服务部署
**代码示例** PaddleHub Serving可以部署一个在线获取预训练词向量。
```python ### Step1: 启动PaddleHub Serving
import paddlehub as hub
# Load ernie pretrained model 运行启动命令:
module = hub.Module(name="ernie_v2_eng_large")
inputs, outputs, program = module.context(trainable=True, max_seq_len=128)
# Must feed all the following tensor of ernie's module need ```shell
input_ids = inputs["input_ids"] $ hub serving start -m ernie_v2_eng_large
position_ids = inputs["position_ids"] ```
segment_ids = inputs["segment_ids"]
input_mask = inputs["input_mask"]
# task_ids is not necessary during finetuning
task_ids = inputs["task_ids"]
# Use "pooled_output" for sentence-level output. 这样就完成了一个获取预训练词向量服务化API的部署,默认端口号为8866。
pooled_output = outputs["pooled_output"]
# Use "sequence_output" for token-level output. **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
sequence_output = outputs["sequence_output"]
# Use "get_embedding" to get embedding result. ### Step2: 发送预测请求
embedding_result = module.get_embedding(texts=[["Sample1_text_a"],["Sample2_text_a","Sample2_text_b"]], use_gpu=True)
# Use "get_params_layer" to get params layer and used to ULMFiTStrategy. 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
params_layer = module.get_params_layer()
strategy = hub.finetune.strategy.ULMFiTStrategy(frz_params_layer=params_layer, dis_params_layer=params_layer)
```
利用该PaddleHub Module Fine-tune示例,可参考[文本分类](https://github.com/PaddlePaddle/PaddleHub/tree/release/v1.2/demo/text-classification)
`Note`:建议该PaddleHub Module在**GPU**环境中运行。如出现显存不足,可以将**batch_size****max_seq_len**调小。 ```python
import requests
import json
# 指定用于预测的文本并生成字典{"text": [text_1, text_2, ... ]}
text = [["今天是个好日子", "天气预报说今天要下雨"], ["这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般"]]
# 以key的方式指定text传入预测方法的时的参数,此例中为"texts"
# 对应本地部署,则为module.get_embedding(texts=text)
data = {"texts": text}
# 发送post请求,content-type类型应指定json方式
url = "http://10.12.121.132:8866/predict/ernie_v2_eng_large"
# 指定post请求的headers为application/json方式
headers = {"Content-Type": "application/json"}
r = requests.post(url=url, headers=headers, data=json.dumps(data))
print(r.json())
```
## 查看代码 ## 查看代码
https://github.com/PaddlePaddle/ERNIE https://github.com/PaddlePaddle/ERNIE
## 依赖 ## 依赖
paddlepaddle >= 1.6.2 paddlepaddle >= 2.0.0
paddlehub >= 1.6.0
paddlehub >= 2.0.0
## 更新历史 ## 更新历史
...@@ -134,3 +153,7 @@ paddlehub >= 1.6.0 ...@@ -134,3 +153,7 @@ paddlehub >= 1.6.0
* 1.1.0 * 1.1.0
支持get_embedding与get_params_layer 支持get_embedding与get_params_layer
* 2.0.0
全面升级动态图版本,接口有所变化
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Ernie model."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
from __future__ import absolute_import
import json
import six
import paddle.fluid as fluid
from io import open
from paddlehub.common.logger import logger
from ernie_v2_eng_large.model.transformer_encoder import encoder, pre_process_layer
class ErnieConfig(object):
def __init__(self, config_path):
self._config_dict = self._parse(config_path)
def _parse(self, config_path):
try:
with open(config_path, 'r', encoding='utf8') as json_file:
config_dict = json.load(json_file)
except Exception:
raise IOError("Error in parsing Ernie model config file '%s'" % config_path)
else:
return config_dict
def __getitem__(self, key):
return self._config_dict.get(key, None)
def print_config(self):
for arg, value in sorted(six.iteritems(self._config_dict)):
logger.info('%s: %s' % (arg, value))
logger.info('------------------------------------------------')
class ErnieModel(object):
def __init__(self,
src_ids,
position_ids,
sentence_ids,
task_ids,
input_mask,
config,
weight_sharing=True,
use_fp16=False):
self._emb_size = config['hidden_size']
self._n_layer = config['num_hidden_layers']
self._n_head = config['num_attention_heads']
self._voc_size = config['vocab_size']
self._max_position_seq_len = config['max_position_embeddings']
if config['sent_type_vocab_size']:
self._sent_types = config['sent_type_vocab_size']
else:
self._sent_types = config['type_vocab_size']
self._use_task_id = config['use_task_id']
if self._use_task_id:
self._task_types = config['task_type_vocab_size']
self._hidden_act = config['hidden_act']
self._prepostprocess_dropout = config['hidden_dropout_prob']
self._attention_dropout = config['attention_probs_dropout_prob']
self._weight_sharing = weight_sharing
self._word_emb_name = "word_embedding"
self._pos_emb_name = "pos_embedding"
self._sent_emb_name = "sent_embedding"
self._task_emb_name = "task_embedding"
self._dtype = "float16" if use_fp16 else "float32"
self._emb_dtype = "float32"
# Initialize all weigths by truncated normal initializer, and all biases
# will be initialized by constant zero by default.
self._param_initializer = fluid.initializer.TruncatedNormal(scale=config['initializer_range'])
self._build_model(src_ids, position_ids, sentence_ids, task_ids, input_mask)
def _build_model(self, src_ids, position_ids, sentence_ids, task_ids, input_mask):
# padding id in vocabulary must be set to 0
emb_out = fluid.layers.embedding(input=src_ids,
size=[self._voc_size, self._emb_size],
dtype=self._emb_dtype,
param_attr=fluid.ParamAttr(name=self._word_emb_name,
initializer=self._param_initializer),
is_sparse=False)
position_emb_out = fluid.layers.embedding(input=position_ids,
size=[self._max_position_seq_len, self._emb_size],
dtype=self._emb_dtype,
param_attr=fluid.ParamAttr(name=self._pos_emb_name,
initializer=self._param_initializer))
sent_emb_out = fluid.layers.embedding(sentence_ids,
size=[self._sent_types, self._emb_size],
dtype=self._emb_dtype,
param_attr=fluid.ParamAttr(name=self._sent_emb_name,
initializer=self._param_initializer))
emb_out = emb_out + position_emb_out
emb_out = emb_out + sent_emb_out
if self._use_task_id:
task_emb_out = fluid.layers.embedding(task_ids,
size=[self._task_types, self._emb_size],
dtype=self._emb_dtype,
param_attr=fluid.ParamAttr(name=self._task_emb_name,
initializer=self._param_initializer))
emb_out = emb_out + task_emb_out
emb_out = pre_process_layer(emb_out, 'nd', self._prepostprocess_dropout, name='pre_encoder')
if self._dtype == "float16":
emb_out = fluid.layers.cast(x=emb_out, dtype=self._dtype)
input_mask = fluid.layers.cast(x=input_mask, dtype=self._dtype)
self_attn_mask = fluid.layers.matmul(x=input_mask, y=input_mask, transpose_y=True)
self_attn_mask = fluid.layers.scale(x=self_attn_mask, scale=10000.0, bias=-1.0, bias_after_scale=False)
n_head_self_attn_mask = fluid.layers.stack(x=[self_attn_mask] * self._n_head, axis=1)
n_head_self_attn_mask.stop_gradient = True
self._enc_out = encoder(enc_input=emb_out,
attn_bias=n_head_self_attn_mask,
n_layer=self._n_layer,
n_head=self._n_head,
d_key=self._emb_size // self._n_head,
d_value=self._emb_size // self._n_head,
d_model=self._emb_size,
d_inner_hid=self._emb_size * 4,
prepostprocess_dropout=self._prepostprocess_dropout,
attention_dropout=self._attention_dropout,
relu_dropout=0,
hidden_act=self._hidden_act,
preprocess_cmd="",
postprocess_cmd="dan",
param_initializer=self._param_initializer,
name='encoder')
def get_sequence_output(self):
return self._enc_out
def get_pooled_output(self):
"""Get the first feature of each sequence for classification"""
next_sent_feat = fluid.layers.slice(input=self._enc_out, axes=[1], starts=[0], ends=[1])
if self._dtype == "float16":
next_sent_feat = fluid.layers.cast(x=next_sent_feat, dtype=self._emb_dtype)
next_sent_feat = fluid.layers.fc(input=next_sent_feat,
size=self._emb_size,
act="tanh",
param_attr=fluid.ParamAttr(name="pooled_fc.w_0",
initializer=self._param_initializer),
bias_attr="pooled_fc.b_0")
return next_sent_feat
def get_lm_output(self, mask_label, mask_pos):
"""Get the loss & accuracy for pretraining"""
mask_pos = fluid.layers.cast(x=mask_pos, dtype='int32')
# extract the first token feature in each sentence
self.next_sent_feat = self.get_pooled_output()
reshaped_emb_out = fluid.layers.reshape(x=self._enc_out, shape=[-1, self._emb_size])
# extract masked tokens' feature
mask_feat = fluid.layers.gather(input=reshaped_emb_out, index=mask_pos)
if self._dtype == "float16":
mask_feat = fluid.layers.cast(x=mask_feat, dtype=self._emb_dtype)
# transform: fc
mask_trans_feat = fluid.layers.fc(input=mask_feat,
size=self._emb_size,
act=self._hidden_act,
param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0',
initializer=self._param_initializer),
bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
# transform: layer norm
mask_trans_feat = fluid.layers.layer_norm(mask_trans_feat,
begin_norm_axis=len(mask_trans_feat.shape) - 1,
param_attr=fluid.ParamAttr(
name='mask_lm_trans_layer_norm_scale',
initializer=fluid.initializer.Constant(1.)),
bias_attr=fluid.ParamAttr(name='mask_lm_trans_layer_norm_bias',
initializer=fluid.initializer.Constant(1.)))
# transform: layer norm
#mask_trans_feat = pre_process_layer(
# mask_trans_feat, 'n', name='mask_lm_trans')
mask_lm_out_bias_attr = fluid.ParamAttr(name="mask_lm_out_fc.b_0",
initializer=fluid.initializer.Constant(value=0.0))
if self._weight_sharing:
fc_out = fluid.layers.matmul(x=mask_trans_feat,
y=fluid.default_main_program().global_block().var(self._word_emb_name),
transpose_y=True)
fc_out += fluid.layers.create_parameter(shape=[self._voc_size],
dtype=self._emb_dtype,
attr=mask_lm_out_bias_attr,
is_bias=True)
else:
fc_out = fluid.layers.fc(input=mask_trans_feat,
size=self._voc_size,
param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0",
initializer=self._param_initializer),
bias_attr=mask_lm_out_bias_attr)
mask_lm_loss = fluid.layers.softmax_with_cross_entropy(logits=fc_out, label=mask_label)
mean_mask_lm_loss = fluid.layers.mean(mask_lm_loss)
return mean_mask_lm_loss
def get_task_output(self, task, task_labels):
task_fc_out = fluid.layers.fc(input=self.next_sent_feat,
size=task["num_labels"],
param_attr=fluid.ParamAttr(name=task["task_name"] + "_fc.w_0",
initializer=self._param_initializer),
bias_attr=task["task_name"] + "_fc.b_0")
task_loss, task_softmax = fluid.layers.softmax_with_cross_entropy(logits=task_fc_out,
label=task_labels,
return_softmax=True)
task_acc = fluid.layers.accuracy(input=task_softmax, label=task_labels)
mean_task_loss = fluid.layers.mean(task_loss)
return mean_task_loss, task_acc
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Transformer encoder."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from functools import partial
import paddle.fluid as fluid
import paddle.fluid.layers as layers
def multi_head_attention(queries,
keys,
values,
attn_bias,
d_key,
d_value,
d_model,
n_head=1,
dropout_rate=0.,
cache=None,
param_initializer=None,
name='multi_head_att'):
"""
Multi-Head Attention. Note that attn_bias is added to the logit before
computing softmax activiation to mask certain selected positions so that
they will not considered in attention weights.
"""
keys = queries if keys is None else keys
values = keys if values is None else values
if not (len(queries.shape) == len(keys.shape) == len(values.shape) == 3):
raise ValueError("Inputs: quries, keys and values should all be 3-D tensors.")
def __compute_qkv(queries, keys, values, n_head, d_key, d_value):
"""
Add linear projection to queries, keys, and values.
"""
q = layers.fc(input=queries,
size=d_key * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_query_fc.w_0', initializer=param_initializer),
bias_attr=name + '_query_fc.b_0')
k = layers.fc(input=keys,
size=d_key * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_key_fc.w_0', initializer=param_initializer),
bias_attr=name + '_key_fc.b_0')
v = layers.fc(input=values,
size=d_value * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_value_fc.w_0', initializer=param_initializer),
bias_attr=name + '_value_fc.b_0')
return q, k, v
def __split_heads(x, n_head):
"""
Reshape the last dimension of inpunt tensor x so that it becomes two
dimensions and then transpose. Specifically, input a tensor with shape
[bs, max_sequence_length, n_head * hidden_dim] then output a tensor
with shape [bs, n_head, max_sequence_length, hidden_dim].
"""
hidden_size = x.shape[-1]
# The value 0 in shape attr means copying the corresponding dimension
# size of the input as the output dimension size.
reshaped = layers.reshape(x=x, shape=[0, 0, n_head, hidden_size // n_head], inplace=True)
# permuate the dimensions into:
# [batch_size, n_head, max_sequence_len, hidden_size_per_head]
return layers.transpose(x=reshaped, perm=[0, 2, 1, 3])
def __combine_heads(x):
"""
Transpose and then reshape the last two dimensions of inpunt tensor x
so that it becomes one dimension, which is reverse to __split_heads.
"""
if len(x.shape) == 3: return x
if len(x.shape) != 4:
raise ValueError("Input(x) should be a 4-D Tensor.")
trans_x = layers.transpose(x, perm=[0, 2, 1, 3])
# The value 0 in shape attr means copying the corresponding dimension
# size of the input as the output dimension size.
return layers.reshape(x=trans_x, shape=[0, 0, trans_x.shape[2] * trans_x.shape[3]], inplace=True)
def scaled_dot_product_attention(q, k, v, attn_bias, d_key, dropout_rate):
"""
Scaled Dot-Product Attention
"""
scaled_q = layers.scale(x=q, scale=d_key**-0.5)
product = layers.matmul(x=scaled_q, y=k, transpose_y=True)
if attn_bias:
product += attn_bias
weights = layers.softmax(product)
if dropout_rate:
weights = layers.dropout(weights,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
out = layers.matmul(weights, v)
return out
q, k, v = __compute_qkv(queries, keys, values, n_head, d_key, d_value)
if cache is not None: # use cache and concat time steps
# Since the inplace reshape in __split_heads changes the shape of k and
# v, which is the cache input for next time step, reshape the cache
# input from the previous time step first.
k = cache["k"] = layers.concat([layers.reshape(cache["k"], shape=[0, 0, d_model]), k], axis=1)
v = cache["v"] = layers.concat([layers.reshape(cache["v"], shape=[0, 0, d_model]), v], axis=1)
q = __split_heads(q, n_head)
k = __split_heads(k, n_head)
v = __split_heads(v, n_head)
ctx_multiheads = scaled_dot_product_attention(q, k, v, attn_bias, d_key, dropout_rate)
out = __combine_heads(ctx_multiheads)
# Project back to the model size.
proj_out = layers.fc(input=out,
size=d_model,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_output_fc.w_0', initializer=param_initializer),
bias_attr=name + '_output_fc.b_0')
return proj_out
def positionwise_feed_forward(x, d_inner_hid, d_hid, dropout_rate, hidden_act, param_initializer=None, name='ffn'):
"""
Position-wise Feed-Forward Networks.
This module consists of two linear transformations with a ReLU activation
in between, which is applied to each position separately and identically.
"""
hidden = layers.fc(input=x,
size=d_inner_hid,
num_flatten_dims=2,
act=hidden_act,
param_attr=fluid.ParamAttr(name=name + '_fc_0.w_0', initializer=param_initializer),
bias_attr=name + '_fc_0.b_0')
if dropout_rate:
hidden = layers.dropout(hidden,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
out = layers.fc(input=hidden,
size=d_hid,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_fc_1.w_0', initializer=param_initializer),
bias_attr=name + '_fc_1.b_0')
return out
def pre_post_process_layer(prev_out, out, process_cmd, dropout_rate=0., name=''):
"""
Add residual connection, layer normalization and droput to the out tensor
optionally according to the value of process_cmd.
This will be used before or after multi-head attention and position-wise
feed-forward networks.
"""
for cmd in process_cmd:
if cmd == "a": # add residual connection
out = out + prev_out if prev_out else out
elif cmd == "n": # add layer normalization
out_dtype = out.dtype
if out_dtype == fluid.core.VarDesc.VarType.FP16:
out = layers.cast(x=out, dtype="float32")
out = layers.layer_norm(out,
begin_norm_axis=len(out.shape) - 1,
param_attr=fluid.ParamAttr(name=name + '_layer_norm_scale',
initializer=fluid.initializer.Constant(1.)),
bias_attr=fluid.ParamAttr(name=name + '_layer_norm_bias',
initializer=fluid.initializer.Constant(0.)))
if out_dtype == fluid.core.VarDesc.VarType.FP16:
out = layers.cast(x=out, dtype="float16")
elif cmd == "d": # add dropout
if dropout_rate:
out = layers.dropout(out,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
return out
pre_process_layer = partial(pre_post_process_layer, None)
post_process_layer = pre_post_process_layer
def encoder_layer(enc_input,
attn_bias,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd="n",
postprocess_cmd="da",
param_initializer=None,
name=''):
"""The encoder layers that can be stacked to form a deep encoder.
This module consits of a multi-head (self) attention followed by
position-wise feed-forward networks and both the two components companied
with the post_process_layer to add residual connection, layer normalization
and droput.
"""
attn_output = multi_head_attention(pre_process_layer(enc_input,
preprocess_cmd,
prepostprocess_dropout,
name=name + '_pre_att'),
None,
None,
attn_bias,
d_key,
d_value,
d_model,
n_head,
attention_dropout,
param_initializer=param_initializer,
name=name + '_multi_head_att')
attn_output = post_process_layer(enc_input,
attn_output,
postprocess_cmd,
prepostprocess_dropout,
name=name + '_post_att')
ffd_output = positionwise_feed_forward(pre_process_layer(attn_output,
preprocess_cmd,
prepostprocess_dropout,
name=name + '_pre_ffn'),
d_inner_hid,
d_model,
relu_dropout,
hidden_act,
param_initializer=param_initializer,
name=name + '_ffn')
return post_process_layer(attn_output, ffd_output, postprocess_cmd, prepostprocess_dropout, name=name + '_post_ffn')
def encoder(enc_input,
attn_bias,
n_layer,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd="n",
postprocess_cmd="da",
param_initializer=None,
name=''):
"""
The encoder is composed of a stack of identical layers returned by calling
encoder_layer.
"""
for i in range(n_layer):
enc_output = encoder_layer(enc_input,
attn_bias,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd,
postprocess_cmd,
param_initializer=param_initializer,
name=name + '_layer_' + str(i))
enc_input = enc_output
enc_output = pre_process_layer(enc_output, preprocess_cmd, prepostprocess_dropout, name="post_encoder")
return enc_output
# coding:utf-8 # Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License" # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
# You may obtain a copy of the License at # You may obtain a copy of the License at
# #
...@@ -12,64 +11,211 @@ ...@@ -12,64 +11,211 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
from __future__ import absolute_import from typing import Dict, List, Optional, Union, Tuple
from __future__ import division
from __future__ import print_function
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os import os
from paddlehub import TransformerModule from paddle.dataset.common import DATA_HOME
from paddlehub.module.module import moduleinfo import paddle
import paddle.nn as nn
import paddle.nn.functional as F
from ernie_v2_eng_large.model.ernie import ErnieModel, ErnieConfig from paddlehub import BertTokenizer
from paddlehub.module.modeling_ernie import ErnieModel, ErnieForSequenceClassification
from paddlehub.module.module import moduleinfo, serving
from paddlehub.utils.log import logger
from paddlehub.utils.utils import download
@moduleinfo( @moduleinfo(
name="ernie_v2_eng_large", name="ernie_v2_eng_large",
version="1.1.0", version="2.0.0",
summary= summary=
"Baidu's ERNIE 2.0, Enhanced Representation through kNowledge IntEgration, A Continual Pre-training Framework for Language Understanding. 12-layer, 768-hidden, 12-heads, 110M parameters.", "Baidu's ERNIE 2.0, Enhanced Representation through kNowledge IntEgration, max_seq_len=512 when predtrained. The module is executed as paddle.dygraph.",
author="baidu-nlp", author="paddlepaddle",
author_email="", author_email="",
type="nlp/semantic_model", type="nlp/semantic_model")
) class ErnieV2(nn.Layer):
class ErnieV2EngLarge(TransformerModule): """
def _initialize(self): Ernie model
ernie_config_path = os.path.join(self.directory, "assets", "ernie_config.json") """
self.ernie_config = ErnieConfig(ernie_config_path)
self.MAX_SEQ_LEN = 512 def __init__(
self.params_path = os.path.join(self.directory, "assets", "params") self,
self.vocab_path = os.path.join(self.directory, "assets", "vocab.txt")\ task=None,
load_checkpoint=None,
def net(self, input_ids, position_ids, segment_ids, input_mask): label_map=None,
):
super(ErnieV2, self).__init__()
# TODO(zhangxuefei): add token_classification task
if task == 'sequence_classification':
self.model = ErnieForSequenceClassification.from_pretrained(
pretrained_model_name_or_path='ernie_v2_eng_large')
self.criterion = paddle.nn.loss.CrossEntropyLoss()
self.metric = paddle.metric.Accuracy(name='acc_accumulation')
elif task is None:
self.model = ErnieModel.from_pretrained(pretrained_model_name_or_path='ernie_v2_eng_large')
else:
raise RuntimeError("Unknown task %s, task should be sequence_classification" % task)
self.task = task
self.label_map = label_map
if load_checkpoint is not None and os.path.isfile(load_checkpoint):
state_dict = paddle.load(load_checkpoint)
self.set_state_dict(state_dict)
logger.info('Loaded parameters from %s' % os.path.abspath(load_checkpoint))
def forward(self, input_ids, token_type_ids=None, position_ids=None, attention_mask=None, labels=None):
result = self.model(input_ids, token_type_ids, position_ids, attention_mask)
if self.task is not None:
logits = result
probs = F.softmax(logits, axis=1)
if labels is not None:
loss = self.criterion(logits, labels)
correct = self.metric.compute(probs, labels)
acc = self.metric.update(correct)
return probs, loss, acc
return probs
else:
sequence_output, pooled_output = result
return sequence_output, pooled_output
def get_vocab_path(self):
"""
Gets the path of the module vocabulary path.
"""
save_path = os.path.join(DATA_HOME, 'ernie', 'vocab.txt')
if not os.path.exists(save_path) or not os.path.isfile(save_path):
url = "https://paddlenlp.bj.bcebos.com/models/transformers/ernie_v2_eng_large/vocab.txt"
download(url, os.path.join(DATA_HOME, 'ernie'))
return save_path
def get_tokenizer(self, tokenize_chinese_chars=True):
"""
Gets the tokenizer that is customized for this module.
Args:
tokenize_chinese_chars (:obj: bool , defaults to :obj: True):
Whether to tokenize chinese characters or not.
Returns:
tokenizer (:obj:BertTokenizer) : The tokenizer which was customized for this module.
"""
return BertTokenizer(tokenize_chinese_chars=tokenize_chinese_chars, vocab_file=self.get_vocab_path())
def training_step(self, batch: List[paddle.Tensor], batch_idx: int):
"""
One step for training, which should be called as forward computation.
Args:
batch(:obj:List[paddle.Tensor]): The one batch data, which contains the model needed,
such as input_ids, sent_ids, pos_ids, input_mask and labels.
batch_idx(int): The index of batch.
Returns:
results(:obj: Dict) : The model outputs, such as loss and metrics.
"""
predictions, avg_loss, acc = self(input_ids=batch[0], token_type_ids=batch[1], labels=batch[2])
return {'loss': avg_loss, 'metrics': {'acc': acc}}
def validation_step(self, batch: List[paddle.Tensor], batch_idx: int):
""" """
create neural network. One step for validation, which should be called as forward computation.
Args:
batch(:obj:List[paddle.Tensor]): The one batch data, which contains the model needed,
such as input_ids, sent_ids, pos_ids, input_mask and labels.
batch_idx(int): The index of batch.
Returns:
results(:obj: Dict) : The model outputs, such as metrics.
"""
predictions, avg_loss, acc = self(input_ids=batch[0], token_type_ids=batch[1], labels=batch[2])
return {'metrics': {'acc': acc}}
def predict(self, data, max_seq_len=128, batch_size=1, use_gpu=False):
"""
Predicts the data labels.
Args: Args:
input_ids (tensor): the word ids. data (obj:`List(str)`): The processed data whose each element is the raw text.
position_ids (tensor): the position ids. max_seq_len (:obj:`int`, `optional`, defaults to :int:`None`):
segment_ids (tensor): the segment ids. If set to a number, will limit the total sequence returned so that it has a maximum length.
input_mask (tensor): the padding mask. batch_size(obj:`int`, defaults to 1): The number of batch.
use_gpu(obj:`bool`, defaults to `False`): Whether to use gpu to run or not.
Returns: Returns:
pooled_output (tensor): sentence-level output for classification task. results(obj:`list`): All the predictions labels.
sequence_output (tensor): token-level output for sequence task.
""" """
self.ernie_config._config_dict['use_task_id'] = False # TODO(zhangxuefei): add task token_classification task predict.
ernie = ErnieModel(src_ids=input_ids, if self.task not in ['sequence_classification']:
position_ids=position_ids, raise RuntimeError("The predict method is for sequence_classification task, but got task %s." % self.task)
sentence_ids=segment_ids,
task_ids=None, paddle.set_device('gpu') if use_gpu else paddle.set_device('cpu')
input_mask=input_mask, tokenizer = self.get_tokenizer()
config=self.ernie_config,
use_fp16=False) examples = []
pooled_output = ernie.get_pooled_output() for text in data:
sequence_output = ernie.get_sequence_output() if len(text) == 1:
return pooled_output, sequence_output encoded_inputs = tokenizer.encode(text[0], text_pair=None, max_seq_len=max_seq_len)
elif len(text) == 2:
encoded_inputs = tokenizer.encode(text[0], text_pair=text[1], max_seq_len=max_seq_len)
if __name__ == '__main__': else:
test_module = ErnieV2EngLarge() raise RuntimeError(
'The input text must have one or two sequence, but got %d. Please check your inputs.' % len(text))
examples.append((encoded_inputs['input_ids'], encoded_inputs['segment_ids']))
def _batchify_fn(batch):
input_ids = [entry[0] for entry in batch]
segment_ids = [entry[1] for entry in batch]
return input_ids, segment_ids
# Seperates data into some batches.
batches = []
one_batch = []
for example in examples:
one_batch.append(example)
if len(one_batch) == batch_size:
batches.append(one_batch)
one_batch = []
if one_batch:
# The last batch whose size is less than the config batch_size setting.
batches.append(one_batch)
results = []
self.eval()
for batch in batches:
input_ids, segment_ids = _batchify_fn(batch)
input_ids = paddle.to_tensor(input_ids)
segment_ids = paddle.to_tensor(segment_ids)
# TODO(zhangxuefei): add task token_classification postprocess after prediction.
if self.task == 'sequence_classification':
probs = self(input_ids, segment_ids)
idx = paddle.argmax(probs, axis=1).numpy()
idx = idx.tolist()
labels = [self.label_map[i] for i in idx]
results.extend(labels)
return results
@serving
def get_embedding(self, texts, use_gpu=False):
if self.task is not None:
raise RuntimeError("The get_embedding method is only valid when task is None, but got task %s" % self.task)
paddle.set_device('gpu') if use_gpu else paddle.set_device('cpu')
tokenizer = self.get_tokenizer()
results = []
for text in texts:
if len(text) == 1:
encoded_inputs = tokenizer.encode(text[0], text_pair=None, pad_to_max_seq_len=False)
elif len(text) == 2:
encoded_inputs = tokenizer.encode(text[0], text_pair=text[1], pad_to_max_seq_len=False)
else:
raise RuntimeError(
'The input text must have one or two sequence, but got %d. Please check your inputs.' % len(text))
input_ids = paddle.to_tensor(encoded_inputs['input_ids']).unsqueeze(0)
segment_ids = paddle.to_tensor(encoded_inputs['segment_ids']).unsqueeze(0)
sequence_output, pooled_output = self(input_ids, segment_ids)
sequence_output = sequence_output.squeeze(0)
pooled_output = pooled_output.squeeze(0)
results.append((sequence_output.numpy().tolist(), pooled_output.numpy().tolist()))
return results
```shell
$ hub install roberta-wwm-ext-large==2.0.0
```
<p align="center">
<img src="https://bj.bcebos.com/paddlehub/paddlehub-img/bert_network.png" hspace='10'/> <br />
</p>
更多详情请参考[RoBERTa论文](https://arxiv.org/abs/1907.11692)[Chinese-BERT-wwm技术报告](https://arxiv.org/abs/1906.08101)
## API
```python
def __init__(
task=None,
load_checkpoint=None,
label_map=None)
```
创建Module对象(动态图组网版本)。
**参数**
* `task`: 任务名称,可为`sequence_classification`
* `load_checkpoint`:使用PaddleHub Fine-tune api训练保存的模型参数文件路径。
* `label_map`:预测时的类别映射表。
```python
def predict(
data,
max_seq_len=128,
batch_size=1,
use_gpu=False)
```
**参数**
* `data`: 待预测数据,格式为\[\[sample\_a\_text\_a, sample\_a\_text\_b\], \[sample\_b\_text\_a, sample\_b\_text\_b\],…,\],其中每个元素都是一个样例,
每个样例可以包含text\_a与text\_b。每个样例文本数量(1个或者2个)需和训练时保持一致。
* `max_seq_len`:模型处理文本的最大长度
* `batch_size`:模型批处理大小
* `use_gpu`:是否使用gpu,默认为False。对于GPU用户,建议开启use_gpu。
**返回**
```python
def get_embedding(
texts,
use_gpu=False
)
```
用于获取输入文本的句子粒度特征与字粒度特征
**参数**
* `texts`:输入文本列表,格式为\[\[sample\_a\_text\_a, sample\_a\_text\_b\], \[sample\_b\_text\_a, sample\_b\_text\_b\],…,\],其中每个元素都是一个样例,每个样例可以包含text\_a与text\_b。
* `use_gpu`:是否使用gpu,默认为False。对于GPU用户,建议开启use_gpu。
**返回**
* `results`:list类型,格式为\[\[sample\_a\_pooled\_feature, sample\_a\_seq\_feature\], \[sample\_b\_pooled\_feature, sample\_b\_seq\_feature\],…,\],其中每个元素都是对应样例的特征输出,每个样例都有句子粒度特征pooled\_feature与字粒度特征seq\_feature。
**代码示例**
```python
import paddlehub as hub
data = [
'这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般',
'怀着十分激动的心情放映,可是看着看着发现,在放映完毕后,出现一集米老鼠的动画片',
'作为老的四星酒店,房间依然很整洁,相当不错。机场接机服务很好,可以在车上办理入住手续,节省时间。',
]
label_map = {0: 'negative', 1: 'positive'}
model = hub.Module(
name='roberta-wwm-ext-large',
version='2.0.0',
task='sequence_classification',
load_checkpoint='/path/to/parameters',
label_map=label_map)
results = model.predict(data, max_seq_len=50, batch_size=1, use_gpu=False)
for idx, text in enumerate(data):
print('Data: {} \t Lable: {}'.format(text, results[idx]))
```
参考PaddleHub 文本分类示例。https://github.com/PaddlePaddle/PaddleHub/tree/release/v2.0.0-beta/demo/text_classifcation
## 服务部署
PaddleHub Serving可以部署一个在线获取预训练词向量。
### Step1: 启动PaddleHub Serving
运行启动命令:
```shell
$ hub serving start -m roberta-wwm-ext-large
```
这样就完成了一个获取预训练词向量服务化API的部署,默认端口号为8866。
**NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
### Step2: 发送预测请求
配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
```python
import requests
import json
# 指定用于预测的文本并生成字典{"text": [text_1, text_2, ... ]}
text = [["今天是个好日子", "天气预报说今天要下雨"], ["这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般"]]
# 以key的方式指定text传入预测方法的时的参数,此例中为"texts"
# 对应本地部署,则为module.get_embedding(texts=text)
data = {"texts": text}
# 发送post请求,content-type类型应指定json方式
url = "http://10.12.121.132:8866/predict/roberta-wwm-ext-large"
# 指定post请求的headers为application/json方式
headers = {"Content-Type": "application/json"}
r = requests.post(url=url, headers=headers, data=json.dumps(data))
print(r.json())
```
## 查看代码
https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/pretrain_langauge_models/BERT
## 依赖
paddlepaddle >= 2.0.0
paddlehub >= 2.0.0
## 更新历史
* 1.0.0
初始发布
* 2.0.0
全面升级动态图,接口有所变化。
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import Dict, List, Optional, Union, Tuple
import os
from paddle.dataset.common import DATA_HOME
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
from paddlehub import BertTokenizer
from paddlehub.module.modeling_roberta import RobertaModel, RobertaForSequenceClassification
from paddlehub.module.module import moduleinfo, serving
from paddlehub.utils.log import logger
from paddlehub.utils.utils import download
@moduleinfo(
name="roberta-wwm-ext-large",
version="2.0.0",
summary=
"chinese-roberta-wwm-ext-large, 24-layer, 1024-hidden, 16-heads, 340M parameters. The module is executed as paddle.dygraph.",
author="ymcui",
author_email="ymcui@ir.hit.edu.cn",
type="nlp/semantic_model",
)
class Roberta(nn.Layer):
"""
RoBERTa model
"""
def __init__(
self,
task=None,
load_checkpoint=None,
label_map=None,
):
super(Roberta, self).__init__()
# TODO(zhangxuefei): add token_classification task
if task == 'sequence_classification':
self.model = RobertaForSequenceClassification.from_pretrained(
pretrained_model_name_or_path='roberta-wwm-ext-large')
self.criterion = paddle.nn.loss.CrossEntropyLoss()
self.metric = paddle.metric.Accuracy(name='acc_accumulation')
elif task is None:
self.model = RobertaModel.from_pretrained(pretrained_model_name_or_path='roberta-wwm-ext-large')
else:
raise RuntimeError("Unknown task %s, task should be sequence_classification" % task)
self.task = task
self.label_map = label_map
if load_checkpoint is not None and os.path.isfile(load_checkpoint):
state_dict = paddle.load(load_checkpoint)
self.set_state_dict(state_dict)
logger.info('Loaded parameters from %s' % os.path.abspath(load_checkpoint))
def forward(self, input_ids, token_type_ids=None, position_ids=None, attention_mask=None, labels=None):
result = self.model(input_ids, token_type_ids, position_ids, attention_mask)
if self.task is not None:
logits = result
probs = F.softmax(logits, axis=1)
if labels is not None:
loss = self.criterion(logits, labels)
correct = self.metric.compute(probs, labels)
acc = self.metric.update(correct)
return probs, loss, acc
return probs
else:
sequence_output, pooled_output = result
return sequence_output, pooled_output
def get_vocab_path(self):
"""
Gets the path of the module vocabulary path.
"""
save_path = os.path.join(DATA_HOME, 'roberta-wwm-ext-large', 'vocab.txt')
if not os.path.exists(save_path) or not os.path.isfile(save_path):
url = "https://paddlenlp.bj.bcebos.com/models/transformers/roberta_large/vocab.txt"
download(url, os.path.join(DATA_HOME, 'roberta-wwm-ext-large'))
return save_path
def get_tokenizer(self, tokenize_chinese_chars=True):
"""
Gets the tokenizer that is customized for this module.
Args:
tokenize_chinese_chars (:obj: bool , defaults to :obj: True):
Whether to tokenize chinese characters or not.
Returns:
tokenizer (:obj:BertTokenizer) : The tokenizer which was customized for this module.
"""
return BertTokenizer(tokenize_chinese_chars=tokenize_chinese_chars, vocab_file=self.get_vocab_path())
def training_step(self, batch: List[paddle.Tensor], batch_idx: int):
"""
One step for training, which should be called as forward computation.
Args:
batch(:obj:List[paddle.Tensor]): The one batch data, which contains the model needed,
such as input_ids, sent_ids, pos_ids, input_mask and labels.
batch_idx(int): The index of batch.
Returns:
results(:obj: Dict) : The model outputs, such as loss and metrics.
"""
predictions, avg_loss, acc = self(input_ids=batch[0], token_type_ids=batch[1], labels=batch[2])
return {'loss': avg_loss, 'metrics': {'acc': acc}}
def validation_step(self, batch: List[paddle.Tensor], batch_idx: int):
"""
One step for validation, which should be called as forward computation.
Args:
batch(:obj:List[paddle.Tensor]): The one batch data, which contains the model needed,
such as input_ids, sent_ids, pos_ids, input_mask and labels.
batch_idx(int): The index of batch.
Returns:
results(:obj: Dict) : The model outputs, such as metrics.
"""
predictions, avg_loss, acc = self(input_ids=batch[0], token_type_ids=batch[1], labels=batch[2])
return {'metrics': {'acc': acc}}
def predict(self, data, max_seq_len=128, batch_size=1, use_gpu=False):
"""
Predicts the data labels.
Args:
data (obj:`List(str)`): The processed data whose each element is the raw text.
max_seq_len (:obj:`int`, `optional`, defaults to :int:`None`):
If set to a number, will limit the total sequence returned so that it has a maximum length.
batch_size(obj:`int`, defaults to 1): The number of batch.
use_gpu(obj:`bool`, defaults to `False`): Whether to use gpu to run or not.
Returns:
results(obj:`list`): All the predictions labels.
"""
# TODO(zhangxuefei): add task token_classification task predict.
if self.task not in ['sequence_classification']:
raise RuntimeError("The predict method is for sequence_classification task, but got task %s." % self.task)
paddle.set_device('gpu') if use_gpu else paddle.set_device('cpu')
tokenizer = self.get_tokenizer()
examples = []
for text in data:
if len(text) == 1:
encoded_inputs = tokenizer.encode(text[0], text_pair=None, max_seq_len=max_seq_len)
elif len(text) == 2:
encoded_inputs = tokenizer.encode(text[0], text_pair=text[1], max_seq_len=max_seq_len)
else:
raise RuntimeError(
'The input text must have one or two sequence, but got %d. Please check your inputs.' % len(text))
examples.append((encoded_inputs['input_ids'], encoded_inputs['segment_ids']))
def _batchify_fn(batch):
input_ids = [entry[0] for entry in batch]
segment_ids = [entry[1] for entry in batch]
return input_ids, segment_ids
# Seperates data into some batches.
batches = []
one_batch = []
for example in examples:
one_batch.append(example)
if len(one_batch) == batch_size:
batches.append(one_batch)
one_batch = []
if one_batch:
# The last batch whose size is less than the config batch_size setting.
batches.append(one_batch)
results = []
self.eval()
for batch in batches:
input_ids, segment_ids = _batchify_fn(batch)
input_ids = paddle.to_tensor(input_ids)
segment_ids = paddle.to_tensor(segment_ids)
# TODO(zhangxuefei): add task token_classification postprocess after prediction.
if self.task == 'sequence_classification':
probs = self(input_ids, segment_ids)
idx = paddle.argmax(probs, axis=1).numpy()
idx = idx.tolist()
labels = [self.label_map[i] for i in idx]
results.extend(labels)
return results
@serving
def get_embedding(self, texts, use_gpu=False):
if self.task is not None:
raise RuntimeError("The get_embedding method is only valid when task is None, but got task %s" % self.task)
paddle.set_device('gpu') if use_gpu else paddle.set_device('cpu')
tokenizer = self.get_tokenizer()
results = []
for text in texts:
if len(text) == 1:
encoded_inputs = tokenizer.encode(text[0], text_pair=None, pad_to_max_seq_len=False)
elif len(text) == 2:
encoded_inputs = tokenizer.encode(text[0], text_pair=text[1], pad_to_max_seq_len=False)
else:
raise RuntimeError(
'The input text must have one or two sequence, but got %d. Please check your inputs.' % len(text))
input_ids = paddle.to_tensor(encoded_inputs['input_ids']).unsqueeze(0)
segment_ids = paddle.to_tensor(encoded_inputs['segment_ids']).unsqueeze(0)
sequence_output, pooled_output = self(input_ids, segment_ids)
sequence_output = sequence_output.squeeze(0)
pooled_output = pooled_output.squeeze(0)
results.append((sequence_output.numpy().tolist(), pooled_output.numpy().tolist()))
return results
```shell
$ hub install roberta-wwm-ext==2.0.0
```
<p align="center">
<img src="https://bj.bcebos.com/paddlehub/paddlehub-img/bert_network.png" hspace='10'/> <br />
</p>
更多详情请参考[RoBERTa论文](https://arxiv.org/abs/1907.11692)[Chinese-BERT-wwm技术报告](https://arxiv.org/abs/1906.08101)
## API
```python
def __init__(
task=None,
load_checkpoint=None,
label_map=None)
```
创建Module对象(动态图组网版本)。
**参数**
* `task`: 任务名称,可为`sequence_classification`
* `load_checkpoint`:使用PaddleHub Fine-tune api训练保存的模型参数文件路径。
* `label_map`:预测时的类别映射表。
```python
def predict(
data,
max_seq_len=128,
batch_size=1,
use_gpu=False)
```
**参数**
* `data`: 待预测数据,格式为\[\[sample\_a\_text\_a, sample\_a\_text\_b\], \[sample\_b\_text\_a, sample\_b\_text\_b\],…,\],其中每个元素都是一个样例,
每个样例可以包含text\_a与text\_b。每个样例文本数量(1个或者2个)需和训练时保持一致。
* `max_seq_len`:模型处理文本的最大长度
* `batch_size`:模型批处理大小
* `use_gpu`:是否使用gpu,默认为False。对于GPU用户,建议开启use_gpu。
**返回**
```python
def get_embedding(
texts,
use_gpu=False
)
```
用于获取输入文本的句子粒度特征与字粒度特征
**参数**
* `texts`:输入文本列表,格式为\[\[sample\_a\_text\_a, sample\_a\_text\_b\], \[sample\_b\_text\_a, sample\_b\_text\_b\],…,\],其中每个元素都是一个样例,每个样例可以包含text\_a与text\_b。
* `use_gpu`:是否使用gpu,默认为False。对于GPU用户,建议开启use_gpu。
**返回**
* `results`:list类型,格式为\[\[sample\_a\_pooled\_feature, sample\_a\_seq\_feature\], \[sample\_b\_pooled\_feature, sample\_b\_seq\_feature\],…,\],其中每个元素都是对应样例的特征输出,每个样例都有句子粒度特征pooled\_feature与字粒度特征seq\_feature。
**代码示例**
```python
import paddlehub as hub
data = [
'这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般',
'怀着十分激动的心情放映,可是看着看着发现,在放映完毕后,出现一集米老鼠的动画片',
'作为老的四星酒店,房间依然很整洁,相当不错。机场接机服务很好,可以在车上办理入住手续,节省时间。',
]
label_map = {0: 'negative', 1: 'positive'}
model = hub.Module(
name='roberta-wwm-ext',
version='2.0.0',
task='sequence_classification',
load_checkpoint='/path/to/parameters',
label_map=label_map)
results = model.predict(data, max_seq_len=50, batch_size=1, use_gpu=False)
for idx, text in enumerate(data):
print('Data: {} \t Lable: {}'.format(text, results[idx]))
```
参考PaddleHub 文本分类示例。https://github.com/PaddlePaddle/PaddleHub/tree/release/v2.0.0-beta/demo/text_classifcation
## 服务部署
PaddleHub Serving可以部署一个在线获取预训练词向量。
### Step1: 启动PaddleHub Serving
运行启动命令:
```shell
$ hub serving start -m roberta-wwm-ext
```
这样就完成了一个获取预训练词向量服务化API的部署,默认端口号为8866。
**NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
### Step2: 发送预测请求
配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
```python
import requests
import json
# 指定用于预测的文本并生成字典{"text": [text_1, text_2, ... ]}
text = [["今天是个好日子", "天气预报说今天要下雨"], ["这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般"]]
# 以key的方式指定text传入预测方法的时的参数,此例中为"texts"
# 对应本地部署,则为module.get_embedding(texts=text)
data = {"texts": text}
# 发送post请求,content-type类型应指定json方式
url = "http://10.12.121.132:8866/predict/roberta-wwm-ext"
# 指定post请求的headers为application/json方式
headers = {"Content-Type": "application/json"}
r = requests.post(url=url, headers=headers, data=json.dumps(data))
print(r.json())
```
## 查看代码
https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/pretrain_langauge_models/BERT
## 依赖
paddlepaddle >= 2.0.0
paddlehub >= 2.0.0
## 更新历史
* 1.0.0
初始发布
* 2.0.0
全面升级动态图,接口有所变化。
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import Dict, List, Optional, Union, Tuple
import os
from paddle.dataset.common import DATA_HOME
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
from paddlehub import BertTokenizer
from paddlehub.module.modeling_roberta import RobertaModel, RobertaForSequenceClassification
from paddlehub.module.module import moduleinfo, serving
from paddlehub.utils.log import logger
from paddlehub.utils.utils import download
@moduleinfo(
name="roberta-wwm-ext",
version="2.0.0",
summary=
"chinese-roberta-wwm-ext, 12-layer, 768-hidden, 12-heads, 110M parameters. The module is executed as paddle.dygraph.",
author="ymcui",
author_email="ymcui@ir.hit.edu.cn",
type="nlp/semantic_model",
)
class Roberta(nn.Layer):
"""
RoBERTa model
"""
def __init__(
self,
task=None,
load_checkpoint=None,
label_map=None,
):
super(Roberta, self).__init__()
# TODO(zhangxuefei): add token_classification task
if task == 'sequence_classification':
self.model = RobertaForSequenceClassification.from_pretrained(
pretrained_model_name_or_path='roberta-wwm-ext')
self.criterion = paddle.nn.loss.CrossEntropyLoss()
self.metric = paddle.metric.Accuracy(name='acc_accumulation')
elif task is None:
self.model = RobertaModel.from_pretrained(pretrained_model_name_or_path='roberta-wwm-ext')
else:
raise RuntimeError("Unknown task %s, task should be sequence_classification" % task)
self.task = task
self.label_map = label_map
if load_checkpoint is not None and os.path.isfile(load_checkpoint):
state_dict = paddle.load(load_checkpoint)
self.set_state_dict(state_dict)
logger.info('Loaded parameters from %s' % os.path.abspath(load_checkpoint))
def forward(self, input_ids, token_type_ids=None, position_ids=None, attention_mask=None, labels=None):
result = self.model(input_ids, token_type_ids, position_ids, attention_mask)
if self.task is not None:
logits = result
probs = F.softmax(logits, axis=1)
if labels is not None:
loss = self.criterion(logits, labels)
correct = self.metric.compute(probs, labels)
acc = self.metric.update(correct)
return probs, loss, acc
return probs
else:
sequence_output, pooled_output = result
return sequence_output, pooled_output
def get_vocab_path(self):
"""
Gets the path of the module vocabulary path.
"""
save_path = os.path.join(DATA_HOME, 'roberta-wwm-ext', 'vocab.txt')
if not os.path.exists(save_path) or not os.path.isfile(save_path):
url = "https://paddlenlp.bj.bcebos.com/models/transformers/roberta_base/vocab.txt"
download(url, os.path.join(DATA_HOME, 'roberta-wwm-ext'))
return save_path
def get_tokenizer(self, tokenize_chinese_chars=True):
"""
Gets the tokenizer that is customized for this module.
Args:
tokenize_chinese_chars (:obj: bool , defaults to :obj: True):
Whether to tokenize chinese characters or not.
Returns:
tokenizer (:obj:BertTokenizer) : The tokenizer which was customized for this module.
"""
return BertTokenizer(tokenize_chinese_chars=tokenize_chinese_chars, vocab_file=self.get_vocab_path())
def training_step(self, batch: List[paddle.Tensor], batch_idx: int):
"""
One step for training, which should be called as forward computation.
Args:
batch(:obj:List[paddle.Tensor]): The one batch data, which contains the model needed,
such as input_ids, sent_ids, pos_ids, input_mask and labels.
batch_idx(int): The index of batch.
Returns:
results(:obj: Dict) : The model outputs, such as loss and metrics.
"""
predictions, avg_loss, acc = self(input_ids=batch[0], token_type_ids=batch[1], labels=batch[2])
return {'loss': avg_loss, 'metrics': {'acc': acc}}
def validation_step(self, batch: List[paddle.Tensor], batch_idx: int):
"""
One step for validation, which should be called as forward computation.
Args:
batch(:obj:List[paddle.Tensor]): The one batch data, which contains the model needed,
such as input_ids, sent_ids, pos_ids, input_mask and labels.
batch_idx(int): The index of batch.
Returns:
results(:obj: Dict) : The model outputs, such as metrics.
"""
predictions, avg_loss, acc = self(input_ids=batch[0], token_type_ids=batch[1], labels=batch[2])
return {'metrics': {'acc': acc}}
def predict(self, data, max_seq_len=128, batch_size=1, use_gpu=False):
"""
Predicts the data labels.
Args:
data (obj:`List(Union(str))`): The processed data (the one sequence or sequence pair) whose each element is the raw text.
max_seq_len (:obj:`int`, `optional`, defaults to :int:`None`):
If set to a number, will limit the total sequence returned so that it has a maximum length.
batch_size(obj:`int`, defaults to 1): The number of batch.
use_gpu(obj:`bool`, defaults to `False`): Whether to use gpu to run or not.
Returns:
results(obj:`list`): All the predictions labels.
"""
# TODO(zhangxuefei): add task token_classification task predict.
if self.task not in ['sequence_classification']:
raise RuntimeError("The predict method is for sequence_classification task, but got task %s." % self.task)
paddle.set_device('gpu') if use_gpu else paddle.set_device('cpu')
tokenizer = self.get_tokenizer()
examples = []
for text in data:
if len(text) == 1:
encoded_inputs = tokenizer.encode(text[0], text_pair=None, max_seq_len=max_seq_len)
elif len(text) == 2:
encoded_inputs = tokenizer.encode(text[0], text_pair=text[1], max_seq_len=max_seq_len)
else:
raise RuntimeError(
'The input text must have one or two sequence, but got %d. Please check your inputs.' % len(text))
examples.append((encoded_inputs['input_ids'], encoded_inputs['segment_ids']))
def _batchify_fn(batch):
input_ids = [entry[0] for entry in batch]
segment_ids = [entry[1] for entry in batch]
return input_ids, segment_ids
# Seperates data into some batches.
batches = []
one_batch = []
for example in examples:
one_batch.append(example)
if len(one_batch) == batch_size:
batches.append(one_batch)
one_batch = []
if one_batch:
# The last batch whose size is less than the config batch_size setting.
batches.append(one_batch)
results = []
self.eval()
for batch in batches:
input_ids, segment_ids = _batchify_fn(batch)
input_ids = paddle.to_tensor(input_ids)
segment_ids = paddle.to_tensor(segment_ids)
# TODO(zhangxuefei): add task token_classification postprocess after prediction.
if self.task == 'sequence_classification':
probs = self(input_ids, segment_ids)
idx = paddle.argmax(probs, axis=1).numpy()
idx = idx.tolist()
labels = [self.label_map[i] for i in idx]
results.extend(labels)
return results
@serving
def get_embedding(self, texts, use_gpu=False):
if self.task is not None:
raise RuntimeError("The get_embedding method is only valid when task is None, but got task %s" % self.task)
paddle.set_device('gpu') if use_gpu else paddle.set_device('cpu')
tokenizer = self.get_tokenizer()
results = []
for text in texts:
if len(text) == 1:
encoded_inputs = tokenizer.encode(text[0], text_pair=None, pad_to_max_seq_len=False)
elif len(text) == 2:
encoded_inputs = tokenizer.encode(text[0], text_pair=text[1], pad_to_max_seq_len=False)
else:
raise RuntimeError(
'The input text must have one or two sequence, but got %d. Please check your inputs.' % len(text))
input_ids = paddle.to_tensor(encoded_inputs['input_ids']).unsqueeze(0)
segment_ids = paddle.to_tensor(encoded_inputs['segment_ids']).unsqueeze(0)
sequence_output, pooled_output = self(input_ids, segment_ids)
sequence_output = sequence_output.squeeze(0)
pooled_output = pooled_output.squeeze(0)
results.append((sequence_output.numpy().tolist(), pooled_output.numpy().tolist()))
return results
...@@ -21,13 +21,16 @@ __version__ = '2.0.0-beta0' ...@@ -21,13 +21,16 @@ __version__ = '2.0.0-beta0'
from paddlehub import env from paddlehub import env
from paddlehub.config import config from paddlehub.config import config
from paddlehub import datasets
from paddlehub.finetune import Trainer
from paddlehub.utils import log, parser, utils from paddlehub.utils import log, parser, utils
from paddlehub.utils import download as _download from paddlehub.utils import download as _download
from paddlehub.utils.paddlex import download, ResourceNotFoundError from paddlehub.utils.paddlex import download, ResourceNotFoundError
from paddlehub.server import server_check from paddlehub.server import server_check
from paddlehub.server.server_source import ServerConnectionError from paddlehub.server.server_source import ServerConnectionError
from paddlehub.module import Module from paddlehub.module import Module
from paddlehub.text.bert_tokenizer import BertTokenizer from paddlehub.text.bert_tokenizer import BertTokenizer, ErnieTinyTokenizer
from paddlehub.text.tokenizer import CustomTokenizer
# In order to maintain the compatibility of the old version, we put the relevant # In order to maintain the compatibility of the old version, we put the relevant
# compatible code in the paddlehub.compat package, and mapped some modules referenced # compatible code in the paddlehub.compat package, and mapped some modules referenced
......
# coding:utf-8
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. # Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License" # Licensed under the Apache License, Version 2.0 (the "License"
...@@ -16,3 +15,4 @@ ...@@ -16,3 +15,4 @@
from paddlehub.datasets.canvas import Canvas from paddlehub.datasets.canvas import Canvas
from paddlehub.datasets.flowers import Flowers from paddlehub.datasets.flowers import Flowers
from paddlehub.datasets.minicoco import MiniCOCO from paddlehub.datasets.minicoco import MiniCOCO
from paddlehub.datasets.chnsenticorp import ChnSentiCorp
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import Dict, List, Optional, Union, Tuple
import csv
import io
import os
import numpy as np
import paddle.fluid as fluid
from paddlehub.env import DATA_HOME
from paddlehub.text.bert_tokenizer import BertTokenizer
from paddlehub.text.tokenizer import CustomTokenizer
from paddlehub.utils.log import logger
from paddlehub.utils.utils import download
from paddlehub.utils.xarfile import is_xarfile, unarchive
class InputExample(object):
"""
The input data structure of Transformer modules (BERT, ERNIE and so on).
"""
def __init__(self, guid: int, text_a: str, text_b: Optional[str] = None, label: Optional[str] = None):
"""
The input data structure.
Args:
guid (:obj:`int`):
Unique id for the input data.
text_a (:obj:`str`, `optional`, defaults to :obj:`None`):
The first sequence. For single sequence tasks, only this sequence must be specified.
text_b (:obj:`str`, `optional`, defaults to :obj:`None`):
The second sequence if sentence-pair.
label (:obj:`str`, `optional`, defaults to :obj:`None`):
The label of the example.
Examples:
.. code-block:: python
from paddlehub.datasets.base_nlp_dataset import InputExample
example = InputExample(guid=0,
text_a='15.4寸笔记本的键盘确实爽,基本跟台式机差不多了',
text_b='蛮喜欢数字小键盘,输数字特方便,样子也很美观,做工也相当不错',
label='1')
"""
self.guid = guid
self.text_a = text_a
self.text_b = text_b
self.label = label
def __str__(self):
if self.text_b is None:
return "text={}\tlabel={}".format(self.text_a, self.label)
else:
return "text_a={}\ttext_b={},label={}".format(self.text_a, self.text_b, self.label)
class BaseNLPDataset(object):
"""
The virtual base class for nlp datasets, such TextClassificationDataset, SeqLabelingDataset, and so on.
The base class must be supered and re-implemented the method _read_file.
"""
def __init__(self,
base_path: str,
tokenizer: Union[BertTokenizer, CustomTokenizer],
max_seq_len: Optional[int] = 128,
mode: Optional[str] = "train",
data_file: Optional[str] = None,
label_file: Optional[str] = None,
label_list: Optional[List[str]] = None):
"""
Ags:
base_path (:obj:`str`): The directory to the whole dataset.
tokenizer (:obj:`BertTokenizer` or :obj:`CustomTokenizer`):
It tokenizes the text and encodes the data as model needed.
max_seq_len (:obj:`int`, `optional`, defaults to :128):
If set to a number, will limit the total sequence returned so that it has a maximum length.
mode (:obj:`str`, `optional`, defaults to `train`):
It identifies the dataset mode (train, test or dev).
data_file(:obj:`str`, `optional`, defaults to :obj:`None`):
The data file name, which is relative to the base_path.
label_file(:obj:`str`, `optional`, defaults to :obj:`None`):
The label file name, which is relative to the base_path.
It is all labels of the dataset, one line one label.
label_list(:obj:`List[str]`, `optional`, defaults to :obj:`None`):
The list of all labels of the dataset
"""
self.data_file = os.path.join(base_path, data_file)
self.label_list = label_list
self.mode = mode
self.tokenizer = tokenizer
self.max_seq_len = max_seq_len
if label_file:
self.label_file = os.path.join(base_path, label_file)
if not self.label_list:
self.label_list = self._load_label_data()
else:
logger.warning("As label_list has been assigned, label_file is noneffective")
if self.label_list:
self.label_map = {item: index for index, item in enumerate(self.label_list)}
def _load_label_data(self):
"""
Loads labels from label file.
"""
if os.path.exists(self.label_file):
with open(self.label_file, "r", encoding="utf8") as f:
return f.read().strip().split("\n")
else:
raise RuntimeError("The file {} is not found.".format(self.label_file))
def _download_and_uncompress_dataset(self, destination: str, url: str):
"""
Downloads dataset and uncompresses it.
Args:
destination (:obj:`str`): The dataset cached directory.
url (:obj: str): The link to be downloaded a dataset.
"""
if not os.path.exists(destination):
dataset_package = download(url=url, path=DATA_HOME)
if is_xarfile(dataset_package):
unarchive(dataset_package, DATA_HOME)
else:
logger.info("Dataset {} already cached.".format(destination))
def _read_file(self, input_file: str, is_file_with_header: bool = False):
"""
Reads the files.
Args:
input_file (:obj:str) : The file to be read.
is_file_with_header(:obj:bool, `optional`, default to :obj: False) :
Whether or not the file is with the header introduction.
"""
raise NotImplementedError
def get_labels(self):
"""
Gets all labels.
"""
return self.label_list
class TextClassificationDataset(BaseNLPDataset, fluid.io.Dataset):
"""
The dataset class which is fit for all datatset of text classification.
"""
def __init__(self,
base_path: str,
tokenizer: Union[BertTokenizer, CustomTokenizer],
max_seq_len: int = 128,
mode: str = "train",
data_file: str = None,
label_file: str = None,
label_list: list = None,
is_file_with_header: bool = False):
"""
Ags:
base_path (:obj:`str`): The directory to the whole dataset.
tokenizer (:obj:`BertTokenizer` or :obj:`CustomTokenizer`):
It tokenizes the text and encodes the data as model needed.
max_seq_len (:obj:`int`, `optional`, defaults to :128):
If set to a number, will limit the total sequence returned so that it has a maximum length.
mode (:obj:`str`, `optional`, defaults to `train`):
It identifies the dataset mode (train, test or dev).
data_file(:obj:`str`, `optional`, defaults to :obj:`None`):
The data file name, which is relative to the base_path.
label_file(:obj:`str`, `optional`, defaults to :obj:`None`):
The label file name, which is relative to the base_path.
It is all labels of the dataset, one line one label.
label_list(:obj:`List[str]`, `optional`, defaults to :obj:`None`):
The list of all labels of the dataset
is_file_with_header(:obj:bool, `optional`, default to :obj: False) :
Whether or not the file is with the header introduction.
"""
super(TextClassificationDataset, self).__init__(
base_path=base_path,
tokenizer=tokenizer,
max_seq_len=max_seq_len,
mode=mode,
data_file=data_file,
label_file=label_file,
label_list=label_list)
self.examples = self._read_file(self.data_file, is_file_with_header)
self.records = self._convert_examples_to_records(self.examples)
def _read_file(self, input_file, is_file_with_header: bool = False) -> List[InputExample]:
"""
Reads a tab separated value file.
Args:
input_file (:obj:str) : The file to be read.
is_file_with_header(:obj:bool, `optional`, default to :obj: False) :
Whether or not the file is with the header introduction.
Returns:
examples (:obj:`List[InputExample]`): All the input data.
"""
if not os.path.exists(input_file):
raise RuntimeError("The file {} is not found.".format(input_file))
else:
with io.open(input_file, "r", encoding="UTF-8") as f:
reader = csv.reader(f, delimiter="\t", quotechar=None)
examples = []
seq_id = 0
header = next(reader) if is_file_with_header else None
for line in reader:
example = InputExample(guid=seq_id, label=line[0], text_a=line[1])
seq_id += 1
examples.append(example)
return examples
def _convert_examples_to_records(self, examples: List[InputExample]) -> List[dict]:
"""
Converts all examples to records which the model needs.
Args:
examples(obj:`List[InputExample]`): All data examples returned by _read_file.
Returns:
records(:obj:`List[dict]`): All records which the model needs.
"""
records = []
for example in examples:
record = self.tokenizer.encode(text=example.text_a, text_pair=example.text_b, max_seq_len=self.max_seq_len)
# CustomTokenizer will tokenize the text firstly and then lookup words in the vocab
# When all words are not found in the vocab, the text will be dropped.
if not record:
logger.info(
"The text %s has been dropped as it has no words in the vocab after tokenization." % example.text_a)
continue
if example.label:
record['label'] = self.label_map[example.label]
records.append(record)
return records
def __getitem__(self, idx):
record = self.records[idx]
if 'label' in record.keys():
if isinstance(self.tokenizer, BertTokenizer):
return np.array(record['input_ids']), np.array(record['segment_ids']), np.array(record['label'])
elif isinstance(self.tokenizer, CustomTokenizer):
return np.array(record['text']), np.array(record['seq_len']), np.array(record['label'])
else:
if isinstance(self.tokenizer, BertTokenizer):
return np.array(record['input_ids']), np.array(record['segment_ids'])
elif isinstance(self.tokenizer, CustomTokenizer):
return np.array(record['text']), np.array(record['seq_len'])
def __len__(self):
return len(self.records)
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import Dict, List, Optional, Union, Tuple
import os
from paddlehub.env import DATA_HOME
from paddlehub.utils.download import download_data
from paddlehub.datasets.base_nlp_dataset import TextClassificationDataset
from paddlehub.text.bert_tokenizer import BertTokenizer
from paddlehub.text.tokenizer import CustomTokenizer
@download_data(url="https://bj.bcebos.com/paddlehub-dataset/chnsenticorp.tar.gz")
class ChnSentiCorp(TextClassificationDataset):
"""
ChnSentiCorp is a dataset for chinese sentiment classification,
which was published by Tan Songbo at ICT of Chinese Academy of Sciences.
"""
# TODO(zhangxuefei): simplify datatset load, such as
# train_ds, dev_ds, test_ds = hub.datasets.ChnSentiCorp(tokenizer=xxx, max_seq_len=128, select='train', 'test', 'valid')
def __init__(self, tokenizer: Union[BertTokenizer, CustomTokenizer], max_seq_len: int = 128, mode: str = 'train'):
"""
Args:
tokenizer (:obj:`BertTokenizer` or `CustomTokenizer`):
It tokenizes the text and encodes the data as model needed.
max_seq_len (:obj:`int`, `optional`, defaults to :128):
The maximum length (in number of tokens) for the inputs to the selected module,
such as ernie, bert and so on.
mode (:obj:`str`, `optional`, defaults to `train`):
It identifies the dataset mode (train, test or dev).
Examples:
.. code-block:: python
import paddlehub as hub
tokenizer = hub.BertTokenizer(vocab_file='./vocab.txt')
train_dataset = hub.datasets.ChnSentiCorp(tokenizer=tokenizer, max_seq_len=120, mode='train')
dev_dataset = hub.datasets.ChnSentiCorp(tokenizer=tokenizer, max_seq_len=120, mode='dev')
test_dataset = hub.datasets.ChnSentiCorp(tokenizer=tokenizer, max_seq_len=120, mode='test')
"""
base_path = os.path.join(DATA_HOME, "chnsenticorp")
if mode == 'train':
data_file = 'train.tsv'
elif mode == 'test':
data_file = 'test.tsv'
else:
data_file = 'dev.tsv'
super(ChnSentiCorp, self).__init__(
base_path=base_path,
tokenizer=tokenizer,
max_seq_len=max_seq_len,
mode=mode,
data_file=data_file,
label_list=["0", "1"],
is_file_with_header=True)
# coding:utf-8
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. # Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License" # Licensed under the Apache License, Version 2.0 (the "License"
......
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from paddlehub.finetune.trainer import Trainer
# coding:utf-8
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. # Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License" # Licensed under the Apache License, Version 2.0 (the "License"
...@@ -33,6 +32,7 @@ class Trainer(object): ...@@ -33,6 +32,7 @@ class Trainer(object):
Args: Args:
model(paddle.nn.Layer) : Model to train or evaluate. model(paddle.nn.Layer) : Model to train or evaluate.
optimizer(paddle.optimizer.Optimizer) : Optimizer for loss. optimizer(paddle.optimizer.Optimizer) : Optimizer for loss.
use_gpu(bool) : Whether to use gpu to run.
use_vdl(bool) : Whether to use visualdl to record training data. use_vdl(bool) : Whether to use visualdl to record training data.
checkpoint_dir(str) : Directory where the checkpoint is saved, and the trainer will restore the checkpoint_dir(str) : Directory where the checkpoint is saved, and the trainer will restore the
state and model parameters from the checkpoint. state and model parameters from the checkpoint.
...@@ -52,9 +52,12 @@ class Trainer(object): ...@@ -52,9 +52,12 @@ class Trainer(object):
def __init__(self, def __init__(self,
model: paddle.nn.Layer, model: paddle.nn.Layer,
optimizer: paddle.optimizer.Optimizer, optimizer: paddle.optimizer.Optimizer,
use_gpu: bool = False,
use_vdl: bool = True, use_vdl: bool = True,
checkpoint_dir: str = None, checkpoint_dir: str = None,
compare_metrics: Callable = None): compare_metrics: Callable = None,
**kwargs):
paddle.set_device('gpu') if use_gpu else paddle.set_device('cpu')
self.nranks = paddle.distributed.get_world_size() self.nranks = paddle.distributed.get_world_size()
self.local_rank = paddle.distributed.get_rank() self.local_rank = paddle.distributed.get_rank()
self.model = model self.model = model
...@@ -154,7 +157,8 @@ class Trainer(object): ...@@ -154,7 +157,8 @@ class Trainer(object):
num_workers: int = 0, num_workers: int = 0,
eval_dataset: paddle.io.Dataset = None, eval_dataset: paddle.io.Dataset = None,
log_interval: int = 10, log_interval: int = 10,
save_interval: int = 10): save_interval: int = 10,
collate_fn: Callable = None):
''' '''
Train a model with specific config. Train a model with specific config.
...@@ -167,6 +171,8 @@ class Trainer(object): ...@@ -167,6 +171,8 @@ class Trainer(object):
execute evaluate function every `save_interval` epochs. execute evaluate function every `save_interval` epochs.
log_interval(int) : Log the train infomation every `log_interval` steps. log_interval(int) : Log the train infomation every `log_interval` steps.
save_interval(int) : Save the checkpoint every `save_interval` epochs. save_interval(int) : Save the checkpoint every `save_interval` epochs.
collate_fn(callable): function to generate mini-batch data by merging the sample list.
None for only stack each fields of sample in axis 0(same as :attr::`np.stack(..., axis=0)`). Default None
''' '''
batch_sampler = paddle.io.DistributedBatchSampler( batch_sampler = paddle.io.DistributedBatchSampler(
train_dataset, batch_size=batch_size, shuffle=True, drop_last=False) train_dataset, batch_size=batch_size, shuffle=True, drop_last=False)
...@@ -175,7 +181,8 @@ class Trainer(object): ...@@ -175,7 +181,8 @@ class Trainer(object):
batch_sampler=batch_sampler, batch_sampler=batch_sampler,
num_workers=num_workers, num_workers=num_workers,
return_list=True, return_list=True,
use_buffer_reader=True) use_buffer_reader=True,
collate_fn=collate_fn)
steps_per_epoch = len(batch_sampler) steps_per_epoch = len(batch_sampler)
timer = Timer(steps_per_epoch * epochs) timer = Timer(steps_per_epoch * epochs)
...@@ -195,7 +202,9 @@ class Trainer(object): ...@@ -195,7 +202,9 @@ class Trainer(object):
# calculate metrics and loss # calculate metrics and loss
avg_loss += loss.numpy()[0] avg_loss += loss.numpy()[0]
for metric, value in metrics.items(): for metric, value in metrics.items():
avg_metrics[metric] += value.numpy()[0] if isinstance(value, paddle.Tensor):
value = value.numpy()
avg_metrics[metric] += value
timer.count() timer.count()
...@@ -225,7 +234,7 @@ class Trainer(object): ...@@ -225,7 +234,7 @@ class Trainer(object):
if self.current_epoch % save_interval == 0 and batch_idx + 1 == steps_per_epoch and self.local_rank == 0: if self.current_epoch % save_interval == 0 and batch_idx + 1 == steps_per_epoch and self.local_rank == 0:
if eval_dataset: if eval_dataset:
result = self.evaluate(eval_dataset, batch_size, num_workers) result = self.evaluate(eval_dataset, batch_size, num_workers, collate_fn=collate_fn)
eval_loss = result.get('loss', None) eval_loss = result.get('loss', None)
eval_metrics = result.get('metrics', {}) eval_metrics = result.get('metrics', {})
if self.use_vdl: if self.use_vdl:
...@@ -250,7 +259,11 @@ class Trainer(object): ...@@ -250,7 +259,11 @@ class Trainer(object):
self._save_checkpoint() self._save_checkpoint()
def evaluate(self, eval_dataset: paddle.io.Dataset, batch_size: int = 1, num_workers: int = 0): def evaluate(self,
eval_dataset: paddle.io.Dataset,
batch_size: int = 1,
num_workers: int = 0,
collate_fn: Callable = None):
''' '''
Run evaluation and returns metrics. Run evaluation and returns metrics.
...@@ -258,47 +271,53 @@ class Trainer(object): ...@@ -258,47 +271,53 @@ class Trainer(object):
eval_dataset(paddle.io.Dataset) : The validation dataset eval_dataset(paddle.io.Dataset) : The validation dataset
batch_size(int) : Batch size of per step, default is 1. batch_size(int) : Batch size of per step, default is 1.
num_workers(int) : Number of subprocess to load data, default is 0. num_workers(int) : Number of subprocess to load data, default is 0.
collate_fn(callable): function to generate mini-batch data by merging the sample list.
None for only stack each fields of sample in axis 0(same as :attr::`np.stack(..., axis=0)`). Default None
''' '''
batch_sampler = paddle.io.DistributedBatchSampler( if self.local_rank == 0:
eval_dataset, batch_size=batch_size, shuffle=False, drop_last=False) batch_sampler = paddle.io.BatchSampler(eval_dataset, batch_size=batch_size, shuffle=False, drop_last=False)
loader = paddle.io.DataLoader( loader = paddle.io.DataLoader(
eval_dataset, batch_sampler=batch_sampler, num_workers=num_workers, return_list=True) eval_dataset,
batch_sampler=batch_sampler,
self.model.eval() num_workers=num_workers,
avg_loss = num_samples = 0 return_list=True,
sum_metrics = defaultdict(int) collate_fn=collate_fn)
avg_metrics = defaultdict(int)
self.model.eval()
avg_loss = num_samples = 0
sum_metrics = defaultdict(int)
avg_metrics = defaultdict(int)
with logger.processing('Evaluation on validation dataset'): with logger.processing('Evaluation on validation dataset'):
for batch_idx, batch in enumerate(loader): for batch_idx, batch in enumerate(loader):
result = self.validation_step(batch, batch_idx) result = self.validation_step(batch, batch_idx)
loss = result.get('loss', None) loss = result.get('loss', None)
metrics = result.get('metrics', {}) metrics = result.get('metrics', {})
bs = batch[0].shape[0] bs = batch[0].shape[0]
num_samples += bs num_samples += bs
if loss: if loss:
avg_loss += loss.numpy()[0] * bs avg_loss += loss.numpy()[0] * bs
for metric, value in metrics.items(): for metric, value in metrics.items():
sum_metrics[metric] += value.numpy()[0] * bs sum_metrics[metric] += value * bs
# print avg metrics and loss # print avg metrics and loss
print_msg = '[Evaluation result]' print_msg = '[Evaluation result]'
if loss: if loss:
avg_loss /= num_samples avg_loss /= num_samples
print_msg += ' avg_loss={:.4f}'.format(avg_loss) print_msg += ' avg_loss={:.4f}'.format(avg_loss)
for metric, value in sum_metrics.items(): for metric, value in sum_metrics.items():
avg_metrics[metric] = value / num_samples avg_metrics[metric] = value / num_samples
print_msg += ' avg_{}={:.4f}'.format(metric, avg_metrics[metric]) print_msg += ' avg_{}={:.4f}'.format(metric, avg_metrics[metric])
logger.eval(print_msg) logger.eval(print_msg)
if loss: if loss:
return {'loss': avg_loss, 'metrics': avg_metrics} return {'loss': avg_loss, 'metrics': avg_metrics}
return {'metrics': avg_metrics} return {'metrics': avg_metrics}
def training_step(self, batch: List[paddle.Tensor], batch_idx: int): def training_step(self, batch: List[paddle.Tensor], batch_idx: int):
''' '''
...@@ -318,7 +337,7 @@ class Trainer(object): ...@@ -318,7 +337,7 @@ class Trainer(object):
raise RuntimeError('The return value of `trainning_step` in {} is not a dict'.format(self.model.__class__)) raise RuntimeError('The return value of `trainning_step` in {} is not a dict'.format(self.model.__class__))
loss = result.get('loss', None) loss = result.get('loss', None)
if not loss: if loss is None:
raise RuntimeError('Cannot find loss attribute in the return value of `trainning_step` of {}'.format( raise RuntimeError('Cannot find loss attribute in the return value of `trainning_step` of {}'.format(
self.model.__class__)) self.model.__class__))
......
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# FIXME(zhangxuefei): remove this file after paddlenlp is released.
import paddle
import paddle.nn as nn
from paddlehub.module.nlp_module import PretrainedModel, register_base_model
class BertEmbeddings(nn.Layer):
"""
Include embeddings from word, position and token_type embeddings
"""
def __init__(self,
vocab_size,
hidden_size=768,
hidden_dropout_prob=0.1,
max_position_embeddings=512,
type_vocab_size=16):
super(BertEmbeddings, self).__init__()
self.word_embeddings = nn.Embedding(vocab_size, hidden_size, padding_idx=0)
self.position_embeddings = nn.Embedding(max_position_embeddings, hidden_size)
self.token_type_embeddings = nn.Embedding(type_vocab_size, hidden_size)
self.layer_norm = nn.LayerNorm(hidden_size)
self.dropout = nn.Dropout(hidden_dropout_prob)
def forward(self, input_ids, token_type_ids=None, position_ids=None):
if position_ids is None:
# maybe need use shape op to unify static graph and dynamic graph
seq_length = input_ids.shape[1]
position_ids = paddle.arange(0, seq_length, dtype="int64")
if token_type_ids is None:
token_type_ids = paddle.zeros_like(input_ids, dtype="int64")
input_embedings = self.word_embeddings(input_ids)
position_embeddings = self.position_embeddings(position_ids)
token_type_embeddings = self.token_type_embeddings(token_type_ids)
embeddings = input_embedings + position_embeddings + token_type_embeddings
embeddings = self.layer_norm(embeddings)
embeddings = self.dropout(embeddings)
return embeddings
class BertPooler(nn.Layer):
"""
"""
def __init__(self, hidden_size):
super(BertPooler, self).__init__()
self.dense = nn.Linear(hidden_size, hidden_size)
self.activation = nn.Tanh()
def forward(self, hidden_states):
# We "pool" the model by simply taking the hidden state corresponding
# to the first token.
first_token_tensor = hidden_states[:, 0]
pooled_output = self.dense(first_token_tensor)
pooled_output = self.activation(pooled_output)
return pooled_output
class BertPretrainedModel(PretrainedModel):
"""
An abstract class for pretrained BERT models. It provides BERT related
`model_config_file`, `resource_files_names`, `pretrained_resource_files_map`,
`pretrained_init_configuration`, `base_model_prefix` for downloading and
loading pretrained models. See `PretrainedModel` for more details.
"""
model_config_file = "model_config.json"
pretrained_init_configuration = {
"bert-base-uncased": {
"vocab_size": 30522,
"hidden_size": 768,
"num_hidden_layers": 12,
"num_attention_heads": 12,
"intermediate_size": 3072,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"attention_probs_dropout_prob": 0.1,
"max_position_embeddings": 512,
"type_vocab_size": 2,
"initializer_range": 0.02,
"pad_token_id": 0,
},
"bert-large-uncased": {
"vocab_size": 30522,
"hidden_size": 1024,
"num_hidden_layers": 24,
"num_attention_heads": 16,
"intermediate_size": 4096,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"attention_probs_dropout_prob": 0.1,
"max_position_embeddings": 512,
"type_vocab_size": 2,
"initializer_range": 0.02,
"pad_token_id": 0,
},
"bert-base-multilingual-uncased": {
"vocab_size": 105879,
"hidden_size": 768,
"num_hidden_layers": 12,
"num_attention_heads": 12,
"intermediate_size": 3072,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"attention_probs_dropout_prob": 0.1,
"max_position_embeddings": 512,
"type_vocab_size": 2,
"initializer_range": 0.02,
"pad_token_id": 0,
},
"bert-base-cased": {
"vocab_size": 30522,
"hidden_size": 768,
"num_hidden_layers": 12,
"num_attention_heads": 12,
"intermediate_size": 3072,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"attention_probs_dropout_prob": 0.1,
"max_position_embeddings": 512,
"type_vocab_size": 2,
"initializer_range": 0.02,
"pad_token_id": 0,
},
"bert-base-chinese": {
"vocab_size": 21128,
"hidden_size": 768,
"num_hidden_layers": 12,
"num_attention_heads": 12,
"intermediate_size": 3072,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"attention_probs_dropout_prob": 0.1,
"max_position_embeddings": 512,
"type_vocab_size": 2,
"initializer_range": 0.02,
"pad_token_id": 0,
},
"bert-base-multilingual-cased": {
"vocab_size": 119547,
"hidden_size": 768,
"num_hidden_layers": 12,
"num_attention_heads": 12,
"intermediate_size": 3072,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"attention_probs_dropout_prob": 0.1,
"max_position_embeddings": 512,
"type_vocab_size": 2,
"initializer_range": 0.02,
"pad_token_id": 0,
},
"bert-large-cased": {
"vocab_size": 28996,
"hidden_size": 1024,
"num_hidden_layers": 24,
"num_attention_heads": 16,
"intermediate_size": 4096,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"attention_probs_dropout_prob": 0.1,
"max_position_embeddings": 512,
"type_vocab_size": 2,
"initializer_range": 0.02,
"pad_token_id": 0,
},
}
resource_files_names = {"model_state": "model_state.pdparams"}
pretrained_resource_files_map = {
"model_state": {
"bert-base-uncased": "https://paddlenlp.bj.bcebos.com/models/transformers/bert-base-uncased.pdparams",
"bert-large-uncased": "https://paddlenlp.bj.bcebos.com/models/transformers/bert-large-uncased.pdparams",
"bert-base-multilingual-uncased":
"http://paddlenlp.bj.bcebos.com/models/transformers/bert-base-multilingual-uncased.pdparams",
"bert-base-cased": "http://paddlenlp.bj.bcebos.com/models/transformers/bert/bert-base-cased.pdparams",
"bert-base-chinese": "http://paddlenlp.bj.bcebos.com/models/transformers/bert/bert-base-chinese.pdparams",
"bert-base-multilingual-cased":
"http://paddlenlp.bj.bcebos.com/models/transformers/bert/bert-base-multilingual-cased.pdparamss",
"bert-large-cased": "http://paddlenlp.bj.bcebos.com/models/transformers/bert/bert-large-cased.pdparams"
}
}
base_model_prefix = "bert"
def init_weights(self, layer):
""" Initialization hook """
if isinstance(layer, (nn.Linear, nn.Embedding)):
# only support dygraph, use truncated_normal and make it inplace
# and configurable later
layer.weight.set_value(
paddle.tensor.normal(
mean=0.0,
std=self.initializer_range
if hasattr(self, "initializer_range") else self.bert.config["initializer_range"],
shape=layer.weight.shape))
elif isinstance(layer, nn.LayerNorm):
layer._epsilon = 1e-12
@register_base_model
class BertModel(BertPretrainedModel):
"""
"""
def __init__(self,
vocab_size,
hidden_size=768,
num_hidden_layers=12,
num_attention_heads=12,
intermediate_size=3072,
hidden_act="gelu",
hidden_dropout_prob=0.1,
attention_probs_dropout_prob=0.1,
max_position_embeddings=512,
type_vocab_size=16,
initializer_range=0.02,
pad_token_id=0):
super(BertModel, self).__init__()
self.pad_token_id = pad_token_id
self.initializer_range = initializer_range
self.embeddings = BertEmbeddings(vocab_size, hidden_size, hidden_dropout_prob, max_position_embeddings,
type_vocab_size)
encoder_layer = nn.TransformerEncoderLayer(
hidden_size,
num_attention_heads,
intermediate_size,
dropout=hidden_dropout_prob,
activation=hidden_act,
attn_dropout=attention_probs_dropout_prob,
act_dropout=0)
self.encoder = nn.TransformerEncoder(encoder_layer, num_hidden_layers)
self.pooler = BertPooler(hidden_size)
self.apply(self.init_weights)
def forward(self, input_ids, token_type_ids=None, position_ids=None, attention_mask=None):
if attention_mask is None:
attention_mask = paddle.unsqueeze(
(input_ids == self.pad_token_id).astype(self.pooler.dense.weight.dtype) * -1e9, axis=[1, 2])
embedding_output = self.embeddings(
input_ids=input_ids, position_ids=position_ids, token_type_ids=token_type_ids)
encoder_outputs = self.encoder(embedding_output, attention_mask)
sequence_output = encoder_outputs
pooled_output = self.pooler(sequence_output)
return sequence_output, pooled_output
class BertForSequenceClassification(BertPretrainedModel):
"""
Model for sentence (pair) classification task with BERT.
Args:
bert (BertModel): An instance of BertModel.
num_classes (int, optional): The number of classes. Default 2
dropout (float, optional): The dropout probability for output of BERT.
If None, use the same value as `hidden_dropout_prob` of `BertModel`
instance `bert`. Default None
"""
def __init__(self, bert, num_classes=2, dropout=None):
super(BertForSequenceClassification, self).__init__()
self.num_classes = num_classes
self.bert = bert # allow bert to be config
self.dropout = nn.Dropout(dropout if dropout is not None else self.bert.config["hidden_dropout_prob"])
self.classifier = nn.Linear(self.bert.config["hidden_size"], num_classes)
self.apply(self.init_weights)
def forward(self, input_ids, token_type_ids=None, position_ids=None, attention_mask=None):
_, pooled_output = self.bert(
input_ids, token_type_ids=token_type_ids, position_ids=position_ids, attention_mask=attention_mask)
pooled_output = self.dropout(pooled_output)
logits = self.classifier(pooled_output)
return logits
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# FIXME(zhangxuefei): remove this file after paddlenlp is released.
import paddle
import paddle.nn as nn
from paddlehub.module.nlp_module import PretrainedModel, register_base_model
class ErnieEmbeddings(nn.Layer):
"""
Include embeddings from word, position and token_type embeddings
"""
def __init__(self,
vocab_size,
hidden_size=768,
hidden_dropout_prob=0.1,
max_position_embeddings=512,
type_vocab_size=2,
pad_token_id=0):
super(ErnieEmbeddings, self).__init__()
self.word_embeddings = nn.Embedding(vocab_size, hidden_size, padding_idx=pad_token_id)
self.position_embeddings = nn.Embedding(max_position_embeddings, hidden_size)
self.token_type_embeddings = nn.Embedding(type_vocab_size, hidden_size)
self.layer_norm = nn.LayerNorm(hidden_size)
self.dropout = nn.Dropout(hidden_dropout_prob)
def forward(self, input_ids, token_type_ids=None, position_ids=None):
if position_ids is None:
# maybe need use shape op to unify static graph and dynamic graph
seq_length = input_ids.shape[1]
position_ids = paddle.arange(0, seq_length, dtype="int64")
if token_type_ids is None:
token_type_ids = paddle.zeros_like(input_ids, dtype="int64")
input_embedings = self.word_embeddings(input_ids)
position_embeddings = self.position_embeddings(position_ids)
token_type_embeddings = self.token_type_embeddings(token_type_ids)
embeddings = input_embedings + position_embeddings + token_type_embeddings
embeddings = self.layer_norm(embeddings)
embeddings = self.dropout(embeddings)
return embeddings
class ErniePooler(nn.Layer):
"""
"""
def __init__(self, hidden_size):
super(ErniePooler, self).__init__()
self.dense = nn.Linear(hidden_size, hidden_size)
self.activation = nn.Tanh()
def forward(self, hidden_states):
# We "pool" the model by simply taking the hidden state corresponding
# to the first token.
first_token_tensor = hidden_states[:, 0]
pooled_output = self.dense(first_token_tensor)
pooled_output = self.activation(pooled_output)
return pooled_output
class ErniePretrainedModel(PretrainedModel):
"""
An abstract class for pretrained ERNIE models. It provides ERNIE related
`model_config_file`, `resource_files_names`, `pretrained_resource_files_map`,
`pretrained_init_configuration`, `base_model_prefix` for downloading and
loading pretrained models. See `PretrainedModel` for more details.
"""
model_config_file = "model_config.json"
pretrained_init_configuration = {
"ernie": {
"attention_probs_dropout_prob": 0.1,
"hidden_act": "relu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"max_position_embeddings": 513,
"num_attention_heads": 12,
"num_hidden_layers": 12,
"type_vocab_size": 2,
"vocab_size": 18000,
"pad_token_id": 0,
},
"ernie_tiny": {
"attention_probs_dropout_prob": 0.1,
"hidden_act": "relu",
"hidden_dropout_prob": 0.1,
"hidden_size": 1024,
"initializer_range": 0.02,
"intermediate_size": 4096,
"max_position_embeddings": 600,
"num_attention_heads": 16,
"num_hidden_layers": 3,
"type_vocab_size": 2,
"vocab_size": 50006,
"pad_token_id": 0,
},
"ernie_v2_eng_base": {
"attention_probs_dropout_prob": 0.1,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"max_position_embeddings": 512,
"num_attention_heads": 12,
"num_hidden_layers": 12,
"type_vocab_size": 4,
"vocab_size": 30522,
"pad_token_id": 0,
},
"ernie_v2_eng_large": {
"attention_probs_dropout_prob": 0.1,
"intermediate_size": 4096,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 1024,
"initializer_range": 0.02,
"max_position_embeddings": 512,
"num_attention_heads": 16,
"num_hidden_layers": 24,
"type_vocab_size": 4,
"vocab_size": 30522,
"pad_token_id": 0,
},
}
resource_files_names = {"model_state": "model_state.pdparams"}
pretrained_resource_files_map = {
"model_state": {
"ernie":
"https://paddlenlp.bj.bcebos.com/models/transformers/ernie/ernie_v1_chn_base.pdparams",
"ernie_tiny":
"https://paddlenlp.bj.bcebos.com/models/transformers/ernie_tiny/ernie_tiny.pdparams",
"ernie_v2_eng_base":
"https://paddlenlp.bj.bcebos.com/models/transformers/ernie_v2_base/ernie_v2_eng_base.pdparams",
"ernie_v2_eng_large":
"https://paddlenlp.bj.bcebos.com/models/transformers/ernie_v2_large/ernie_v2_eng_large.pdparams",
}
}
base_model_prefix = "ernie"
def init_weights(self, layer):
""" Initialization hook """
if isinstance(layer, (nn.Linear, nn.Embedding)):
# only support dygraph, use truncated_normal and make it inplace
# and configurable later
layer.weight.set_value(
paddle.tensor.normal(
mean=0.0,
std=self.initializer_range
if hasattr(self, "initializer_range") else self.ernie.config["initializer_range"],
shape=layer.weight.shape))
@register_base_model
class ErnieModel(ErniePretrainedModel):
"""
"""
def __init__(self,
vocab_size,
hidden_size=768,
num_hidden_layers=12,
num_attention_heads=12,
intermediate_size=3072,
hidden_act="gelu",
hidden_dropout_prob=0.1,
attention_probs_dropout_prob=0.1,
max_position_embeddings=512,
type_vocab_size=16,
initializer_range=0.02,
pad_token_id=0):
super(ErnieModel, self).__init__()
self.pad_token_id = pad_token_id
self.initializer_range = initializer_range
self.embeddings = ErnieEmbeddings(vocab_size, hidden_size, hidden_dropout_prob, max_position_embeddings,
type_vocab_size, pad_token_id)
encoder_layer = nn.TransformerEncoderLayer(
hidden_size,
num_attention_heads,
intermediate_size,
dropout=hidden_dropout_prob,
activation=hidden_act,
attn_dropout=attention_probs_dropout_prob,
act_dropout=0)
self.encoder = nn.TransformerEncoder(encoder_layer, num_hidden_layers)
self.pooler = ErniePooler(hidden_size)
self.apply(self.init_weights)
def forward(self, input_ids, token_type_ids=None, position_ids=None, attention_mask=None):
if attention_mask is None:
attention_mask = paddle.unsqueeze(
(input_ids == self.pad_token_id).astype(self.pooler.dense.weight.dtype) * -1e9, axis=[1, 2])
embedding_output = self.embeddings(
input_ids=input_ids, position_ids=position_ids, token_type_ids=token_type_ids)
encoder_outputs = self.encoder(embedding_output, attention_mask)
sequence_output = encoder_outputs
pooled_output = self.pooler(sequence_output)
return sequence_output, pooled_output
class ErnieForSequenceClassification(ErniePretrainedModel):
"""
Model for sentence (pair) classification task with ERNIE.
Args:
ernie (ErnieModel): An instance of `ErnieModel`.
num_classes (int, optional): The number of classes. Default 2
dropout (float, optional): The dropout probability for output of ERNIE.
If None, use the same value as `hidden_dropout_prob` of `ErnieModel`
instance `Ernie`. Default None
"""
def __init__(self, ernie, num_classes=2, dropout=None):
super(ErnieForSequenceClassification, self).__init__()
self.num_classes = num_classes
self.ernie = ernie # allow ernie to be config
self.dropout = nn.Dropout(dropout if dropout is not None else self.ernie.config["hidden_dropout_prob"])
self.classifier = nn.Linear(self.ernie.config["hidden_size"], num_classes)
self.apply(self.init_weights)
def forward(self, input_ids, token_type_ids=None, position_ids=None, attention_mask=None):
_, pooled_output = self.ernie(
input_ids, token_type_ids=token_type_ids, position_ids=position_ids, attention_mask=attention_mask)
pooled_output = self.dropout(pooled_output)
logits = self.classifier(pooled_output)
return logits
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# FIXME(zhangxuefei): remove this file after paddlenlp is released.
import paddle
import paddle.nn as nn
from paddlehub.module.nlp_module import PretrainedModel, register_base_model
class RobertaEmbeddings(nn.Layer):
"""
Include embeddings from word, position and token_type embeddings
"""
def __init__(self,
vocab_size,
hidden_size=768,
hidden_dropout_prob=0.1,
max_position_embeddings=512,
type_vocab_size=16,
pad_token_id=0):
super(RobertaEmbeddings, self).__init__()
self.word_embeddings = nn.Embedding(vocab_size, hidden_size, padding_idx=pad_token_id)
self.position_embeddings = nn.Embedding(max_position_embeddings, hidden_size)
self.token_type_embeddings = nn.Embedding(type_vocab_size, hidden_size)
self.layer_norm = nn.LayerNorm(hidden_size)
self.dropout = nn.Dropout(hidden_dropout_prob)
def forward(self, input_ids, token_type_ids=None, position_ids=None):
if position_ids is None:
# maybe need use shape op to unify static graph and dynamic graph
seq_length = input_ids.shape[1]
position_ids = paddle.arange(0, seq_length, dtype="int64")
if token_type_ids is None:
token_type_ids = paddle.zeros_like(input_ids, dtype="int64")
input_embedings = self.word_embeddings(input_ids)
position_embeddings = self.position_embeddings(position_ids)
token_type_embeddings = self.token_type_embeddings(token_type_ids)
embeddings = input_embedings + position_embeddings + token_type_embeddings
embeddings = self.layer_norm(embeddings)
embeddings = self.dropout(embeddings)
return embeddings
class RobertaPooler(nn.Layer):
"""
"""
def __init__(self, hidden_size):
super(RobertaPooler, self).__init__()
self.dense = nn.Linear(hidden_size, hidden_size)
self.activation = nn.Tanh()
def forward(self, hidden_states):
# We "pool" the model by simply taking the hidden state corresponding
# to the first token.
first_token_tensor = hidden_states[:, 0]
pooled_output = self.dense(first_token_tensor)
pooled_output = self.activation(pooled_output)
return pooled_output
class RobertaPretrainedModel(PretrainedModel):
"""
An abstract class for pretrained RoBERTa models. It provides RoBERTa related
`model_config_file`, `resource_files_names`, `pretrained_resource_files_map`,
`pretrained_init_configuration`, `base_model_prefix` for downloading and
loading pretrained models. See `PretrainedModel` for more details.
"""
model_config_file = "model_config.json"
pretrained_init_configuration = {
"roberta-wwm-ext": {
"attention_probs_dropout_prob": 0.1,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"intermediate_size": 3072,
"max_position_embeddings": 512,
"num_attention_heads": 12,
"num_hidden_layers": 12,
"type_vocab_size": 2,
"vocab_size": 21128,
"pad_token_id": 0
},
"roberta-wwm-ext-large": {
"attention_probs_dropout_prob": 0.1,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 1024,
"initializer_range": 0.02,
"intermediate_size": 4096,
"max_position_embeddings": 512,
"num_attention_heads": 16,
"num_hidden_layers": 24,
"type_vocab_size": 2,
"vocab_size": 21128,
"pad_token_id": 0
}
}
resource_files_names = {"model_state": "model_state.pdparams"}
pretrained_resource_files_map = {
"model_state": {
"roberta-wwm-ext":
"https://paddlenlp.bj.bcebos.com/models/transformers/roberta_base/roberta_chn_base.pdparams",
"roberta-wwm-ext-large":
"https://paddlenlp.bj.bcebos.com/models/transformers/roberta_large/roberta_chn_large.pdparams",
}
}
base_model_prefix = "roberta"
def init_weights(self, layer):
""" Initialization hook """
if isinstance(layer, (nn.Linear, nn.Embedding)):
# only support dygraph, use truncated_normal and make it inplace
# and configurable later
layer.weight.set_value(
paddle.tensor.normal(
mean=0.0,
std=self.initializer_range
if hasattr(self, "initializer_range") else self.roberta.config["initializer_range"],
shape=layer.weight.shape))
elif isinstance(layer, nn.LayerNorm):
layer._epsilon = 1e-12
@register_base_model
class RobertaModel(RobertaPretrainedModel):
"""
"""
def __init__(self,
vocab_size,
hidden_size=768,
num_hidden_layers=12,
num_attention_heads=12,
intermediate_size=3072,
hidden_act="gelu",
hidden_dropout_prob=0.1,
attention_probs_dropout_prob=0.1,
max_position_embeddings=512,
type_vocab_size=16,
initializer_range=0.02,
pad_token_id=0):
super(RobertaModel, self).__init__()
self.pad_token_id = pad_token_id
self.initializer_range = initializer_range
self.embeddings = RobertaEmbeddings(vocab_size, hidden_size, hidden_dropout_prob, max_position_embeddings,
type_vocab_size, pad_token_id)
encoder_layer = nn.TransformerEncoderLayer(
hidden_size,
num_attention_heads,
intermediate_size,
dropout=hidden_dropout_prob,
activation=hidden_act,
attn_dropout=attention_probs_dropout_prob,
act_dropout=0)
self.encoder = nn.TransformerEncoder(encoder_layer, num_hidden_layers)
self.pooler = RobertaPooler(hidden_size)
self.apply(self.init_weights)
def forward(self, input_ids, token_type_ids=None, position_ids=None, attention_mask=None):
if attention_mask is None:
attention_mask = paddle.unsqueeze(
(input_ids == self.pad_token_id).astype(self.pooler.dense.weight.dtype) * -1e9, axis=[1, 2])
embedding_output = self.embeddings(
input_ids=input_ids, position_ids=position_ids, token_type_ids=token_type_ids)
encoder_outputs = self.encoder(embedding_output, attention_mask)
sequence_output = encoder_outputs
pooled_output = self.pooler(sequence_output)
return sequence_output, pooled_output
class RobertaForSequenceClassification(RobertaPretrainedModel):
"""
Model for sentence (pair) classification task with RoBERTa.
Args:
roberta (RobertaModel): An instance of `RobertaModel`.
num_classes (int, optional): The number of classes. Default 2
dropout (float, optional): The dropout probability for output of RoBERTa.
If None, use the same value as `hidden_dropout_prob` of `RobertaModel`
instance `Roberta`. Default None
"""
def __init__(self, roberta, num_classes=2, dropout=None):
super(RobertaForSequenceClassification, self).__init__()
self.num_classes = num_classes
self.roberta = roberta # allow roberta to be config
self.dropout = nn.Dropout(dropout if dropout is not None else self.roberta.config["hidden_dropout_prob"])
self.classifier = nn.Linear(self.roberta.config["hidden_size"], num_classes)
self.apply(self.init_weights)
def forward(self, input_ids, token_type_ids=None, position_ids=None, attention_mask=None):
_, pooled_output = self.roberta(
input_ids, token_type_ids=token_type_ids, position_ids=position_ids, attention_mask=attention_mask)
pooled_output = self.dropout(pooled_output)
logits = self.classifier(pooled_output)
return logits
# coding:utf-8 # Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License" # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
# You may obtain a copy of the License at # You may obtain a copy of the License at
# #
...@@ -13,7 +12,333 @@ ...@@ -13,7 +12,333 @@
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
# FIXME(zhangxuefei): remove this file after paddlenlp is released.
class DataFormatError(Exception): import copy
def __init__(self, *args): import functools
self.args = args import inspect
import io
import json
import os
import six
import paddle
import paddle.nn as nn
from paddle.dataset.common import DATA_HOME
from paddle.utils.download import get_path_from_url
from paddlehub.utils.log import logger
__all__ = [
'PretrainedModel',
'register_base_model',
]
def fn_args_to_dict(func, *args, **kwargs):
"""
Inspect function `func` and its arguments for running, and extract a
dict mapping between argument names and keys.
"""
if hasattr(inspect, 'getfullargspec'):
(spec_args, spec_varargs, spec_varkw, spec_defaults, _, _, _) = inspect.getfullargspec(func)
else:
(spec_args, spec_varargs, spec_varkw, spec_defaults) = inspect.getargspec(func)
# add positional argument values
init_dict = dict(zip(spec_args, args))
# add default argument values
kwargs_dict = dict(zip(spec_args[-len(spec_defaults):], spec_defaults)) if spec_defaults else {}
kwargs_dict.update(kwargs)
init_dict.update(kwargs_dict)
return init_dict
class InitTrackerMeta(type(nn.Layer)):
"""
This metaclass wraps the `__init__` method of a class to add `init_config`
attribute for instances of that class, and `init_config` use a dict to track
the initial configuration. If the class has `_wrap_init` method, it would be
hooked after `__init__` and called as `_wrap_init(self, init_fn, init_args)`.
Since InitTrackerMeta would be used as metaclass for pretrained model classes,
which always are Layer and `type(nn.Layer)` is not `type`, thus use `type(nn.Layer)`
rather than `type` as base class for it to avoid inheritance metaclass
conflicts.
"""
def __init__(cls, name, bases, attrs):
init_func = cls.__init__
# If attrs has `__init__`, wrap it using accessable `_wrap_init`.
# Otherwise, no need to wrap again since the super cls has been wraped.
# TODO: remove reduplicated tracker if using super cls `__init__`
help_func = getattr(cls, '_wrap_init', None) if '__init__' in attrs else None
cls.__init__ = InitTrackerMeta.init_and_track_conf(init_func, help_func)
super(InitTrackerMeta, cls).__init__(name, bases, attrs)
@staticmethod
def init_and_track_conf(init_func, help_func=None):
"""
wraps `init_func` which is `__init__` method of a class to add `init_config`
attribute for instances of that class.
Args:
init_func (callable): It should be the `__init__` method of a class.
help_func (callable, optional): If provided, it would be hooked after
`init_func` and called as `_wrap_init(self, init_func, *init_args, **init_args)`.
Default None.
Returns:
function: the wrapped function
"""
@functools.wraps(init_func)
def __impl__(self, *args, **kwargs):
# keep full configuration
init_func(self, *args, **kwargs)
# registed helper by `_wrap_init`
if help_func:
help_func(self, init_func, *args, **kwargs)
self.init_config = kwargs
if args:
kwargs['init_args'] = args
kwargs['init_class'] = self.__class__.__name__
return __impl__
def register_base_model(cls):
"""
Add a `base_model_class` attribute for the base class of decorated class,
representing the base model class in derived classes of the same architecture.
Args:
cls (class): the name of the model
"""
base_cls = cls.__bases__[0]
assert issubclass(base_cls,
PretrainedModel), "`register_base_model` should be used on subclasses of PretrainedModel."
base_cls.base_model_class = cls
return cls
@six.add_metaclass(InitTrackerMeta)
class PretrainedModel(nn.Layer):
"""
The base class for all pretrained models. It provides some attributes and
common methods for all pretrained models, including attributes `init_config`,
`config` for initialized arguments and methods for saving, loading.
It also includes some class attributes (should be set by derived classes):
- `model_config_file` (str): represents the file name for saving and loading
model configuration, it's value is `model_config.json`.
- `resource_files_names` (dict): use this to map resources to specific file
names for saving and loading.
- `pretrained_resource_files_map` (dict): The dict has the same keys as
`resource_files_names`, the values are also dict mapping specific pretrained
model name to URL linking to pretrained model.
- `pretrained_init_configuration` (dict): The dict has pretrained model names
as keys, and the values are also dict preserving corresponding configuration
for model initialization.
- `base_model_prefix` (str): represents the the attribute associated to the
base model in derived classes of the same architecture adding layers on
top of the base model.
"""
model_config_file = "model_config.json"
pretrained_init_configuration = {}
# TODO: more flexible resource handle, namedtuple with fileds as:
# resource_name, saved_file, handle_name_for_load(None for used as __init__
# arguments), handle_name_for_save
resource_files_names = {"model_state": "model_state.pdparams"}
pretrained_resource_files_map = {}
base_model_prefix = ""
def _wrap_init(self, original_init, *args, **kwargs):
"""
It would be hooked after `__init__` to add a dict including arguments of
`__init__` as a attribute named `config` of the prtrained model instance.
"""
init_dict = fn_args_to_dict(original_init, *args, **kwargs)
self.config = init_dict
@property
def base_model(self):
return getattr(self, self.base_model_prefix, self)
@property
def model_name_list(self):
return list(self.pretrained_init_configuration.keys())
def get_input_embeddings(self):
base_model = getattr(self, self.base_model_prefix, self)
if base_model is not self:
return base_model.get_input_embeddings()
else:
raise NotImplementedError
def get_output_embeddings(self):
return None # Overwrite for models with output embeddings
@classmethod
def from_pretrained(cls, pretrained_model_name_or_path, *args, **kwargs):
"""
Instantiate an instance of `PretrainedModel` from a predefined
model specified by name or path.
Args:
pretrained_model_name_or_path (str): A name of or a file path to a
pretrained model.
*args (tuple): position arguments for `__init__`. If provide, use
this as position argument values for model initialization.
**kwargs (dict): keyword arguments for `__init__`. If provide, use
this to update pre-defined keyword argument values for model
initialization.
Returns:
PretrainedModel: An instance of PretrainedModel.
"""
pretrained_models = list(cls.pretrained_init_configuration.keys())
resource_files = {}
init_configuration = {}
if pretrained_model_name_or_path in pretrained_models:
for file_id, map_list in cls.pretrained_resource_files_map.items():
resource_files[file_id] = map_list[pretrained_model_name_or_path]
init_configuration = copy.deepcopy(cls.pretrained_init_configuration[pretrained_model_name_or_path])
else:
if os.path.isdir(pretrained_model_name_or_path):
for file_id, file_name in cls.resource_files_names.items():
full_file_name = os.path.join(pretrained_model_name_or_path, file_name)
resource_files[file_id] = full_file_name
resource_files["model_config_file"] = os.path.join(pretrained_model_name_or_path, cls.model_config_file)
else:
raise ValueError("Calling {}.from_pretrained() with a model identifier or the "
"path to a directory instead. The supported model "
"identifiers are as follows: {}".format(cls.__name__,
cls.pretrained_init_configuration.keys()))
# FIXME(chenzeyu01): We should use another data path for storing model
default_root = os.path.join(DATA_HOME, pretrained_model_name_or_path)
resolved_resource_files = {}
for file_id, file_path in resource_files.items():
path = os.path.join(default_root, file_path.split('/')[-1])
if file_path is None or os.path.isfile(file_path):
resolved_resource_files[file_id] = file_path
elif os.path.exists(path):
logger.info("Already cached %s" % path)
resolved_resource_files[file_id] = path
else:
logger.info("Downloading %s and saved to %s" % (file_path, default_root))
resolved_resource_files[file_id] = get_path_from_url(file_path, default_root)
# Prepare model initialization kwargs
# Did we saved some inputs and kwargs to reload ?
model_config_file = resolved_resource_files.pop("model_config_file", None)
if model_config_file is not None:
with io.open(model_config_file, encoding="utf-8") as f:
init_kwargs = json.load(f)
else:
init_kwargs = init_configuration
# position args are stored in kwargs, maybe better not include
init_args = init_kwargs.pop("init_args", ())
# class name corresponds to this configuration
init_class = init_kwargs.pop("init_class", cls.base_model_class.__name__)
# Check if the loaded config matches the current model class's __init__
# arguments. If not match, the loaded config is for the base model class.
if init_class == cls.base_model_class.__name__:
base_args = init_args
base_kwargs = init_kwargs
derived_args = ()
derived_kwargs = {}
base_arg_index = None
else: # extract config for base model
derived_args = list(init_args)
derived_kwargs = init_kwargs
for i, arg in enumerate(init_args):
if isinstance(arg, dict) and "init_class" in arg:
assert arg.pop("init_class") == cls.base_model_class.__name__, (
"pretrained base model should be {}").format(cls.base_model_class.__name__)
base_arg_index = i
break
for arg_name, arg in init_kwargs.items():
if isinstance(arg, dict) and "init_class" in arg:
assert arg.pop("init_class") == cls.base_model_class.__name__, (
"pretrained base model should be {}").format(cls.base_model_class.__name__)
base_arg_index = arg_name
break
base_args = arg.pop("init_args", ())
base_kwargs = arg
if cls == cls.base_model_class:
# Update with newly provided args and kwargs for base model
base_args = base_args if not args else args
base_kwargs.update(kwargs)
model = cls(*base_args, **base_kwargs)
else:
# Update with newly provided args and kwargs for derived model
base_model = cls.base_model_class(*base_args, **base_kwargs)
if base_arg_index is not None:
derived_args[base_arg_index] = base_model
else:
derived_args = (base_model, ) # assume at the first position
derived_args = derived_args if not args else args
derived_kwargs.update(kwargs)
model = cls(*derived_args, **derived_kwargs)
# Maybe need more ways to load resources.
weight_path = list(resolved_resource_files.values())[0]
assert weight_path.endswith(".pdparams"), "suffix of weight must be .pdparams"
state_dict = paddle.load(weight_path)
# Make sure we are able to load base models as well as derived models
# (with heads)
start_prefix = ""
model_to_load = model
state_to_load = state_dict
unexpected_keys = []
missing_keys = []
if not hasattr(model, cls.base_model_prefix) and any(
s.startswith(cls.base_model_prefix) for s in state_dict.keys()):
# base model
state_to_load = {}
start_prefix = cls.base_model_prefix + "."
for k, v in state_dict.items():
if k.startswith(cls.base_model_prefix):
state_to_load[k[len(start_prefix):]] = v
else:
unexpected_keys.append(k)
if hasattr(model,
cls.base_model_prefix) and not any(s.startswith(cls.base_model_prefix) for s in state_dict.keys()):
# derived model (base model with heads)
model_to_load = getattr(model, cls.base_model_prefix)
for k in model.state_dict().keys():
if not k.startswith(cls.base_model_prefix):
missing_keys.append(k)
if len(missing_keys) > 0:
logger.info("Weights of {} not initialized from pretrained model: {}".format(
model.__class__.__name__, missing_keys))
if len(unexpected_keys) > 0:
logger.info("Weights from pretrained model not used in {}: {}".format(model.__class__.__name__,
unexpected_keys))
model_to_load.set_state_dict(state_to_load)
if paddle.in_dynamic_mode():
return model
return model, state_to_load
def save_pretrained(self, save_directory):
"""
Save model configuration and related resources (model state) to files
under `save_directory`.
Args:
save_directory (str): Directory to save files into.
"""
assert os.path.isdir(save_directory), "Saving directory ({}) should be a directory".format(save_directory)
# save model config
model_config_file = os.path.join(save_directory, self.model_config_file)
model_config = self.init_config
# If init_config contains a Layer, use the layer's init_config to save
for key, value in model_config.items():
if key == "init_args":
args = []
for arg in value:
args.append(arg.init_config if isinstance(arg, PretrainedModel) else arg)
model_config[key] = tuple(args)
elif isinstance(value, PretrainedModel):
model_config[key] = value.init_config
with io.open(model_config_file, "w", encoding="utf-8") as f:
f.write(json.dumps(model_config, ensure_ascii=False))
# save model
file_name = os.path.join(save_directory, list(self.resource_files_names.values())[0])
paddle.save(self.state_dict(), file_name)
...@@ -20,7 +20,7 @@ import pickle ...@@ -20,7 +20,7 @@ import pickle
import unicodedata import unicodedata
from typing import Dict, List, Optional, Union, Tuple from typing import Dict, List, Optional, Union, Tuple
import sentencepiece as spm from paddle.utils import try_import
from paddlehub.text.utils import load_vocab, is_whitespace, is_control, is_punctuation, whitespace_tokenize, is_chinese_char from paddlehub.text.utils import load_vocab, is_whitespace, is_control, is_punctuation, whitespace_tokenize, is_chinese_char
...@@ -509,9 +509,9 @@ class BertTokenizer(object): ...@@ -509,9 +509,9 @@ class BertTokenizer(object):
max_seq_len: Optional[int] = None, max_seq_len: Optional[int] = None,
pad_to_max_seq_len: bool = True, pad_to_max_seq_len: bool = True,
truncation_strategy: str = 'longest_first', truncation_strategy: str = 'longest_first',
return_position_ids: bool = True, return_position_ids: bool = False,
return_segment_ids: bool = True, return_segment_ids: bool = True,
return_input_mask: bool = True, return_input_mask: bool = False,
return_length: bool = True, return_length: bool = True,
return_overflowing_tokens: bool = False, return_overflowing_tokens: bool = False,
return_special_tokens_mask: bool = False): return_special_tokens_mask: bool = False):
...@@ -541,11 +541,11 @@ class BertTokenizer(object): ...@@ -541,11 +541,11 @@ class BertTokenizer(object):
- 'only_first': Only truncate the first sequence - 'only_first': Only truncate the first sequence
- 'only_second': Only truncate the second sequence - 'only_second': Only truncate the second sequence
- 'do_not_truncate': Does not truncate (raise an error if the input sequence is longer than max_seq_len) - 'do_not_truncate': Does not truncate (raise an error if the input sequence is longer than max_seq_len)
return_position_ids (:obj:`bool`, `optional`, defaults to :obj:`True`): return_position_ids (:obj:`bool`, `optional`, defaults to :obj:`False`):
Set to True to return tokens position ids (default True). Set to True to return tokens position ids (default True).
return_segment_ids (:obj:`bool`, `optional`, defaults to :obj:`True`): return_segment_ids (:obj:`bool`, `optional`, defaults to :obj:`True`):
Whether to return token type IDs. Whether to return token type IDs.
return_input_mask (:obj:`bool`, `optional`, defaults to :obj:`True`): return_input_mask (:obj:`bool`, `optional`, defaults to :obj:`False`):
Whether to return the attention mask. Whether to return the attention mask.
return_length (:obj:`int`, defaults to :obj:`True`): return_length (:obj:`int`, defaults to :obj:`True`):
If set the resulting dictionary will include the length of each encoded inputs If set the resulting dictionary will include the length of each encoded inputs
...@@ -612,7 +612,6 @@ class BertTokenizer(object): ...@@ -612,7 +612,6 @@ class BertTokenizer(object):
encoded_inputs['num_truncated_tokens'] = total_len - max_seq_len encoded_inputs['num_truncated_tokens'] = total_len - max_seq_len
# Add special tokens # Add special tokens
sequence = self.build_inputs_with_special_tokens(ids, pair_ids) sequence = self.build_inputs_with_special_tokens(ids, pair_ids)
segment_ids = self.create_segment_ids_from_sequences(ids, pair_ids) segment_ids = self.create_segment_ids_from_sequences(ids, pair_ids)
...@@ -704,6 +703,7 @@ class ErnieTinyTokenizer(BertTokenizer): ...@@ -704,6 +703,7 @@ class ErnieTinyTokenizer(BertTokenizer):
cls_token: str = '[CLS]', cls_token: str = '[CLS]',
mask_token: str = '[MASK]', mask_token: str = '[MASK]',
): ):
mod = try_import('sentencepiece')
self.unk_token = unk_token self.unk_token = unk_token
self.sep_token = sep_token self.sep_token = sep_token
self.pad_token = pad_token self.pad_token = pad_token
...@@ -719,7 +719,7 @@ class ErnieTinyTokenizer(BertTokenizer): ...@@ -719,7 +719,7 @@ class ErnieTinyTokenizer(BertTokenizer):
# Here is the difference with BertTokenizer. # Here is the difference with BertTokenizer.
self.dict = pickle.load(open(word_dict_path, 'rb')) self.dict = pickle.load(open(word_dict_path, 'rb'))
self.sp_model = spm.SentencePieceProcessor() self.sp_model = mod.SentencePieceProcessor()
self.window_size = 5 self.window_size = 5
self.sp_model.Load(spm_path) self.sp_model.Load(spm_path)
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册