提交 edbe41ba 编写于 作者: G grasswolfs

test=pre-develop, test=documents_fix

上级 c093331e
......@@ -4,7 +4,7 @@
![support os](https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-yellow.svg)
## 简介
- PaddleHub旨在为开发者提供丰富的、高质量的、直接可用的预训练模型,**【无需深度学习背景、无需数据与训练过程】,**也可快速使用AI模型。
- PaddleHub旨在为开发者提供丰富的、高质量的、直接可用的预训练模型,**【无需深度学习背景、无需数据与训练过程】**也可快速使用AI模型。
- 涵盖CV、NLP、Audio、Video主流四大品类,支持**一键预测****一键服务化部署****快速迁移学习**
- 全部模型开源下载,**离线可运行**
......@@ -15,10 +15,10 @@
- **2020.09.27**,新增文本生成模型6个,图像分割模型1个,预训练模型总量到达 **【154】** 个。
- **2020.08.13**,发布v1.8.1,新增人像分割模型Humanseg,支持EMNLP2019-Sentence-BERT作为文本匹配任务网络,预训练模型总量到达 **【147】** 个。
- **2020.07.29**,发布v1.8.0,新增AI对联和AI写诗、jieba切词,文本数据LDA、语义相似度计算,新增目标检测,短视频分类模型,超轻量中英文OCR,新增行人检测、车辆检测、动物识别等工业级模型,支持VisualDL可视化训练,预训练模型总量到达 **【135】** 个。
- [More]()
- [More](./docs/release.md)
## 特性
## [特性](./docs/figures.md)
- **【丰富的预训练模型】**:涵盖CV、NLP、Audio、Video主流四大品类的 180+ 预训练模型,全部开源下载,离线可运行。
- **【一键模型快速预测】**:通过一行命令行或者极简的Python API实现模型调用,可快速体验模型效果。
- **【一键模型转服务化】**:一行命令,搭建深度学习模型API服务化部署能力。
......@@ -60,27 +60,27 @@
- [在线运行体验demo【Official】](https://github.com/PaddlePaddle/PaddleHub/tree/release/v1.8/demo)
- [生态趣味项目demo【ThirdPary】](./docs/quick_experience/more_demos.md)
- 丰富的预训练模型 182 个
- [精品特色模型](./docs/pretrained_models.md)
- [精品特色模型](./docs/figure.md)
- 计算机视觉 126 个
- [图像分类 64 个](./modules/image/classification)
- [目标检测 13 个](./modules/image/object_detection)
- [人脸检测 7 个](./modules/image/face_detection)
- [关键点检测 3 个](./modules/image/keypoint_detection)
- [图像分割 7 个](./modules/image/semantic_segmentation)
- [文本识别 8 个](./modules/image/text_recognition)
- [图像生成 17 个](./modules/image/gan)
- [图像编辑 7 个](./modules/image/style_transfer)
- [图像分类 64 个](./modules/image/classification/README.md)
- [目标检测 13 个](./modules/image/object_detection/README.md)
- [人脸检测 7 个](./modules/image/face_detection/README.md)
- [关键点检测 3 个](./modules/image/keypoint_detection/README.md)
- [图像分割 7 个](./modules/image/semantic_segmentation/README.md)
- [文本识别 8 个](./modules/image/text_recognition/README.md)
- [图像生成 17 个](./modules/image/Image_gan/README.md)
- [图像编辑 7 个](./modules/image/Image_editing/README.md)
- 自然语言处理 48 个
- [词法分析 2 个](./modules/text/lexical_analysis)
- [句法分析 1 个](./modules/text/syntactic_analysis)
- [情感分析 7 个](./modules/text/semantic_model)
- [文本审核 3 个](./modules/text/text_review)
- [文本生成 9 个](./modules/text/text_generation)
- [语义模型 26 个](./modules/text/semantic_model)
- [词法分析 2 个](./modules/text/lexical_analysis/README.md)
- [句法分析 1 个](./modules/text/syntactic_analysis/README.md)
- [情感分析 7 个](./modules/text/sentiment_analysis/README.md)
- [文本审核 3 个](./modules/text/text_review/README.md)
- [文本生成 9 个](./modules/text/text_generation/README.md)
- [语义模型 26 个](./modules/text/language_model/README.md)
- 语音 3 个
- [语音合成 3 个](./modules/audio)
- [语音合成 3 个](./modules/audio/README.md)
- 视频5个
- [视频分类 5 个](./modules/video)
- [视频分类 5 个](./modules/video/README.md)
- 部署
- [一行代码服务化部署](./docs/tutorial/serving.md)
- C++ Inference 部署(建议加群沟通)
......
## 特性详解
<a name="丰富的预训练模型"></a>
### 1、丰富的预训练模型
- 1.1、图像
| | **精品模型举例** |
| ---------- | :----------------------------------------------------------- |
| 图像分类 | [菜品识别](https://www.paddlepaddle.org.cn/hubdetail?name=resnet50_vd_dishes&en_category=ImageClassification)[动物识别](https://www.paddlepaddle.org.cn/hubdetail?name=resnet50_vd_animals&en_category=ImageClassification)[动物识别](https://www.paddlepaddle.org.cn/hubdetail?name=resnet50_vd_animals&en_category=ImageClassification)[-->More](../modules/image/classification/README.md) |
| 目标检测 | [通用检测](https://www.paddlepaddle.org.cn/hubdetail?name=yolov3_darknet53_coco2017&en_category=ObjectDetection)[行人检测](https://www.paddlepaddle.org.cn/hubdetail?name=yolov3_darknet53_pedestrian&en_category=ObjectDetection)[车辆检测](https://www.paddlepaddle.org.cn/hubdetail?name=yolov3_darknet53_vehicles&en_category=ObjectDetection)[-->More](../modules/image/object_detection/README.md) |
| 人脸检测 | [人脸检测](https://www.paddlepaddle.org.cn/hubdetail?name=pyramidbox_lite_server&en_category=FaceDetection)[口罩检测](https://www.paddlepaddle.org.cn/hubdetail?name=pyramidbox_lite_server_mask&en_category=FaceDetection)[-->More](../modules/image/face_detection/README.md) |
| 图像分割 | [人像分割](https://www.paddlepaddle.org.cn/hubdetail?name=deeplabv3p_xception65_humanseg&en_category=ImageSegmentation)[人体解析](https://www.paddlepaddle.org.cn/hubdetail?name=ace2p&en_category=ImageSegmentation)[肺炎CT影像分析](https://www.paddlepaddle.org.cn/hubdetail?name=Pneumonia_CT_LKM_PP&en_category=ImageSegmentation)[-->More](../modules/image/semantic_segmentation/README.md) |
| 关键点检测 | [人体关键点](https://www.paddlepaddle.org.cn/hubdetail?name=human_pose_estimation_resnet50_mpii&en_category=KeyPointDetection)[人脸关键点](https://www.paddlepaddle.org.cn/hubdetail?name=face_landmark_localization&en_category=KeyPointDetection)[手部关键点](https://www.paddlepaddle.org.cn/hubdetail?name=hand_pose_localization&en_category=KeyPointDetection)[-->More](./modules/image/keypoint_detection/README.md) |
| 文本识别 | [超轻量中英文OCR文字识别](https://www.paddlepaddle.org.cn/hubdetail?name=chinese_ocr_db_crnn_mobile&en_category=TextRecognition)[-->More](../modules/image/text_recognition/README.md) |
| 图像生成 | [风格迁移](https://www.paddlepaddle.org.cn/hubdetail?name=stylepro_artistic&en_category=GANs)[街景动漫画]()、[-->More](../modules/image/Image_gan/README.md) |
| 图像编辑 | [超分辨率](https://www.paddlepaddle.org.cn/hubdetail?name=realsr&en_category=ImageEditing)[黑白上色](https://www.paddlepaddle.org.cn/hubdetail?name=deoldify&en_category=ImageEditing)[-->More](../modules/image/Image_editing/README.md) |
- 1.2、文本
| | **精品模型举例** |
| ---------- | :----------------------------------------------------------- |
| 词句分析 | [词法分析 ](https://www.paddlepaddle.org.cn/hubdetail?name=lac&en_category=LexicalAnalysis)[句法分析](https://www.paddlepaddle.org.cn/hubdetail?name=ddparser&en_category=SyntacticAnalysis)[-->More](../modules/text/lexical_analysis/README.md) |
| 情感分析 | [情感判断](https://www.paddlepaddle.org.cn/hubdetail?name=lac&en_category=LexicalAnalysis)[情绪分析](https://www.paddlepaddle.org.cn/hubdetail?name=emotion_detection_textcnn&en_category=SentimentAnalysis)[-->More](../modules/text/sentiment_analysis/README.md)|
| 文本审核 | [色情审核](https://www.paddlepaddle.org.cn/hubdetail?name=porn_detection_gru&en_category=TextCensorship)[-->More](../modules/text/text_review/README.md) |
| 文本生成 | [对联生成]()、[情话生成]()、[藏图诗生成]()、[土味情话]()[-->More](../modules/text/text_generation/README.md)|
| 语义模型 | [ERNIE](https://www.paddlepaddle.org.cn/hubdetail?name=ERNIE&en_category=SemanticModel)[文本相似度](https://www.paddlepaddle.org.cn/hubdetail?name=simnet_bow&en_category=SemanticModel)[-->More](../modules/text/language_model/README.md) |
- 1.3、语音
| | **精品模型举例** |
| ---------- | :----------------------------------------------------------- |
| 语音合成 | [语音合成]()[-->More](../modules/audio/README.md) |
- 1.4、视频
| | **精品模型举例** |
| ---------- | :----------------------------------------------------------- |
| 视频分类 | [视频分类]()、[-->More](../modules/video/README.md) |
<a name="一键模型预测"></a>
### 2、一键模型预测
* 举例,假如考虑使用文字识别轻量级中文OCR模型chinese_ocr_db_crnn_mobile即可一键快速识别图片中的文字。
```shell
$ pip install paddlehub
$ wget https://paddlehub.bj.bcebos.com/model/image/ocr/test_ocr.jpg
$ hub run chinese_ocr_db_crnn_mobile --input_path test_ocr.jpg --visualization=True
```
* 预测结果图片保存在当前运行路径下ocr_result文件夹中,如下图所示。
<p align="center">
<img src="./imgs/ocr_res.jpg" width='70%' align="middle"
</p>
* 使用词法分析模型LAC进行分词
```shell
$ hub run lac --input_text "现在,慕尼黑再保险公司不仅是此类行动的倡议者,更是将其大量气候数据整合进保险产品中,并与公众共享大量天气信息,参与到新能源领域的保障中。"
[{
'word': ['现在', ',', '慕尼黑再保险公司', '不仅', '是', '此类', '行动', '的', '倡议者', ',', '更是', '将', '其', '大量', '气候', '数据', '整合', '进', '保险', '产品', '中', ',', '并', '与', '公众', '共享', '大量', '天气', '信息', ',', '参与', '到', '新能源', '领域', '的', '保障', '中', '。'],
'tag': ['TIME', 'w', 'ORG', 'c', 'v', 'r', 'n', 'u', 'n', 'w', 'd', 'p', 'r', 'a', 'n', 'n', 'v', 'v', 'n', 'n', 'f', 'w', 'c', 'p', 'n', 'v', 'a', 'n', 'n', 'w', 'v', 'v', 'n', 'n', 'u', 'vn', 'f', 'w']
}]
```
除了一行代码预测之外,PaddleHub也支持使用API调用模型的方式,可以参考每个模型的详细文档。
<a name="一键模型转服务"></a>
### 3、一键模型转服务
PaddleHub提供便捷的模型转服务的能力,只需简单一行命令即可完成模型的HTTP服务部署。通过以下命令即可快速启动LAC词法分析服务:
```shell
$ hub serving start -m chinese_ocr_db_crnn_mobile
```
更多关于模型服务化使用说明参见[PaddleHub模型一键服务化部署](./tutorial/serving.md)
<a name="十行代码迁移学习"></a>
### 4、十行代码迁移学习
通过Fine-tune API,只需要少量代码即可完成深度学习模型在计算机视觉场景下的迁移学习。
* [Demo示例](../demo)提供丰富的Fine-tune API的使用代码,包括[图像分类](../demo/image_classification)[图像着色](../demo/colorization)[风格迁移](../demo/style_transfer)、等场景的模型迁移示例。
<p align="center">
<img src="./imgs/paddlehub_finetune.gif" align="middle"
</p>
<p align='center'>
十行代码完成工业级文本分类
</p>
* 如需在线快速体验,请点击[PaddleHub教程合集](https://aistudio.baidu.com/aistudio/projectdetail/231146),可使用AI Studio平台提供的GPU算力进行快速尝试。
<a name="许可证书"></a>
## 许可证书
本项目的发布受<a href="./LICENSE">Apache 2.0 license</a>许可认证。
<a name="致谢"></a>
## 致谢
我们非常欢迎您为PaddleHub贡献代码,也十分感谢您的反馈。
* 非常感谢[Austendeng](https://github.com/Austendeng)贡献了修复SequenceLabelReader的pr
* 非常感谢[cclauss](https://github.com/cclauss)贡献了优化travis-ci检查的pr
* 非常感谢[奇想天外](http://www.cheerthink.com/)贡献了口罩检测的demo
* 非常感谢[mhlwsk](https://github.com/mhlwsk)贡献了修复序列标注预测demo的pr
* 非常感谢[zbp-xxxp](https://github.com/zbp-xxxp)贡献了看图作诗的module
* 非常感谢[zbp-xxxp](https://github.com/zbp-xxxp)[七年期限](https://github.com/1084667371)联合贡献了看图写诗中秋特别版module
* 非常感谢[livingbody](https://github.com/livingbody)贡献了基于PaddleHub能力的风格迁移和中秋看图写诗微信小程序
## **更好用户体验,建议参考WEB端官方文档 -> [【语音合成】](https://www.paddlepaddle.org.cn/hubdetail)**
### 文字识别
语音合成(TTS)任务可以实现讲文字转化为语音,已经广泛应用于各种语音交互设备中。
- 推荐模型
| 模型名称 | 模型简介 |
| ------------------------------------------------------------ | ------------------------------------------------------------ |
| [语音合成transformer_tts_ljspeech](https://www.paddlepaddle.org.cn/hubdetail?name=transformer_tts_ljspeech&en_category=TextToSpeech) | TansformerTTS 对 Transformer 和 Tacotron2 进行了融合,取得了令人满意的效果,英文TTS模型,仅支持预测。 |
| [语音合成fastspeech_ljspeech](https://www.paddlepaddle.org.cn/hubdetail?name=fastspeech_ljspeech&en_category=TextToSpeech) | FastSpeech是基于encoder-decoder结构的teacher model中提取attention对角线来做发音持续时间预测,英文TTS模型,仅支持预测。 |
| [语音合成deepvoice3_ljspeech](https://www.paddlepaddle.org.cn/hubdetail?name=deepvoice3_ljspeech&en_category=TextToSpeech) | Deep Voice 3是百度研究院2017年发布的端到端的TTS模型(论文录用于ICLR 2018)。它是一个基于卷积神经网络和注意力机制的seq2seq模型,英文TTS模型,仅支持预测。|
## **更好用户体验,建议参考WEB端官方文档 -> [【图像编辑】](https://www.paddlepaddle.org.cn/hubdetail)**
### 图像编辑
图像编辑是指在输入图像的基础上,对图像的像素点进行进一步的编辑和调整,输出新的目标图像,具体的应用场景有:超分辨率、黑白片上色,老照片修复等。
- 精选推荐模型
| 模型名称 | 模型简介 |
| ------------------------------------------------------------ | ------------------------------------------------------------ |
| [超分辨率](https://www.paddlepaddle.org.cn/hubdetail?name=realsr&en_category=ImageEditing) | 可用于图像和视频超分模型,它能够将输入的图片和视频超分四倍。 |
| [黑白图像上色](https://www.paddlepaddle.org.cn/hubdetail?name=deoldify&en_category=ImageEditing) | deoldify是用于图像和视频的着色渲染模型,该模型能够实现给黑白照片和视频恢复原彩。 |
| [老照片修复](https://www.paddlepaddle.org.cn/hubdetail?name=photo_restoration&en_category=ImageEditing) | 针对老照片修复的模型。它主要由两个部分组成:着色和超分。|
......@@ -130,4 +130,4 @@ class ColorizePreprocess:
data['real_B_enc'] = paddle.to_tensor(data['real_B_enc'].astype(np.int64))
data['hint_B'] = paddle.to_tensor(data['hint_B'].astype(np.float32))
data['mask_B'] = paddle.to_tensor(data['mask_B'].astype(np.float32))
return data
\ No newline at end of file
return data
......@@ -40,7 +40,6 @@ class UserGuidedColorization(nn.Layer):
load_checkpoint (str): Pretrained checkpoint path.
"""
def __init__(self, use_tanh: bool = True, load_checkpoint: str = None):
super(UserGuidedColorization, self).__init__()
self.input_nc = 4
......@@ -119,8 +118,8 @@ class UserGuidedColorization(nn.Layer):
)
# Conv8
model8up = (Conv2DTranspose(512, 256, kernel_size=4, stride=2, padding=1),)
model3short8 = (Conv2D(256, 256, 3, 1, 1),)
model8up = (Conv2DTranspose(512, 256, kernel_size=4, stride=2, padding=1), )
model3short8 = (Conv2D(256, 256, 3, 1, 1), )
model8 = (
nn.ReLU(),
Conv2D(256, 256, 3, 1, 1),
......@@ -131,20 +130,26 @@ class UserGuidedColorization(nn.Layer):
)
# Conv9
model9up = (Conv2DTranspose(256, 128, kernel_size=4, stride=2, padding=1),)
model2short9 = (Conv2D(128, 128, 3, 1, 1,),)
model9up = (Conv2DTranspose(256, 128, kernel_size=4, stride=2, padding=1), )
model2short9 = (Conv2D(
128,
128,
3,
1,
1,
), )
model9 = (nn.ReLU(), Conv2D(128, 128, 3, 1, 1), nn.ReLU(), nn.BatchNorm(128))
# Conv10
model10up = (Conv2DTranspose(128, 128, kernel_size=4, stride=2, padding=1),)
model1short10 = (Conv2D(64, 128, 3, 1, 1),)
model10up = (Conv2DTranspose(128, 128, kernel_size=4, stride=2, padding=1), )
model1short10 = (Conv2D(64, 128, 3, 1, 1), )
model10 = (nn.ReLU(), Conv2D(128, 128, 3, 1, 1), nn.LeakyReLU(negative_slope=0.2))
model_class = (Conv2D(256, 529, 1),)
model_class = (Conv2D(256, 529, 1), )
if use_tanh:
model_out = (Conv2D(128, 2, 1, 1, 0, 1), nn.Tanh())
else:
model_out = (Conv2D(128, 2, 1, 1, 0, 1),)
model_out = (Conv2D(128, 2, 1, 1, 0, 1), )
self.model1 = nn.Sequential(*model1)
self.model2 = nn.Sequential(*model2)
......@@ -178,10 +183,10 @@ class UserGuidedColorization(nn.Layer):
print("load pretrained checkpoint success")
def transforms(self, images: str) -> callable:
transform = T.Compose([T.Resize((256, 256), interpolation='NEAREST'), T.RGB2LAB()], to_rgb=True)
return transform(images)
def set_config(self, classification: bool = True, prob: float = 1., num_point: int = None):
self.classification = classification
self.pre_func = ColorizePreprocess(ab_thresh=0., p=prob, points=num_point)
......@@ -221,4 +226,4 @@ class UserGuidedColorization(nn.Layer):
conv10_2 = self.model10(conv10_up)
out_reg = self.model_out(conv10_2)
return out_class, out_reg
\ No newline at end of file
return out_class, out_reg
## **更好用户体验,建议参考WEB端官方文档 -> [【图像生成】](https://www.paddlepaddle.org.cn/hubdetail)**
### 图像生成
图像生成是指根据输入向量,生成目标图像。这里的输入向量可以是随机的噪声或用户指定的条件向量。具体的应用场景有:风格迁移、图像动漫画等。
- 精选推荐模型
| 模型名称 | 模型简介 |
| ------------------------------------------------------------ | ------------------------------------------------------------ |
| [艺术风格迁移](https://www.paddlepaddle.org.cn/hubdetail?name=stylepro_artistic&en_category=GANs) | 将给定的图像转换为任意的艺术风格。确保模型高保真还原内容图片的语义细节信息与风格图片的风格信息。 |
| [图像动漫化-新海诚](https://www.paddlepaddle.org.cn/hubdetail?name=animegan_v2_shinkai_53&en_category=GANs) | AnimeGAN V2 图像风格转换模型, 模型可将输入的图像转换成新海诚动漫风格 |
| [图像动漫化-宫崎骏](https://www.paddlepaddle.org.cn/hubdetail?name=animegan_v2_hayao_64&en_category=GANs) | AnimeGAN V2 图像风格转换模型, 模型可将输入的图像转换成宫崎骏动漫风格|
| [图像动漫化-今敏红辣椒](https://www.paddlepaddle.org.cn/hubdetail?name=animegan_v2_paprika_97&en_category=GANs) | AnimeGAN V2 图像风格转换模型, 模型可将输入的图像转换成今敏红辣椒动漫风格。|
......@@ -14,7 +14,6 @@ from paddlehub.module.cv_module import StyleTransferModule
class GramMatrix(nn.Layer):
"""Calculate gram matrix"""
def forward(self, y):
(b, ch, h, w) = y.shape
features = y.reshape((b, ch, w * h))
......@@ -25,7 +24,6 @@ class GramMatrix(nn.Layer):
class ConvLayer(nn.Layer):
"""Basic conv layer with reflection padding layer"""
def __init__(self, in_channels: int, out_channels: int, kernel_size: int, stride: int):
super(ConvLayer, self).__init__()
pad = int(np.floor(kernel_size / 2))
......@@ -53,7 +51,6 @@ class UpsampleConvLayer(nn.Layer):
Return:
img(paddle.Tensor): UpsampleConvLayer output.
"""
def __init__(self, in_channels: int, out_channels: int, kernel_size: int, stride: int, upsample=None):
super(UpsampleConvLayer, self).__init__()
self.upsample = upsample
......@@ -88,7 +85,6 @@ class Bottleneck(nn.Layer):
Return:
img(paddle.Tensor): Bottleneck output.
"""
def __init__(self,
inplanes: int,
planes: int,
......@@ -102,8 +98,8 @@ class Bottleneck(nn.Layer):
self.residual_layer = nn.Conv2D(inplanes, planes * self.expansion, kernel_size=1, stride=stride)
conv_block = (norm_layer(inplanes), nn.ReLU(), nn.Conv2D(inplanes, planes, kernel_size=1, stride=1),
norm_layer(planes), nn.ReLU(), ConvLayer(planes, planes, kernel_size=3, stride=stride),
norm_layer(planes), nn.ReLU(), nn.Conv2D(
planes, planes * self.expansion, kernel_size=1, stride=1))
norm_layer(planes), nn.ReLU(), nn.Conv2D(planes, planes * self.expansion, kernel_size=1,
stride=1))
self.conv_block = nn.Sequential(*conv_block)
def forward(self, x: paddle.Tensor):
......@@ -129,12 +125,14 @@ class UpBottleneck(nn.Layer):
Return:
img(paddle.Tensor): UpBottleneck output.
"""
def __init__(self, inplanes: int, planes: int, stride: int = 2, norm_layer: nn.Layer = nn.BatchNorm2D):
super(UpBottleneck, self).__init__()
self.expansion = 4
self.residual_layer = UpsampleConvLayer(
inplanes, planes * self.expansion, kernel_size=1, stride=1, upsample=stride)
self.residual_layer = UpsampleConvLayer(inplanes,
planes * self.expansion,
kernel_size=1,
stride=1,
upsample=stride)
conv_block = []
conv_block += [norm_layer(inplanes), nn.ReLU(), nn.Conv2D(inplanes, planes, kernel_size=1, stride=1)]
conv_block += [
......@@ -165,7 +163,6 @@ class Inspiration(nn.Layer):
Return:
img(paddle.Tensor): UpBottleneck output.
"""
def __init__(self, C: int, B: int = 1):
super(Inspiration, self).__init__()
......@@ -182,8 +179,8 @@ class Inspiration(nn.Layer):
self.P = paddle.bmm(self.weight.expand_as(self.G), self.G)
x = paddle.bmm(
self.P.transpose((0, 2, 1)).expand((X.shape[0], self.C, self.C)), X.reshape((X.shape[0], X.shape[1],
-1))).reshape(X.shape)
self.P.transpose((0, 2, 1)).expand((X.shape[0], self.C, self.C)), X.reshape(
(X.shape[0], X.shape[1], -1))).reshape(X.shape)
return x
def __repr__(self):
......@@ -193,7 +190,6 @@ class Inspiration(nn.Layer):
class Vgg16(nn.Layer):
""" First four layers from Vgg16."""
def __init__(self):
super(Vgg16, self).__init__()
self.conv1_1 = nn.Conv2D(3, 64, kernel_size=3, stride=1, padding=1)
......@@ -268,8 +264,12 @@ class MSGNet(nn.Layer):
Return:
img(paddle.Tensor): MSGNet output.
"""
def __init__(self, input_nc=3, output_nc=3, ngf=128, n_blocks=6, norm_layer=nn.InstanceNorm2D,
def __init__(self,
input_nc=3,
output_nc=3,
ngf=128,
n_blocks=6,
norm_layer=nn.InstanceNorm2D,
load_checkpoint=None):
super(MSGNet, self).__init__()
self.gram = GramMatrix()
......@@ -341,4 +341,4 @@ class MSGNet(nn.Layer):
return self._vgg(input)
def forward(self, input: paddle.Tensor):
return self.model(input)
\ No newline at end of file
return self.model(input)
# coding=utf-8
from paddle.fluid.initializer import Constant
from paddle.fluid.param_attr import ParamAttr
import paddle.fluid as fluid
def decoder_net():
x2paddle_22 = fluid.layers.create_parameter(dtype='float32',
shape=[4],
name='x2paddle_22',
attr='x2paddle_22',
default_initializer=Constant(0.0))
x2paddle_36 = fluid.layers.create_parameter(dtype='float32',
shape=[4],
name='x2paddle_36',
attr='x2paddle_36',
default_initializer=Constant(0.0))
x2paddle_44 = fluid.layers.create_parameter(dtype='float32',
shape=[4],
name='x2paddle_44',
attr='x2paddle_44',
default_initializer=Constant(0.0))
x2paddle_input_1 = fluid.layers.data(dtype='float32',
shape=[1, 512, 64, 64],
name='x2paddle_input_1',
append_batch_size=False)
x2paddle_19 = fluid.layers.pad2d(x2paddle_input_1,
pad_value=0.0,
mode='reflect',
paddings=[1, 1, 1, 1],
name='x2paddle_19')
x2paddle_20 = fluid.layers.conv2d(x2paddle_19,
num_filters=256,
filter_size=[3, 3],
stride=[1, 1],
padding=[0, 0],
dilation=[1, 1],
groups=1,
param_attr='x2paddle_1',
name='x2paddle_20',
bias_attr='x2paddle_2')
x2paddle_21 = fluid.layers.relu(x2paddle_20, name='x2paddle_21')
x2paddle_23 = fluid.layers.resize_nearest(x2paddle_21, name='x2paddle_23', out_shape=[128, 128])
x2paddle_24 = fluid.layers.pad2d(x2paddle_23,
pad_value=0.0,
mode='reflect',
paddings=[1, 1, 1, 1],
name='x2paddle_24')
x2paddle_25 = fluid.layers.conv2d(x2paddle_24,
num_filters=256,
filter_size=[3, 3],
stride=[1, 1],
padding=[0, 0],
dilation=[1, 1],
groups=1,
param_attr='x2paddle_3',
name='x2paddle_25',
bias_attr='x2paddle_4')
x2paddle_26 = fluid.layers.relu(x2paddle_25, name='x2paddle_26')
x2paddle_27 = fluid.layers.pad2d(x2paddle_26,
pad_value=0.0,
mode='reflect',
paddings=[1, 1, 1, 1],
name='x2paddle_27')
x2paddle_28 = fluid.layers.conv2d(x2paddle_27,
num_filters=256,
filter_size=[3, 3],
stride=[1, 1],
padding=[0, 0],
dilation=[1, 1],
groups=1,
param_attr='x2paddle_5',
name='x2paddle_28',
bias_attr='x2paddle_6')
x2paddle_29 = fluid.layers.relu(x2paddle_28, name='x2paddle_29')
x2paddle_30 = fluid.layers.pad2d(x2paddle_29,
pad_value=0.0,
mode='reflect',
paddings=[1, 1, 1, 1],
name='x2paddle_30')
x2paddle_31 = fluid.layers.conv2d(x2paddle_30,
num_filters=256,
filter_size=[3, 3],
stride=[1, 1],
padding=[0, 0],
dilation=[1, 1],
groups=1,
param_attr='x2paddle_7',
name='x2paddle_31',
bias_attr='x2paddle_8')
x2paddle_32 = fluid.layers.relu(x2paddle_31, name='x2paddle_32')
x2paddle_33 = fluid.layers.pad2d(x2paddle_32,
pad_value=0.0,
mode='reflect',
paddings=[1, 1, 1, 1],
name='x2paddle_33')
x2paddle_34 = fluid.layers.conv2d(x2paddle_33,
num_filters=128,
filter_size=[3, 3],
stride=[1, 1],
padding=[0, 0],
dilation=[1, 1],
groups=1,
param_attr='x2paddle_9',
name='x2paddle_34',
bias_attr='x2paddle_10')
x2paddle_35 = fluid.layers.relu(x2paddle_34, name='x2paddle_35')
x2paddle_37 = fluid.layers.resize_nearest(x2paddle_35, name='x2paddle_37', out_shape=[256, 256])
x2paddle_38 = fluid.layers.pad2d(x2paddle_37,
pad_value=0.0,
mode='reflect',
paddings=[1, 1, 1, 1],
name='x2paddle_38')
x2paddle_39 = fluid.layers.conv2d(x2paddle_38,
num_filters=128,
filter_size=[3, 3],
stride=[1, 1],
padding=[0, 0],
dilation=[1, 1],
groups=1,
param_attr='x2paddle_11',
name='x2paddle_39',
bias_attr='x2paddle_12')
x2paddle_40 = fluid.layers.relu(x2paddle_39, name='x2paddle_40')
x2paddle_41 = fluid.layers.pad2d(x2paddle_40,
pad_value=0.0,
mode='reflect',
paddings=[1, 1, 1, 1],
name='x2paddle_41')
x2paddle_42 = fluid.layers.conv2d(x2paddle_41,
num_filters=64,
filter_size=[3, 3],
stride=[1, 1],
padding=[0, 0],
dilation=[1, 1],
groups=1,
param_attr='x2paddle_13',
name='x2paddle_42',
bias_attr='x2paddle_14')
x2paddle_43 = fluid.layers.relu(x2paddle_42, name='x2paddle_43')
x2paddle_45 = fluid.layers.resize_nearest(x2paddle_43, name='x2paddle_45', out_shape=[512, 512])
x2paddle_46 = fluid.layers.pad2d(x2paddle_45,
pad_value=0.0,
mode='reflect',
paddings=[1, 1, 1, 1],
name='x2paddle_46')
x2paddle_47 = fluid.layers.conv2d(x2paddle_46,
num_filters=64,
filter_size=[3, 3],
stride=[1, 1],
padding=[0, 0],
dilation=[1, 1],
groups=1,
param_attr='x2paddle_15',
name='x2paddle_47',
bias_attr='x2paddle_16')
x2paddle_48 = fluid.layers.relu(x2paddle_47, name='x2paddle_48')
x2paddle_49 = fluid.layers.pad2d(x2paddle_48,
pad_value=0.0,
mode='reflect',
paddings=[1, 1, 1, 1],
name='x2paddle_49')
x2paddle_50 = fluid.layers.conv2d(x2paddle_49,
num_filters=3,
filter_size=[3, 3],
stride=[1, 1],
padding=[0, 0],
dilation=[1, 1],
groups=1,
param_attr='x2paddle_17',
name='x2paddle_50',
bias_attr='x2paddle_18')
return x2paddle_input_1, x2paddle_50
# coding=utf-8
from paddle.fluid.initializer import Constant
from paddle.fluid.param_attr import ParamAttr
import paddle.fluid as fluid
def encoder_net():
x2paddle_0 = fluid.layers.data(dtype='float32', shape=[1, 3, 512, 512], name='x2paddle_0', append_batch_size=False)
x2paddle_21 = fluid.layers.conv2d(x2paddle_0,
num_filters=3,
filter_size=[1, 1],
stride=[1, 1],
padding=[0, 0],
dilation=[1, 1],
groups=1,
param_attr='x2paddle_1',
name='x2paddle_21',
bias_attr='x2paddle_2')
x2paddle_22 = fluid.layers.pad2d(x2paddle_21,
pad_value=0.0,
mode='reflect',
paddings=[1, 1, 1, 1],
name='x2paddle_22')
x2paddle_23 = fluid.layers.conv2d(x2paddle_22,
num_filters=64,
filter_size=[3, 3],
stride=[1, 1],
padding=[0, 0],
dilation=[1, 1],
groups=1,
param_attr='x2paddle_3',
name='x2paddle_23',
bias_attr='x2paddle_4')
x2paddle_24 = fluid.layers.relu(x2paddle_23, name='x2paddle_24')
x2paddle_25 = fluid.layers.pad2d(x2paddle_24,
pad_value=0.0,
mode='reflect',
paddings=[1, 1, 1, 1],
name='x2paddle_25')
x2paddle_26 = fluid.layers.conv2d(x2paddle_25,
num_filters=64,
filter_size=[3, 3],
stride=[1, 1],
padding=[0, 0],
dilation=[1, 1],
groups=1,
param_attr='x2paddle_5',
name='x2paddle_26',
bias_attr='x2paddle_6')
x2paddle_27 = fluid.layers.relu(x2paddle_26, name='x2paddle_27')
x2paddle_28 = fluid.layers.pool2d(x2paddle_27,
pool_size=[2, 2],
pool_type='max',
pool_stride=[2, 2],
pool_padding=[0, 0],
ceil_mode=False,
name='x2paddle_28',
exclusive=False)
x2paddle_29 = fluid.layers.pad2d(x2paddle_28,
pad_value=0.0,
mode='reflect',
paddings=[1, 1, 1, 1],
name='x2paddle_29')
x2paddle_30 = fluid.layers.conv2d(x2paddle_29,
num_filters=128,
filter_size=[3, 3],
stride=[1, 1],
padding=[0, 0],
dilation=[1, 1],
groups=1,
param_attr='x2paddle_7',
name='x2paddle_30',
bias_attr='x2paddle_8')
x2paddle_31 = fluid.layers.relu(x2paddle_30, name='x2paddle_31')
x2paddle_32 = fluid.layers.pad2d(x2paddle_31,
pad_value=0.0,
mode='reflect',
paddings=[1, 1, 1, 1],
name='x2paddle_32')
x2paddle_33 = fluid.layers.conv2d(x2paddle_32,
num_filters=128,
filter_size=[3, 3],
stride=[1, 1],
padding=[0, 0],
dilation=[1, 1],
groups=1,
param_attr='x2paddle_9',
name='x2paddle_33',
bias_attr='x2paddle_10')
x2paddle_34 = fluid.layers.relu(x2paddle_33, name='x2paddle_34')
x2paddle_35 = fluid.layers.pool2d(x2paddle_34,
pool_size=[2, 2],
pool_type='max',
pool_stride=[2, 2],
pool_padding=[0, 0],
ceil_mode=False,
name='x2paddle_35',
exclusive=False)
x2paddle_36 = fluid.layers.pad2d(x2paddle_35,
pad_value=0.0,
mode='reflect',
paddings=[1, 1, 1, 1],
name='x2paddle_36')
x2paddle_37 = fluid.layers.conv2d(x2paddle_36,
num_filters=256,
filter_size=[3, 3],
stride=[1, 1],
padding=[0, 0],
dilation=[1, 1],
groups=1,
param_attr='x2paddle_11',
name='x2paddle_37',
bias_attr='x2paddle_12')
x2paddle_38 = fluid.layers.relu(x2paddle_37, name='x2paddle_38')
x2paddle_39 = fluid.layers.pad2d(x2paddle_38,
pad_value=0.0,
mode='reflect',
paddings=[1, 1, 1, 1],
name='x2paddle_39')
x2paddle_40 = fluid.layers.conv2d(x2paddle_39,
num_filters=256,
filter_size=[3, 3],
stride=[1, 1],
padding=[0, 0],
dilation=[1, 1],
groups=1,
param_attr='x2paddle_13',
name='x2paddle_40',
bias_attr='x2paddle_14')
x2paddle_41 = fluid.layers.relu(x2paddle_40, name='x2paddle_41')
x2paddle_42 = fluid.layers.pad2d(x2paddle_41,
pad_value=0.0,
mode='reflect',
paddings=[1, 1, 1, 1],
name='x2paddle_42')
x2paddle_43 = fluid.layers.conv2d(x2paddle_42,
num_filters=256,
filter_size=[3, 3],
stride=[1, 1],
padding=[0, 0],
dilation=[1, 1],
groups=1,
param_attr='x2paddle_15',
name='x2paddle_43',
bias_attr='x2paddle_16')
x2paddle_44 = fluid.layers.relu(x2paddle_43, name='x2paddle_44')
x2paddle_45 = fluid.layers.pad2d(x2paddle_44,
pad_value=0.0,
mode='reflect',
paddings=[1, 1, 1, 1],
name='x2paddle_45')
x2paddle_46 = fluid.layers.conv2d(x2paddle_45,
num_filters=256,
filter_size=[3, 3],
stride=[1, 1],
padding=[0, 0],
dilation=[1, 1],
groups=1,
param_attr='x2paddle_17',
name='x2paddle_46',
bias_attr='x2paddle_18')
x2paddle_47 = fluid.layers.relu(x2paddle_46, name='x2paddle_47')
x2paddle_48 = fluid.layers.pool2d(x2paddle_47,
pool_size=[2, 2],
pool_type='max',
pool_stride=[2, 2],
pool_padding=[0, 0],
ceil_mode=False,
name='x2paddle_48',
exclusive=False)
x2paddle_49 = fluid.layers.pad2d(x2paddle_48,
pad_value=0.0,
mode='reflect',
paddings=[1, 1, 1, 1],
name='x2paddle_49')
x2paddle_50 = fluid.layers.conv2d(x2paddle_49,
num_filters=512,
filter_size=[3, 3],
stride=[1, 1],
padding=[0, 0],
dilation=[1, 1],
groups=1,
param_attr='x2paddle_19',
name='x2paddle_50',
bias_attr='x2paddle_20')
x2paddle_51 = fluid.layers.relu(x2paddle_50, name='x2paddle_51')
return x2paddle_0, x2paddle_51
......@@ -140,14 +140,13 @@ class StyleProjection(hub.Module):
encode_program, encode_feeded_var_names, encode_target_vars = fluid.io.load_inference_model(
dirname=self.pretrained_encoder_net, executor=exe)
fluid.io.save_inference_model(
dirname=dirname,
main_program=encode_program,
executor=exe,
feeded_var_names=encode_feeded_var_names,
target_vars=encode_target_vars,
model_filename=model_filename,
params_filename=params_filename)
fluid.io.save_inference_model(dirname=dirname,
main_program=encode_program,
executor=exe,
feeded_var_names=encode_feeded_var_names,
target_vars=encode_target_vars,
model_filename=model_filename,
params_filename=params_filename)
def _save_decode_model(self, dirname, model_filename=None, params_filename=None, combined=True):
if combined:
......@@ -159,14 +158,13 @@ class StyleProjection(hub.Module):
decode_program, decode_feeded_var_names, decode_target_vars = fluid.io.load_inference_model(
dirname=self.pretrained_decoder_net, executor=exe)
fluid.io.save_inference_model(
dirname=dirname,
main_program=decode_program,
executor=exe,
feeded_var_names=decode_feeded_var_names,
target_vars=decode_target_vars,
model_filename=model_filename,
params_filename=params_filename)
fluid.io.save_inference_model(dirname=dirname,
main_program=decode_program,
executor=exe,
feeded_var_names=decode_feeded_var_names,
target_vars=decode_target_vars,
model_filename=model_filename,
params_filename=params_filename)
@serving
def serving_method(self, images, **kwargs):
......@@ -186,11 +184,10 @@ class StyleProjection(hub.Module):
"""
Run as a command.
"""
self.parser = argparse.ArgumentParser(
description="Run the {} module.".format(self.name),
prog='hub run {}'.format(self.name),
usage='%(prog)s',
add_help=True)
self.parser = argparse.ArgumentParser(description="Run the {} module.".format(self.name),
prog='hub run {}'.format(self.name),
usage='%(prog)s',
add_help=True)
self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
self.arg_config_group = self.parser.add_argument_group(
......@@ -202,20 +199,29 @@ class StyleProjection(hub.Module):
paths = [{'content': args.content, 'styles': args.styles.split(',')}]
else:
paths = [{'content': args.content, 'styles': args.styles.split(','), 'weights': list(args.weights)}]
results = self.style_transfer(
paths=paths, alpha=args.alpha, use_gpu=args.use_gpu, output_dir=args.output_dir, visualization=True)
results = self.style_transfer(paths=paths,
alpha=args.alpha,
use_gpu=args.use_gpu,
output_dir=args.output_dir,
visualization=True)
return results
def add_module_config_arg(self):
"""
Add the command config options.
"""
self.arg_config_group.add_argument(
'--use_gpu', type=ast.literal_eval, default=False, help="whether use GPU or not")
self.arg_config_group.add_argument(
'--output_dir', type=str, default='transfer_result', help="The directory to save output images.")
self.arg_config_group.add_argument(
'--visualization', type=ast.literal_eval, default=True, help="whether to save output as images.")
self.arg_config_group.add_argument('--use_gpu',
type=ast.literal_eval,
default=False,
help="whether use GPU or not")
self.arg_config_group.add_argument('--output_dir',
type=str,
default='transfer_result',
help="The directory to save output images.")
self.arg_config_group.add_argument('--visualization',
type=ast.literal_eval,
default=True,
help="whether to save output as images.")
def add_module_input_arg(self):
"""
......@@ -223,7 +229,11 @@ class StyleProjection(hub.Module):
"""
self.arg_input_group.add_argument('--content', type=str, help="path to content.")
self.arg_input_group.add_argument('--styles', type=str, help="path to styles.")
self.arg_input_group.add_argument(
'--weights', type=ast.literal_eval, default=None, help="interpolation weights of styles.")
self.arg_config_group.add_argument(
'--alpha', type=ast.literal_eval, default=1, help="The parameter to control the tranform degree.")
self.arg_input_group.add_argument('--weights',
type=ast.literal_eval,
default=None,
help="interpolation weights of styles.")
self.arg_config_group.add_argument('--alpha',
type=ast.literal_eval,
default=1,
help="The parameter to control the tranform degree.")
## **更好用户体验,建议参考WEB端官方文档 -> [【图像分类】](https://www.paddlepaddle.org.cn/hubdetail)**
### 图像分类
图像分类是根据图像的语义信息对不同类别图像进行区分,是计算机视觉中重要的基础问题,是物体检测、图像分割、物体跟踪、行为分析、人脸识别等其他高层视觉任务的基础,在许多领域都有着广泛的应用。如:安防领域的人脸识别和智能视频分析等,交通领域的交通场景识别,互联网领域基于内容的图像检索和相册自动归类,医学领域的图像识别等。
**注:** **如果你是资深开发者,那可以随意按需使用****假如你是新手,服务器端优先选择Resnet50,移动端优先选择MobileNetV3**
- 精选模型推荐
| | **模型名称** | **模型特色** |
| ---------- | :----------------------------------------------------------- | ---------------------------------------------------------- |
| 图像分类 | [菜品识别](https://www.paddlepaddle.org.cn/hubdetail?name=resnet50_vd_dishes&en_category=ImageClassification) | 私有数据集训练,支持8416种菜品的分类识别,适合进一步菜品方向微调 |
| 图像分类 | [动物识别](https://www.paddlepaddle.org.cn/hubdetail?name=resnet50_vd_animals&en_category=ImageClassification) | 私有数据集训练,支持7978种动物的分类识别,适合进一步动物方向微调 |
| 图像分类 | [野生动物制品识别](https://www.paddlepaddle.org.cn/hubdetail?name=resnet50_vd_wildanimals&en_category=ImageClassification) | 支持'象牙制品', '象牙', '大象', '虎皮', '老虎', '虎牙/虎爪/虎骨', '穿山甲甲片', '穿山甲', '穿山甲爪子', '其他' 这十个标签的识别。 |
- 更多模型
| **模型名称** | **模型简介** |
| - | - |
| [AlexNet](https://www.paddlepaddle.org.cn/hubdetail?name=alexnet_imagenet&en_category=ImageClassification) | 首次在 CNN 中成功的应用了 ReLU, Dropout 和 LRN,并使用 GPU 进行运算加速 |
| [VGG19](https://www.paddlepaddle.org.cn/hubdetail?name=vgg19_imagenet&en_category=ImageClassification) | 在 AlexNet 的基础上使用 3*3 小卷积核,增加网络深度,具有很好的泛化能力 |
| [GoogLeNet](https://github.com/PaddlePaddle/models/tree/release/1.7/PaddleCV/image_classification) | 在不增加计算负载的前提下增加了网络的深度和宽度,性能更加优越 |
| [ResNet50](https://www.paddlepaddle.org.cn/hubdetail?name=resnet_v2_50_imagenet&en_category=ImageClassification) | Residual Network,引入了新的残差结构,解决了随着网络加深,准确率下降的问题 |
| [Inceptionv4](https://www.paddlepaddle.org.cn/hubdetail?name=inception_v4_imagenet&en_category=ImageClassification) | 将 Inception 模块与 Residual Connection 进行结合,通过ResNet的结构极大地加速训练并获得性能的提升 |
| [MobileNetV2](https://www.paddlepaddle.org.cn/hubdetail?name=mobilenet_v2_imagenet&en_category=ImageClassification) | MobileNet结构的微调,直接在 thinner 的 bottleneck层上进行 skip learning 连接以及对 bottleneck layer 不进行 ReLu 非线性处理可取得更好的结果 |
| [se_resnext50](https://www.paddlepaddle.org.cn/hubdetail?name=se_resnext50_32x4d_imagenet&en_category=ImageClassification) | 在ResNeXt 基础、上加入了 SE(Sequeeze-and-Excitation) 模块,提高了识别准确率,在 ILSVRC 2017 的分类项目中取得了第一名 |
| [ShuffleNetV2](https://www.paddlepaddle.org.cn/hubdetail?name=shufflenet_v2_imagenet&en_category=ImageClassification) | ECCV2018,轻量级 CNN 网络,在速度和准确度之间做了很好地平衡。在同等复杂度下,比 ShuffleNet 和 MobileNetv2 更准确,更适合移动端以及无人车领域 |
| [efficientNetb7](https://www.paddlepaddle.org.cn/hubdetail?name=efficientnetb7_imagenet&en_category=ImageClassification) | 同时对模型的分辨率,通道数和深度进行缩放,用极少的参数就可以达到SOTA的精度。 |
| [xception71](https://www.paddlepaddle.org.cn/hubdetail?name=xception71_imagenet&en_category=ImageClassification) | 对inception-v3的改进,用深度可分离卷积代替普通卷积,降低参数量同时提高了精度。 |
| [dpn107](https://www.paddlepaddle.org.cn/hubdetail?name=dpn107_imagenet&en_category=ImageClassification) | 融合了densenet和resnext的特点。 |
| [DarkNet53](https://www.paddlepaddle.org.cn/hubdetail?name=darknet53_imagenet&en_category=ImageClassification) | 检测框架yolov3使用的backbone,在分类和检测任务上都有不错表现。 |
| [DenseNet161](https://www.paddlepaddle.org.cn/hubdetail?name=densenet161_imagenet&en_category=ImageClassification) | 提出了密集连接的网络结构,更加有利于信息流的传递。 |
| [ResNeXt152_vd](https://www.paddlepaddle.org.cn/hubdetail?name=resnext152_64x4d_imagenet&en_category=ImageClassification) | 提出了cardinatity的概念,用于作为模型复杂度的另外一个度量,有效地提升模型精度。 |
## **更好用户体验,建议参考WEB端官方文档 -> [【人脸检测】](https://www.paddlepaddle.org.cn/hubdetail)**
### 人脸检测
人脸检测属于目标检测的一个重要分支,由于近年来安防市场、人脸识别、人脸安全方面的原因,成为目标检测中最重要的任务之一。
- 推荐模型
| 模型名称 | 模型简介 |
| ------------------------------------------------------------ | ------------------------------------------------------------ |
| [人脸检测](https://www.paddlepaddle.org.cn/hubdetail?name=pyramidbox_lite_server&en_category=FaceDetection) | 百度自研,18年3月WIDER Face 数据集**冠军模型**, |
| [超轻量人脸检测](https://www.paddlepaddle.org.cn/hubdetail?name=ultra_light_fast_generic_face_detector_1mb_640&en_category=FaceDetection) | 针对边缘计算设备或低算力设备(如用ARM推理)设计的实时超轻量级通用人脸检测模型,可以在低算力设备中如用ARM进行实时的通用场景的人脸检测推理。 |
| [口罩人脸检测与识别](https://www.paddlepaddle.org.cn/hubdetail?name=pyramidbox_lite_server_mask&en_category=FaceDetection) | 业界**首个开源口罩人脸检测与识别模型**,引起广泛关注。 |
## **更好用户体验,建议参考WEB端官方文档 -> [【关键点检测】](https://www.paddlepaddle.org.cn/hubdetail)**
#### 关键点检测
人体骨骼关键点检测 (Pose Estimation) 主要检测人体的一些关键点,如关节,五官等,通过关键点描述人体骨骼信息。人体骨骼关键点检测对于描述人体姿态,预测人体行为至关重要。是诸多计算机视觉任务的基础,例如动作分类,异常行为检测,以及自动驾驶等等。
- 精选模型
| 模型名称 | 模型简介 |
| ------------------------------------------------------------ | ------------------------------------------------------------ |
| [单人--人体骨骼关键点检测](https://www.paddlepaddle.org.cn/hubdetail?name=human_pose_estimation_resnet50_mpii&en_category=KeyPointDetection) | 可用于行为识别、人物跟踪、步态识别等相关领域。具体应用主要集中在智能视频监控,病人监护系统,人机交互,虚拟现实,人体动画,智能家居,智能安防,运动员辅助训练等等。 |
| [多人-人体骨骼关键点检测](https://www.paddlepaddle.org.cn/hubdetail?name=openpose_body_estimation&en_category=KeyPointDetection) | 可用于行为识别、人物跟踪、步态识别等相关领域。具体应用主要集中在智能视频监控,病人监护系统,人机交互,虚拟现实,人体动画,智能家居,智能安防,运动员辅助训练等等。 |
| [面部关键点检测](https://www.paddlepaddle.org.cn/hubdetail?name=face_landmark_localization&en_category=KeyPointDetection) |可用于人脸识别、表情分析、三维人脸重建及三维动画等其它人脸相关问题,支持同一张图中的多个人脸检测 |
| [手部关键点检测](https://www.paddlepaddle.org.cn/hubdetail?name=hand_pose_localization&en_category=KeyPointDetection) |可用于手势识别,配合人体骨骼关键点,可用于异常行为检测等多种场景 |
## **更好用户体验,建议参考WEB端官方文档 -> [【目标检测】](https://www.paddlepaddle.org.cn/hubdetail)**
### 目标检测
目标检测任务的目标是给定一张图像或是一个视频帧,让计算机找出其中所有目标的位置,并给出每个目标的具体类别。对于计算机而言,能够“看到”的是图像被编码之后的数字,但很难解图像或是视频帧中出现了人或是物体这样的高层语义概念,也就更加难以定位目标出现在图像中哪个区域。
- 精选推荐模型
| 模型名称 | 模型简介 |
| ------------------------------------------------------------ | ------------------------------------------------------------ |
| [YOLOv3](https://www.paddlepaddle.org.cn/hubdetail?name=yolov3_darknet53_coco2017&en_category=ObjectDetection) | 实现精度相比原作者**提高5.9 个绝对百分点**,性能极致优化。 |
| [行人检测](https://www.paddlepaddle.org.cn/hubdetail?name=yolov3_darknet53_pedestrian&en_category=ObjectDetection) | 百度自研模型,海量私有数据集训练,可以应用于智能视频监控,人体行为分析,客流统计系统,智能交通等领域 |
| [车辆检测](https://www.paddlepaddle.org.cn/hubdetail?name=yolov3_darknet53_vehicles&en_category=ObjectDetection) | 百度自研模型,支持car (汽车),truck (卡车),bus (公交车),motorbike (摩托车),tricycle (三轮车)等车型的识别 |
## **更好用户体验,建议参考WEB端官方文档 -> [【图像分割】](https://www.paddlepaddle.org.cn/hubdetail)**
### 图像分割
图像语义分割顾名思义是将图像像素按照表达的语义含义的不同进行分组/分割,图像语义是指对图像内容的理解,例如,能够描绘出什么物体在哪里做了什么事情等,分割是指对图片中的每个像素点进行标注,标注属于哪一类别。近年来用在无人车驾驶技术中分割街景来避让行人和车辆、医疗影像分析中辅助诊断等。
- 精选模型
| 模型名称 | 模型简介 |
| ------------------------------------------------------------ | ------------------------------------------------------------ |
| [人像分割](https://www.paddlepaddle.org.cn/hubdetail?name=deeplabv3p_xception65_humanseg&en_category=ImageSegmentation) | 百度**自建数据集**训练,人像分割效果卓越。 |
| [人体解析](https://www.paddlepaddle.org.cn/hubdetail?name=ace2p&en_category=ImageSegmentation) | CVPR2019 LIP挑战赛中**满贯三冠王**。人体解析任务必选。 |
| [肺炎CT影像分析](https://www.paddlepaddle.org.cn/hubdetail?name=Pneumonia_CT_LKM_PP&en_category=ImageSegmentation) | 助力连心医疗开源**业界首个**肺炎CT影像分析模型
# coding=utf-8
from paddle.fluid.initializer import Constant
from paddle.fluid.param_attr import ParamAttr
import paddle.fluid as fluid
def decoder_net():
x2paddle_22 = fluid.layers.create_parameter(
dtype='float32', shape=[4], name='x2paddle_22', attr='x2paddle_22', default_initializer=Constant(0.0))
x2paddle_36 = fluid.layers.create_parameter(
dtype='float32', shape=[4], name='x2paddle_36', attr='x2paddle_36', default_initializer=Constant(0.0))
x2paddle_44 = fluid.layers.create_parameter(
dtype='float32', shape=[4], name='x2paddle_44', attr='x2paddle_44', default_initializer=Constant(0.0))
x2paddle_input_1 = fluid.layers.data(
dtype='float32', shape=[1, 512, 64, 64], name='x2paddle_input_1', append_batch_size=False)
x2paddle_19 = fluid.layers.pad2d(
x2paddle_input_1, pad_value=0.0, mode='reflect', paddings=[1, 1, 1, 1], name='x2paddle_19')
x2paddle_20 = fluid.layers.conv2d(
x2paddle_19,
num_filters=256,
filter_size=[3, 3],
stride=[1, 1],
padding=[0, 0],
dilation=[1, 1],
groups=1,
param_attr='x2paddle_1',
name='x2paddle_20',
bias_attr='x2paddle_2')
x2paddle_21 = fluid.layers.relu(x2paddle_20, name='x2paddle_21')
x2paddle_23 = fluid.layers.resize_nearest(x2paddle_21, name='x2paddle_23', out_shape=[128, 128])
x2paddle_24 = fluid.layers.pad2d(
x2paddle_23, pad_value=0.0, mode='reflect', paddings=[1, 1, 1, 1], name='x2paddle_24')
x2paddle_25 = fluid.layers.conv2d(
x2paddle_24,
num_filters=256,
filter_size=[3, 3],
stride=[1, 1],
padding=[0, 0],
dilation=[1, 1],
groups=1,
param_attr='x2paddle_3',
name='x2paddle_25',
bias_attr='x2paddle_4')
x2paddle_26 = fluid.layers.relu(x2paddle_25, name='x2paddle_26')
x2paddle_27 = fluid.layers.pad2d(
x2paddle_26, pad_value=0.0, mode='reflect', paddings=[1, 1, 1, 1], name='x2paddle_27')
x2paddle_28 = fluid.layers.conv2d(
x2paddle_27,
num_filters=256,
filter_size=[3, 3],
stride=[1, 1],
padding=[0, 0],
dilation=[1, 1],
groups=1,
param_attr='x2paddle_5',
name='x2paddle_28',
bias_attr='x2paddle_6')
x2paddle_29 = fluid.layers.relu(x2paddle_28, name='x2paddle_29')
x2paddle_30 = fluid.layers.pad2d(
x2paddle_29, pad_value=0.0, mode='reflect', paddings=[1, 1, 1, 1], name='x2paddle_30')
x2paddle_31 = fluid.layers.conv2d(
x2paddle_30,
num_filters=256,
filter_size=[3, 3],
stride=[1, 1],
padding=[0, 0],
dilation=[1, 1],
groups=1,
param_attr='x2paddle_7',
name='x2paddle_31',
bias_attr='x2paddle_8')
x2paddle_32 = fluid.layers.relu(x2paddle_31, name='x2paddle_32')
x2paddle_33 = fluid.layers.pad2d(
x2paddle_32, pad_value=0.0, mode='reflect', paddings=[1, 1, 1, 1], name='x2paddle_33')
x2paddle_34 = fluid.layers.conv2d(
x2paddle_33,
num_filters=128,
filter_size=[3, 3],
stride=[1, 1],
padding=[0, 0],
dilation=[1, 1],
groups=1,
param_attr='x2paddle_9',
name='x2paddle_34',
bias_attr='x2paddle_10')
x2paddle_35 = fluid.layers.relu(x2paddle_34, name='x2paddle_35')
x2paddle_37 = fluid.layers.resize_nearest(x2paddle_35, name='x2paddle_37', out_shape=[256, 256])
x2paddle_38 = fluid.layers.pad2d(
x2paddle_37, pad_value=0.0, mode='reflect', paddings=[1, 1, 1, 1], name='x2paddle_38')
x2paddle_39 = fluid.layers.conv2d(
x2paddle_38,
num_filters=128,
filter_size=[3, 3],
stride=[1, 1],
padding=[0, 0],
dilation=[1, 1],
groups=1,
param_attr='x2paddle_11',
name='x2paddle_39',
bias_attr='x2paddle_12')
x2paddle_40 = fluid.layers.relu(x2paddle_39, name='x2paddle_40')
x2paddle_41 = fluid.layers.pad2d(
x2paddle_40, pad_value=0.0, mode='reflect', paddings=[1, 1, 1, 1], name='x2paddle_41')
x2paddle_42 = fluid.layers.conv2d(
x2paddle_41,
num_filters=64,
filter_size=[3, 3],
stride=[1, 1],
padding=[0, 0],
dilation=[1, 1],
groups=1,
param_attr='x2paddle_13',
name='x2paddle_42',
bias_attr='x2paddle_14')
x2paddle_43 = fluid.layers.relu(x2paddle_42, name='x2paddle_43')
x2paddle_45 = fluid.layers.resize_nearest(x2paddle_43, name='x2paddle_45', out_shape=[512, 512])
x2paddle_46 = fluid.layers.pad2d(
x2paddle_45, pad_value=0.0, mode='reflect', paddings=[1, 1, 1, 1], name='x2paddle_46')
x2paddle_47 = fluid.layers.conv2d(
x2paddle_46,
num_filters=64,
filter_size=[3, 3],
stride=[1, 1],
padding=[0, 0],
dilation=[1, 1],
groups=1,
param_attr='x2paddle_15',
name='x2paddle_47',
bias_attr='x2paddle_16')
x2paddle_48 = fluid.layers.relu(x2paddle_47, name='x2paddle_48')
x2paddle_49 = fluid.layers.pad2d(
x2paddle_48, pad_value=0.0, mode='reflect', paddings=[1, 1, 1, 1], name='x2paddle_49')
x2paddle_50 = fluid.layers.conv2d(
x2paddle_49,
num_filters=3,
filter_size=[3, 3],
stride=[1, 1],
padding=[0, 0],
dilation=[1, 1],
groups=1,
param_attr='x2paddle_17',
name='x2paddle_50',
bias_attr='x2paddle_18')
return x2paddle_input_1, x2paddle_50
# coding=utf-8
from paddle.fluid.initializer import Constant
from paddle.fluid.param_attr import ParamAttr
import paddle.fluid as fluid
def encoder_net():
x2paddle_0 = fluid.layers.data(dtype='float32', shape=[1, 3, 512, 512], name='x2paddle_0', append_batch_size=False)
x2paddle_21 = fluid.layers.conv2d(
x2paddle_0,
num_filters=3,
filter_size=[1, 1],
stride=[1, 1],
padding=[0, 0],
dilation=[1, 1],
groups=1,
param_attr='x2paddle_1',
name='x2paddle_21',
bias_attr='x2paddle_2')
x2paddle_22 = fluid.layers.pad2d(
x2paddle_21, pad_value=0.0, mode='reflect', paddings=[1, 1, 1, 1], name='x2paddle_22')
x2paddle_23 = fluid.layers.conv2d(
x2paddle_22,
num_filters=64,
filter_size=[3, 3],
stride=[1, 1],
padding=[0, 0],
dilation=[1, 1],
groups=1,
param_attr='x2paddle_3',
name='x2paddle_23',
bias_attr='x2paddle_4')
x2paddle_24 = fluid.layers.relu(x2paddle_23, name='x2paddle_24')
x2paddle_25 = fluid.layers.pad2d(
x2paddle_24, pad_value=0.0, mode='reflect', paddings=[1, 1, 1, 1], name='x2paddle_25')
x2paddle_26 = fluid.layers.conv2d(
x2paddle_25,
num_filters=64,
filter_size=[3, 3],
stride=[1, 1],
padding=[0, 0],
dilation=[1, 1],
groups=1,
param_attr='x2paddle_5',
name='x2paddle_26',
bias_attr='x2paddle_6')
x2paddle_27 = fluid.layers.relu(x2paddle_26, name='x2paddle_27')
x2paddle_28 = fluid.layers.pool2d(
x2paddle_27,
pool_size=[2, 2],
pool_type='max',
pool_stride=[2, 2],
pool_padding=[0, 0],
ceil_mode=False,
name='x2paddle_28',
exclusive=False)
x2paddle_29 = fluid.layers.pad2d(
x2paddle_28, pad_value=0.0, mode='reflect', paddings=[1, 1, 1, 1], name='x2paddle_29')
x2paddle_30 = fluid.layers.conv2d(
x2paddle_29,
num_filters=128,
filter_size=[3, 3],
stride=[1, 1],
padding=[0, 0],
dilation=[1, 1],
groups=1,
param_attr='x2paddle_7',
name='x2paddle_30',
bias_attr='x2paddle_8')
x2paddle_31 = fluid.layers.relu(x2paddle_30, name='x2paddle_31')
x2paddle_32 = fluid.layers.pad2d(
x2paddle_31, pad_value=0.0, mode='reflect', paddings=[1, 1, 1, 1], name='x2paddle_32')
x2paddle_33 = fluid.layers.conv2d(
x2paddle_32,
num_filters=128,
filter_size=[3, 3],
stride=[1, 1],
padding=[0, 0],
dilation=[1, 1],
groups=1,
param_attr='x2paddle_9',
name='x2paddle_33',
bias_attr='x2paddle_10')
x2paddle_34 = fluid.layers.relu(x2paddle_33, name='x2paddle_34')
x2paddle_35 = fluid.layers.pool2d(
x2paddle_34,
pool_size=[2, 2],
pool_type='max',
pool_stride=[2, 2],
pool_padding=[0, 0],
ceil_mode=False,
name='x2paddle_35',
exclusive=False)
x2paddle_36 = fluid.layers.pad2d(
x2paddle_35, pad_value=0.0, mode='reflect', paddings=[1, 1, 1, 1], name='x2paddle_36')
x2paddle_37 = fluid.layers.conv2d(
x2paddle_36,
num_filters=256,
filter_size=[3, 3],
stride=[1, 1],
padding=[0, 0],
dilation=[1, 1],
groups=1,
param_attr='x2paddle_11',
name='x2paddle_37',
bias_attr='x2paddle_12')
x2paddle_38 = fluid.layers.relu(x2paddle_37, name='x2paddle_38')
x2paddle_39 = fluid.layers.pad2d(
x2paddle_38, pad_value=0.0, mode='reflect', paddings=[1, 1, 1, 1], name='x2paddle_39')
x2paddle_40 = fluid.layers.conv2d(
x2paddle_39,
num_filters=256,
filter_size=[3, 3],
stride=[1, 1],
padding=[0, 0],
dilation=[1, 1],
groups=1,
param_attr='x2paddle_13',
name='x2paddle_40',
bias_attr='x2paddle_14')
x2paddle_41 = fluid.layers.relu(x2paddle_40, name='x2paddle_41')
x2paddle_42 = fluid.layers.pad2d(
x2paddle_41, pad_value=0.0, mode='reflect', paddings=[1, 1, 1, 1], name='x2paddle_42')
x2paddle_43 = fluid.layers.conv2d(
x2paddle_42,
num_filters=256,
filter_size=[3, 3],
stride=[1, 1],
padding=[0, 0],
dilation=[1, 1],
groups=1,
param_attr='x2paddle_15',
name='x2paddle_43',
bias_attr='x2paddle_16')
x2paddle_44 = fluid.layers.relu(x2paddle_43, name='x2paddle_44')
x2paddle_45 = fluid.layers.pad2d(
x2paddle_44, pad_value=0.0, mode='reflect', paddings=[1, 1, 1, 1], name='x2paddle_45')
x2paddle_46 = fluid.layers.conv2d(
x2paddle_45,
num_filters=256,
filter_size=[3, 3],
stride=[1, 1],
padding=[0, 0],
dilation=[1, 1],
groups=1,
param_attr='x2paddle_17',
name='x2paddle_46',
bias_attr='x2paddle_18')
x2paddle_47 = fluid.layers.relu(x2paddle_46, name='x2paddle_47')
x2paddle_48 = fluid.layers.pool2d(
x2paddle_47,
pool_size=[2, 2],
pool_type='max',
pool_stride=[2, 2],
pool_padding=[0, 0],
ceil_mode=False,
name='x2paddle_48',
exclusive=False)
x2paddle_49 = fluid.layers.pad2d(
x2paddle_48, pad_value=0.0, mode='reflect', paddings=[1, 1, 1, 1], name='x2paddle_49')
x2paddle_50 = fluid.layers.conv2d(
x2paddle_49,
num_filters=512,
filter_size=[3, 3],
stride=[1, 1],
padding=[0, 0],
dilation=[1, 1],
groups=1,
param_attr='x2paddle_19',
name='x2paddle_50',
bias_attr='x2paddle_20')
x2paddle_51 = fluid.layers.relu(x2paddle_50, name='x2paddle_51')
return x2paddle_0, x2paddle_51
## **更好用户体验,建议参考WEB端官方文档 -> [【文字识别】](https://www.paddlepaddle.org.cn/hubdetail)**
### 文字识别
文字识别(OCR)是计算机视觉重要任务之一,主要用于图像中文本信息的提取,具有重要的产业实践意义。
- 推荐模型
| 模型名称 | 模型简介 |
| ------------------------------------------------------------ | ------------------------------------------------------------ |
| [超轻量-中英文OCR文字识别](https://www.paddlepaddle.org.cn/hubdetail?name=chinese_ocr_db_crnn_mobile&en_category=TextRecognition) | 业界开源最小,8.1M超轻量中英文识别模型。支持中英文识别;支持倾斜、竖排等多种方向文字识别,**强力推荐** |
| [高精度-中英文OCR文字识别](https://www.paddlepaddle.org.cn/hubdetail?name=chinese_ocr_db_crnn_mobile&en_category=TextRecognition) | 业界开源效果最好,155M高精度中英文识别模型。支持中英文识别;支持倾斜、竖排等多种方向文字识别,**强力推荐** |
| [德语-超轻量OCR文字识别](https://www.paddlepaddle.org.cn/hubdetail?name=german_ocr_db_crnn_mobile&en_category=TextRecognition) | 德语OCR识别,超轻量|
| [法语-超轻量OCR文字识别](https://www.paddlepaddle.org.cn/hubdetail?name=french_ocr_db_crnn_mobile&en_category=TextRecognition) | 法语OCR识别,超轻量|
| [日语-超轻量OCR文字识别](https://www.paddlepaddle.org.cn/hubdetail?name=japan_ocr_db_crnn_mobile&en_category=TextRecognition) | 日语OCR识别,超轻量|
| [韩语-超轻量OCR文字识别](https://www.paddlepaddle.org.cn/hubdetail?name=korean_ocr_db_crnn_mobile&en_category=TextRecognition) | 韩语OCR识别,超轻量|
## **更好用户体验,建议参考WEB端官方文档 -> [【语言模型】](https://www.paddlepaddle.org.cn/hubdetail)**
### 语言模型
- 推荐模型
| 模型名称 | 模型简介 |
| ------------------------------------------------------------ | ------------------------------------------------------------ |
| [词嵌入模型](https://www.paddlepaddle.org.cn/hubdetail?name=word2vec_skipgram&en_category=SemanticModel) |在海量百度搜索数据集下预训练得到中文单词预训练词嵌入。其支持Fine-tune。Word2vec的预训练数据集的词汇表大小为1700249,word embedding维度为128。 |
| [文本相似度](https://www.paddlepaddle.org.cn/hubdetail?name=simnet_bow&en_category=SemanticModel) |根据用户输入的两个文本,计算出文本相似度得分。 |
| [ERNIE](https://www.paddlepaddle.org.cn/hubdetail?name=ERNIE&en_category=SemanticModel) |基于百科类、资讯类、论坛对话类数据等中文语料自研模型,其可用于文本分类、序列标注、阅读理解等任务。
.
......@@ -74,23 +74,23 @@ class BertModel(object):
def _build_model(self, src_ids, position_ids, sentence_ids, input_mask):
# padding id in vocabulary must be set to 0
emb_out = fluid.layers.embedding(
input=src_ids,
size=[self._voc_size, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._word_emb_name, initializer=self._param_initializer),
is_sparse=False)
position_emb_out = fluid.layers.embedding(
input=position_ids,
size=[self._max_position_seq_len, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._pos_emb_name, initializer=self._param_initializer))
sent_emb_out = fluid.layers.embedding(
sentence_ids,
size=[self._sent_types, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._sent_emb_name, initializer=self._param_initializer))
emb_out = fluid.layers.embedding(input=src_ids,
size=[self._voc_size, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._word_emb_name,
initializer=self._param_initializer),
is_sparse=False)
position_emb_out = fluid.layers.embedding(input=position_ids,
size=[self._max_position_seq_len, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._pos_emb_name,
initializer=self._param_initializer))
sent_emb_out = fluid.layers.embedding(sentence_ids,
size=[self._sent_types, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._sent_emb_name,
initializer=self._param_initializer))
emb_out = emb_out + position_emb_out
emb_out = emb_out + sent_emb_out
......@@ -105,23 +105,22 @@ class BertModel(object):
n_head_self_attn_mask = fluid.layers.stack(x=[self_attn_mask] * self._n_head, axis=1)
n_head_self_attn_mask.stop_gradient = True
self._enc_out = encoder(
enc_input=emb_out,
attn_bias=n_head_self_attn_mask,
n_layer=self._n_layer,
n_head=self._n_head,
d_key=self._emb_size // self._n_head,
d_value=self._emb_size // self._n_head,
d_model=self._emb_size,
d_inner_hid=self._emb_size * 4,
prepostprocess_dropout=self._prepostprocess_dropout,
attention_dropout=self._attention_dropout,
relu_dropout=0,
hidden_act=self._hidden_act,
preprocess_cmd="",
postprocess_cmd="dan",
param_initializer=self._param_initializer,
name='encoder')
self._enc_out = encoder(enc_input=emb_out,
attn_bias=n_head_self_attn_mask,
n_layer=self._n_layer,
n_head=self._n_head,
d_key=self._emb_size // self._n_head,
d_value=self._emb_size // self._n_head,
d_model=self._emb_size,
d_inner_hid=self._emb_size * 4,
prepostprocess_dropout=self._prepostprocess_dropout,
attention_dropout=self._attention_dropout,
relu_dropout=0,
hidden_act=self._hidden_act,
preprocess_cmd="",
postprocess_cmd="dan",
param_initializer=self._param_initializer,
name='encoder')
def get_sequence_output(self):
return self._enc_out
......@@ -130,12 +129,12 @@ class BertModel(object):
"""Get the first feature of each sequence for classification"""
next_sent_feat = fluid.layers.slice(input=self._enc_out, axes=[1], starts=[0], ends=[1])
next_sent_feat = fluid.layers.fc(
input=next_sent_feat,
size=self._emb_size,
act="tanh",
param_attr=fluid.ParamAttr(name="pooled_fc.w_0", initializer=self._param_initializer),
bias_attr="pooled_fc.b_0")
next_sent_feat = fluid.layers.fc(input=next_sent_feat,
size=self._emb_size,
act="tanh",
param_attr=fluid.ParamAttr(name="pooled_fc.w_0",
initializer=self._param_initializer),
bias_attr="pooled_fc.b_0")
return next_sent_feat
def get_pretraining_output(self, mask_label, mask_pos, labels):
......@@ -150,43 +149,45 @@ class BertModel(object):
mask_feat = fluid.layers.gather(input=reshaped_emb_out, index=mask_pos)
# transform: fc
mask_trans_feat = fluid.layers.fc(
input=mask_feat,
size=self._emb_size,
act=self._hidden_act,
param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0', initializer=self._param_initializer),
bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
mask_trans_feat = fluid.layers.fc(input=mask_feat,
size=self._emb_size,
act=self._hidden_act,
param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0',
initializer=self._param_initializer),
bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
# transform: layer norm
mask_trans_feat = pre_process_layer(mask_trans_feat, 'n', name='mask_lm_trans')
mask_lm_out_bias_attr = fluid.ParamAttr(
name="mask_lm_out_fc.b_0", initializer=fluid.initializer.Constant(value=0.0))
mask_lm_out_bias_attr = fluid.ParamAttr(name="mask_lm_out_fc.b_0",
initializer=fluid.initializer.Constant(value=0.0))
if self._weight_sharing:
fc_out = fluid.layers.matmul(
x=mask_trans_feat,
y=fluid.default_main_program().global_block().var(self._word_emb_name),
transpose_y=True)
fc_out += fluid.layers.create_parameter(
shape=[self._voc_size], dtype=self._dtype, attr=mask_lm_out_bias_attr, is_bias=True)
fc_out = fluid.layers.matmul(x=mask_trans_feat,
y=fluid.default_main_program().global_block().var(self._word_emb_name),
transpose_y=True)
fc_out += fluid.layers.create_parameter(shape=[self._voc_size],
dtype=self._dtype,
attr=mask_lm_out_bias_attr,
is_bias=True)
else:
fc_out = fluid.layers.fc(
input=mask_trans_feat,
size=self._voc_size,
param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0", initializer=self._param_initializer),
bias_attr=mask_lm_out_bias_attr)
fc_out = fluid.layers.fc(input=mask_trans_feat,
size=self._voc_size,
param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0",
initializer=self._param_initializer),
bias_attr=mask_lm_out_bias_attr)
mask_lm_loss = fluid.layers.softmax_with_cross_entropy(logits=fc_out, label=mask_label)
mean_mask_lm_loss = fluid.layers.mean(mask_lm_loss)
next_sent_fc_out = fluid.layers.fc(
input=next_sent_feat,
size=2,
param_attr=fluid.ParamAttr(name="next_sent_fc.w_0", initializer=self._param_initializer),
bias_attr="next_sent_fc.b_0")
next_sent_fc_out = fluid.layers.fc(input=next_sent_feat,
size=2,
param_attr=fluid.ParamAttr(name="next_sent_fc.w_0",
initializer=self._param_initializer),
bias_attr="next_sent_fc.b_0")
next_sent_loss, next_sent_softmax = fluid.layers.softmax_with_cross_entropy(
logits=next_sent_fc_out, label=labels, return_softmax=True)
next_sent_loss, next_sent_softmax = fluid.layers.softmax_with_cross_entropy(logits=next_sent_fc_out,
label=labels,
return_softmax=True)
next_sent_acc = fluid.layers.accuracy(input=next_sent_softmax, label=labels)
......
......@@ -50,24 +50,21 @@ def multi_head_attention(queries,
"""
Add linear projection to queries, keys, and values.
"""
q = layers.fc(
input=queries,
size=d_key * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_query_fc.w_0', initializer=param_initializer),
bias_attr=name + '_query_fc.b_0')
k = layers.fc(
input=keys,
size=d_key * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_key_fc.w_0', initializer=param_initializer),
bias_attr=name + '_key_fc.b_0')
v = layers.fc(
input=values,
size=d_value * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_value_fc.w_0', initializer=param_initializer),
bias_attr=name + '_value_fc.b_0')
q = layers.fc(input=queries,
size=d_key * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_query_fc.w_0', initializer=param_initializer),
bias_attr=name + '_query_fc.b_0')
k = layers.fc(input=keys,
size=d_key * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_key_fc.w_0', initializer=param_initializer),
bias_attr=name + '_key_fc.b_0')
v = layers.fc(input=values,
size=d_value * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_value_fc.w_0', initializer=param_initializer),
bias_attr=name + '_value_fc.b_0')
return q, k, v
def __split_heads(x, n_head):
......@@ -110,8 +107,10 @@ def multi_head_attention(queries,
product += attn_bias
weights = layers.softmax(product)
if dropout_rate:
weights = layers.dropout(
weights, dropout_prob=dropout_rate, dropout_implementation="upscale_in_train", is_test=False)
weights = layers.dropout(weights,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
out = layers.matmul(weights, v)
return out
......@@ -133,12 +132,11 @@ def multi_head_attention(queries,
out = __combine_heads(ctx_multiheads)
# Project back to the model size.
proj_out = layers.fc(
input=out,
size=d_model,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_output_fc.w_0', initializer=param_initializer),
bias_attr=name + '_output_fc.b_0')
proj_out = layers.fc(input=out,
size=d_model,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_output_fc.w_0', initializer=param_initializer),
bias_attr=name + '_output_fc.b_0')
return proj_out
......@@ -148,22 +146,22 @@ def positionwise_feed_forward(x, d_inner_hid, d_hid, dropout_rate, hidden_act, p
This module consists of two linear transformations with a ReLU activation
in between, which is applied to each position separately and identically.
"""
hidden = layers.fc(
input=x,
size=d_inner_hid,
num_flatten_dims=2,
act=hidden_act,
param_attr=fluid.ParamAttr(name=name + '_fc_0.w_0', initializer=param_initializer),
bias_attr=name + '_fc_0.b_0')
hidden = layers.fc(input=x,
size=d_inner_hid,
num_flatten_dims=2,
act=hidden_act,
param_attr=fluid.ParamAttr(name=name + '_fc_0.w_0', initializer=param_initializer),
bias_attr=name + '_fc_0.b_0')
if dropout_rate:
hidden = layers.dropout(
hidden, dropout_prob=dropout_rate, dropout_implementation="upscale_in_train", is_test=False)
out = layers.fc(
input=hidden,
size=d_hid,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_fc_1.w_0', initializer=param_initializer),
bias_attr=name + '_fc_1.b_0')
hidden = layers.dropout(hidden,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
out = layers.fc(input=hidden,
size=d_hid,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_fc_1.w_0', initializer=param_initializer),
bias_attr=name + '_fc_1.b_0')
return out
......@@ -181,17 +179,20 @@ def pre_post_process_layer(prev_out, out, process_cmd, dropout_rate=0., name='')
out_dtype = out.dtype
if out_dtype == fluid.core.VarDesc.VarType.FP16:
out = layers.cast(x=out, dtype="float32")
out = layers.layer_norm(
out,
begin_norm_axis=len(out.shape) - 1,
param_attr=fluid.ParamAttr(name=name + '_layer_norm_scale', initializer=fluid.initializer.Constant(1.)),
bias_attr=fluid.ParamAttr(name=name + '_layer_norm_bias', initializer=fluid.initializer.Constant(0.)))
out = layers.layer_norm(out,
begin_norm_axis=len(out.shape) - 1,
param_attr=fluid.ParamAttr(name=name + '_layer_norm_scale',
initializer=fluid.initializer.Constant(1.)),
bias_attr=fluid.ParamAttr(name=name + '_layer_norm_bias',
initializer=fluid.initializer.Constant(0.)))
if out_dtype == fluid.core.VarDesc.VarType.FP16:
out = layers.cast(x=out, dtype="float16")
elif cmd == "d": # add dropout
if dropout_rate:
out = layers.dropout(
out, dropout_prob=dropout_rate, dropout_implementation="upscale_in_train", is_test=False)
out = layers.dropout(out,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
return out
......@@ -220,28 +221,35 @@ def encoder_layer(enc_input,
with the post_process_layer to add residual connection, layer normalization
and droput.
"""
attn_output = multi_head_attention(
pre_process_layer(enc_input, preprocess_cmd, prepostprocess_dropout, name=name + '_pre_att'),
None,
None,
attn_bias,
d_key,
d_value,
d_model,
n_head,
attention_dropout,
param_initializer=param_initializer,
name=name + '_multi_head_att')
attn_output = post_process_layer(
enc_input, attn_output, postprocess_cmd, prepostprocess_dropout, name=name + '_post_att')
ffd_output = positionwise_feed_forward(
pre_process_layer(attn_output, preprocess_cmd, prepostprocess_dropout, name=name + '_pre_ffn'),
d_inner_hid,
d_model,
relu_dropout,
hidden_act,
param_initializer=param_initializer,
name=name + '_ffn')
attn_output = multi_head_attention(pre_process_layer(enc_input,
preprocess_cmd,
prepostprocess_dropout,
name=name + '_pre_att'),
None,
None,
attn_bias,
d_key,
d_value,
d_model,
n_head,
attention_dropout,
param_initializer=param_initializer,
name=name + '_multi_head_att')
attn_output = post_process_layer(enc_input,
attn_output,
postprocess_cmd,
prepostprocess_dropout,
name=name + '_post_att')
ffd_output = positionwise_feed_forward(pre_process_layer(attn_output,
preprocess_cmd,
prepostprocess_dropout,
name=name + '_pre_ffn'),
d_inner_hid,
d_model,
relu_dropout,
hidden_act,
param_initializer=param_initializer,
name=name + '_ffn')
return post_process_layer(attn_output, ffd_output, postprocess_cmd, prepostprocess_dropout, name=name + '_post_ffn')
......@@ -266,22 +274,21 @@ def encoder(enc_input,
encoder_layer.
"""
for i in range(n_layer):
enc_output = encoder_layer(
enc_input,
attn_bias,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd,
postprocess_cmd,
param_initializer=param_initializer,
name=name + '_layer_' + str(i))
enc_output = encoder_layer(enc_input,
attn_bias,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd,
postprocess_cmd,
param_initializer=param_initializer,
name=name + '_layer_' + str(i))
enc_input = enc_output
enc_output = pre_process_layer(enc_output, preprocess_cmd, prepostprocess_dropout, name="post_encoder")
......
......@@ -58,13 +58,12 @@ class Bert(TransformerModule):
pooled_output (tensor): sentence-level output for classification task.
sequence_output (tensor): token-level output for sequence task.
"""
bert = BertModel(
src_ids=input_ids,
position_ids=position_ids,
sentence_ids=segment_ids,
input_mask=input_mask,
config=self.bert_config,
use_fp16=False)
bert = BertModel(src_ids=input_ids,
position_ids=position_ids,
sentence_ids=segment_ids,
input_mask=input_mask,
config=self.bert_config,
use_fp16=False)
pooled_output = bert.get_pooled_output()
sequence_output = bert.get_sequence_output()
return pooled_output, sequence_output
......
......@@ -74,23 +74,23 @@ class BertModel(object):
def _build_model(self, src_ids, position_ids, sentence_ids, input_mask):
# padding id in vocabulary must be set to 0
emb_out = fluid.layers.embedding(
input=src_ids,
size=[self._voc_size, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._word_emb_name, initializer=self._param_initializer),
is_sparse=False)
position_emb_out = fluid.layers.embedding(
input=position_ids,
size=[self._max_position_seq_len, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._pos_emb_name, initializer=self._param_initializer))
sent_emb_out = fluid.layers.embedding(
sentence_ids,
size=[self._sent_types, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._sent_emb_name, initializer=self._param_initializer))
emb_out = fluid.layers.embedding(input=src_ids,
size=[self._voc_size, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._word_emb_name,
initializer=self._param_initializer),
is_sparse=False)
position_emb_out = fluid.layers.embedding(input=position_ids,
size=[self._max_position_seq_len, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._pos_emb_name,
initializer=self._param_initializer))
sent_emb_out = fluid.layers.embedding(sentence_ids,
size=[self._sent_types, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._sent_emb_name,
initializer=self._param_initializer))
emb_out = emb_out + position_emb_out
emb_out = emb_out + sent_emb_out
......@@ -105,23 +105,22 @@ class BertModel(object):
n_head_self_attn_mask = fluid.layers.stack(x=[self_attn_mask] * self._n_head, axis=1)
n_head_self_attn_mask.stop_gradient = True
self._enc_out = encoder(
enc_input=emb_out,
attn_bias=n_head_self_attn_mask,
n_layer=self._n_layer,
n_head=self._n_head,
d_key=self._emb_size // self._n_head,
d_value=self._emb_size // self._n_head,
d_model=self._emb_size,
d_inner_hid=self._emb_size * 4,
prepostprocess_dropout=self._prepostprocess_dropout,
attention_dropout=self._attention_dropout,
relu_dropout=0,
hidden_act=self._hidden_act,
preprocess_cmd="",
postprocess_cmd="dan",
param_initializer=self._param_initializer,
name='encoder')
self._enc_out = encoder(enc_input=emb_out,
attn_bias=n_head_self_attn_mask,
n_layer=self._n_layer,
n_head=self._n_head,
d_key=self._emb_size // self._n_head,
d_value=self._emb_size // self._n_head,
d_model=self._emb_size,
d_inner_hid=self._emb_size * 4,
prepostprocess_dropout=self._prepostprocess_dropout,
attention_dropout=self._attention_dropout,
relu_dropout=0,
hidden_act=self._hidden_act,
preprocess_cmd="",
postprocess_cmd="dan",
param_initializer=self._param_initializer,
name='encoder')
def get_sequence_output(self):
return self._enc_out
......@@ -130,12 +129,12 @@ class BertModel(object):
"""Get the first feature of each sequence for classification"""
next_sent_feat = fluid.layers.slice(input=self._enc_out, axes=[1], starts=[0], ends=[1])
next_sent_feat = fluid.layers.fc(
input=next_sent_feat,
size=self._emb_size,
act="tanh",
param_attr=fluid.ParamAttr(name="pooled_fc.w_0", initializer=self._param_initializer),
bias_attr="pooled_fc.b_0")
next_sent_feat = fluid.layers.fc(input=next_sent_feat,
size=self._emb_size,
act="tanh",
param_attr=fluid.ParamAttr(name="pooled_fc.w_0",
initializer=self._param_initializer),
bias_attr="pooled_fc.b_0")
return next_sent_feat
def get_pretraining_output(self, mask_label, mask_pos, labels):
......@@ -150,43 +149,45 @@ class BertModel(object):
mask_feat = fluid.layers.gather(input=reshaped_emb_out, index=mask_pos)
# transform: fc
mask_trans_feat = fluid.layers.fc(
input=mask_feat,
size=self._emb_size,
act=self._hidden_act,
param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0', initializer=self._param_initializer),
bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
mask_trans_feat = fluid.layers.fc(input=mask_feat,
size=self._emb_size,
act=self._hidden_act,
param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0',
initializer=self._param_initializer),
bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
# transform: layer norm
mask_trans_feat = pre_process_layer(mask_trans_feat, 'n', name='mask_lm_trans')
mask_lm_out_bias_attr = fluid.ParamAttr(
name="mask_lm_out_fc.b_0", initializer=fluid.initializer.Constant(value=0.0))
mask_lm_out_bias_attr = fluid.ParamAttr(name="mask_lm_out_fc.b_0",
initializer=fluid.initializer.Constant(value=0.0))
if self._weight_sharing:
fc_out = fluid.layers.matmul(
x=mask_trans_feat,
y=fluid.default_main_program().global_block().var(self._word_emb_name),
transpose_y=True)
fc_out += fluid.layers.create_parameter(
shape=[self._voc_size], dtype=self._dtype, attr=mask_lm_out_bias_attr, is_bias=True)
fc_out = fluid.layers.matmul(x=mask_trans_feat,
y=fluid.default_main_program().global_block().var(self._word_emb_name),
transpose_y=True)
fc_out += fluid.layers.create_parameter(shape=[self._voc_size],
dtype=self._dtype,
attr=mask_lm_out_bias_attr,
is_bias=True)
else:
fc_out = fluid.layers.fc(
input=mask_trans_feat,
size=self._voc_size,
param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0", initializer=self._param_initializer),
bias_attr=mask_lm_out_bias_attr)
fc_out = fluid.layers.fc(input=mask_trans_feat,
size=self._voc_size,
param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0",
initializer=self._param_initializer),
bias_attr=mask_lm_out_bias_attr)
mask_lm_loss = fluid.layers.softmax_with_cross_entropy(logits=fc_out, label=mask_label)
mean_mask_lm_loss = fluid.layers.mean(mask_lm_loss)
next_sent_fc_out = fluid.layers.fc(
input=next_sent_feat,
size=2,
param_attr=fluid.ParamAttr(name="next_sent_fc.w_0", initializer=self._param_initializer),
bias_attr="next_sent_fc.b_0")
next_sent_fc_out = fluid.layers.fc(input=next_sent_feat,
size=2,
param_attr=fluid.ParamAttr(name="next_sent_fc.w_0",
initializer=self._param_initializer),
bias_attr="next_sent_fc.b_0")
next_sent_loss, next_sent_softmax = fluid.layers.softmax_with_cross_entropy(
logits=next_sent_fc_out, label=labels, return_softmax=True)
next_sent_loss, next_sent_softmax = fluid.layers.softmax_with_cross_entropy(logits=next_sent_fc_out,
label=labels,
return_softmax=True)
next_sent_acc = fluid.layers.accuracy(input=next_sent_softmax, label=labels)
......
......@@ -50,24 +50,21 @@ def multi_head_attention(queries,
"""
Add linear projection to queries, keys, and values.
"""
q = layers.fc(
input=queries,
size=d_key * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_query_fc.w_0', initializer=param_initializer),
bias_attr=name + '_query_fc.b_0')
k = layers.fc(
input=keys,
size=d_key * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_key_fc.w_0', initializer=param_initializer),
bias_attr=name + '_key_fc.b_0')
v = layers.fc(
input=values,
size=d_value * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_value_fc.w_0', initializer=param_initializer),
bias_attr=name + '_value_fc.b_0')
q = layers.fc(input=queries,
size=d_key * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_query_fc.w_0', initializer=param_initializer),
bias_attr=name + '_query_fc.b_0')
k = layers.fc(input=keys,
size=d_key * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_key_fc.w_0', initializer=param_initializer),
bias_attr=name + '_key_fc.b_0')
v = layers.fc(input=values,
size=d_value * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_value_fc.w_0', initializer=param_initializer),
bias_attr=name + '_value_fc.b_0')
return q, k, v
def __split_heads(x, n_head):
......@@ -110,8 +107,10 @@ def multi_head_attention(queries,
product += attn_bias
weights = layers.softmax(product)
if dropout_rate:
weights = layers.dropout(
weights, dropout_prob=dropout_rate, dropout_implementation="upscale_in_train", is_test=False)
weights = layers.dropout(weights,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
out = layers.matmul(weights, v)
return out
......@@ -133,12 +132,11 @@ def multi_head_attention(queries,
out = __combine_heads(ctx_multiheads)
# Project back to the model size.
proj_out = layers.fc(
input=out,
size=d_model,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_output_fc.w_0', initializer=param_initializer),
bias_attr=name + '_output_fc.b_0')
proj_out = layers.fc(input=out,
size=d_model,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_output_fc.w_0', initializer=param_initializer),
bias_attr=name + '_output_fc.b_0')
return proj_out
......@@ -148,22 +146,22 @@ def positionwise_feed_forward(x, d_inner_hid, d_hid, dropout_rate, hidden_act, p
This module consists of two linear transformations with a ReLU activation
in between, which is applied to each position separately and identically.
"""
hidden = layers.fc(
input=x,
size=d_inner_hid,
num_flatten_dims=2,
act=hidden_act,
param_attr=fluid.ParamAttr(name=name + '_fc_0.w_0', initializer=param_initializer),
bias_attr=name + '_fc_0.b_0')
hidden = layers.fc(input=x,
size=d_inner_hid,
num_flatten_dims=2,
act=hidden_act,
param_attr=fluid.ParamAttr(name=name + '_fc_0.w_0', initializer=param_initializer),
bias_attr=name + '_fc_0.b_0')
if dropout_rate:
hidden = layers.dropout(
hidden, dropout_prob=dropout_rate, dropout_implementation="upscale_in_train", is_test=False)
out = layers.fc(
input=hidden,
size=d_hid,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_fc_1.w_0', initializer=param_initializer),
bias_attr=name + '_fc_1.b_0')
hidden = layers.dropout(hidden,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
out = layers.fc(input=hidden,
size=d_hid,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_fc_1.w_0', initializer=param_initializer),
bias_attr=name + '_fc_1.b_0')
return out
......@@ -181,17 +179,20 @@ def pre_post_process_layer(prev_out, out, process_cmd, dropout_rate=0., name='')
out_dtype = out.dtype
if out_dtype == fluid.core.VarDesc.VarType.FP16:
out = layers.cast(x=out, dtype="float32")
out = layers.layer_norm(
out,
begin_norm_axis=len(out.shape) - 1,
param_attr=fluid.ParamAttr(name=name + '_layer_norm_scale', initializer=fluid.initializer.Constant(1.)),
bias_attr=fluid.ParamAttr(name=name + '_layer_norm_bias', initializer=fluid.initializer.Constant(0.)))
out = layers.layer_norm(out,
begin_norm_axis=len(out.shape) - 1,
param_attr=fluid.ParamAttr(name=name + '_layer_norm_scale',
initializer=fluid.initializer.Constant(1.)),
bias_attr=fluid.ParamAttr(name=name + '_layer_norm_bias',
initializer=fluid.initializer.Constant(0.)))
if out_dtype == fluid.core.VarDesc.VarType.FP16:
out = layers.cast(x=out, dtype="float16")
elif cmd == "d": # add dropout
if dropout_rate:
out = layers.dropout(
out, dropout_prob=dropout_rate, dropout_implementation="upscale_in_train", is_test=False)
out = layers.dropout(out,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
return out
......@@ -220,28 +221,35 @@ def encoder_layer(enc_input,
with the post_process_layer to add residual connection, layer normalization
and droput.
"""
attn_output = multi_head_attention(
pre_process_layer(enc_input, preprocess_cmd, prepostprocess_dropout, name=name + '_pre_att'),
None,
None,
attn_bias,
d_key,
d_value,
d_model,
n_head,
attention_dropout,
param_initializer=param_initializer,
name=name + '_multi_head_att')
attn_output = post_process_layer(
enc_input, attn_output, postprocess_cmd, prepostprocess_dropout, name=name + '_post_att')
ffd_output = positionwise_feed_forward(
pre_process_layer(attn_output, preprocess_cmd, prepostprocess_dropout, name=name + '_pre_ffn'),
d_inner_hid,
d_model,
relu_dropout,
hidden_act,
param_initializer=param_initializer,
name=name + '_ffn')
attn_output = multi_head_attention(pre_process_layer(enc_input,
preprocess_cmd,
prepostprocess_dropout,
name=name + '_pre_att'),
None,
None,
attn_bias,
d_key,
d_value,
d_model,
n_head,
attention_dropout,
param_initializer=param_initializer,
name=name + '_multi_head_att')
attn_output = post_process_layer(enc_input,
attn_output,
postprocess_cmd,
prepostprocess_dropout,
name=name + '_post_att')
ffd_output = positionwise_feed_forward(pre_process_layer(attn_output,
preprocess_cmd,
prepostprocess_dropout,
name=name + '_pre_ffn'),
d_inner_hid,
d_model,
relu_dropout,
hidden_act,
param_initializer=param_initializer,
name=name + '_ffn')
return post_process_layer(attn_output, ffd_output, postprocess_cmd, prepostprocess_dropout, name=name + '_post_ffn')
......@@ -266,22 +274,21 @@ def encoder(enc_input,
encoder_layer.
"""
for i in range(n_layer):
enc_output = encoder_layer(
enc_input,
attn_bias,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd,
postprocess_cmd,
param_initializer=param_initializer,
name=name + '_layer_' + str(i))
enc_output = encoder_layer(enc_input,
attn_bias,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd,
postprocess_cmd,
param_initializer=param_initializer,
name=name + '_layer_' + str(i))
enc_input = enc_output
enc_output = pre_process_layer(enc_output, preprocess_cmd, prepostprocess_dropout, name="post_encoder")
......
......@@ -58,13 +58,12 @@ class Bert(TransformerModule):
pooled_output (tensor): sentence-level output for classification task.
sequence_output (tensor): token-level output for sequence task.
"""
bert = BertModel(
src_ids=input_ids,
position_ids=position_ids,
sentence_ids=segment_ids,
input_mask=input_mask,
config=self.bert_config,
use_fp16=False)
bert = BertModel(src_ids=input_ids,
position_ids=position_ids,
sentence_ids=segment_ids,
input_mask=input_mask,
config=self.bert_config,
use_fp16=False)
pooled_output = bert.get_pooled_output()
sequence_output = bert.get_sequence_output()
return pooled_output, sequence_output
......
......@@ -74,23 +74,23 @@ class BertModel(object):
def _build_model(self, src_ids, position_ids, sentence_ids, input_mask):
# padding id in vocabulary must be set to 0
emb_out = fluid.layers.embedding(
input=src_ids,
size=[self._voc_size, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._word_emb_name, initializer=self._param_initializer),
is_sparse=False)
position_emb_out = fluid.layers.embedding(
input=position_ids,
size=[self._max_position_seq_len, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._pos_emb_name, initializer=self._param_initializer))
sent_emb_out = fluid.layers.embedding(
sentence_ids,
size=[self._sent_types, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._sent_emb_name, initializer=self._param_initializer))
emb_out = fluid.layers.embedding(input=src_ids,
size=[self._voc_size, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._word_emb_name,
initializer=self._param_initializer),
is_sparse=False)
position_emb_out = fluid.layers.embedding(input=position_ids,
size=[self._max_position_seq_len, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._pos_emb_name,
initializer=self._param_initializer))
sent_emb_out = fluid.layers.embedding(sentence_ids,
size=[self._sent_types, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._sent_emb_name,
initializer=self._param_initializer))
emb_out = emb_out + position_emb_out
emb_out = emb_out + sent_emb_out
......@@ -105,23 +105,22 @@ class BertModel(object):
n_head_self_attn_mask = fluid.layers.stack(x=[self_attn_mask] * self._n_head, axis=1)
n_head_self_attn_mask.stop_gradient = True
self._enc_out = encoder(
enc_input=emb_out,
attn_bias=n_head_self_attn_mask,
n_layer=self._n_layer,
n_head=self._n_head,
d_key=self._emb_size // self._n_head,
d_value=self._emb_size // self._n_head,
d_model=self._emb_size,
d_inner_hid=self._emb_size * 4,
prepostprocess_dropout=self._prepostprocess_dropout,
attention_dropout=self._attention_dropout,
relu_dropout=0,
hidden_act=self._hidden_act,
preprocess_cmd="",
postprocess_cmd="dan",
param_initializer=self._param_initializer,
name='encoder')
self._enc_out = encoder(enc_input=emb_out,
attn_bias=n_head_self_attn_mask,
n_layer=self._n_layer,
n_head=self._n_head,
d_key=self._emb_size // self._n_head,
d_value=self._emb_size // self._n_head,
d_model=self._emb_size,
d_inner_hid=self._emb_size * 4,
prepostprocess_dropout=self._prepostprocess_dropout,
attention_dropout=self._attention_dropout,
relu_dropout=0,
hidden_act=self._hidden_act,
preprocess_cmd="",
postprocess_cmd="dan",
param_initializer=self._param_initializer,
name='encoder')
def get_sequence_output(self):
return self._enc_out
......@@ -130,12 +129,12 @@ class BertModel(object):
"""Get the first feature of each sequence for classification"""
next_sent_feat = fluid.layers.slice(input=self._enc_out, axes=[1], starts=[0], ends=[1])
next_sent_feat = fluid.layers.fc(
input=next_sent_feat,
size=self._emb_size,
act="tanh",
param_attr=fluid.ParamAttr(name="pooled_fc.w_0", initializer=self._param_initializer),
bias_attr="pooled_fc.b_0")
next_sent_feat = fluid.layers.fc(input=next_sent_feat,
size=self._emb_size,
act="tanh",
param_attr=fluid.ParamAttr(name="pooled_fc.w_0",
initializer=self._param_initializer),
bias_attr="pooled_fc.b_0")
return next_sent_feat
def get_pretraining_output(self, mask_label, mask_pos, labels):
......@@ -150,43 +149,45 @@ class BertModel(object):
mask_feat = fluid.layers.gather(input=reshaped_emb_out, index=mask_pos)
# transform: fc
mask_trans_feat = fluid.layers.fc(
input=mask_feat,
size=self._emb_size,
act=self._hidden_act,
param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0', initializer=self._param_initializer),
bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
mask_trans_feat = fluid.layers.fc(input=mask_feat,
size=self._emb_size,
act=self._hidden_act,
param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0',
initializer=self._param_initializer),
bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
# transform: layer norm
mask_trans_feat = pre_process_layer(mask_trans_feat, 'n', name='mask_lm_trans')
mask_lm_out_bias_attr = fluid.ParamAttr(
name="mask_lm_out_fc.b_0", initializer=fluid.initializer.Constant(value=0.0))
mask_lm_out_bias_attr = fluid.ParamAttr(name="mask_lm_out_fc.b_0",
initializer=fluid.initializer.Constant(value=0.0))
if self._weight_sharing:
fc_out = fluid.layers.matmul(
x=mask_trans_feat,
y=fluid.default_main_program().global_block().var(self._word_emb_name),
transpose_y=True)
fc_out += fluid.layers.create_parameter(
shape=[self._voc_size], dtype=self._dtype, attr=mask_lm_out_bias_attr, is_bias=True)
fc_out = fluid.layers.matmul(x=mask_trans_feat,
y=fluid.default_main_program().global_block().var(self._word_emb_name),
transpose_y=True)
fc_out += fluid.layers.create_parameter(shape=[self._voc_size],
dtype=self._dtype,
attr=mask_lm_out_bias_attr,
is_bias=True)
else:
fc_out = fluid.layers.fc(
input=mask_trans_feat,
size=self._voc_size,
param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0", initializer=self._param_initializer),
bias_attr=mask_lm_out_bias_attr)
fc_out = fluid.layers.fc(input=mask_trans_feat,
size=self._voc_size,
param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0",
initializer=self._param_initializer),
bias_attr=mask_lm_out_bias_attr)
mask_lm_loss = fluid.layers.softmax_with_cross_entropy(logits=fc_out, label=mask_label)
mean_mask_lm_loss = fluid.layers.mean(mask_lm_loss)
next_sent_fc_out = fluid.layers.fc(
input=next_sent_feat,
size=2,
param_attr=fluid.ParamAttr(name="next_sent_fc.w_0", initializer=self._param_initializer),
bias_attr="next_sent_fc.b_0")
next_sent_fc_out = fluid.layers.fc(input=next_sent_feat,
size=2,
param_attr=fluid.ParamAttr(name="next_sent_fc.w_0",
initializer=self._param_initializer),
bias_attr="next_sent_fc.b_0")
next_sent_loss, next_sent_softmax = fluid.layers.softmax_with_cross_entropy(
logits=next_sent_fc_out, label=labels, return_softmax=True)
next_sent_loss, next_sent_softmax = fluid.layers.softmax_with_cross_entropy(logits=next_sent_fc_out,
label=labels,
return_softmax=True)
next_sent_acc = fluid.layers.accuracy(input=next_sent_softmax, label=labels)
......
......@@ -50,24 +50,21 @@ def multi_head_attention(queries,
"""
Add linear projection to queries, keys, and values.
"""
q = layers.fc(
input=queries,
size=d_key * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_query_fc.w_0', initializer=param_initializer),
bias_attr=name + '_query_fc.b_0')
k = layers.fc(
input=keys,
size=d_key * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_key_fc.w_0', initializer=param_initializer),
bias_attr=name + '_key_fc.b_0')
v = layers.fc(
input=values,
size=d_value * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_value_fc.w_0', initializer=param_initializer),
bias_attr=name + '_value_fc.b_0')
q = layers.fc(input=queries,
size=d_key * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_query_fc.w_0', initializer=param_initializer),
bias_attr=name + '_query_fc.b_0')
k = layers.fc(input=keys,
size=d_key * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_key_fc.w_0', initializer=param_initializer),
bias_attr=name + '_key_fc.b_0')
v = layers.fc(input=values,
size=d_value * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_value_fc.w_0', initializer=param_initializer),
bias_attr=name + '_value_fc.b_0')
return q, k, v
def __split_heads(x, n_head):
......@@ -110,8 +107,10 @@ def multi_head_attention(queries,
product += attn_bias
weights = layers.softmax(product)
if dropout_rate:
weights = layers.dropout(
weights, dropout_prob=dropout_rate, dropout_implementation="upscale_in_train", is_test=False)
weights = layers.dropout(weights,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
out = layers.matmul(weights, v)
return out
......@@ -133,12 +132,11 @@ def multi_head_attention(queries,
out = __combine_heads(ctx_multiheads)
# Project back to the model size.
proj_out = layers.fc(
input=out,
size=d_model,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_output_fc.w_0', initializer=param_initializer),
bias_attr=name + '_output_fc.b_0')
proj_out = layers.fc(input=out,
size=d_model,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_output_fc.w_0', initializer=param_initializer),
bias_attr=name + '_output_fc.b_0')
return proj_out
......@@ -148,22 +146,22 @@ def positionwise_feed_forward(x, d_inner_hid, d_hid, dropout_rate, hidden_act, p
This module consists of two linear transformations with a ReLU activation
in between, which is applied to each position separately and identically.
"""
hidden = layers.fc(
input=x,
size=d_inner_hid,
num_flatten_dims=2,
act=hidden_act,
param_attr=fluid.ParamAttr(name=name + '_fc_0.w_0', initializer=param_initializer),
bias_attr=name + '_fc_0.b_0')
hidden = layers.fc(input=x,
size=d_inner_hid,
num_flatten_dims=2,
act=hidden_act,
param_attr=fluid.ParamAttr(name=name + '_fc_0.w_0', initializer=param_initializer),
bias_attr=name + '_fc_0.b_0')
if dropout_rate:
hidden = layers.dropout(
hidden, dropout_prob=dropout_rate, dropout_implementation="upscale_in_train", is_test=False)
out = layers.fc(
input=hidden,
size=d_hid,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_fc_1.w_0', initializer=param_initializer),
bias_attr=name + '_fc_1.b_0')
hidden = layers.dropout(hidden,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
out = layers.fc(input=hidden,
size=d_hid,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_fc_1.w_0', initializer=param_initializer),
bias_attr=name + '_fc_1.b_0')
return out
......@@ -181,17 +179,20 @@ def pre_post_process_layer(prev_out, out, process_cmd, dropout_rate=0., name='')
out_dtype = out.dtype
if out_dtype == fluid.core.VarDesc.VarType.FP16:
out = layers.cast(x=out, dtype="float32")
out = layers.layer_norm(
out,
begin_norm_axis=len(out.shape) - 1,
param_attr=fluid.ParamAttr(name=name + '_layer_norm_scale', initializer=fluid.initializer.Constant(1.)),
bias_attr=fluid.ParamAttr(name=name + '_layer_norm_bias', initializer=fluid.initializer.Constant(0.)))
out = layers.layer_norm(out,
begin_norm_axis=len(out.shape) - 1,
param_attr=fluid.ParamAttr(name=name + '_layer_norm_scale',
initializer=fluid.initializer.Constant(1.)),
bias_attr=fluid.ParamAttr(name=name + '_layer_norm_bias',
initializer=fluid.initializer.Constant(0.)))
if out_dtype == fluid.core.VarDesc.VarType.FP16:
out = layers.cast(x=out, dtype="float16")
elif cmd == "d": # add dropout
if dropout_rate:
out = layers.dropout(
out, dropout_prob=dropout_rate, dropout_implementation="upscale_in_train", is_test=False)
out = layers.dropout(out,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
return out
......@@ -220,28 +221,35 @@ def encoder_layer(enc_input,
with the post_process_layer to add residual connection, layer normalization
and droput.
"""
attn_output = multi_head_attention(
pre_process_layer(enc_input, preprocess_cmd, prepostprocess_dropout, name=name + '_pre_att'),
None,
None,
attn_bias,
d_key,
d_value,
d_model,
n_head,
attention_dropout,
param_initializer=param_initializer,
name=name + '_multi_head_att')
attn_output = post_process_layer(
enc_input, attn_output, postprocess_cmd, prepostprocess_dropout, name=name + '_post_att')
ffd_output = positionwise_feed_forward(
pre_process_layer(attn_output, preprocess_cmd, prepostprocess_dropout, name=name + '_pre_ffn'),
d_inner_hid,
d_model,
relu_dropout,
hidden_act,
param_initializer=param_initializer,
name=name + '_ffn')
attn_output = multi_head_attention(pre_process_layer(enc_input,
preprocess_cmd,
prepostprocess_dropout,
name=name + '_pre_att'),
None,
None,
attn_bias,
d_key,
d_value,
d_model,
n_head,
attention_dropout,
param_initializer=param_initializer,
name=name + '_multi_head_att')
attn_output = post_process_layer(enc_input,
attn_output,
postprocess_cmd,
prepostprocess_dropout,
name=name + '_post_att')
ffd_output = positionwise_feed_forward(pre_process_layer(attn_output,
preprocess_cmd,
prepostprocess_dropout,
name=name + '_pre_ffn'),
d_inner_hid,
d_model,
relu_dropout,
hidden_act,
param_initializer=param_initializer,
name=name + '_ffn')
return post_process_layer(attn_output, ffd_output, postprocess_cmd, prepostprocess_dropout, name=name + '_post_ffn')
......@@ -266,22 +274,21 @@ def encoder(enc_input,
encoder_layer.
"""
for i in range(n_layer):
enc_output = encoder_layer(
enc_input,
attn_bias,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd,
postprocess_cmd,
param_initializer=param_initializer,
name=name + '_layer_' + str(i))
enc_output = encoder_layer(enc_input,
attn_bias,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd,
postprocess_cmd,
param_initializer=param_initializer,
name=name + '_layer_' + str(i))
enc_input = enc_output
enc_output = pre_process_layer(enc_output, preprocess_cmd, prepostprocess_dropout, name="post_encoder")
......
......@@ -58,13 +58,12 @@ class BertChinese(TransformerModule):
pooled_output (tensor): sentence-level output for classification task.
sequence_output (tensor): token-level output for sequence task.
"""
bert = BertModel(
src_ids=input_ids,
position_ids=position_ids,
sentence_ids=segment_ids,
input_mask=input_mask,
config=self.bert_config,
use_fp16=False)
bert = BertModel(src_ids=input_ids,
position_ids=position_ids,
sentence_ids=segment_ids,
input_mask=input_mask,
config=self.bert_config,
use_fp16=False)
pooled_output = bert.get_pooled_output()
sequence_output = bert.get_sequence_output()
return pooled_output, sequence_output
......
......@@ -74,23 +74,23 @@ class BertModel(object):
def _build_model(self, src_ids, position_ids, sentence_ids, input_mask):
# padding id in vocabulary must be set to 0
emb_out = fluid.layers.embedding(
input=src_ids,
size=[self._voc_size, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._word_emb_name, initializer=self._param_initializer),
is_sparse=False)
position_emb_out = fluid.layers.embedding(
input=position_ids,
size=[self._max_position_seq_len, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._pos_emb_name, initializer=self._param_initializer))
sent_emb_out = fluid.layers.embedding(
sentence_ids,
size=[self._sent_types, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._sent_emb_name, initializer=self._param_initializer))
emb_out = fluid.layers.embedding(input=src_ids,
size=[self._voc_size, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._word_emb_name,
initializer=self._param_initializer),
is_sparse=False)
position_emb_out = fluid.layers.embedding(input=position_ids,
size=[self._max_position_seq_len, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._pos_emb_name,
initializer=self._param_initializer))
sent_emb_out = fluid.layers.embedding(sentence_ids,
size=[self._sent_types, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._sent_emb_name,
initializer=self._param_initializer))
emb_out = emb_out + position_emb_out
emb_out = emb_out + sent_emb_out
......@@ -105,23 +105,22 @@ class BertModel(object):
n_head_self_attn_mask = fluid.layers.stack(x=[self_attn_mask] * self._n_head, axis=1)
n_head_self_attn_mask.stop_gradient = True
self._enc_out = encoder(
enc_input=emb_out,
attn_bias=n_head_self_attn_mask,
n_layer=self._n_layer,
n_head=self._n_head,
d_key=self._emb_size // self._n_head,
d_value=self._emb_size // self._n_head,
d_model=self._emb_size,
d_inner_hid=self._emb_size * 4,
prepostprocess_dropout=self._prepostprocess_dropout,
attention_dropout=self._attention_dropout,
relu_dropout=0,
hidden_act=self._hidden_act,
preprocess_cmd="",
postprocess_cmd="dan",
param_initializer=self._param_initializer,
name='encoder')
self._enc_out = encoder(enc_input=emb_out,
attn_bias=n_head_self_attn_mask,
n_layer=self._n_layer,
n_head=self._n_head,
d_key=self._emb_size // self._n_head,
d_value=self._emb_size // self._n_head,
d_model=self._emb_size,
d_inner_hid=self._emb_size * 4,
prepostprocess_dropout=self._prepostprocess_dropout,
attention_dropout=self._attention_dropout,
relu_dropout=0,
hidden_act=self._hidden_act,
preprocess_cmd="",
postprocess_cmd="dan",
param_initializer=self._param_initializer,
name='encoder')
def get_sequence_output(self):
return self._enc_out
......@@ -130,12 +129,12 @@ class BertModel(object):
"""Get the first feature of each sequence for classification"""
next_sent_feat = fluid.layers.slice(input=self._enc_out, axes=[1], starts=[0], ends=[1])
next_sent_feat = fluid.layers.fc(
input=next_sent_feat,
size=self._emb_size,
act="tanh",
param_attr=fluid.ParamAttr(name="pooled_fc.w_0", initializer=self._param_initializer),
bias_attr="pooled_fc.b_0")
next_sent_feat = fluid.layers.fc(input=next_sent_feat,
size=self._emb_size,
act="tanh",
param_attr=fluid.ParamAttr(name="pooled_fc.w_0",
initializer=self._param_initializer),
bias_attr="pooled_fc.b_0")
return next_sent_feat
def get_pretraining_output(self, mask_label, mask_pos, labels):
......@@ -150,43 +149,45 @@ class BertModel(object):
mask_feat = fluid.layers.gather(input=reshaped_emb_out, index=mask_pos)
# transform: fc
mask_trans_feat = fluid.layers.fc(
input=mask_feat,
size=self._emb_size,
act=self._hidden_act,
param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0', initializer=self._param_initializer),
bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
mask_trans_feat = fluid.layers.fc(input=mask_feat,
size=self._emb_size,
act=self._hidden_act,
param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0',
initializer=self._param_initializer),
bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
# transform: layer norm
mask_trans_feat = pre_process_layer(mask_trans_feat, 'n', name='mask_lm_trans')
mask_lm_out_bias_attr = fluid.ParamAttr(
name="mask_lm_out_fc.b_0", initializer=fluid.initializer.Constant(value=0.0))
mask_lm_out_bias_attr = fluid.ParamAttr(name="mask_lm_out_fc.b_0",
initializer=fluid.initializer.Constant(value=0.0))
if self._weight_sharing:
fc_out = fluid.layers.matmul(
x=mask_trans_feat,
y=fluid.default_main_program().global_block().var(self._word_emb_name),
transpose_y=True)
fc_out += fluid.layers.create_parameter(
shape=[self._voc_size], dtype=self._dtype, attr=mask_lm_out_bias_attr, is_bias=True)
fc_out = fluid.layers.matmul(x=mask_trans_feat,
y=fluid.default_main_program().global_block().var(self._word_emb_name),
transpose_y=True)
fc_out += fluid.layers.create_parameter(shape=[self._voc_size],
dtype=self._dtype,
attr=mask_lm_out_bias_attr,
is_bias=True)
else:
fc_out = fluid.layers.fc(
input=mask_trans_feat,
size=self._voc_size,
param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0", initializer=self._param_initializer),
bias_attr=mask_lm_out_bias_attr)
fc_out = fluid.layers.fc(input=mask_trans_feat,
size=self._voc_size,
param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0",
initializer=self._param_initializer),
bias_attr=mask_lm_out_bias_attr)
mask_lm_loss = fluid.layers.softmax_with_cross_entropy(logits=fc_out, label=mask_label)
mean_mask_lm_loss = fluid.layers.mean(mask_lm_loss)
next_sent_fc_out = fluid.layers.fc(
input=next_sent_feat,
size=2,
param_attr=fluid.ParamAttr(name="next_sent_fc.w_0", initializer=self._param_initializer),
bias_attr="next_sent_fc.b_0")
next_sent_fc_out = fluid.layers.fc(input=next_sent_feat,
size=2,
param_attr=fluid.ParamAttr(name="next_sent_fc.w_0",
initializer=self._param_initializer),
bias_attr="next_sent_fc.b_0")
next_sent_loss, next_sent_softmax = fluid.layers.softmax_with_cross_entropy(
logits=next_sent_fc_out, label=labels, return_softmax=True)
next_sent_loss, next_sent_softmax = fluid.layers.softmax_with_cross_entropy(logits=next_sent_fc_out,
label=labels,
return_softmax=True)
next_sent_acc = fluid.layers.accuracy(input=next_sent_softmax, label=labels)
......
......@@ -50,24 +50,21 @@ def multi_head_attention(queries,
"""
Add linear projection to queries, keys, and values.
"""
q = layers.fc(
input=queries,
size=d_key * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_query_fc.w_0', initializer=param_initializer),
bias_attr=name + '_query_fc.b_0')
k = layers.fc(
input=keys,
size=d_key * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_key_fc.w_0', initializer=param_initializer),
bias_attr=name + '_key_fc.b_0')
v = layers.fc(
input=values,
size=d_value * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_value_fc.w_0', initializer=param_initializer),
bias_attr=name + '_value_fc.b_0')
q = layers.fc(input=queries,
size=d_key * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_query_fc.w_0', initializer=param_initializer),
bias_attr=name + '_query_fc.b_0')
k = layers.fc(input=keys,
size=d_key * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_key_fc.w_0', initializer=param_initializer),
bias_attr=name + '_key_fc.b_0')
v = layers.fc(input=values,
size=d_value * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_value_fc.w_0', initializer=param_initializer),
bias_attr=name + '_value_fc.b_0')
return q, k, v
def __split_heads(x, n_head):
......@@ -110,8 +107,10 @@ def multi_head_attention(queries,
product += attn_bias
weights = layers.softmax(product)
if dropout_rate:
weights = layers.dropout(
weights, dropout_prob=dropout_rate, dropout_implementation="upscale_in_train", is_test=False)
weights = layers.dropout(weights,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
out = layers.matmul(weights, v)
return out
......@@ -133,12 +132,11 @@ def multi_head_attention(queries,
out = __combine_heads(ctx_multiheads)
# Project back to the model size.
proj_out = layers.fc(
input=out,
size=d_model,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_output_fc.w_0', initializer=param_initializer),
bias_attr=name + '_output_fc.b_0')
proj_out = layers.fc(input=out,
size=d_model,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_output_fc.w_0', initializer=param_initializer),
bias_attr=name + '_output_fc.b_0')
return proj_out
......@@ -148,22 +146,22 @@ def positionwise_feed_forward(x, d_inner_hid, d_hid, dropout_rate, hidden_act, p
This module consists of two linear transformations with a ReLU activation
in between, which is applied to each position separately and identically.
"""
hidden = layers.fc(
input=x,
size=d_inner_hid,
num_flatten_dims=2,
act=hidden_act,
param_attr=fluid.ParamAttr(name=name + '_fc_0.w_0', initializer=param_initializer),
bias_attr=name + '_fc_0.b_0')
hidden = layers.fc(input=x,
size=d_inner_hid,
num_flatten_dims=2,
act=hidden_act,
param_attr=fluid.ParamAttr(name=name + '_fc_0.w_0', initializer=param_initializer),
bias_attr=name + '_fc_0.b_0')
if dropout_rate:
hidden = layers.dropout(
hidden, dropout_prob=dropout_rate, dropout_implementation="upscale_in_train", is_test=False)
out = layers.fc(
input=hidden,
size=d_hid,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_fc_1.w_0', initializer=param_initializer),
bias_attr=name + '_fc_1.b_0')
hidden = layers.dropout(hidden,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
out = layers.fc(input=hidden,
size=d_hid,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_fc_1.w_0', initializer=param_initializer),
bias_attr=name + '_fc_1.b_0')
return out
......@@ -181,17 +179,20 @@ def pre_post_process_layer(prev_out, out, process_cmd, dropout_rate=0., name='')
out_dtype = out.dtype
if out_dtype == fluid.core.VarDesc.VarType.FP16:
out = layers.cast(x=out, dtype="float32")
out = layers.layer_norm(
out,
begin_norm_axis=len(out.shape) - 1,
param_attr=fluid.ParamAttr(name=name + '_layer_norm_scale', initializer=fluid.initializer.Constant(1.)),
bias_attr=fluid.ParamAttr(name=name + '_layer_norm_bias', initializer=fluid.initializer.Constant(0.)))
out = layers.layer_norm(out,
begin_norm_axis=len(out.shape) - 1,
param_attr=fluid.ParamAttr(name=name + '_layer_norm_scale',
initializer=fluid.initializer.Constant(1.)),
bias_attr=fluid.ParamAttr(name=name + '_layer_norm_bias',
initializer=fluid.initializer.Constant(0.)))
if out_dtype == fluid.core.VarDesc.VarType.FP16:
out = layers.cast(x=out, dtype="float16")
elif cmd == "d": # add dropout
if dropout_rate:
out = layers.dropout(
out, dropout_prob=dropout_rate, dropout_implementation="upscale_in_train", is_test=False)
out = layers.dropout(out,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
return out
......@@ -220,28 +221,35 @@ def encoder_layer(enc_input,
with the post_process_layer to add residual connection, layer normalization
and droput.
"""
attn_output = multi_head_attention(
pre_process_layer(enc_input, preprocess_cmd, prepostprocess_dropout, name=name + '_pre_att'),
None,
None,
attn_bias,
d_key,
d_value,
d_model,
n_head,
attention_dropout,
param_initializer=param_initializer,
name=name + '_multi_head_att')
attn_output = post_process_layer(
enc_input, attn_output, postprocess_cmd, prepostprocess_dropout, name=name + '_post_att')
ffd_output = positionwise_feed_forward(
pre_process_layer(attn_output, preprocess_cmd, prepostprocess_dropout, name=name + '_pre_ffn'),
d_inner_hid,
d_model,
relu_dropout,
hidden_act,
param_initializer=param_initializer,
name=name + '_ffn')
attn_output = multi_head_attention(pre_process_layer(enc_input,
preprocess_cmd,
prepostprocess_dropout,
name=name + '_pre_att'),
None,
None,
attn_bias,
d_key,
d_value,
d_model,
n_head,
attention_dropout,
param_initializer=param_initializer,
name=name + '_multi_head_att')
attn_output = post_process_layer(enc_input,
attn_output,
postprocess_cmd,
prepostprocess_dropout,
name=name + '_post_att')
ffd_output = positionwise_feed_forward(pre_process_layer(attn_output,
preprocess_cmd,
prepostprocess_dropout,
name=name + '_pre_ffn'),
d_inner_hid,
d_model,
relu_dropout,
hidden_act,
param_initializer=param_initializer,
name=name + '_ffn')
return post_process_layer(attn_output, ffd_output, postprocess_cmd, prepostprocess_dropout, name=name + '_post_ffn')
......@@ -266,22 +274,21 @@ def encoder(enc_input,
encoder_layer.
"""
for i in range(n_layer):
enc_output = encoder_layer(
enc_input,
attn_bias,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd,
postprocess_cmd,
param_initializer=param_initializer,
name=name + '_layer_' + str(i))
enc_output = encoder_layer(enc_input,
attn_bias,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd,
postprocess_cmd,
param_initializer=param_initializer,
name=name + '_layer_' + str(i))
enc_input = enc_output
enc_output = pre_process_layer(enc_output, preprocess_cmd, prepostprocess_dropout, name="post_encoder")
......
......@@ -58,13 +58,12 @@ class Bert(TransformerModule):
pooled_output (tensor): sentence-level output for classification task.
sequence_output (tensor): token-level output for sequence task.
"""
bert = BertModel(
src_ids=input_ids,
position_ids=position_ids,
sentence_ids=segment_ids,
input_mask=input_mask,
config=self.bert_config,
use_fp16=False)
bert = BertModel(src_ids=input_ids,
position_ids=position_ids,
sentence_ids=segment_ids,
input_mask=input_mask,
config=self.bert_config,
use_fp16=False)
pooled_output = bert.get_pooled_output()
sequence_output = bert.get_sequence_output()
return pooled_output, sequence_output
......
......@@ -74,23 +74,23 @@ class BertModel(object):
def _build_model(self, src_ids, position_ids, sentence_ids, input_mask):
# padding id in vocabulary must be set to 0
emb_out = fluid.layers.embedding(
input=src_ids,
size=[self._voc_size, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._word_emb_name, initializer=self._param_initializer),
is_sparse=False)
position_emb_out = fluid.layers.embedding(
input=position_ids,
size=[self._max_position_seq_len, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._pos_emb_name, initializer=self._param_initializer))
sent_emb_out = fluid.layers.embedding(
sentence_ids,
size=[self._sent_types, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._sent_emb_name, initializer=self._param_initializer))
emb_out = fluid.layers.embedding(input=src_ids,
size=[self._voc_size, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._word_emb_name,
initializer=self._param_initializer),
is_sparse=False)
position_emb_out = fluid.layers.embedding(input=position_ids,
size=[self._max_position_seq_len, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._pos_emb_name,
initializer=self._param_initializer))
sent_emb_out = fluid.layers.embedding(sentence_ids,
size=[self._sent_types, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._sent_emb_name,
initializer=self._param_initializer))
emb_out = emb_out + position_emb_out
emb_out = emb_out + sent_emb_out
......@@ -105,23 +105,22 @@ class BertModel(object):
n_head_self_attn_mask = fluid.layers.stack(x=[self_attn_mask] * self._n_head, axis=1)
n_head_self_attn_mask.stop_gradient = True
self._enc_out = encoder(
enc_input=emb_out,
attn_bias=n_head_self_attn_mask,
n_layer=self._n_layer,
n_head=self._n_head,
d_key=self._emb_size // self._n_head,
d_value=self._emb_size // self._n_head,
d_model=self._emb_size,
d_inner_hid=self._emb_size * 4,
prepostprocess_dropout=self._prepostprocess_dropout,
attention_dropout=self._attention_dropout,
relu_dropout=0,
hidden_act=self._hidden_act,
preprocess_cmd="",
postprocess_cmd="dan",
param_initializer=self._param_initializer,
name='encoder')
self._enc_out = encoder(enc_input=emb_out,
attn_bias=n_head_self_attn_mask,
n_layer=self._n_layer,
n_head=self._n_head,
d_key=self._emb_size // self._n_head,
d_value=self._emb_size // self._n_head,
d_model=self._emb_size,
d_inner_hid=self._emb_size * 4,
prepostprocess_dropout=self._prepostprocess_dropout,
attention_dropout=self._attention_dropout,
relu_dropout=0,
hidden_act=self._hidden_act,
preprocess_cmd="",
postprocess_cmd="dan",
param_initializer=self._param_initializer,
name='encoder')
def get_sequence_output(self):
return self._enc_out
......@@ -130,12 +129,12 @@ class BertModel(object):
"""Get the first feature of each sequence for classification"""
next_sent_feat = fluid.layers.slice(input=self._enc_out, axes=[1], starts=[0], ends=[1])
next_sent_feat = fluid.layers.fc(
input=next_sent_feat,
size=self._emb_size,
act="tanh",
param_attr=fluid.ParamAttr(name="pooled_fc.w_0", initializer=self._param_initializer),
bias_attr="pooled_fc.b_0")
next_sent_feat = fluid.layers.fc(input=next_sent_feat,
size=self._emb_size,
act="tanh",
param_attr=fluid.ParamAttr(name="pooled_fc.w_0",
initializer=self._param_initializer),
bias_attr="pooled_fc.b_0")
return next_sent_feat
def get_pretraining_output(self, mask_label, mask_pos, labels):
......@@ -150,43 +149,45 @@ class BertModel(object):
mask_feat = fluid.layers.gather(input=reshaped_emb_out, index=mask_pos)
# transform: fc
mask_trans_feat = fluid.layers.fc(
input=mask_feat,
size=self._emb_size,
act=self._hidden_act,
param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0', initializer=self._param_initializer),
bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
mask_trans_feat = fluid.layers.fc(input=mask_feat,
size=self._emb_size,
act=self._hidden_act,
param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0',
initializer=self._param_initializer),
bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
# transform: layer norm
mask_trans_feat = pre_process_layer(mask_trans_feat, 'n', name='mask_lm_trans')
mask_lm_out_bias_attr = fluid.ParamAttr(
name="mask_lm_out_fc.b_0", initializer=fluid.initializer.Constant(value=0.0))
mask_lm_out_bias_attr = fluid.ParamAttr(name="mask_lm_out_fc.b_0",
initializer=fluid.initializer.Constant(value=0.0))
if self._weight_sharing:
fc_out = fluid.layers.matmul(
x=mask_trans_feat,
y=fluid.default_main_program().global_block().var(self._word_emb_name),
transpose_y=True)
fc_out += fluid.layers.create_parameter(
shape=[self._voc_size], dtype=self._dtype, attr=mask_lm_out_bias_attr, is_bias=True)
fc_out = fluid.layers.matmul(x=mask_trans_feat,
y=fluid.default_main_program().global_block().var(self._word_emb_name),
transpose_y=True)
fc_out += fluid.layers.create_parameter(shape=[self._voc_size],
dtype=self._dtype,
attr=mask_lm_out_bias_attr,
is_bias=True)
else:
fc_out = fluid.layers.fc(
input=mask_trans_feat,
size=self._voc_size,
param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0", initializer=self._param_initializer),
bias_attr=mask_lm_out_bias_attr)
fc_out = fluid.layers.fc(input=mask_trans_feat,
size=self._voc_size,
param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0",
initializer=self._param_initializer),
bias_attr=mask_lm_out_bias_attr)
mask_lm_loss = fluid.layers.softmax_with_cross_entropy(logits=fc_out, label=mask_label)
mean_mask_lm_loss = fluid.layers.mean(mask_lm_loss)
next_sent_fc_out = fluid.layers.fc(
input=next_sent_feat,
size=2,
param_attr=fluid.ParamAttr(name="next_sent_fc.w_0", initializer=self._param_initializer),
bias_attr="next_sent_fc.b_0")
next_sent_fc_out = fluid.layers.fc(input=next_sent_feat,
size=2,
param_attr=fluid.ParamAttr(name="next_sent_fc.w_0",
initializer=self._param_initializer),
bias_attr="next_sent_fc.b_0")
next_sent_loss, next_sent_softmax = fluid.layers.softmax_with_cross_entropy(
logits=next_sent_fc_out, label=labels, return_softmax=True)
next_sent_loss, next_sent_softmax = fluid.layers.softmax_with_cross_entropy(logits=next_sent_fc_out,
label=labels,
return_softmax=True)
next_sent_acc = fluid.layers.accuracy(input=next_sent_softmax, label=labels)
......
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Transformer encoder."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from functools import partial
import paddle.fluid as fluid
import paddle.fluid.layers as layers
def multi_head_attention(queries,
keys,
values,
attn_bias,
d_key,
d_value,
d_model,
n_head=1,
dropout_rate=0.,
cache=None,
param_initializer=None,
name='multi_head_att'):
"""
Multi-Head Attention. Note that attn_bias is added to the logit before
computing softmax activiation to mask certain selected positions so that
they will not considered in attention weights.
"""
keys = queries if keys is None else keys
values = keys if values is None else values
if not (len(queries.shape) == len(keys.shape) == len(values.shape) == 3):
raise ValueError("Inputs: quries, keys and values should all be 3-D tensors.")
def __compute_qkv(queries, keys, values, n_head, d_key, d_value):
"""
Add linear projection to queries, keys, and values.
"""
q = layers.fc(input=queries,
size=d_key * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_query_fc.w_0', initializer=param_initializer),
bias_attr=name + '_query_fc.b_0')
k = layers.fc(input=keys,
size=d_key * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_key_fc.w_0', initializer=param_initializer),
bias_attr=name + '_key_fc.b_0')
v = layers.fc(input=values,
size=d_value * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_value_fc.w_0', initializer=param_initializer),
bias_attr=name + '_value_fc.b_0')
return q, k, v
def __split_heads(x, n_head):
"""
Reshape the last dimension of inpunt tensor x so that it becomes two
dimensions and then transpose. Specifically, input a tensor with shape
[bs, max_sequence_length, n_head * hidden_dim] then output a tensor
with shape [bs, n_head, max_sequence_length, hidden_dim].
"""
hidden_size = x.shape[-1]
# The value 0 in shape attr means copying the corresponding dimension
# size of the input as the output dimension size.
reshaped = layers.reshape(x=x, shape=[0, 0, n_head, hidden_size // n_head], inplace=True)
# permuate the dimensions into:
# [batch_size, n_head, max_sequence_len, hidden_size_per_head]
return layers.transpose(x=reshaped, perm=[0, 2, 1, 3])
def __combine_heads(x):
"""
Transpose and then reshape the last two dimensions of inpunt tensor x
so that it becomes one dimension, which is reverse to __split_heads.
"""
if len(x.shape) == 3: return x
if len(x.shape) != 4:
raise ValueError("Input(x) should be a 4-D Tensor.")
trans_x = layers.transpose(x, perm=[0, 2, 1, 3])
# The value 0 in shape attr means copying the corresponding dimension
# size of the input as the output dimension size.
return layers.reshape(x=trans_x, shape=[0, 0, trans_x.shape[2] * trans_x.shape[3]], inplace=True)
def scaled_dot_product_attention(q, k, v, attn_bias, d_key, dropout_rate):
"""
Scaled Dot-Product Attention
"""
scaled_q = layers.scale(x=q, scale=d_key**-0.5)
product = layers.matmul(x=scaled_q, y=k, transpose_y=True)
if attn_bias:
product += attn_bias
weights = layers.softmax(product)
if dropout_rate:
weights = layers.dropout(weights,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
out = layers.matmul(weights, v)
return out
q, k, v = __compute_qkv(queries, keys, values, n_head, d_key, d_value)
if cache is not None: # use cache and concat time steps
# Since the inplace reshape in __split_heads changes the shape of k and
# v, which is the cache input for next time step, reshape the cache
# input from the previous time step first.
k = cache["k"] = layers.concat([layers.reshape(cache["k"], shape=[0, 0, d_model]), k], axis=1)
v = cache["v"] = layers.concat([layers.reshape(cache["v"], shape=[0, 0, d_model]), v], axis=1)
q = __split_heads(q, n_head)
k = __split_heads(k, n_head)
v = __split_heads(v, n_head)
ctx_multiheads = scaled_dot_product_attention(q, k, v, attn_bias, d_key, dropout_rate)
out = __combine_heads(ctx_multiheads)
# Project back to the model size.
proj_out = layers.fc(input=out,
size=d_model,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_output_fc.w_0', initializer=param_initializer),
bias_attr=name + '_output_fc.b_0')
return proj_out
def positionwise_feed_forward(x, d_inner_hid, d_hid, dropout_rate, hidden_act, param_initializer=None, name='ffn'):
"""
Position-wise Feed-Forward Networks.
This module consists of two linear transformations with a ReLU activation
in between, which is applied to each position separately and identically.
"""
hidden = layers.fc(input=x,
size=d_inner_hid,
num_flatten_dims=2,
act=hidden_act,
param_attr=fluid.ParamAttr(name=name + '_fc_0.w_0', initializer=param_initializer),
bias_attr=name + '_fc_0.b_0')
if dropout_rate:
hidden = layers.dropout(hidden,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
out = layers.fc(input=hidden,
size=d_hid,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_fc_1.w_0', initializer=param_initializer),
bias_attr=name + '_fc_1.b_0')
return out
def pre_post_process_layer(prev_out, out, process_cmd, dropout_rate=0., name=''):
"""
Add residual connection, layer normalization and droput to the out tensor
optionally according to the value of process_cmd.
This will be used before or after multi-head attention and position-wise
feed-forward networks.
"""
for cmd in process_cmd:
if cmd == "a": # add residual connection
out = out + prev_out if prev_out else out
elif cmd == "n": # add layer normalization
out_dtype = out.dtype
if out_dtype == fluid.core.VarDesc.VarType.FP16:
out = layers.cast(x=out, dtype="float32")
out = layers.layer_norm(out,
begin_norm_axis=len(out.shape) - 1,
param_attr=fluid.ParamAttr(name=name + '_layer_norm_scale',
initializer=fluid.initializer.Constant(1.)),
bias_attr=fluid.ParamAttr(name=name + '_layer_norm_bias',
initializer=fluid.initializer.Constant(0.)))
if out_dtype == fluid.core.VarDesc.VarType.FP16:
out = layers.cast(x=out, dtype="float16")
elif cmd == "d": # add dropout
if dropout_rate:
out = layers.dropout(out,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
return out
pre_process_layer = partial(pre_post_process_layer, None)
post_process_layer = pre_post_process_layer
def encoder_layer(enc_input,
attn_bias,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd="n",
postprocess_cmd="da",
param_initializer=None,
name=''):
"""The encoder layers that can be stacked to form a deep encoder.
This module consits of a multi-head (self) attention followed by
position-wise feed-forward networks and both the two components companied
with the post_process_layer to add residual connection, layer normalization
and droput.
"""
attn_output = multi_head_attention(pre_process_layer(enc_input,
preprocess_cmd,
prepostprocess_dropout,
name=name + '_pre_att'),
None,
None,
attn_bias,
d_key,
d_value,
d_model,
n_head,
attention_dropout,
param_initializer=param_initializer,
name=name + '_multi_head_att')
attn_output = post_process_layer(enc_input,
attn_output,
postprocess_cmd,
prepostprocess_dropout,
name=name + '_post_att')
ffd_output = positionwise_feed_forward(pre_process_layer(attn_output,
preprocess_cmd,
prepostprocess_dropout,
name=name + '_pre_ffn'),
d_inner_hid,
d_model,
relu_dropout,
hidden_act,
param_initializer=param_initializer,
name=name + '_ffn')
return post_process_layer(attn_output, ffd_output, postprocess_cmd, prepostprocess_dropout, name=name + '_post_ffn')
def encoder(enc_input,
attn_bias,
n_layer,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd="n",
postprocess_cmd="da",
param_initializer=None,
name=''):
"""
The encoder is composed of a stack of identical layers returned by calling
encoder_layer.
"""
for i in range(n_layer):
enc_output = encoder_layer(enc_input,
attn_bias,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd,
postprocess_cmd,
param_initializer=param_initializer,
name=name + '_layer_' + str(i))
enc_input = enc_output
enc_output = pre_process_layer(enc_output, preprocess_cmd, prepostprocess_dropout, name="post_encoder")
return enc_output
......@@ -58,13 +58,12 @@ class Bert(TransformerModule):
pooled_output (tensor): sentence-level output for classification task.
sequence_output (tensor): token-level output for sequence task.
"""
bert = BertModel(
src_ids=input_ids,
position_ids=position_ids,
sentence_ids=segment_ids,
input_mask=input_mask,
config=self.bert_config,
use_fp16=False)
bert = BertModel(src_ids=input_ids,
position_ids=position_ids,
sentence_ids=segment_ids,
input_mask=input_mask,
config=self.bert_config,
use_fp16=False)
pooled_output = bert.get_pooled_output()
sequence_output = bert.get_sequence_output()
return pooled_output, sequence_output
......
......@@ -74,23 +74,23 @@ class BertModel(object):
def _build_model(self, src_ids, position_ids, sentence_ids, input_mask):
# padding id in vocabulary must be set to 0
emb_out = fluid.layers.embedding(
input=src_ids,
size=[self._voc_size, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._word_emb_name, initializer=self._param_initializer),
is_sparse=False)
position_emb_out = fluid.layers.embedding(
input=position_ids,
size=[self._max_position_seq_len, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._pos_emb_name, initializer=self._param_initializer))
sent_emb_out = fluid.layers.embedding(
sentence_ids,
size=[self._sent_types, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._sent_emb_name, initializer=self._param_initializer))
emb_out = fluid.layers.embedding(input=src_ids,
size=[self._voc_size, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._word_emb_name,
initializer=self._param_initializer),
is_sparse=False)
position_emb_out = fluid.layers.embedding(input=position_ids,
size=[self._max_position_seq_len, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._pos_emb_name,
initializer=self._param_initializer))
sent_emb_out = fluid.layers.embedding(sentence_ids,
size=[self._sent_types, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._sent_emb_name,
initializer=self._param_initializer))
emb_out = emb_out + position_emb_out
emb_out = emb_out + sent_emb_out
......@@ -105,23 +105,22 @@ class BertModel(object):
n_head_self_attn_mask = fluid.layers.stack(x=[self_attn_mask] * self._n_head, axis=1)
n_head_self_attn_mask.stop_gradient = True
self._enc_out = encoder(
enc_input=emb_out,
attn_bias=n_head_self_attn_mask,
n_layer=self._n_layer,
n_head=self._n_head,
d_key=self._emb_size // self._n_head,
d_value=self._emb_size // self._n_head,
d_model=self._emb_size,
d_inner_hid=self._emb_size * 4,
prepostprocess_dropout=self._prepostprocess_dropout,
attention_dropout=self._attention_dropout,
relu_dropout=0,
hidden_act=self._hidden_act,
preprocess_cmd="",
postprocess_cmd="dan",
param_initializer=self._param_initializer,
name='encoder')
self._enc_out = encoder(enc_input=emb_out,
attn_bias=n_head_self_attn_mask,
n_layer=self._n_layer,
n_head=self._n_head,
d_key=self._emb_size // self._n_head,
d_value=self._emb_size // self._n_head,
d_model=self._emb_size,
d_inner_hid=self._emb_size * 4,
prepostprocess_dropout=self._prepostprocess_dropout,
attention_dropout=self._attention_dropout,
relu_dropout=0,
hidden_act=self._hidden_act,
preprocess_cmd="",
postprocess_cmd="dan",
param_initializer=self._param_initializer,
name='encoder')
def get_sequence_output(self):
return self._enc_out
......@@ -130,12 +129,12 @@ class BertModel(object):
"""Get the first feature of each sequence for classification"""
next_sent_feat = fluid.layers.slice(input=self._enc_out, axes=[1], starts=[0], ends=[1])
next_sent_feat = fluid.layers.fc(
input=next_sent_feat,
size=self._emb_size,
act="tanh",
param_attr=fluid.ParamAttr(name="pooled_fc.w_0", initializer=self._param_initializer),
bias_attr="pooled_fc.b_0")
next_sent_feat = fluid.layers.fc(input=next_sent_feat,
size=self._emb_size,
act="tanh",
param_attr=fluid.ParamAttr(name="pooled_fc.w_0",
initializer=self._param_initializer),
bias_attr="pooled_fc.b_0")
return next_sent_feat
def get_pretraining_output(self, mask_label, mask_pos, labels):
......@@ -150,43 +149,45 @@ class BertModel(object):
mask_feat = fluid.layers.gather(input=reshaped_emb_out, index=mask_pos)
# transform: fc
mask_trans_feat = fluid.layers.fc(
input=mask_feat,
size=self._emb_size,
act=self._hidden_act,
param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0', initializer=self._param_initializer),
bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
mask_trans_feat = fluid.layers.fc(input=mask_feat,
size=self._emb_size,
act=self._hidden_act,
param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0',
initializer=self._param_initializer),
bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
# transform: layer norm
mask_trans_feat = pre_process_layer(mask_trans_feat, 'n', name='mask_lm_trans')
mask_lm_out_bias_attr = fluid.ParamAttr(
name="mask_lm_out_fc.b_0", initializer=fluid.initializer.Constant(value=0.0))
mask_lm_out_bias_attr = fluid.ParamAttr(name="mask_lm_out_fc.b_0",
initializer=fluid.initializer.Constant(value=0.0))
if self._weight_sharing:
fc_out = fluid.layers.matmul(
x=mask_trans_feat,
y=fluid.default_main_program().global_block().var(self._word_emb_name),
transpose_y=True)
fc_out += fluid.layers.create_parameter(
shape=[self._voc_size], dtype=self._dtype, attr=mask_lm_out_bias_attr, is_bias=True)
fc_out = fluid.layers.matmul(x=mask_trans_feat,
y=fluid.default_main_program().global_block().var(self._word_emb_name),
transpose_y=True)
fc_out += fluid.layers.create_parameter(shape=[self._voc_size],
dtype=self._dtype,
attr=mask_lm_out_bias_attr,
is_bias=True)
else:
fc_out = fluid.layers.fc(
input=mask_trans_feat,
size=self._voc_size,
param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0", initializer=self._param_initializer),
bias_attr=mask_lm_out_bias_attr)
fc_out = fluid.layers.fc(input=mask_trans_feat,
size=self._voc_size,
param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0",
initializer=self._param_initializer),
bias_attr=mask_lm_out_bias_attr)
mask_lm_loss = fluid.layers.softmax_with_cross_entropy(logits=fc_out, label=mask_label)
mean_mask_lm_loss = fluid.layers.mean(mask_lm_loss)
next_sent_fc_out = fluid.layers.fc(
input=next_sent_feat,
size=2,
param_attr=fluid.ParamAttr(name="next_sent_fc.w_0", initializer=self._param_initializer),
bias_attr="next_sent_fc.b_0")
next_sent_fc_out = fluid.layers.fc(input=next_sent_feat,
size=2,
param_attr=fluid.ParamAttr(name="next_sent_fc.w_0",
initializer=self._param_initializer),
bias_attr="next_sent_fc.b_0")
next_sent_loss, next_sent_softmax = fluid.layers.softmax_with_cross_entropy(
logits=next_sent_fc_out, label=labels, return_softmax=True)
next_sent_loss, next_sent_softmax = fluid.layers.softmax_with_cross_entropy(logits=next_sent_fc_out,
label=labels,
return_softmax=True)
next_sent_acc = fluid.layers.accuracy(input=next_sent_softmax, label=labels)
......
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Transformer encoder."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from functools import partial
import paddle.fluid as fluid
import paddle.fluid.layers as layers
def multi_head_attention(queries,
keys,
values,
attn_bias,
d_key,
d_value,
d_model,
n_head=1,
dropout_rate=0.,
cache=None,
param_initializer=None,
name='multi_head_att'):
"""
Multi-Head Attention. Note that attn_bias is added to the logit before
computing softmax activiation to mask certain selected positions so that
they will not considered in attention weights.
"""
keys = queries if keys is None else keys
values = keys if values is None else values
if not (len(queries.shape) == len(keys.shape) == len(values.shape) == 3):
raise ValueError("Inputs: quries, keys and values should all be 3-D tensors.")
def __compute_qkv(queries, keys, values, n_head, d_key, d_value):
"""
Add linear projection to queries, keys, and values.
"""
q = layers.fc(input=queries,
size=d_key * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_query_fc.w_0', initializer=param_initializer),
bias_attr=name + '_query_fc.b_0')
k = layers.fc(input=keys,
size=d_key * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_key_fc.w_0', initializer=param_initializer),
bias_attr=name + '_key_fc.b_0')
v = layers.fc(input=values,
size=d_value * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_value_fc.w_0', initializer=param_initializer),
bias_attr=name + '_value_fc.b_0')
return q, k, v
def __split_heads(x, n_head):
"""
Reshape the last dimension of inpunt tensor x so that it becomes two
dimensions and then transpose. Specifically, input a tensor with shape
[bs, max_sequence_length, n_head * hidden_dim] then output a tensor
with shape [bs, n_head, max_sequence_length, hidden_dim].
"""
hidden_size = x.shape[-1]
# The value 0 in shape attr means copying the corresponding dimension
# size of the input as the output dimension size.
reshaped = layers.reshape(x=x, shape=[0, 0, n_head, hidden_size // n_head], inplace=True)
# permuate the dimensions into:
# [batch_size, n_head, max_sequence_len, hidden_size_per_head]
return layers.transpose(x=reshaped, perm=[0, 2, 1, 3])
def __combine_heads(x):
"""
Transpose and then reshape the last two dimensions of inpunt tensor x
so that it becomes one dimension, which is reverse to __split_heads.
"""
if len(x.shape) == 3: return x
if len(x.shape) != 4:
raise ValueError("Input(x) should be a 4-D Tensor.")
trans_x = layers.transpose(x, perm=[0, 2, 1, 3])
# The value 0 in shape attr means copying the corresponding dimension
# size of the input as the output dimension size.
return layers.reshape(x=trans_x, shape=[0, 0, trans_x.shape[2] * trans_x.shape[3]], inplace=True)
def scaled_dot_product_attention(q, k, v, attn_bias, d_key, dropout_rate):
"""
Scaled Dot-Product Attention
"""
scaled_q = layers.scale(x=q, scale=d_key**-0.5)
product = layers.matmul(x=scaled_q, y=k, transpose_y=True)
if attn_bias:
product += attn_bias
weights = layers.softmax(product)
if dropout_rate:
weights = layers.dropout(weights,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
out = layers.matmul(weights, v)
return out
q, k, v = __compute_qkv(queries, keys, values, n_head, d_key, d_value)
if cache is not None: # use cache and concat time steps
# Since the inplace reshape in __split_heads changes the shape of k and
# v, which is the cache input for next time step, reshape the cache
# input from the previous time step first.
k = cache["k"] = layers.concat([layers.reshape(cache["k"], shape=[0, 0, d_model]), k], axis=1)
v = cache["v"] = layers.concat([layers.reshape(cache["v"], shape=[0, 0, d_model]), v], axis=1)
q = __split_heads(q, n_head)
k = __split_heads(k, n_head)
v = __split_heads(v, n_head)
ctx_multiheads = scaled_dot_product_attention(q, k, v, attn_bias, d_key, dropout_rate)
out = __combine_heads(ctx_multiheads)
# Project back to the model size.
proj_out = layers.fc(input=out,
size=d_model,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_output_fc.w_0', initializer=param_initializer),
bias_attr=name + '_output_fc.b_0')
return proj_out
def positionwise_feed_forward(x, d_inner_hid, d_hid, dropout_rate, hidden_act, param_initializer=None, name='ffn'):
"""
Position-wise Feed-Forward Networks.
This module consists of two linear transformations with a ReLU activation
in between, which is applied to each position separately and identically.
"""
hidden = layers.fc(input=x,
size=d_inner_hid,
num_flatten_dims=2,
act=hidden_act,
param_attr=fluid.ParamAttr(name=name + '_fc_0.w_0', initializer=param_initializer),
bias_attr=name + '_fc_0.b_0')
if dropout_rate:
hidden = layers.dropout(hidden,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
out = layers.fc(input=hidden,
size=d_hid,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_fc_1.w_0', initializer=param_initializer),
bias_attr=name + '_fc_1.b_0')
return out
def pre_post_process_layer(prev_out, out, process_cmd, dropout_rate=0., name=''):
"""
Add residual connection, layer normalization and droput to the out tensor
optionally according to the value of process_cmd.
This will be used before or after multi-head attention and position-wise
feed-forward networks.
"""
for cmd in process_cmd:
if cmd == "a": # add residual connection
out = out + prev_out if prev_out else out
elif cmd == "n": # add layer normalization
out_dtype = out.dtype
if out_dtype == fluid.core.VarDesc.VarType.FP16:
out = layers.cast(x=out, dtype="float32")
out = layers.layer_norm(out,
begin_norm_axis=len(out.shape) - 1,
param_attr=fluid.ParamAttr(name=name + '_layer_norm_scale',
initializer=fluid.initializer.Constant(1.)),
bias_attr=fluid.ParamAttr(name=name + '_layer_norm_bias',
initializer=fluid.initializer.Constant(0.)))
if out_dtype == fluid.core.VarDesc.VarType.FP16:
out = layers.cast(x=out, dtype="float16")
elif cmd == "d": # add dropout
if dropout_rate:
out = layers.dropout(out,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
return out
pre_process_layer = partial(pre_post_process_layer, None)
post_process_layer = pre_post_process_layer
def encoder_layer(enc_input,
attn_bias,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd="n",
postprocess_cmd="da",
param_initializer=None,
name=''):
"""The encoder layers that can be stacked to form a deep encoder.
This module consits of a multi-head (self) attention followed by
position-wise feed-forward networks and both the two components companied
with the post_process_layer to add residual connection, layer normalization
and droput.
"""
attn_output = multi_head_attention(pre_process_layer(enc_input,
preprocess_cmd,
prepostprocess_dropout,
name=name + '_pre_att'),
None,
None,
attn_bias,
d_key,
d_value,
d_model,
n_head,
attention_dropout,
param_initializer=param_initializer,
name=name + '_multi_head_att')
attn_output = post_process_layer(enc_input,
attn_output,
postprocess_cmd,
prepostprocess_dropout,
name=name + '_post_att')
ffd_output = positionwise_feed_forward(pre_process_layer(attn_output,
preprocess_cmd,
prepostprocess_dropout,
name=name + '_pre_ffn'),
d_inner_hid,
d_model,
relu_dropout,
hidden_act,
param_initializer=param_initializer,
name=name + '_ffn')
return post_process_layer(attn_output, ffd_output, postprocess_cmd, prepostprocess_dropout, name=name + '_post_ffn')
def encoder(enc_input,
attn_bias,
n_layer,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd="n",
postprocess_cmd="da",
param_initializer=None,
name=''):
"""
The encoder is composed of a stack of identical layers returned by calling
encoder_layer.
"""
for i in range(n_layer):
enc_output = encoder_layer(enc_input,
attn_bias,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd,
postprocess_cmd,
param_initializer=param_initializer,
name=name + '_layer_' + str(i))
enc_input = enc_output
enc_output = pre_process_layer(enc_output, preprocess_cmd, prepostprocess_dropout, name="post_encoder")
return enc_output
......@@ -58,13 +58,12 @@ class Bert(TransformerModule):
pooled_output (tensor): sentence-level output for classification task.
sequence_output (tensor): token-level output for sequence task.
"""
bert = BertModel(
src_ids=input_ids,
position_ids=position_ids,
sentence_ids=segment_ids,
input_mask=input_mask,
config=self.bert_config,
use_fp16=False)
bert = BertModel(src_ids=input_ids,
position_ids=position_ids,
sentence_ids=segment_ids,
input_mask=input_mask,
config=self.bert_config,
use_fp16=False)
pooled_output = bert.get_pooled_output()
sequence_output = bert.get_sequence_output()
return pooled_output, sequence_output
......
......@@ -74,23 +74,23 @@ class BertModel(object):
def _build_model(self, src_ids, position_ids, sentence_ids, input_mask):
# padding id in vocabulary must be set to 0
emb_out = fluid.layers.embedding(
input=src_ids,
size=[self._voc_size, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._word_emb_name, initializer=self._param_initializer),
is_sparse=False)
position_emb_out = fluid.layers.embedding(
input=position_ids,
size=[self._max_position_seq_len, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._pos_emb_name, initializer=self._param_initializer))
sent_emb_out = fluid.layers.embedding(
sentence_ids,
size=[self._sent_types, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._sent_emb_name, initializer=self._param_initializer))
emb_out = fluid.layers.embedding(input=src_ids,
size=[self._voc_size, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._word_emb_name,
initializer=self._param_initializer),
is_sparse=False)
position_emb_out = fluid.layers.embedding(input=position_ids,
size=[self._max_position_seq_len, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._pos_emb_name,
initializer=self._param_initializer))
sent_emb_out = fluid.layers.embedding(sentence_ids,
size=[self._sent_types, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._sent_emb_name,
initializer=self._param_initializer))
emb_out = emb_out + position_emb_out
emb_out = emb_out + sent_emb_out
......@@ -105,23 +105,22 @@ class BertModel(object):
n_head_self_attn_mask = fluid.layers.stack(x=[self_attn_mask] * self._n_head, axis=1)
n_head_self_attn_mask.stop_gradient = True
self._enc_out = encoder(
enc_input=emb_out,
attn_bias=n_head_self_attn_mask,
n_layer=self._n_layer,
n_head=self._n_head,
d_key=self._emb_size // self._n_head,
d_value=self._emb_size // self._n_head,
d_model=self._emb_size,
d_inner_hid=self._emb_size * 4,
prepostprocess_dropout=self._prepostprocess_dropout,
attention_dropout=self._attention_dropout,
relu_dropout=0,
hidden_act=self._hidden_act,
preprocess_cmd="",
postprocess_cmd="dan",
param_initializer=self._param_initializer,
name='encoder')
self._enc_out = encoder(enc_input=emb_out,
attn_bias=n_head_self_attn_mask,
n_layer=self._n_layer,
n_head=self._n_head,
d_key=self._emb_size // self._n_head,
d_value=self._emb_size // self._n_head,
d_model=self._emb_size,
d_inner_hid=self._emb_size * 4,
prepostprocess_dropout=self._prepostprocess_dropout,
attention_dropout=self._attention_dropout,
relu_dropout=0,
hidden_act=self._hidden_act,
preprocess_cmd="",
postprocess_cmd="dan",
param_initializer=self._param_initializer,
name='encoder')
def get_sequence_output(self):
return self._enc_out
......@@ -130,12 +129,12 @@ class BertModel(object):
"""Get the first feature of each sequence for classification"""
next_sent_feat = fluid.layers.slice(input=self._enc_out, axes=[1], starts=[0], ends=[1])
next_sent_feat = fluid.layers.fc(
input=next_sent_feat,
size=self._emb_size,
act="tanh",
param_attr=fluid.ParamAttr(name="pooled_fc.w_0", initializer=self._param_initializer),
bias_attr="pooled_fc.b_0")
next_sent_feat = fluid.layers.fc(input=next_sent_feat,
size=self._emb_size,
act="tanh",
param_attr=fluid.ParamAttr(name="pooled_fc.w_0",
initializer=self._param_initializer),
bias_attr="pooled_fc.b_0")
return next_sent_feat
def get_pretraining_output(self, mask_label, mask_pos, labels):
......@@ -150,43 +149,45 @@ class BertModel(object):
mask_feat = fluid.layers.gather(input=reshaped_emb_out, index=mask_pos)
# transform: fc
mask_trans_feat = fluid.layers.fc(
input=mask_feat,
size=self._emb_size,
act=self._hidden_act,
param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0', initializer=self._param_initializer),
bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
mask_trans_feat = fluid.layers.fc(input=mask_feat,
size=self._emb_size,
act=self._hidden_act,
param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0',
initializer=self._param_initializer),
bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
# transform: layer norm
mask_trans_feat = pre_process_layer(mask_trans_feat, 'n', name='mask_lm_trans')
mask_lm_out_bias_attr = fluid.ParamAttr(
name="mask_lm_out_fc.b_0", initializer=fluid.initializer.Constant(value=0.0))
mask_lm_out_bias_attr = fluid.ParamAttr(name="mask_lm_out_fc.b_0",
initializer=fluid.initializer.Constant(value=0.0))
if self._weight_sharing:
fc_out = fluid.layers.matmul(
x=mask_trans_feat,
y=fluid.default_main_program().global_block().var(self._word_emb_name),
transpose_y=True)
fc_out += fluid.layers.create_parameter(
shape=[self._voc_size], dtype=self._dtype, attr=mask_lm_out_bias_attr, is_bias=True)
fc_out = fluid.layers.matmul(x=mask_trans_feat,
y=fluid.default_main_program().global_block().var(self._word_emb_name),
transpose_y=True)
fc_out += fluid.layers.create_parameter(shape=[self._voc_size],
dtype=self._dtype,
attr=mask_lm_out_bias_attr,
is_bias=True)
else:
fc_out = fluid.layers.fc(
input=mask_trans_feat,
size=self._voc_size,
param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0", initializer=self._param_initializer),
bias_attr=mask_lm_out_bias_attr)
fc_out = fluid.layers.fc(input=mask_trans_feat,
size=self._voc_size,
param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0",
initializer=self._param_initializer),
bias_attr=mask_lm_out_bias_attr)
mask_lm_loss = fluid.layers.softmax_with_cross_entropy(logits=fc_out, label=mask_label)
mean_mask_lm_loss = fluid.layers.mean(mask_lm_loss)
next_sent_fc_out = fluid.layers.fc(
input=next_sent_feat,
size=2,
param_attr=fluid.ParamAttr(name="next_sent_fc.w_0", initializer=self._param_initializer),
bias_attr="next_sent_fc.b_0")
next_sent_fc_out = fluid.layers.fc(input=next_sent_feat,
size=2,
param_attr=fluid.ParamAttr(name="next_sent_fc.w_0",
initializer=self._param_initializer),
bias_attr="next_sent_fc.b_0")
next_sent_loss, next_sent_softmax = fluid.layers.softmax_with_cross_entropy(
logits=next_sent_fc_out, label=labels, return_softmax=True)
next_sent_loss, next_sent_softmax = fluid.layers.softmax_with_cross_entropy(logits=next_sent_fc_out,
label=labels,
return_softmax=True)
next_sent_acc = fluid.layers.accuracy(input=next_sent_softmax, label=labels)
......
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Transformer encoder."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from functools import partial
import paddle.fluid as fluid
import paddle.fluid.layers as layers
def multi_head_attention(queries,
keys,
values,
attn_bias,
d_key,
d_value,
d_model,
n_head=1,
dropout_rate=0.,
cache=None,
param_initializer=None,
name='multi_head_att'):
"""
Multi-Head Attention. Note that attn_bias is added to the logit before
computing softmax activiation to mask certain selected positions so that
they will not considered in attention weights.
"""
keys = queries if keys is None else keys
values = keys if values is None else values
if not (len(queries.shape) == len(keys.shape) == len(values.shape) == 3):
raise ValueError("Inputs: quries, keys and values should all be 3-D tensors.")
def __compute_qkv(queries, keys, values, n_head, d_key, d_value):
"""
Add linear projection to queries, keys, and values.
"""
q = layers.fc(input=queries,
size=d_key * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_query_fc.w_0', initializer=param_initializer),
bias_attr=name + '_query_fc.b_0')
k = layers.fc(input=keys,
size=d_key * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_key_fc.w_0', initializer=param_initializer),
bias_attr=name + '_key_fc.b_0')
v = layers.fc(input=values,
size=d_value * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_value_fc.w_0', initializer=param_initializer),
bias_attr=name + '_value_fc.b_0')
return q, k, v
def __split_heads(x, n_head):
"""
Reshape the last dimension of inpunt tensor x so that it becomes two
dimensions and then transpose. Specifically, input a tensor with shape
[bs, max_sequence_length, n_head * hidden_dim] then output a tensor
with shape [bs, n_head, max_sequence_length, hidden_dim].
"""
hidden_size = x.shape[-1]
# The value 0 in shape attr means copying the corresponding dimension
# size of the input as the output dimension size.
reshaped = layers.reshape(x=x, shape=[0, 0, n_head, hidden_size // n_head], inplace=True)
# permuate the dimensions into:
# [batch_size, n_head, max_sequence_len, hidden_size_per_head]
return layers.transpose(x=reshaped, perm=[0, 2, 1, 3])
def __combine_heads(x):
"""
Transpose and then reshape the last two dimensions of inpunt tensor x
so that it becomes one dimension, which is reverse to __split_heads.
"""
if len(x.shape) == 3: return x
if len(x.shape) != 4:
raise ValueError("Input(x) should be a 4-D Tensor.")
trans_x = layers.transpose(x, perm=[0, 2, 1, 3])
# The value 0 in shape attr means copying the corresponding dimension
# size of the input as the output dimension size.
return layers.reshape(x=trans_x, shape=[0, 0, trans_x.shape[2] * trans_x.shape[3]], inplace=True)
def scaled_dot_product_attention(q, k, v, attn_bias, d_key, dropout_rate):
"""
Scaled Dot-Product Attention
"""
scaled_q = layers.scale(x=q, scale=d_key**-0.5)
product = layers.matmul(x=scaled_q, y=k, transpose_y=True)
if attn_bias:
product += attn_bias
weights = layers.softmax(product)
if dropout_rate:
weights = layers.dropout(weights,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
out = layers.matmul(weights, v)
return out
q, k, v = __compute_qkv(queries, keys, values, n_head, d_key, d_value)
if cache is not None: # use cache and concat time steps
# Since the inplace reshape in __split_heads changes the shape of k and
# v, which is the cache input for next time step, reshape the cache
# input from the previous time step first.
k = cache["k"] = layers.concat([layers.reshape(cache["k"], shape=[0, 0, d_model]), k], axis=1)
v = cache["v"] = layers.concat([layers.reshape(cache["v"], shape=[0, 0, d_model]), v], axis=1)
q = __split_heads(q, n_head)
k = __split_heads(k, n_head)
v = __split_heads(v, n_head)
ctx_multiheads = scaled_dot_product_attention(q, k, v, attn_bias, d_key, dropout_rate)
out = __combine_heads(ctx_multiheads)
# Project back to the model size.
proj_out = layers.fc(input=out,
size=d_model,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_output_fc.w_0', initializer=param_initializer),
bias_attr=name + '_output_fc.b_0')
return proj_out
def positionwise_feed_forward(x, d_inner_hid, d_hid, dropout_rate, hidden_act, param_initializer=None, name='ffn'):
"""
Position-wise Feed-Forward Networks.
This module consists of two linear transformations with a ReLU activation
in between, which is applied to each position separately and identically.
"""
hidden = layers.fc(input=x,
size=d_inner_hid,
num_flatten_dims=2,
act=hidden_act,
param_attr=fluid.ParamAttr(name=name + '_fc_0.w_0', initializer=param_initializer),
bias_attr=name + '_fc_0.b_0')
if dropout_rate:
hidden = layers.dropout(hidden,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
out = layers.fc(input=hidden,
size=d_hid,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_fc_1.w_0', initializer=param_initializer),
bias_attr=name + '_fc_1.b_0')
return out
def pre_post_process_layer(prev_out, out, process_cmd, dropout_rate=0., name=''):
"""
Add residual connection, layer normalization and droput to the out tensor
optionally according to the value of process_cmd.
This will be used before or after multi-head attention and position-wise
feed-forward networks.
"""
for cmd in process_cmd:
if cmd == "a": # add residual connection
out = out + prev_out if prev_out else out
elif cmd == "n": # add layer normalization
out_dtype = out.dtype
if out_dtype == fluid.core.VarDesc.VarType.FP16:
out = layers.cast(x=out, dtype="float32")
out = layers.layer_norm(out,
begin_norm_axis=len(out.shape) - 1,
param_attr=fluid.ParamAttr(name=name + '_layer_norm_scale',
initializer=fluid.initializer.Constant(1.)),
bias_attr=fluid.ParamAttr(name=name + '_layer_norm_bias',
initializer=fluid.initializer.Constant(0.)))
if out_dtype == fluid.core.VarDesc.VarType.FP16:
out = layers.cast(x=out, dtype="float16")
elif cmd == "d": # add dropout
if dropout_rate:
out = layers.dropout(out,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
return out
pre_process_layer = partial(pre_post_process_layer, None)
post_process_layer = pre_post_process_layer
def encoder_layer(enc_input,
attn_bias,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd="n",
postprocess_cmd="da",
param_initializer=None,
name=''):
"""The encoder layers that can be stacked to form a deep encoder.
This module consits of a multi-head (self) attention followed by
position-wise feed-forward networks and both the two components companied
with the post_process_layer to add residual connection, layer normalization
and droput.
"""
attn_output = multi_head_attention(pre_process_layer(enc_input,
preprocess_cmd,
prepostprocess_dropout,
name=name + '_pre_att'),
None,
None,
attn_bias,
d_key,
d_value,
d_model,
n_head,
attention_dropout,
param_initializer=param_initializer,
name=name + '_multi_head_att')
attn_output = post_process_layer(enc_input,
attn_output,
postprocess_cmd,
prepostprocess_dropout,
name=name + '_post_att')
ffd_output = positionwise_feed_forward(pre_process_layer(attn_output,
preprocess_cmd,
prepostprocess_dropout,
name=name + '_pre_ffn'),
d_inner_hid,
d_model,
relu_dropout,
hidden_act,
param_initializer=param_initializer,
name=name + '_ffn')
return post_process_layer(attn_output, ffd_output, postprocess_cmd, prepostprocess_dropout, name=name + '_post_ffn')
def encoder(enc_input,
attn_bias,
n_layer,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd="n",
postprocess_cmd="da",
param_initializer=None,
name=''):
"""
The encoder is composed of a stack of identical layers returned by calling
encoder_layer.
"""
for i in range(n_layer):
enc_output = encoder_layer(enc_input,
attn_bias,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd,
postprocess_cmd,
param_initializer=param_initializer,
name=name + '_layer_' + str(i))
enc_input = enc_output
enc_output = pre_process_layer(enc_output, preprocess_cmd, prepostprocess_dropout, name="post_encoder")
return enc_output
......@@ -58,13 +58,12 @@ class Bert(TransformerModule):
pooled_output (tensor): sentence-level output for classification task.
sequence_output (tensor): token-level output for sequence task.
"""
bert = BertModel(
src_ids=input_ids,
position_ids=position_ids,
sentence_ids=segment_ids,
input_mask=input_mask,
config=self.bert_config,
use_fp16=False)
bert = BertModel(src_ids=input_ids,
position_ids=position_ids,
sentence_ids=segment_ids,
input_mask=input_mask,
config=self.bert_config,
use_fp16=False)
pooled_output = bert.get_pooled_output()
sequence_output = bert.get_sequence_output()
return pooled_output, sequence_output
......
......@@ -74,23 +74,23 @@ class BertModel(object):
def _build_model(self, src_ids, position_ids, sentence_ids, input_mask):
# padding id in vocabulary must be set to 0
emb_out = fluid.layers.embedding(
input=src_ids,
size=[self._voc_size, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._word_emb_name, initializer=self._param_initializer),
is_sparse=False)
position_emb_out = fluid.layers.embedding(
input=position_ids,
size=[self._max_position_seq_len, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._pos_emb_name, initializer=self._param_initializer))
sent_emb_out = fluid.layers.embedding(
sentence_ids,
size=[self._sent_types, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._sent_emb_name, initializer=self._param_initializer))
emb_out = fluid.layers.embedding(input=src_ids,
size=[self._voc_size, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._word_emb_name,
initializer=self._param_initializer),
is_sparse=False)
position_emb_out = fluid.layers.embedding(input=position_ids,
size=[self._max_position_seq_len, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._pos_emb_name,
initializer=self._param_initializer))
sent_emb_out = fluid.layers.embedding(sentence_ids,
size=[self._sent_types, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(name=self._sent_emb_name,
initializer=self._param_initializer))
emb_out = emb_out + position_emb_out
emb_out = emb_out + sent_emb_out
......@@ -105,23 +105,22 @@ class BertModel(object):
n_head_self_attn_mask = fluid.layers.stack(x=[self_attn_mask] * self._n_head, axis=1)
n_head_self_attn_mask.stop_gradient = True
self._enc_out = encoder(
enc_input=emb_out,
attn_bias=n_head_self_attn_mask,
n_layer=self._n_layer,
n_head=self._n_head,
d_key=self._emb_size // self._n_head,
d_value=self._emb_size // self._n_head,
d_model=self._emb_size,
d_inner_hid=self._emb_size * 4,
prepostprocess_dropout=self._prepostprocess_dropout,
attention_dropout=self._attention_dropout,
relu_dropout=0,
hidden_act=self._hidden_act,
preprocess_cmd="",
postprocess_cmd="dan",
param_initializer=self._param_initializer,
name='encoder')
self._enc_out = encoder(enc_input=emb_out,
attn_bias=n_head_self_attn_mask,
n_layer=self._n_layer,
n_head=self._n_head,
d_key=self._emb_size // self._n_head,
d_value=self._emb_size // self._n_head,
d_model=self._emb_size,
d_inner_hid=self._emb_size * 4,
prepostprocess_dropout=self._prepostprocess_dropout,
attention_dropout=self._attention_dropout,
relu_dropout=0,
hidden_act=self._hidden_act,
preprocess_cmd="",
postprocess_cmd="dan",
param_initializer=self._param_initializer,
name='encoder')
def get_sequence_output(self):
return self._enc_out
......@@ -130,12 +129,12 @@ class BertModel(object):
"""Get the first feature of each sequence for classification"""
next_sent_feat = fluid.layers.slice(input=self._enc_out, axes=[1], starts=[0], ends=[1])
next_sent_feat = fluid.layers.fc(
input=next_sent_feat,
size=self._emb_size,
act="tanh",
param_attr=fluid.ParamAttr(name="pooled_fc.w_0", initializer=self._param_initializer),
bias_attr="pooled_fc.b_0")
next_sent_feat = fluid.layers.fc(input=next_sent_feat,
size=self._emb_size,
act="tanh",
param_attr=fluid.ParamAttr(name="pooled_fc.w_0",
initializer=self._param_initializer),
bias_attr="pooled_fc.b_0")
return next_sent_feat
def get_pretraining_output(self, mask_label, mask_pos, labels):
......@@ -150,43 +149,45 @@ class BertModel(object):
mask_feat = fluid.layers.gather(input=reshaped_emb_out, index=mask_pos)
# transform: fc
mask_trans_feat = fluid.layers.fc(
input=mask_feat,
size=self._emb_size,
act=self._hidden_act,
param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0', initializer=self._param_initializer),
bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
mask_trans_feat = fluid.layers.fc(input=mask_feat,
size=self._emb_size,
act=self._hidden_act,
param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0',
initializer=self._param_initializer),
bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
# transform: layer norm
mask_trans_feat = pre_process_layer(mask_trans_feat, 'n', name='mask_lm_trans')
mask_lm_out_bias_attr = fluid.ParamAttr(
name="mask_lm_out_fc.b_0", initializer=fluid.initializer.Constant(value=0.0))
mask_lm_out_bias_attr = fluid.ParamAttr(name="mask_lm_out_fc.b_0",
initializer=fluid.initializer.Constant(value=0.0))
if self._weight_sharing:
fc_out = fluid.layers.matmul(
x=mask_trans_feat,
y=fluid.default_main_program().global_block().var(self._word_emb_name),
transpose_y=True)
fc_out += fluid.layers.create_parameter(
shape=[self._voc_size], dtype=self._dtype, attr=mask_lm_out_bias_attr, is_bias=True)
fc_out = fluid.layers.matmul(x=mask_trans_feat,
y=fluid.default_main_program().global_block().var(self._word_emb_name),
transpose_y=True)
fc_out += fluid.layers.create_parameter(shape=[self._voc_size],
dtype=self._dtype,
attr=mask_lm_out_bias_attr,
is_bias=True)
else:
fc_out = fluid.layers.fc(
input=mask_trans_feat,
size=self._voc_size,
param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0", initializer=self._param_initializer),
bias_attr=mask_lm_out_bias_attr)
fc_out = fluid.layers.fc(input=mask_trans_feat,
size=self._voc_size,
param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0",
initializer=self._param_initializer),
bias_attr=mask_lm_out_bias_attr)
mask_lm_loss = fluid.layers.softmax_with_cross_entropy(logits=fc_out, label=mask_label)
mean_mask_lm_loss = fluid.layers.mean(mask_lm_loss)
next_sent_fc_out = fluid.layers.fc(
input=next_sent_feat,
size=2,
param_attr=fluid.ParamAttr(name="next_sent_fc.w_0", initializer=self._param_initializer),
bias_attr="next_sent_fc.b_0")
next_sent_fc_out = fluid.layers.fc(input=next_sent_feat,
size=2,
param_attr=fluid.ParamAttr(name="next_sent_fc.w_0",
initializer=self._param_initializer),
bias_attr="next_sent_fc.b_0")
next_sent_loss, next_sent_softmax = fluid.layers.softmax_with_cross_entropy(
logits=next_sent_fc_out, label=labels, return_softmax=True)
next_sent_loss, next_sent_softmax = fluid.layers.softmax_with_cross_entropy(logits=next_sent_fc_out,
label=labels,
return_softmax=True)
next_sent_acc = fluid.layers.accuracy(input=next_sent_softmax, label=labels)
......
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Transformer encoder."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from functools import partial
import paddle.fluid as fluid
import paddle.fluid.layers as layers
def multi_head_attention(queries,
keys,
values,
attn_bias,
d_key,
d_value,
d_model,
n_head=1,
dropout_rate=0.,
cache=None,
param_initializer=None,
name='multi_head_att'):
"""
Multi-Head Attention. Note that attn_bias is added to the logit before
computing softmax activiation to mask certain selected positions so that
they will not considered in attention weights.
"""
keys = queries if keys is None else keys
values = keys if values is None else values
if not (len(queries.shape) == len(keys.shape) == len(values.shape) == 3):
raise ValueError("Inputs: quries, keys and values should all be 3-D tensors.")
def __compute_qkv(queries, keys, values, n_head, d_key, d_value):
"""
Add linear projection to queries, keys, and values.
"""
q = layers.fc(input=queries,
size=d_key * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_query_fc.w_0', initializer=param_initializer),
bias_attr=name + '_query_fc.b_0')
k = layers.fc(input=keys,
size=d_key * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_key_fc.w_0', initializer=param_initializer),
bias_attr=name + '_key_fc.b_0')
v = layers.fc(input=values,
size=d_value * n_head,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_value_fc.w_0', initializer=param_initializer),
bias_attr=name + '_value_fc.b_0')
return q, k, v
def __split_heads(x, n_head):
"""
Reshape the last dimension of inpunt tensor x so that it becomes two
dimensions and then transpose. Specifically, input a tensor with shape
[bs, max_sequence_length, n_head * hidden_dim] then output a tensor
with shape [bs, n_head, max_sequence_length, hidden_dim].
"""
hidden_size = x.shape[-1]
# The value 0 in shape attr means copying the corresponding dimension
# size of the input as the output dimension size.
reshaped = layers.reshape(x=x, shape=[0, 0, n_head, hidden_size // n_head], inplace=True)
# permuate the dimensions into:
# [batch_size, n_head, max_sequence_len, hidden_size_per_head]
return layers.transpose(x=reshaped, perm=[0, 2, 1, 3])
def __combine_heads(x):
"""
Transpose and then reshape the last two dimensions of inpunt tensor x
so that it becomes one dimension, which is reverse to __split_heads.
"""
if len(x.shape) == 3: return x
if len(x.shape) != 4:
raise ValueError("Input(x) should be a 4-D Tensor.")
trans_x = layers.transpose(x, perm=[0, 2, 1, 3])
# The value 0 in shape attr means copying the corresponding dimension
# size of the input as the output dimension size.
return layers.reshape(x=trans_x, shape=[0, 0, trans_x.shape[2] * trans_x.shape[3]], inplace=True)
def scaled_dot_product_attention(q, k, v, attn_bias, d_key, dropout_rate):
"""
Scaled Dot-Product Attention
"""
scaled_q = layers.scale(x=q, scale=d_key**-0.5)
product = layers.matmul(x=scaled_q, y=k, transpose_y=True)
if attn_bias:
product += attn_bias
weights = layers.softmax(product)
if dropout_rate:
weights = layers.dropout(weights,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
out = layers.matmul(weights, v)
return out
q, k, v = __compute_qkv(queries, keys, values, n_head, d_key, d_value)
if cache is not None: # use cache and concat time steps
# Since the inplace reshape in __split_heads changes the shape of k and
# v, which is the cache input for next time step, reshape the cache
# input from the previous time step first.
k = cache["k"] = layers.concat([layers.reshape(cache["k"], shape=[0, 0, d_model]), k], axis=1)
v = cache["v"] = layers.concat([layers.reshape(cache["v"], shape=[0, 0, d_model]), v], axis=1)
q = __split_heads(q, n_head)
k = __split_heads(k, n_head)
v = __split_heads(v, n_head)
ctx_multiheads = scaled_dot_product_attention(q, k, v, attn_bias, d_key, dropout_rate)
out = __combine_heads(ctx_multiheads)
# Project back to the model size.
proj_out = layers.fc(input=out,
size=d_model,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_output_fc.w_0', initializer=param_initializer),
bias_attr=name + '_output_fc.b_0')
return proj_out
def positionwise_feed_forward(x, d_inner_hid, d_hid, dropout_rate, hidden_act, param_initializer=None, name='ffn'):
"""
Position-wise Feed-Forward Networks.
This module consists of two linear transformations with a ReLU activation
in between, which is applied to each position separately and identically.
"""
hidden = layers.fc(input=x,
size=d_inner_hid,
num_flatten_dims=2,
act=hidden_act,
param_attr=fluid.ParamAttr(name=name + '_fc_0.w_0', initializer=param_initializer),
bias_attr=name + '_fc_0.b_0')
if dropout_rate:
hidden = layers.dropout(hidden,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
out = layers.fc(input=hidden,
size=d_hid,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_fc_1.w_0', initializer=param_initializer),
bias_attr=name + '_fc_1.b_0')
return out
def pre_post_process_layer(prev_out, out, process_cmd, dropout_rate=0., name=''):
"""
Add residual connection, layer normalization and droput to the out tensor
optionally according to the value of process_cmd.
This will be used before or after multi-head attention and position-wise
feed-forward networks.
"""
for cmd in process_cmd:
if cmd == "a": # add residual connection
out = out + prev_out if prev_out else out
elif cmd == "n": # add layer normalization
out_dtype = out.dtype
if out_dtype == fluid.core.VarDesc.VarType.FP16:
out = layers.cast(x=out, dtype="float32")
out = layers.layer_norm(out,
begin_norm_axis=len(out.shape) - 1,
param_attr=fluid.ParamAttr(name=name + '_layer_norm_scale',
initializer=fluid.initializer.Constant(1.)),
bias_attr=fluid.ParamAttr(name=name + '_layer_norm_bias',
initializer=fluid.initializer.Constant(0.)))
if out_dtype == fluid.core.VarDesc.VarType.FP16:
out = layers.cast(x=out, dtype="float16")
elif cmd == "d": # add dropout
if dropout_rate:
out = layers.dropout(out,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
return out
pre_process_layer = partial(pre_post_process_layer, None)
post_process_layer = pre_post_process_layer
def encoder_layer(enc_input,
attn_bias,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd="n",
postprocess_cmd="da",
param_initializer=None,
name=''):
"""The encoder layers that can be stacked to form a deep encoder.
This module consits of a multi-head (self) attention followed by
position-wise feed-forward networks and both the two components companied
with the post_process_layer to add residual connection, layer normalization
and droput.
"""
attn_output = multi_head_attention(pre_process_layer(enc_input,
preprocess_cmd,
prepostprocess_dropout,
name=name + '_pre_att'),
None,
None,
attn_bias,
d_key,
d_value,
d_model,
n_head,
attention_dropout,
param_initializer=param_initializer,
name=name + '_multi_head_att')
attn_output = post_process_layer(enc_input,
attn_output,
postprocess_cmd,
prepostprocess_dropout,
name=name + '_post_att')
ffd_output = positionwise_feed_forward(pre_process_layer(attn_output,
preprocess_cmd,
prepostprocess_dropout,
name=name + '_pre_ffn'),
d_inner_hid,
d_model,
relu_dropout,
hidden_act,
param_initializer=param_initializer,
name=name + '_ffn')
return post_process_layer(attn_output, ffd_output, postprocess_cmd, prepostprocess_dropout, name=name + '_post_ffn')
def encoder(enc_input,
attn_bias,
n_layer,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd="n",
postprocess_cmd="da",
param_initializer=None,
name=''):
"""
The encoder is composed of a stack of identical layers returned by calling
encoder_layer.
"""
for i in range(n_layer):
enc_output = encoder_layer(enc_input,
attn_bias,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd,
postprocess_cmd,
param_initializer=param_initializer,
name=name + '_layer_' + str(i))
enc_input = enc_output
enc_output = pre_process_layer(enc_output, preprocess_cmd, prepostprocess_dropout, name="post_encoder")
return enc_output
......@@ -58,13 +58,12 @@ class BertWwm(TransformerModule):
pooled_output (tensor): sentence-level output for classification task.
sequence_output (tensor): token-level output for sequence task.
"""
bert = BertModel(
src_ids=input_ids,
position_ids=position_ids,
sentence_ids=segment_ids,
input_mask=input_mask,
config=self.bert_config,
use_fp16=False)
bert = BertModel(src_ids=input_ids,
position_ids=position_ids,
sentence_ids=segment_ids,
input_mask=input_mask,
config=self.bert_config,
use_fp16=False)
pooled_output = bert.get_pooled_output()
sequence_output = bert.get_sequence_output()
return pooled_output, sequence_output
......
......@@ -58,13 +58,12 @@ class BertWwm(TransformerModule):
pooled_output (tensor): sentence-level output for classification task.
sequence_output (tensor): token-level output for sequence task.
"""
bert = BertModel(
src_ids=input_ids,
position_ids=position_ids,
sentence_ids=segment_ids,
input_mask=input_mask,
config=self.bert_config,
use_fp16=False)
bert = BertModel(src_ids=input_ids,
position_ids=position_ids,
sentence_ids=segment_ids,
input_mask=input_mask,
config=self.bert_config,
use_fp16=False)
pooled_output = bert.get_pooled_output()
sequence_output = bert.get_sequence_output()
return pooled_output, sequence_output
......
......@@ -58,13 +58,12 @@ class Electra(TransformerModule):
pooled_output (tensor): sentence-level output for classification task.
sequence_output (tensor): token-level output for sequence task.
"""
electra = ElectraModel(
src_ids=input_ids,
position_ids=position_ids,
sentence_ids=segment_ids,
input_mask=input_mask,
config=self.electra_config,
use_fp16=False)
electra = ElectraModel(src_ids=input_ids,
position_ids=position_ids,
sentence_ids=segment_ids,
input_mask=input_mask,
config=self.electra_config,
use_fp16=False)
pooled_output = electra.get_pooled_output()
sequence_output = electra.get_sequence_output()
return pooled_output, sequence_output
......
......@@ -58,13 +58,12 @@ class Electra(TransformerModule):
pooled_output (tensor): sentence-level output for classification task.
sequence_output (tensor): token-level output for sequence task.
"""
electra = ElectraModel(
src_ids=input_ids,
position_ids=position_ids,
sentence_ids=segment_ids,
input_mask=input_mask,
config=self.electra_config,
use_fp16=False)
electra = ElectraModel(src_ids=input_ids,
position_ids=position_ids,
sentence_ids=segment_ids,
input_mask=input_mask,
config=self.electra_config,
use_fp16=False)
pooled_output = electra.get_pooled_output()
sequence_output = electra.get_sequence_output()
return pooled_output, sequence_output
......
......@@ -58,13 +58,12 @@ class BertWwm(TransformerModule):
pooled_output (tensor): sentence-level output for classification task.
sequence_output (tensor): token-level output for sequence task.
"""
bert = BertModel(
src_ids=input_ids,
position_ids=position_ids,
sentence_ids=segment_ids,
input_mask=input_mask,
config=self.bert_config,
use_fp16=False)
bert = BertModel(src_ids=input_ids,
position_ids=position_ids,
sentence_ids=segment_ids,
input_mask=input_mask,
config=self.bert_config,
use_fp16=False)
pooled_output = bert.get_pooled_output()
sequence_output = bert.get_sequence_output()
return pooled_output, sequence_output
......
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册