Reorganize README

update_readme_1118

Reorganize README
update_readme_1118
ce2b33f2 · Zeyu Chen · GitHub · 24788265 · 180e461a · ce2b33f2
282 changed file
--- a/README.md
+++ b/README.md
-<p align="center">
- <img src="./docs/imgs/paddlehub_logo.jpg" align="middle"  
-</p>
-[![Build Status](https://travis-ci.org/PaddlePaddle/PaddleHub.svg?branch=release/v1.8)](https://travis-ci.org/PaddlePaddle/PaddleHub)
 [![License](https://img.shields.io/badge/license-Apache%202-red.svg)](LICENSE)
 [![Version](https://img.shields.io/github/release/PaddlePaddle/PaddleHub.svg)](https://github.com/PaddlePaddle/PaddleHub/releases)
 ![python version](https://img.shields.io/badge/python-3.6+-orange.svg)
 ![support os](https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-yellow.svg)
 ## 简介
-PaddleHub是飞桨生态的预训练模型应用工具，开发者可以便捷地使用高质量的预训练模型结合Fine-tune API快速完成模型迁移到部署的全流程工作。PaddleHub提供的预训练模型涵盖了图像分类、目标检测、词法分析、语义模型、情感分析、视频分类、图像生成、图像分割、文本审核、关键点检测等主流模型。更多详情可查看官网：https://www.paddlepaddle.org.cn/hub
+- PaddleHub旨在为开发者提供丰富的、高质量的、直接可用的预训练模型，**【无需深度学习背景、无需数据与训练过程】**，也可快速使用AI模型。
+- 涵盖CV、NLP、Audio、Video主流四大品类，支持**一键预测**、**一键服务化部署**和**快速迁移学习**
-## 特性
+- 全部模型开源下载，**离线可运行**。
- **模型即软件**：通过Python API或命令行实现模型调用，可快速体验或集成飞桨特色预训练模型。[-> 效果展示](#模型即软件)
- **易用的迁移学习**：通过Fine-tune API，只需少量代码即可完成预训练模型的Fine-tuning。[-> 效果展示](#易用的迁移学习)
- **一键模型转服务**：简单一行命令即可搭建属于自己的深度学习模型API服务完成部署。[-> 效果展示](#一键模型转服务)
-## 文档教程 [[readthedocs]](https://paddlehub.readthedocs.io/zh_CN/develop/index.html)
- [概述](./docs/overview.md)
- [PIP安装](./docs/installation.md)
- [快速体验](./docs/quickstart.md)
- [丰富的预训练模型](./docs/pretrained_models.md)
-    - [飞桨优势特色模型](./docs/pretrained_models.md)
-    - [计算机视觉](./docs/pretrained_models.md)
-      - [图像分类](./docs/pretrained_models.md)
-      - [目标检测](./docs/pretrained_models.md)
-      - [图像分割](./docs/pretrained_models.md)
-      - [关键点检测](./docs/pretrained_models.md)
-      - [图像生成](./docs/pretrained_models.md)
-    - [自然语言处理](./docs/pretrained_models.md)
-      - [中文词法分析与词向量](./docs/pretrained_models.md)
-      - [情感分析](./docs/pretrained_models.md)
-      - [文本相似度计算](./docs/pretrained_models.md)
-      - [文本生成](./docs/pretrained_models.md)
-      - [语义表示](./docs/pretrained_models.md)
-    - [视频](./docs/pretrained_models.md)
- 使用教程
-    - [命令行工具](./docs/tutorial/cmdintro.md)
-    - [自定义数据](./docs/tutorial/how_to_load_data.md)
-    - [服务化部署](./docs/tutorial/serving.md)
- 进阶指南
-    - [文本Embedding服务](./docs/tutorial/bert_service.md)
-    - [语义相似度计算](./docs/tutorial/sentence_sim.md)
- API
-    - [hub.datasets](./docs/reference/datasets.md)
-    - [hub.finetune](./docs/reference/finetune.md)
-    - [hub.Module](./docs/reference/module.md)
-    - [hub.vision.transforms](./docs/reference/vision.md)
- [FAQ](./docs/faq.md)  
- 社区交流
-    - [加入技术交流群](#欢迎加入PaddleHub技术交流群)
-    - [贡献预训练模型](./docs/contribution/contri_pretrained_model.md)
-    - [贡献代码](./docs/contribution/contri_pr.md)
- [更新历史](./docs/release.md)
- [许可证书](#许可证书)
- [致谢](#致谢)
-## 效果展示
-<a name="模型即软件"></a>
+## 近期更新
-### 1、模型即软件
+- **2020.11.20**，发布2.0-beta版本，全面迁移动态图编程模式，服务化部署Serving能力升级；新增手部关键点检测1个、图像动漫化类12个、图片编辑类3个，语音合成类3个，句法分析1个，预训练模型总量到达 **【182】** 个。
+- **2020.10.09**，新增OCR多语言系列模型4个，图像编辑模型4个，预训练模型总量到达 **【162】** 个。
+- **2020.09.27**，新增文本生成模型6个，图像分割模型1个，预训练模型总量到达 **【154】** 个。
+- **2020.08.13**，发布v1.8.1，新增人像分割模型Humanseg，支持EMNLP2019-Sentence-BERT作为文本匹配任务网络，预训练模型总量到达 **【147】** 个。
+- **2020.07.29**，发布v1.8.0，新增AI对联和AI写诗、jieba切词，文本数据LDA、语义相似度计算，新增目标检测，短视频分类模型，超轻量中英文OCR，新增行人检测、车辆检测、动物识别等工业级模型，支持VisualDL可视化训练，预训练模型总量到达 **【135】** 个。
+- [More](./docs/release.md)
-PaddleHub采用模型即软件的设计理念，所有的预训练模型与Python软件包类似，具备版本的概念，通过`hub install/uninstall` 可以便捷完成模型的升级和卸载。还可以通过Python的API或命令行实现快速预测的软件集成，更方便地应用和集成深度学习模型。
-安装PaddleHub后，执行命令[hub run](./docs/tutorial/cmdintro.md)，即可快速体验无需代码、一键预测的功能：
+## [特性](./docs/figures.md)
+- **【丰富的预训练模型】**：涵盖CV、NLP、Audio、Video主流四大品类的 180+ 预训练模型，全部开源下载，离线可运行。
+- **【一键模型快速预测】**：通过一行命令行或者极简的Python API实现模型调用，可快速体验模型效果。
+- **【一键模型转服务化】**：一行命令，搭建深度学习模型API服务化部署能力。
+- **【十行代码迁移学习】**：十行代码完成图片分类、文本分类的迁移学习任务
+- **【PIP安装便捷】**：支持PIP快速安装使用
+- **【跨平台兼容性】**：可运行于Linux、Windows、MacOS等多种操作系统
-* 使用[文字识别](https://www.paddlepaddle.org.cn/hublist?filter=en_category&value=TextRecognition)轻量级中文OCR模型chinese_ocr_db_crnn_mobile即可一键快速识别图片中的文字。
-```shell
-$ wget https://paddlehub.bj.bcebos.com/model/image/ocr/test_ocr.jpg
-$ hub run chinese_ocr_db_crnn_mobile --input_path test_ocr.jpg --visualization=True
-```
-预测结果图片保存在当前运行路径下ocr_result文件夹中，如下图所示。
+## 精品模型效果展示
+### 文本识别
-<p align="center">
+- 包含超轻量中英文OCR模型，高精度中英文、多语种德语、法语、日语、韩语OCR识别。
- <img src="./docs/imgs/ocr_res.jpg" width='70%' align="middle"  
+<div align="center">
-</p>
+<img src="./docs/imgs/Readme_Related/Image_Ocr.gif"  width = "800" height = "400" />
+</div>
-* 使用[目标检测](https://www.paddlepaddle.org.cn/hublist?filter=en_category&value=ObjectDetection)模型pyramidbox_lite_mobile_mask对图片进行口罩检测
-```shell
-$ wget https://paddlehub.bj.bcebos.com/resources/test_mask_detection.jpg
-$ hub run pyramidbox_lite_mobile_mask --input_path test_mask_detection.jpg
-```
-<p align="center">
- <img src="./docs/imgs/test_mask_detection_result.jpg" align="middle"  
-</p>
-* 使用[词法分析](https://www.paddlepaddle.org.cn/hublist?filter=en_category&value=LexicalAnalysis)模型LAC进行分词
-```shell
-$ hub run lac --input_text "现在，慕尼黑再保险公司不仅是此类行动的倡议者，更是将其大量气候数据整合进保险产品中，并与公众共享大量天气信息，参与到新能源领域的保障中。"
-[{
-    'word': ['现在', '，', '慕尼黑再保险公司', '不仅', '是', '此类', '行动', '的', '倡议者', '，', '更是', '将', '其', '大量', '气候', '数据', '整合', '进', '保险', '产品', '中', '，', '并', '与', '公众', '共享', '大量', '天气', '信息', '，', '参与', '到', '新能源', '领域', '的', '保障', '中', '。'],
-    'tag':  ['TIME', 'w', 'ORG', 'c', 'v', 'r', 'n', 'u', 'n', 'w', 'd', 'p', 'r', 'a', 'n', 'n', 'v', 'v', 'n', 'n', 'f', 'w', 'c', 'p', 'n', 'v', 'a', 'n', 'n', 'w', 'v', 'v', 'n', 'n', 'u', 'vn', 'f', 'w']
-}]
-```
-PaddleHub还提供图像分类、语义模型、视频分类、图像生成、图像分割、文本审核、关键点检测等主流模型，更多模型介绍，请前往[预训练模型介绍](./docs/pretrained_models.md)或者PaddleHub官网[https://www.paddlepaddle.org.cn/hub](https://www.paddlepaddle.org.cn/hub) 查看
-<a name="易用的迁移学习"></a>
-### 2、易用的迁移学习
-通过Fine-tune API，只需要少量代码即可完成深度学习模型在计算机视觉场景下的迁移学习。
-* [Demo示例](./demo)提供丰富的Fine-tune API的使用代码，包括[图像分类](./demo/image_classification)、[图像着色](./demo/colorization)、[风格迁移](./demo/style_transfer)、等场景的模型迁移示例。
-<p align="center">
- <img src="./docs/imgs/paddlehub_finetune.gif" align="middle"  
-</p>
-<p align='center'>
- 十行代码完成图像风格迁移
-</p>
-* 如需在线快速体验，请点击[PaddleHub教程合集](https://aistudio.baidu.com/aistudio/projectdetail/231146)，可使用AI Studio平台提供的GPU算力进行快速尝试。
-<a name="一键模型转服务"></a>
+### 人脸检测
-### 3、一键模型转服务
+- 包含人脸检测，口罩人脸检测，多种算法可选。
+<div align="center">
+<img src="./docs/imgs/Readme_Related/Image_ObjectDetection_Face_Mask.gif"  width = "588" height = "400" />
+</div>
-PaddleHub提供便捷的模型转服务的能力，只需简单一行命令即可完成模型的HTTP服务部署。通过以下命令即可快速启动LAC词法分析服务：
+### 图像编辑
+- 4倍超分效果，多种超分算法可选。
+- 黑白图片上色，可用于老旧照片修复，
+<div align="center">
+<table>
+    <thead>
+    </thead>
+    <tbody>
+        <tr>
+            <th>图像超分辨率 </th>
+            <th>黑白图片上色 </th>
+        </tr>
+        <tr>
+            <th>
+            <a>
+            <img src="./docs/imgs/Readme_Related/ImageEdit_SuperResolution.gif"  width = "266" height = "400" /></a><br>
+            </th>
+            <th>
+            <a>
+            <img src="./docs/imgs/Readme_Related/ImageEdit_Restoration.gif"  width = "300" height = "400" /></a><br>
+            </th>
+        </tr>
+    </tbody>
+</table>
+</div>
+### 目标检测
+- 包含行人检测、车辆检测，更有工业级超大规模预训练模型可选。
+<div align="center">
+<img src="./docs/imgs/Readme_Related/Image_ObjectDetection_Pedestrian_Vehicle.gif"  width = "642" height = "400" />
+</div>
-```shell
+### 关键点检测
-$ hub serving start --modules lac
+- 包含单人、多人身体关键点检测、面部关键点检测、手部关键点检测。
-```
+<div align="center">
+<img src="./docs/imgs/Readme_Related/Image_keypoint.gif"  width = "458" height = "400" />
+</div>
-更多关于模型服务化使用说明参见[PaddleHub模型一键服务化部署](./docs/tutorial/serving.md)。
+### 图像分割
+- 包含效果卓越的人像抠图模型、ACE2P人体解析世界冠军模型
+<div align="center">
+<img src="./docs/imgs/Readme_Related/ImageSeg_Human.gif"  width = "642" height = "400" />
+</div>
-## FAQ
+### 图像动漫化
+- 包含宫崎骏、新海诚在内的多位漫画家风格迁移，多种算法可选
+<div align="center">
+<img src="./docs/imgs/Readme_Related/ImageGan_Anime.gif"  width = "532" height = "400" />
+</div>
-**Q:** 利用PaddleHub Fine-tune如何适配自定义数据集？
+### 图像分类
+- 包含动物分类、菜品分类、野生动物制品分类，多种算法可选
+<div align="center">
+<img src="./docs/imgs/Readme_Related/ImageClas_animal_dish_wild.gif"  width = "530" height = "400" />
+</div>
-**A:** 参考[PaddleHub使用自定义数据集完成Fine-tune](./docs/tutorial/how_to_load_data.md)
+### 词法分析
+- 效果优秀的中文分词、词性标注与命名实体识别的模型。
+<div align="center">
+<img src="./docs/imgs/Readme_Related/Text_Lexical Analysis.png"  width = "640" height = "233" />
+</div>
+### 文本生成
+- 包含AI写诗、AI对联、AI情话、AI藏头诗，多种算法可选。
+<div align="center">
+<img src="./docs/imgs/Readme_Related/Text_Textgen_poetry.gif"  width = "850" height = "400" />
+</div>
-**Q:** 使用PaddleHub时，无法下载预置数据集、Module的等现象。
+### 句法分析
+- 效果领先的中文句法分析模型。
+<div align="center">
+<img src="./docs/imgs/Readme_Related/Text_SyntacticAnalysis.png"  width = "640" height = "301" />
+</div>
-**A:** 下载数据集、module等，PaddleHub要求机器可以访问外网。可以使用server_check()可以检查本地与远端PaddleHub-Server的连接状态，使用方法如下：
+### 情感分析
+- 支持中文的评论情感分析
+<div align="center">
+<img src="./docs/imgs/Readme_Related/Text_SentimentAnalysis.png"  width = "640" height = "228" />
+</div>
-```python
+### 文本审核
-import paddlehub
+- 包含中文色情文本的审核，多种算法可选。
-paddlehub.server_check()
+<div align="center">
-# 如果可以连接远端PaddleHub-Server，则显示Request Hub-Server successfully。
+<img src="./docs/imgs/Readme_Related/Text_Textreview.png"  width = "640" height = "140" />
-# 如果无法连接远端PaddleHub-Server，则显示Request Hub-Server unsuccessfully。
+</div>
-```
-**[More](./docs/faq.md)**
+### 语音合成
+- TTS语音合成算法，多种算法可选
+- 输入：`Life was like a box of chocolates, you never know what you're gonna get.`
+- 合成效果如下:
+<div align="center">
+<table>
+    <thead>
+    </thead>
+    <tbody>
+        <tr>
+            <th>deepvoice3 </th>
+            <th>fastspeech </th>
+            <th>transformer</th>
+        </tr>
+        <tr>
+            <th>
+            <a href="https://paddlehub.bj.bcebos.com/resources/deepvoice3_ljspeech-0.wav">
+            <img src="./docs/imgs/Readme_Related/audio_icon.png" width=250 /></a><br>
+            </th>
+            <th>
+            <a href="https://paddlehub.bj.bcebos.com/resources/fastspeech_ljspeech-0.wav">
+            <img src="./docs/imgs/Readme_Related/audio_icon.png" width=250 /></a><br>
+            </th>
+            <th>
+            <a href="https://paddlehub.bj.bcebos.com/resources/transformer_tts_ljspeech-0.wav">
+            <img src="./docs/imgs/Readme_Related/audio_icon.png" width=250 /></a><br>
+            </th>
+        </tr>
+    </tbody>
+</table>
+</div>
+### 视频分类
+- 包含短视频分类，支持3000+标签种类，可输出TOP-K标签，多种算法可选。
+- `举例：输入一段游泳的短视频，算法可以输出"游泳"结果`
+<div align="center">
+<img src="./docs/imgs/Readme_Related/Text_Video.gif"  width = "400" height = "400" />
+</div>
-当您安装或者使用遇到问题时，如果在FAQ中没有找到解决方案，欢迎您将问题以[Github Issues](https://github.com/PaddlePaddle/PaddleHub/issues)的形式提交给我们，我们会第一时间进行跟进。
+##  ===划重点===
+- 以上所有预训练模型全部开源，模型数量持续更新，欢迎Star关注。
+<div align="center">
+<a href="https://github.com/PaddlePaddle/PaddleHub/stargazers">
+            <img src="./docs/imgs/Readme_Related/star.png"  width = "411" height = "100" /></a>  
+</div>
 <a name="欢迎加入PaddleHub技术交流群"></a>
-## 微信扫描二维码，欢迎加入PaddleHub技术交流群
+## 欢迎加入PaddleHub技术交流群
+- 在使用模型过程中有任何问题，可以加入官方微信群，获得更高效的问题答疑，与各行各业开发者充分交流，期待您的加入。
 <div align="center">
-<img src="./docs/imgs/joinus.JPEG"  width = "200" height = "200" />
+<img src="./docs/imgs/joinus.PNG"  width = "200" height = "200" />
 </div>  
 如扫码失败，请添加微信15711058002，并备注“Hub”，运营同学会邀请您入群。  
+## 文档教程
+- [PIP安装](./docs/installation.md)
+- 快速开始
+    - [命令行调用](./docs/quick_experience/cmd_quick_run.md)
+    - [Python API调用](./docs/quick_experience/python_use_hub.md)
+    - [在线运行体验demo【Official】](https://github.com/PaddlePaddle/PaddleHub/tree/release/v1.8/demo)
+    - [生态趣味项目demo【ThirdParty】](./docs/quick_experience/more_demos.md)
+- 丰富的预训练模型 182 个
+    - [精品特色模型](./docs/figure.md)
+    - 计算机视觉 126 个
+      - [图像分类 64 个](./modules/image/classification/README.md)
+      - [目标检测 13 个](./modules/image/object_detection/README.md)
+      - [人脸检测 7 个](./modules/image/face_detection/README.md)  
+      - [关键点检测 3 个](./modules/image/keypoint_detection/README.md)
+      - [图像分割 7 个](./modules/image/semantic_segmentation/README.md)
+      - [文本识别 8 个](./modules/image/text_recognition/README.md)
+      - [图像生成 17 个](./modules/image/Image_gan/README.md)
+      - [图像编辑 7 个](./modules/image/Image_editing/README.md)
+    - 自然语言处理 48 个
+      - [词法分析 2 个](./modules/text/lexical_analysis/README.md)
+      - [句法分析 1 个](./modules/text/syntactic_analysis/README.md)
+      - [情感分析 7 个](./modules/text/sentiment_analysis/README.md)
+      - [文本审核 3 个](./modules/text/text_review/README.md)
+      - [文本生成 9 个](./modules/text/text_generation/README.md)
+      - [语义模型 26 个](./modules/text/language_model/README.md)
+    - 语音 3 个
+      - [语音合成 3 个](./modules/audio/README.md)
+    - 视频5个
+      - [视频分类 5 个](./modules/video/README.md)
+- 部署
+    - [本地Inference部署](./docs/quick_experience/python_use_hub.md)
+    - [一行代码服务化部署](./docs/tutorial/serving.md)
+    - [移动端 Lite 部署（跳转Lite教程）](https://paddle-lite.readthedocs.io/zh/latest/quick_start/tutorial.html)
+- 进阶文档
+    - [命令行工具详解](./docs/tutorial/cmdintro.md)
+    - [自定义数据迁移学习](./docs/tutorial/how_to_load_data.md)
+    - [模型转module](./docs/tutorial/contri_pretrained_model.md)
+    - [文本Embedding任务](./docs/tutorial/bert_service.md)
+- 社区交流
+    - [加入技术交流群](#欢迎加入PaddleHub技术交流群)
+    - [贡献预训练模型](./docs/contribution/contri_pretrained_model.md)
+    - [贡献代码](./docs/contribution/contri_pr.md)
+- [FAQ](./docs/faq.md)  
+- [更新历史](./docs/release.md)
+- [许可证书](#许可证书)
+- [致谢](#致谢)
 <a name="许可证书"></a>
 ## 许可证书
 本项目的发布受<a href="./LICENSE">Apache 2.0 license</a>许可认证。

--- a/docs/figures.md
+++ b/docs/figures.md
+## 特性详解
+<a name="丰富的预训练模型"></a>
+### 1、丰富的预训练模型
+#### 1.1、图像
+|            | **精品模型举例**                                             |
+| ---------- | :----------------------------------------------------------- |
+| 图像分类 | [菜品识别](https://www.paddlepaddle.org.cn/hubdetail?name=resnet50_vd_dishes&en_category=ImageClassification)、[动物识别](https://www.paddlepaddle.org.cn/hubdetail?name=resnet50_vd_animals&en_category=ImageClassification)、[动物识别](https://www.paddlepaddle.org.cn/hubdetail?name=resnet50_vd_animals&en_category=ImageClassification)、[-->More](../modules/image/classification/README.md) |
+| 目标检测   | [通用检测](https://www.paddlepaddle.org.cn/hubdetail?name=yolov3_darknet53_coco2017&en_category=ObjectDetection)、[行人检测](https://www.paddlepaddle.org.cn/hubdetail?name=yolov3_darknet53_pedestrian&en_category=ObjectDetection)、[车辆检测](https://www.paddlepaddle.org.cn/hubdetail?name=yolov3_darknet53_vehicles&en_category=ObjectDetection)、[-->More](../modules/image/object_detection/README.md) |
+| 人脸检测 | [人脸检测](https://www.paddlepaddle.org.cn/hubdetail?name=pyramidbox_lite_server&en_category=FaceDetection)、[口罩检测](https://www.paddlepaddle.org.cn/hubdetail?name=pyramidbox_lite_server_mask&en_category=FaceDetection)、[-->More](../modules/image/face_detection/README.md) |
+| 图像分割   | [人像分割](https://www.paddlepaddle.org.cn/hubdetail?name=deeplabv3p_xception65_humanseg&en_category=ImageSegmentation)、[人体解析](https://www.paddlepaddle.org.cn/hubdetail?name=ace2p&en_category=ImageSegmentation)、[肺炎CT影像分析](https://www.paddlepaddle.org.cn/hubdetail?name=Pneumonia_CT_LKM_PP&en_category=ImageSegmentation)、[-->More](../modules/image/semantic_segmentation/README.md) |
+| 关键点检测 | [人体关键点](https://www.paddlepaddle.org.cn/hubdetail?name=human_pose_estimation_resnet50_mpii&en_category=KeyPointDetection)、[人脸关键点](https://www.paddlepaddle.org.cn/hubdetail?name=face_landmark_localization&en_category=KeyPointDetection)、[手部关键点](https://www.paddlepaddle.org.cn/hubdetail?name=hand_pose_localization&en_category=KeyPointDetection)、[-->More](./modules/image/keypoint_detection/README.md) |
+| 文本识别 | [超轻量中英文OCR文字识别](https://www.paddlepaddle.org.cn/hubdetail?name=chinese_ocr_db_crnn_mobile&en_category=TextRecognition)、[-->More](../modules/image/text_recognition/README.md) |
+| 图像生成    | [风格迁移](https://www.paddlepaddle.org.cn/hubdetail?name=stylepro_artistic&en_category=GANs)、[街景动漫画]()、[-->More](../modules/image/Image_gan/README.md) |
+| 图像编辑 | [超分辨率](https://www.paddlepaddle.org.cn/hubdetail?name=realsr&en_category=ImageEditing)、[黑白上色](https://www.paddlepaddle.org.cn/hubdetail?name=deoldify&en_category=ImageEditing)、[-->More](../modules/image/Image_editing/README.md) |
+#### 1.2、文本
+|            | **精品模型举例**                                           |
+| ---------- | :----------------------------------------------------------- |
+| 词句分析 | [词法分析 ](https://www.paddlepaddle.org.cn/hubdetail?name=lac&en_category=LexicalAnalysis)、[句法分析](https://www.paddlepaddle.org.cn/hubdetail?name=ddparser&en_category=SyntacticAnalysis)、[-->More](../modules/text/lexical_analysis/README.md) |
+| 情感分析   | [情感判断](https://www.paddlepaddle.org.cn/hubdetail?name=lac&en_category=LexicalAnalysis)、[情绪分析](https://www.paddlepaddle.org.cn/hubdetail?name=emotion_detection_textcnn&en_category=SentimentAnalysis) 、[-->More](../modules/text/sentiment_analysis/README.md)|
+| 文本审核 | [色情审核](https://www.paddlepaddle.org.cn/hubdetail?name=porn_detection_gru&en_category=TextCensorship)、[-->More](../modules/text/text_review/README.md) |
+| 文本生成 | [对联生成]()、[情话生成]()、[藏图诗生成]()、[土味情话]() 、[-->More](../modules/text/text_generation/README.md)|
+| 语义模型   | [ERNIE](https://www.paddlepaddle.org.cn/hubdetail?name=ERNIE&en_category=SemanticModel)、[文本相似度](https://www.paddlepaddle.org.cn/hubdetail?name=simnet_bow&en_category=SemanticModel)、[-->More](../modules/text/language_model/README.md) |
+#### 1.3、语音
+|            | **精品模型举例**                                           |
+| ---------- | :----------------------------------------------------------- |
+| 语音合成   | [语音合成]() 、[-->More](../modules/audio/README.md)                         |
+#### 1.4、视频
+|            | **精品模型举例**                                       |
+| ---------- | :----------------------------------------------------------- |
+| 视频分类 | [视频分类]()、[-->More](../modules/video/README.md) |
+<a name="一键模型预测"></a>
+### 2、一键模型预测
+* 举例，假如考虑使用文字识别轻量级中文OCR模型chinese_ocr_db_crnn_mobile即可一键快速识别图片中的文字。
+```shell
+$ pip install paddlehub
+$ wget https://paddlehub.bj.bcebos.com/model/image/ocr/test_ocr.jpg
+$ hub run chinese_ocr_db_crnn_mobile --input_path test_ocr.jpg --visualization=True
+```
+* 预测结果图片保存在当前运行路径下ocr_result文件夹中，如下图所示。
+<p align="center">
+ <img src="./imgs/ocr_res.jpg" width='70%' align="middle"  
+</p>
+* 使用词法分析模型LAC进行分词
+```shell
+$ hub run lac --input_text "现在，慕尼黑再保险公司不仅是此类行动的倡议者，更是将其大量气候数据整合进保险产品中，并与公众共享大量天气信息，参与到新能源领域的保障中。"
+[{
+    'word': ['现在', '，', '慕尼黑再保险公司', '不仅', '是', '此类', '行动', '的', '倡议者', '，', '更是', '将', '其', '大量', '气候', '数据', '整合', '进', '保险', '产品', '中', '，', '并', '与', '公众', '共享', '大量', '天气', '信息', '，', '参与', '到', '新能源', '领域', '的', '保障', '中', '。'],
+    'tag':  ['TIME', 'w', 'ORG', 'c', 'v', 'r', 'n', 'u', 'n', 'w', 'd', 'p', 'r', 'a', 'n', 'n', 'v', 'v', 'n', 'n', 'f', 'w', 'c', 'p', 'n', 'v', 'a', 'n', 'n', 'w', 'v', 'v', 'n', 'n', 'u', 'vn', 'f', 'w']
+}]
+```
+除了一行代码预测之外，PaddleHub也支持使用API调用模型的方式，可以参考每个模型的详细文档。
+<a name="一键模型转服务"></a>
+### 3、一键模型转服务
+PaddleHub提供便捷的模型转服务的能力，只需简单一行命令即可完成模型的HTTP服务部署。通过以下命令即可快速启动LAC词法分析服务：
+```shell
+$ hub serving start -m chinese_ocr_db_crnn_mobile
+```
+更多关于模型服务化使用说明参见[PaddleHub模型一键服务化部署](./tutorial/serving.md)。
+<a name="十行代码迁移学习"></a>
+### 4、十行代码迁移学习
+通过Fine-tune API，只需要少量代码即可完成深度学习模型在计算机视觉场景下的迁移学习。
+* [Demo示例](../demo)提供丰富的Fine-tune API的使用代码，包括[图像分类](../demo/image_classification)、[图像着色](../demo/colorization)、[风格迁移](../demo/style_transfer)、等场景的模型迁移示例。
+<p align="center">
+ <img src="./imgs/paddlehub_finetune.gif" align="middle"  
+</p>
+<p align='center'>
+ 十行代码完成工业级文本分类
+</p>
+* 如需在线快速体验，请点击[PaddleHub教程合集](https://aistudio.baidu.com/aistudio/projectdetail/231146)，可使用AI Studio平台提供的GPU算力进行快速尝试。
--- a/docs/imgs/Readme_Related/ImageClas_animal_dish_wild.gif
+++ b/docs/imgs/Readme_Related/ImageClas_animal_dish_wild.gif
--- a/docs/imgs/Readme_Related/ImageEdit_Restoration.gif
+++ b/docs/imgs/Readme_Related/ImageEdit_Restoration.gif
--- a/docs/imgs/Readme_Related/ImageEdit_SuperResolution.gif
+++ b/docs/imgs/Readme_Related/ImageEdit_SuperResolution.gif
--- a/docs/imgs/Readme_Related/ImageGan_Anime.gif
+++ b/docs/imgs/Readme_Related/ImageGan_Anime.gif
--- a/docs/imgs/Readme_Related/ImageSeg_Human.gif
+++ b/docs/imgs/Readme_Related/ImageSeg_Human.gif
--- a/docs/imgs/Readme_Related/Image_ObjectDetection_Face_Mask.gif
+++ b/docs/imgs/Readme_Related/Image_ObjectDetection_Face_Mask.gif
--- a/docs/imgs/Readme_Related/Image_ObjectDetection_Pedestrian_Vehicle.gif
+++ b/docs/imgs/Readme_Related/Image_ObjectDetection_Pedestrian_Vehicle.gif
--- a/docs/imgs/Readme_Related/Image_Ocr.gif
+++ b/docs/imgs/Readme_Related/Image_Ocr.gif
--- a/docs/imgs/Readme_Related/Image_keypoint.gif
+++ b/docs/imgs/Readme_Related/Image_keypoint.gif
--- a/docs/imgs/Readme_Related/Text_Lexical Analysis.png
+++ b/docs/imgs/Readme_Related/Text_Lexical Analysis.png
--- a/docs/imgs/Readme_Related/Text_SentimentAnalysis.png
+++ b/docs/imgs/Readme_Related/Text_SentimentAnalysis.png
--- a/docs/imgs/Readme_Related/Text_SyntacticAnalysis.png
+++ b/docs/imgs/Readme_Related/Text_SyntacticAnalysis.png
--- a/docs/imgs/Readme_Related/Text_Textgen_poetry.gif
+++ b/docs/imgs/Readme_Related/Text_Textgen_poetry.gif
--- a/docs/imgs/Readme_Related/Text_Textreview.png
+++ b/docs/imgs/Readme_Related/Text_Textreview.png
--- a/docs/imgs/Readme_Related/Text_Video.gif
+++ b/docs/imgs/Readme_Related/Text_Video.gif
--- a/docs/imgs/Readme_Related/audio_icon.png
+++ b/docs/imgs/Readme_Related/audio_icon.png
--- a/docs/imgs/Readme_Related/star.png
+++ b/docs/imgs/Readme_Related/star.png
--- a/docs/imgs/joinus.PNG
+++ b/docs/imgs/joinus.PNG
--- a/docs/tutorial/contri_pretrained_model.md
+++ b/docs/tutorial/contri_pretrained_model.md
+# 如何编写一个PaddleHub Module
+## 模型基本信息
+我们准备编写一个PaddleHub Module，Module的基本信息如下：
+```yaml
+name="openpose_body_estimation",
+type="CV/image_editing",
+author="paddlepaddle",
+author_email="",
+summary="Openpose_body_estimation is a body pose estimation model based on Realtime Multi-Person 2D Pose \
+        Estimation using Part Affinity Fields.",
+version="1.0.0"
+```
+Module存在一个接口predict，用于接收传入图片，并得到最终输出的结果，支持python接口调用和命令行调用。
+```python
+import paddlehub as hub
+model = hub.Module(name="openpose_body_estimation")
+result = model.predict("demo.jpg")
+```
+```cmd
+hub run openpose_body_estimation --input_path demo.jpg
+```
+## Module创建
+### step 1. 创建必要的目录与文件
+创建一个openpose_body_estimation的目录，并在openpose_body_estimation目录下分别创建module.py, processor.py。其中
+|文件名|用途|
+|-|-|
+|module.py|主模块，提供Module的实现代码|
+|processor.py|辅助模块，提供词表加载的方法|
+```cmd
+➜  tree openpose_body_estimation
+openpose_body_estimation/
+   ├── module.py
+   └── processor.py
+```
+### step 2. 实现辅助模块processor
+在processor.py中实现一些在module.py里面需要调用到的类和函数。例如在processor.py 中实现ResizeScaling类：
+```python
+class ResizeScaling:
+    """Resize images by scaling method.
+    Args:
+        target(int): Target image size.
+        interpolation(Callable): Interpolation method.
+    """
+    def __init__(self, target: int = 368, interpolation: Callable = cv2.INTER_CUBIC):
+        self.target = target
+        self.interpolation = interpolation
+    def __call__(self, img, scale_search):
+        scale = scale_search * self.target / img.shape[0]
+        resize_img = cv2.resize(img, (0, 0), fx=scale, fy=scale, interpolation=self.interpolation)
+        return resize_img
+```
+### step 3. 编写Module处理代码
+module.py文件为Module的入口代码所在，我们需要在其中实现预测逻辑。
+#### step 3_1. 引入必要的头文件
+```python
+import os
+import time
+import copy
+import base64
+import argparse
+from typing import Union
+from collections import OrderedDict
+import cv2
+import paddle
+import paddle.nn as nn
+import numpy as np
+from paddlehub.module.module import moduleinfo, runnable, serving
+import paddlehub.vision.transforms as T
+import openpose_body_estimation.processor as P
+```
+**NOTE:** `paddlehub.vision.transforms`有常见的图像处理方法，可以方便调用。
+#### step 3_2. 定义BodyPoseModel类
+module.py中需要有一个继承了nn.Layer，该类负责实现预测逻辑，并使用moduleinfo填写基本信息。当使用hub.Module(name="openpose_body_estimation")加载Module时，PaddleHub会自动创建openpose_body_estimation的对象并返回。
+```python
+@moduleinfo(
+    name="openpose_body_estimation",
+    type="CV/image_editing",
+    author="paddlepaddle",
+    author_email="",
+    summary="Openpose_body_estimation is a body pose estimation model based on Realtime Multi-Person 2D Pose \
+            Estimation using Part Affinity Fields.",
+    version="1.0.0")
+class BodyPoseModel(nn.Layer):
+    ...
+```
+#### step 3_3. 执行必要的初始化及模型搭建
+模型的初始化主要完成几个功能：待使用的类的声明，模型使用的类的声明及参数加载。
+```python
+ def __init__(self, load_checkpoint: str = None):
+        super(BodyPoseModel, self).__init__()
+        #将会使用到的类的声明
+        self.resize_func = P.ResizeScaling()
+        self.norm_func = T.Normalize(std=[1, 1, 1])
+        #模型声明
+        self.input_nc = 4
+        self.output_nc = 2
+        model1 = (
+            Conv2D(self.input_nc, 64, 3, 1, 1),
+            nn.ReLU(),
+            Conv2D(64, 64, 3, 1, 1),
+            nn.ReLU(),
+            nn.BatchNorm(64),
+        )
+        self.model1 = nn.Sequential(*model1)
+        #参数加载
+        if load_checkpoint is not None:
+            self.model_dict = paddle.load(load_checkpoint)
+            self.set_dict(self.model_dict)
+            print("load custom checkpoint success")
+        else:
+            checkpoint = os.path.join(self.directory, 'model.pdparams')
+            self.model_dict = paddle.load(checkpoint)
+            self.set_dict(self.model_dict)
+            print("load pretrained checkpoint success")
+```  
+模型的搭建主要在`forward`里面实现：
+```python
+def forward(self, input: paddle.Tensor) -> paddle.Tensor:
+        result = self.model1(input)
+        return result
+```  
+#### step 3_4. 完善预测逻辑
+```python
+def predict(self, img:Union(np.ndarray,str), visualization: bool = True):
+    self.eval()
+    self.visualization = visualization
+    if isinstance(img, str):
+        orgImg = cv2.imread(img)
+    else:
+        orgImg = img
+    data = self.resize_func(self.norm_func(orgImg))
+    output = self.forward(paddle.to_tensor(data.astype('float32')))
+    output = paddle.clip(output[0].transpose((1, 2, 0)), 0, 255).numpy()
+    output = output.astype(np.uint8)
+    if self.visualization:
+        style_name = "body_" + str(time.time()) + ".png"
+        if not os.path.exists(save_path):
+            os.mkdir(save_path)
+        path = os.path.join(save_path, style_name)
+        cv2.imwrite(path, output)
+    return output
+```
+#### step 3_5. 支持命令行调用
+如果希望Module可以支持命令行调用，则需要提供一个经过runnable修饰的接口，接口负责解析传入数据并进行预测，将结果返回。
+```python
+@runnable
+def run_cmd(self, argvs):
+    """
+    Run as a command.
+    """
+    self.parser = argparse.ArgumentParser(
+        description="Run the {} module.".format(self.name),
+        prog='hub run {}'.format(self.name),
+        usage='%(prog)s',
+        add_help=True)
+    self.arg_input_group = self.parser.add_argument_group(
+        title="Input options", description="Input data. Required")
+    self.arg_config_group = self.parser.add_argument_group(
+        title="Config options",
+        description=
+        "Run configuration for controlling module behavior, not required.")
+    self.add_module_config_arg()
+    self.add_module_input_arg()
+    args = self.parser.parse_args(argvs)
+    results = self.predict(
+        img=args.input_path,
+        save_path=args.output_dir,
+        visualization=args.visualization)
+    return results
+def add_module_config_arg(self):
+    """
+    Add the command config options.
+    """
+    self.arg_config_group.add_argument(
+        '--output_dir',
+        type=str,
+        default='openpose_body',
+        help="The directory to save output images.")
+    self.arg_config_group.add_argument(
+        '--save_dir',
+        type=str,
+        default='openpose_model',
+        help="The directory to save model.")
+    self.arg_config_group.add_argument(
+        '--visualization',
+        type=bool,
+        default=True,
+        help="whether to save output as images.")
+def add_module_input_arg(self):
+    """
+    Add the command input options.
+    """
+    self.arg_input_group.add_argument(
+        '--input_path', type=str, help="path to image.")
+```
+#### step 3_6. 支持serving调用
+如果希望Module可以支持PaddleHub Serving部署预测服务，则需要提供一个经过serving修饰的接口，接口负责解析传入数据并进行预测，将结果返回。
+如果不需要提供PaddleHub Serving部署预测服务，则可以不需要加上serving修饰。
+```python
+@serving
+def serving_method(self, images, **kwargs):
+    """
+    Run as a service.
+    """
+    images_decode = [base64_to_cv2(image) for image in images]
+    results = self.predict(img=images_decode[0], **kwargs)
+    final={}
+    final['data'] = P.cv2_to_base64(results)
+    return final
+```
+## 测试步骤
+完成Module编写后，我们可以通过以下方式测试该Module
+### 调用方法1
+将Module安装到本机中，再通过Hub.Module(name=...)加载
+```shell
+hub install openpose_body_estimation
+```
+```python
+import paddlehub as hub
+if __name__ == "__main__":
+    model = hub.Module(name='openpose_hands_estimation')
+    result = model.predict("demo.jpg")
+```
+### 调用方法2
+将Module安装到本机中，再通过hub run运行
+```shell
+hub install openpose_body_estimation
+hub run openpose_body_estimation --input_path demo.jpg
+```
+### 测试serving方法
+运行启动命令：
+```shell
+$ hub serving start -m openpose_body_estimation
+```
+发送预测请求，获取预测结果.
+```python
+import requests
+import json
+import cv2
+import base64
+import numpy as np
+def cv2_to_base64(image):
+    data = cv2.imencode('.jpg', image)[1]
+    return base64.b64encode(data.tostring()).decode('utf8')
+def base64_to_cv2(b64str):
+    data = base64.b64decode(b64str.encode('utf8'))
+    data = np.fromstring(data, np.uint8)
+    data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+    return data
+# 发送HTTP请求
+org_im = cv2.imread('/PATH/TO/IMAGE')
+data = {'images':[cv2_to_base64(org_im)]}
+headers = {"Content-type": "application/json"}
+url = "http://127.0.0.1:8866/predict/openpose_body_estimation"
+r = requests.post(url=url, headers=headers, data=json.dumps(data))
+canvas = base64_to_cv2(r.json()["results"]['data'])
+cv2.imwrite('keypoint_body.png', canvas)
+```
--- a/modules/audio/README.md
+++ b/modules/audio/README.md
+## **更好用户体验，建议参考WEB端官方文档 -> [【语音合成】](https://www.paddlepaddle.org.cn/hublist)**
+### 文字识别
+语音合成（TTS）任务可以实现讲文字转化为语音，已经广泛应用于各种语音交互设备中。
+- 推荐模型
+| 模型名称                                                     | 模型简介                                                     |
+| ------------------------------------------------------------ | ------------------------------------------------------------ |
+| [语音合成transformer_tts_ljspeech](https://www.paddlepaddle.org.cn/hubdetail?name=transformer_tts_ljspeech&en_category=TextToSpeech) | TansformerTTS 对 Transformer 和 Tacotron2 进行了融合，取得了令人满意的效果，英文TTS模型，仅支持预测。 |
+| [语音合成fastspeech_ljspeech](https://www.paddlepaddle.org.cn/hubdetail?name=fastspeech_ljspeech&en_category=TextToSpeech) | FastSpeech是基于encoder-decoder结构的teacher model中提取attention对角线来做发音持续时间预测，英文TTS模型，仅支持预测。 |
+| [语音合成deepvoice3_ljspeech](https://www.paddlepaddle.org.cn/hubdetail?name=deepvoice3_ljspeech&en_category=TextToSpeech) | Deep Voice 3是百度研究院2017年发布的端到端的TTS模型（论文录用于ICLR 2018）。它是一个基于卷积神经网络和注意力机制的seq2seq模型,英文TTS模型，仅支持预测。|
--- a/modules/image/Image_editing/README.md
+++ b/modules/image/Image_editing/README.md
+## **更好用户体验，建议参考WEB端官方文档 -> [【图像编辑】](https://www.paddlepaddle.org.cn/hublist)**
+### 图像编辑
+图像编辑是指在输入图像的基础上，对图像的像素点进行进一步的编辑和调整，输出新的目标图像，具体的应用场景有：超分辨率、黑白片上色，老照片修复等。
+- 精选推荐模型
+| 模型名称                                                     | 模型简介                                                     |
+| ------------------------------------------------------------ | ------------------------------------------------------------ |
+ | [超分辨率](https://www.paddlepaddle.org.cn/hubdetail?name=realsr&en_category=ImageEditing) | 可用于图像和视频超分模型，它能够将输入的图片和视频超分四倍。 |
+ | [黑白图像上色](https://www.paddlepaddle.org.cn/hubdetail?name=deoldify&en_category=ImageEditing) | deoldify是用于图像和视频的着色渲染模型，该模型能够实现给黑白照片和视频恢复原彩。 |
+  | [老照片修复](https://www.paddlepaddle.org.cn/hubdetail?name=photo_restoration&en_category=ImageEditing) | 针对老照片修复的模型。它主要由两个部分组成：着色和超分。|
--- a/modules/image/colorization/user_guided_colorization/data_feed.py
+++ b/modules/image/colorization/user_guided_colorization/data_feed.py
@@ -130,4 +130,4 @@ class ColorizePreprocess:
        data['real_B_enc'] = paddle.to_tensor(data['real_B_enc'].astype(np.int64))
        data['hint_B'] = paddle.to_tensor(data['hint_B'].astype(np.float32))
        data['mask_B'] = paddle.to_tensor(data['mask_B'].astype(np.float32))
        return data
\ No newline at end of file
--- a/modules/image/colorization/user_guided_colorization/module.py
+++ b/modules/image/colorization/user_guided_colorization/module.py
@@ -40,7 +40,6 @@ class UserGuidedColorization(nn.Layer):
        load_checkpoint (str): Pretrained checkpoint path.
    """
    def __init__(self, use_tanh: bool = True, load_checkpoint: str = None):
        super(UserGuidedColorization, self).__init__()
        self.input_nc = 4
@@ -119,8 +118,8 @@ class UserGuidedColorization(nn.Layer):
        )
        # Conv8
-        model8up = (Conv2DTranspose(512, 256, kernel_size=4, stride=2, padding=1),)
+        model8up = (Conv2DTranspose(512, 256, kernel_size=4, stride=2, padding=1), )
-        model3short8 = (Conv2D(256, 256, 3, 1, 1),)
+        model3short8 = (Conv2D(256, 256, 3, 1, 1), )
        model8 = (
            nn.ReLU(),
            Conv2D(256, 256, 3, 1, 1),
@@ -131,20 +130,26 @@ class UserGuidedColorization(nn.Layer):
        )
        # Conv9
-        model9up = (Conv2DTranspose(256, 128, kernel_size=4, stride=2, padding=1),)
+        model9up = (Conv2DTranspose(256, 128, kernel_size=4, stride=2, padding=1), )
-        model2short9 = (Conv2D(128, 128, 3, 1, 1,),)
+        model2short9 = (Conv2D(
+            128,
+            128,
+            3,
+            1,
+            1,
+        ), )
        model9 = (nn.ReLU(), Conv2D(128, 128, 3, 1, 1), nn.ReLU(), nn.BatchNorm(128))
        # Conv10
-        model10up = (Conv2DTranspose(128, 128, kernel_size=4, stride=2, padding=1),)
+        model10up = (Conv2DTranspose(128, 128, kernel_size=4, stride=2, padding=1), )
-        model1short10 = (Conv2D(64, 128, 3, 1, 1),)
+        model1short10 = (Conv2D(64, 128, 3, 1, 1), )
        model10 = (nn.ReLU(), Conv2D(128, 128, 3, 1, 1), nn.LeakyReLU(negative_slope=0.2))
-        model_class = (Conv2D(256, 529, 1),)
+        model_class = (Conv2D(256, 529, 1), )
        if use_tanh:
            model_out = (Conv2D(128, 2, 1, 1, 0, 1), nn.Tanh())
        else:
-            model_out = (Conv2D(128, 2, 1, 1, 0, 1),)
+            model_out = (Conv2D(128, 2, 1, 1, 0, 1), )
        self.model1 = nn.Sequential(*model1)
        self.model2 = nn.Sequential(*model2)
@@ -178,10 +183,10 @@ class UserGuidedColorization(nn.Layer):
            print("load pretrained checkpoint success")
    def transforms(self, images: str) -> callable:
        transform = T.Compose([T.Resize((256, 256), interpolation='NEAREST'), T.RGB2LAB()], to_rgb=True)
        return transform(images)
    def set_config(self, classification: bool = True, prob: float = 1., num_point: int = None):
        self.classification = classification
        self.pre_func = ColorizePreprocess(ab_thresh=0., p=prob, points=num_point)
@@ -221,4 +226,4 @@ class UserGuidedColorization(nn.Layer):
            conv10_2 = self.model10(conv10_up)
            out_reg = self.model_out(conv10_2)
        return out_class, out_reg
\ No newline at end of file
--- a/modules/image/Image_gan/README.md
+++ b/modules/image/Image_gan/README.md
+## **更好用户体验，建议参考WEB端官方文档 -> [【图像生成】](https://www.paddlepaddle.org.cn/hublist)**
+### 图像生成
+图像生成是指根据输入向量，生成目标图像。这里的输入向量可以是随机的噪声或用户指定的条件向量。具体的应用场景有：风格迁移、图像动漫画等。
+- 精选推荐模型
+| 模型名称                                                     | 模型简介                                                     |
+| ------------------------------------------------------------ | ------------------------------------------------------------ |
+ | [艺术风格迁移](https://www.paddlepaddle.org.cn/hubdetail?name=stylepro_artistic&en_category=GANs) | 将给定的图像转换为任意的艺术风格。确保模型高保真还原内容图片的语义细节信息与风格图片的风格信息。 |
+ | [图像动漫化-新海诚](https://www.paddlepaddle.org.cn/hubdetail?name=animegan_v2_shinkai_53&en_category=GANs) | AnimeGAN V2 图像风格转换模型, 模型可将输入的图像转换成新海诚动漫风格 |
+  | [图像动漫化-宫崎骏](https://www.paddlepaddle.org.cn/hubdetail?name=animegan_v2_hayao_64&en_category=GANs) | AnimeGAN V2 图像风格转换模型, 模型可将输入的图像转换成宫崎骏动漫风格|
+  | [图像动漫化-今敏红辣椒](https://www.paddlepaddle.org.cn/hubdetail?name=animegan_v2_paprika_97&en_category=GANs) | AnimeGAN V2 图像风格转换模型, 模型可将输入的图像转换成今敏红辣椒动漫风格。|
--- a/modules/image/style_transfer/msgnet/module.py
+++ b/modules/image/style_transfer/msgnet/module.py
@@ -14,7 +14,6 @@ from paddlehub.module.cv_module import StyleTransferModule
 class GramMatrix(nn.Layer):
    """Calculate gram matrix"""
    def forward(self, y):
        (b, ch, h, w) = y.shape
        features = y.reshape((b, ch, w * h))
@@ -25,7 +24,6 @@ class GramMatrix(nn.Layer):
 class ConvLayer(nn.Layer):
    """Basic conv layer with reflection padding layer"""
    def __init__(self, in_channels: int, out_channels: int, kernel_size: int, stride: int):
        super(ConvLayer, self).__init__()
        pad = int(np.floor(kernel_size / 2))
@@ -53,7 +51,6 @@ class UpsampleConvLayer(nn.Layer):
    Return:
        img(paddle.Tensor): UpsampleConvLayer output.
    """
    def __init__(self, in_channels: int, out_channels: int, kernel_size: int, stride: int, upsample=None):
        super(UpsampleConvLayer, self).__init__()
        self.upsample = upsample
@@ -88,7 +85,6 @@ class Bottleneck(nn.Layer):
    Return:
        img(paddle.Tensor): Bottleneck output.
    """
    def __init__(self,
                 inplanes: int,
                 planes: int,
@@ -102,8 +98,8 @@ class Bottleneck(nn.Layer):
            self.residual_layer = nn.Conv2D(inplanes, planes * self.expansion, kernel_size=1, stride=stride)
        conv_block = (norm_layer(inplanes), nn.ReLU(), nn.Conv2D(inplanes, planes, kernel_size=1, stride=1),
                      norm_layer(planes), nn.ReLU(), ConvLayer(planes, planes, kernel_size=3, stride=stride),
-                      norm_layer(planes), nn.ReLU(), nn.Conv2D(
+                      norm_layer(planes), nn.ReLU(), nn.Conv2D(planes, planes * self.expansion, kernel_size=1,
-                          planes, planes * self.expansion, kernel_size=1, stride=1))
+                                                               stride=1))
        self.conv_block = nn.Sequential(*conv_block)
    def forward(self, x: paddle.Tensor):
@@ -129,12 +125,14 @@ class UpBottleneck(nn.Layer):
    Return:
        img(paddle.Tensor): UpBottleneck output.
    """
    def __init__(self, inplanes: int, planes: int, stride: int = 2, norm_layer: nn.Layer = nn.BatchNorm2D):
        super(UpBottleneck, self).__init__()
        self.expansion = 4
-        self.residual_layer = UpsampleConvLayer(
+        self.residual_layer = UpsampleConvLayer(inplanes,
-            inplanes, planes * self.expansion, kernel_size=1, stride=1, upsample=stride)
+                                                planes * self.expansion,
+                                                kernel_size=1,
+                                                stride=1,
+                                                upsample=stride)
        conv_block = []
        conv_block += [norm_layer(inplanes), nn.ReLU(), nn.Conv2D(inplanes, planes, kernel_size=1, stride=1)]
        conv_block += [
@@ -165,7 +163,6 @@ class Inspiration(nn.Layer):
    Return:
        img(paddle.Tensor): UpBottleneck output.
    """
    def __init__(self, C: int, B: int = 1):
        super(Inspiration, self).__init__()
@@ -182,8 +179,8 @@ class Inspiration(nn.Layer):
        self.P = paddle.bmm(self.weight.expand_as(self.G), self.G)
        x = paddle.bmm(
-            self.P.transpose((0, 2, 1)).expand((X.shape[0], self.C, self.C)), X.reshape((X.shape[0], X.shape[1],
+            self.P.transpose((0, 2, 1)).expand((X.shape[0], self.C, self.C)), X.reshape(
-                                                                                         -1))).reshape(X.shape)
+                (X.shape[0], X.shape[1], -1))).reshape(X.shape)
        return x
    def __repr__(self):
@@ -193,7 +190,6 @@ class Inspiration(nn.Layer):
 class Vgg16(nn.Layer):
    """ First four layers from Vgg16."""
    def __init__(self):
        super(Vgg16, self).__init__()
        self.conv1_1 = nn.Conv2D(3, 64, kernel_size=3, stride=1, padding=1)
@@ -268,8 +264,12 @@ class MSGNet(nn.Layer):
    Return:
        img(paddle.Tensor): MSGNet output.
    """
+    def __init__(self,
-    def __init__(self, input_nc=3, output_nc=3, ngf=128, n_blocks=6, norm_layer=nn.InstanceNorm2D,
+                 input_nc=3,
+                 output_nc=3,
+                 ngf=128,
+                 n_blocks=6,
+                 norm_layer=nn.InstanceNorm2D,
                 load_checkpoint=None):
        super(MSGNet, self).__init__()
        self.gram = GramMatrix()
@@ -341,4 +341,4 @@ class MSGNet(nn.Layer):
        return self._vgg(input)
    def forward(self, input: paddle.Tensor):
        return self.model(input)
\ No newline at end of file
--- a/modules/image/style_transfer/stylepro_artistic/README.md
+++ b/modules/image/style_transfer/stylepro_artistic/README.md
--- a/modules/image/style_transfer/stylepro_artistic/__init__.py
+++ b/modules/image/style_transfer/stylepro_artistic/__init__.py
--- a/modules/image/style_transfer/stylepro_artistic/data_feed.py
+++ b/modules/image/style_transfer/stylepro_artistic/data_feed.py
--- a/modules/image/Image_gan/style_transfer/stylepro_artistic/decoder_network.py
+++ b/modules/image/Image_gan/style_transfer/stylepro_artistic/decoder_network.py
+# coding=utf-8
+from paddle.fluid.initializer import Constant
+from paddle.fluid.param_attr import ParamAttr
+import paddle.fluid as fluid
+def decoder_net():
+    x2paddle_22 = fluid.layers.create_parameter(dtype='float32',
+                                                shape=[4],
+                                                name='x2paddle_22',
+                                                attr='x2paddle_22',
+                                                default_initializer=Constant(0.0))
+    x2paddle_36 = fluid.layers.create_parameter(dtype='float32',
+                                                shape=[4],
+                                                name='x2paddle_36',
+                                                attr='x2paddle_36',
+                                                default_initializer=Constant(0.0))
+    x2paddle_44 = fluid.layers.create_parameter(dtype='float32',
+                                                shape=[4],
+                                                name='x2paddle_44',
+                                                attr='x2paddle_44',
+                                                default_initializer=Constant(0.0))
+    x2paddle_input_1 = fluid.layers.data(dtype='float32',
+                                         shape=[1, 512, 64, 64],
+                                         name='x2paddle_input_1',
+                                         append_batch_size=False)
+    x2paddle_19 = fluid.layers.pad2d(x2paddle_input_1,
+                                     pad_value=0.0,
+                                     mode='reflect',
+                                     paddings=[1, 1, 1, 1],
+                                     name='x2paddle_19')
+    x2paddle_20 = fluid.layers.conv2d(x2paddle_19,
+                                      num_filters=256,
+                                      filter_size=[3, 3],
+                                      stride=[1, 1],
+                                      padding=[0, 0],
+                                      dilation=[1, 1],
+                                      groups=1,
+                                      param_attr='x2paddle_1',
+                                      name='x2paddle_20',
+                                      bias_attr='x2paddle_2')
+    x2paddle_21 = fluid.layers.relu(x2paddle_20, name='x2paddle_21')
+    x2paddle_23 = fluid.layers.resize_nearest(x2paddle_21, name='x2paddle_23', out_shape=[128, 128])
+    x2paddle_24 = fluid.layers.pad2d(x2paddle_23,
+                                     pad_value=0.0,
+                                     mode='reflect',
+                                     paddings=[1, 1, 1, 1],
+                                     name='x2paddle_24')
+    x2paddle_25 = fluid.layers.conv2d(x2paddle_24,
+                                      num_filters=256,
+                                      filter_size=[3, 3],
+                                      stride=[1, 1],
+                                      padding=[0, 0],
+                                      dilation=[1, 1],
+                                      groups=1,
+                                      param_attr='x2paddle_3',
+                                      name='x2paddle_25',
+                                      bias_attr='x2paddle_4')
+    x2paddle_26 = fluid.layers.relu(x2paddle_25, name='x2paddle_26')
+    x2paddle_27 = fluid.layers.pad2d(x2paddle_26,
+                                     pad_value=0.0,
+                                     mode='reflect',
+                                     paddings=[1, 1, 1, 1],
+                                     name='x2paddle_27')
+    x2paddle_28 = fluid.layers.conv2d(x2paddle_27,
+                                      num_filters=256,
+                                      filter_size=[3, 3],
+                                      stride=[1, 1],
+                                      padding=[0, 0],
+                                      dilation=[1, 1],
+                                      groups=1,
+                                      param_attr='x2paddle_5',
+                                      name='x2paddle_28',
+                                      bias_attr='x2paddle_6')
+    x2paddle_29 = fluid.layers.relu(x2paddle_28, name='x2paddle_29')
+    x2paddle_30 = fluid.layers.pad2d(x2paddle_29,
+                                     pad_value=0.0,
+                                     mode='reflect',
+                                     paddings=[1, 1, 1, 1],
+                                     name='x2paddle_30')
+    x2paddle_31 = fluid.layers.conv2d(x2paddle_30,
+                                      num_filters=256,
+                                      filter_size=[3, 3],
+                                      stride=[1, 1],
+                                      padding=[0, 0],
+                                      dilation=[1, 1],
+                                      groups=1,
+                                      param_attr='x2paddle_7',
+                                      name='x2paddle_31',
+                                      bias_attr='x2paddle_8')
+    x2paddle_32 = fluid.layers.relu(x2paddle_31, name='x2paddle_32')
+    x2paddle_33 = fluid.layers.pad2d(x2paddle_32,
+                                     pad_value=0.0,
+                                     mode='reflect',
+                                     paddings=[1, 1, 1, 1],
+                                     name='x2paddle_33')
+    x2paddle_34 = fluid.layers.conv2d(x2paddle_33,
+                                      num_filters=128,
+                                      filter_size=[3, 3],
+                                      stride=[1, 1],
+                                      padding=[0, 0],
+                                      dilation=[1, 1],
+                                      groups=1,
+                                      param_attr='x2paddle_9',
+                                      name='x2paddle_34',
+                                      bias_attr='x2paddle_10')
+    x2paddle_35 = fluid.layers.relu(x2paddle_34, name='x2paddle_35')
+    x2paddle_37 = fluid.layers.resize_nearest(x2paddle_35, name='x2paddle_37', out_shape=[256, 256])
+    x2paddle_38 = fluid.layers.pad2d(x2paddle_37,
+                                     pad_value=0.0,
+                                     mode='reflect',
+                                     paddings=[1, 1, 1, 1],
+                                     name='x2paddle_38')
+    x2paddle_39 = fluid.layers.conv2d(x2paddle_38,
+                                      num_filters=128,
+                                      filter_size=[3, 3],
+                                      stride=[1, 1],
+                                      padding=[0, 0],
+                                      dilation=[1, 1],
+                                      groups=1,
+                                      param_attr='x2paddle_11',
+                                      name='x2paddle_39',
+                                      bias_attr='x2paddle_12')
+    x2paddle_40 = fluid.layers.relu(x2paddle_39, name='x2paddle_40')
+    x2paddle_41 = fluid.layers.pad2d(x2paddle_40,
+                                     pad_value=0.0,
+                                     mode='reflect',
+                                     paddings=[1, 1, 1, 1],
+                                     name='x2paddle_41')
+    x2paddle_42 = fluid.layers.conv2d(x2paddle_41,
+                                      num_filters=64,
+                                      filter_size=[3, 3],
+                                      stride=[1, 1],
+                                      padding=[0, 0],
+                                      dilation=[1, 1],
+                                      groups=1,
+                                      param_attr='x2paddle_13',
+                                      name='x2paddle_42',
+                                      bias_attr='x2paddle_14')
+    x2paddle_43 = fluid.layers.relu(x2paddle_42, name='x2paddle_43')
+    x2paddle_45 = fluid.layers.resize_nearest(x2paddle_43, name='x2paddle_45', out_shape=[512, 512])
+    x2paddle_46 = fluid.layers.pad2d(x2paddle_45,
+                                     pad_value=0.0,
+                                     mode='reflect',
+                                     paddings=[1, 1, 1, 1],
+                                     name='x2paddle_46')
+    x2paddle_47 = fluid.layers.conv2d(x2paddle_46,
+                                      num_filters=64,
+                                      filter_size=[3, 3],
+                                      stride=[1, 1],
+                                      padding=[0, 0],
+                                      dilation=[1, 1],
+                                      groups=1,
+                                      param_attr='x2paddle_15',
+                                      name='x2paddle_47',
+                                      bias_attr='x2paddle_16')
+    x2paddle_48 = fluid.layers.relu(x2paddle_47, name='x2paddle_48')
+    x2paddle_49 = fluid.layers.pad2d(x2paddle_48,
+                                     pad_value=0.0,
+                                     mode='reflect',
+                                     paddings=[1, 1, 1, 1],
+                                     name='x2paddle_49')
+    x2paddle_50 = fluid.layers.conv2d(x2paddle_49,
+                                      num_filters=3,
+                                      filter_size=[3, 3],
+                                      stride=[1, 1],
+                                      padding=[0, 0],
+                                      dilation=[1, 1],
+                                      groups=1,
+                                      param_attr='x2paddle_17',
+                                      name='x2paddle_50',
+                                      bias_attr='x2paddle_18')
+    return x2paddle_input_1, x2paddle_50
--- a/modules/image/Image_gan/style_transfer/stylepro_artistic/encoder_network.py
+++ b/modules/image/Image_gan/style_transfer/stylepro_artistic/encoder_network.py
+# coding=utf-8
+from paddle.fluid.initializer import Constant
+from paddle.fluid.param_attr import ParamAttr
+import paddle.fluid as fluid
+def encoder_net():
+    x2paddle_0 = fluid.layers.data(dtype='float32', shape=[1, 3, 512, 512], name='x2paddle_0', append_batch_size=False)
+    x2paddle_21 = fluid.layers.conv2d(x2paddle_0,
+                                      num_filters=3,
+                                      filter_size=[1, 1],
+                                      stride=[1, 1],
+                                      padding=[0, 0],
+                                      dilation=[1, 1],
+                                      groups=1,
+                                      param_attr='x2paddle_1',
+                                      name='x2paddle_21',
+                                      bias_attr='x2paddle_2')
+    x2paddle_22 = fluid.layers.pad2d(x2paddle_21,
+                                     pad_value=0.0,
+                                     mode='reflect',
+                                     paddings=[1, 1, 1, 1],
+                                     name='x2paddle_22')
+    x2paddle_23 = fluid.layers.conv2d(x2paddle_22,
+                                      num_filters=64,
+                                      filter_size=[3, 3],
+                                      stride=[1, 1],
+                                      padding=[0, 0],
+                                      dilation=[1, 1],
+                                      groups=1,
+                                      param_attr='x2paddle_3',
+                                      name='x2paddle_23',
+                                      bias_attr='x2paddle_4')
+    x2paddle_24 = fluid.layers.relu(x2paddle_23, name='x2paddle_24')
+    x2paddle_25 = fluid.layers.pad2d(x2paddle_24,
+                                     pad_value=0.0,
+                                     mode='reflect',
+                                     paddings=[1, 1, 1, 1],
+                                     name='x2paddle_25')
+    x2paddle_26 = fluid.layers.conv2d(x2paddle_25,
+                                      num_filters=64,
+                                      filter_size=[3, 3],
+                                      stride=[1, 1],
+                                      padding=[0, 0],
+                                      dilation=[1, 1],
+                                      groups=1,
+                                      param_attr='x2paddle_5',
+                                      name='x2paddle_26',
+                                      bias_attr='x2paddle_6')
+    x2paddle_27 = fluid.layers.relu(x2paddle_26, name='x2paddle_27')
+    x2paddle_28 = fluid.layers.pool2d(x2paddle_27,
+                                      pool_size=[2, 2],
+                                      pool_type='max',
+                                      pool_stride=[2, 2],
+                                      pool_padding=[0, 0],
+                                      ceil_mode=False,
+                                      name='x2paddle_28',
+                                      exclusive=False)
+    x2paddle_29 = fluid.layers.pad2d(x2paddle_28,
+                                     pad_value=0.0,
+                                     mode='reflect',
+                                     paddings=[1, 1, 1, 1],
+                                     name='x2paddle_29')
+    x2paddle_30 = fluid.layers.conv2d(x2paddle_29,
+                                      num_filters=128,
+                                      filter_size=[3, 3],
+                                      stride=[1, 1],
+                                      padding=[0, 0],
+                                      dilation=[1, 1],
+                                      groups=1,
+                                      param_attr='x2paddle_7',
+                                      name='x2paddle_30',
+                                      bias_attr='x2paddle_8')
+    x2paddle_31 = fluid.layers.relu(x2paddle_30, name='x2paddle_31')
+    x2paddle_32 = fluid.layers.pad2d(x2paddle_31,
+                                     pad_value=0.0,
+                                     mode='reflect',
+                                     paddings=[1, 1, 1, 1],
+                                     name='x2paddle_32')
+    x2paddle_33 = fluid.layers.conv2d(x2paddle_32,
+                                      num_filters=128,
+                                      filter_size=[3, 3],
+                                      stride=[1, 1],
+                                      padding=[0, 0],
+                                      dilation=[1, 1],
+                                      groups=1,
+                                      param_attr='x2paddle_9',
+                                      name='x2paddle_33',
+                                      bias_attr='x2paddle_10')
+    x2paddle_34 = fluid.layers.relu(x2paddle_33, name='x2paddle_34')
+    x2paddle_35 = fluid.layers.pool2d(x2paddle_34,
+                                      pool_size=[2, 2],
+                                      pool_type='max',
+                                      pool_stride=[2, 2],
+                                      pool_padding=[0, 0],
+                                      ceil_mode=False,
+                                      name='x2paddle_35',
+                                      exclusive=False)
+    x2paddle_36 = fluid.layers.pad2d(x2paddle_35,
+                                     pad_value=0.0,
+                                     mode='reflect',
+                                     paddings=[1, 1, 1, 1],
+                                     name='x2paddle_36')
+    x2paddle_37 = fluid.layers.conv2d(x2paddle_36,
+                                      num_filters=256,
+                                      filter_size=[3, 3],
+                                      stride=[1, 1],
+                                      padding=[0, 0],
+                                      dilation=[1, 1],
+                                      groups=1,
+                                      param_attr='x2paddle_11',
+                                      name='x2paddle_37',
+                                      bias_attr='x2paddle_12')
+    x2paddle_38 = fluid.layers.relu(x2paddle_37, name='x2paddle_38')
+    x2paddle_39 = fluid.layers.pad2d(x2paddle_38,
+                                     pad_value=0.0,
+                                     mode='reflect',
+                                     paddings=[1, 1, 1, 1],
+                                     name='x2paddle_39')
+    x2paddle_40 = fluid.layers.conv2d(x2paddle_39,
+                                      num_filters=256,
+                                      filter_size=[3, 3],
+                                      stride=[1, 1],
+                                      padding=[0, 0],
+                                      dilation=[1, 1],
+                                      groups=1,
+                                      param_attr='x2paddle_13',
+                                      name='x2paddle_40',
+                                      bias_attr='x2paddle_14')
+    x2paddle_41 = fluid.layers.relu(x2paddle_40, name='x2paddle_41')
+    x2paddle_42 = fluid.layers.pad2d(x2paddle_41,
+                                     pad_value=0.0,
+                                     mode='reflect',
+                                     paddings=[1, 1, 1, 1],
+                                     name='x2paddle_42')
+    x2paddle_43 = fluid.layers.conv2d(x2paddle_42,
+                                      num_filters=256,
+                                      filter_size=[3, 3],
+                                      stride=[1, 1],
+                                      padding=[0, 0],
+                                      dilation=[1, 1],
+                                      groups=1,
+                                      param_attr='x2paddle_15',
+                                      name='x2paddle_43',
+                                      bias_attr='x2paddle_16')
+    x2paddle_44 = fluid.layers.relu(x2paddle_43, name='x2paddle_44')
+    x2paddle_45 = fluid.layers.pad2d(x2paddle_44,
+                                     pad_value=0.0,
+                                     mode='reflect',
+                                     paddings=[1, 1, 1, 1],
+                                     name='x2paddle_45')
+    x2paddle_46 = fluid.layers.conv2d(x2paddle_45,
+                                      num_filters=256,
+                                      filter_size=[3, 3],
+                                      stride=[1, 1],
+                                      padding=[0, 0],
+                                      dilation=[1, 1],
+                                      groups=1,
+                                      param_attr='x2paddle_17',
+                                      name='x2paddle_46',
+                                      bias_attr='x2paddle_18')
+    x2paddle_47 = fluid.layers.relu(x2paddle_46, name='x2paddle_47')
+    x2paddle_48 = fluid.layers.pool2d(x2paddle_47,
+                                      pool_size=[2, 2],
+                                      pool_type='max',
+                                      pool_stride=[2, 2],
+                                      pool_padding=[0, 0],
+                                      ceil_mode=False,
+                                      name='x2paddle_48',
+                                      exclusive=False)
+    x2paddle_49 = fluid.layers.pad2d(x2paddle_48,
+                                     pad_value=0.0,
+                                     mode='reflect',
+                                     paddings=[1, 1, 1, 1],
+                                     name='x2paddle_49')
+    x2paddle_50 = fluid.layers.conv2d(x2paddle_49,
+                                      num_filters=512,
+                                      filter_size=[3, 3],
+                                      stride=[1, 1],
+                                      padding=[0, 0],
+                                      dilation=[1, 1],
+                                      groups=1,
+                                      param_attr='x2paddle_19',
+                                      name='x2paddle_50',
+                                      bias_attr='x2paddle_20')
+    x2paddle_51 = fluid.layers.relu(x2paddle_50, name='x2paddle_51')
+    return x2paddle_0, x2paddle_51
--- a/modules/image/style_transfer/stylepro_artistic/module.py
+++ b/modules/image/style_transfer/stylepro_artistic/module.py
@@ -140,14 +140,13 @@ class StyleProjection(hub.Module):
        encode_program, encode_feeded_var_names, encode_target_vars = fluid.io.load_inference_model(
            dirname=self.pretrained_encoder_net, executor=exe)
-        fluid.io.save_inference_model(
+        fluid.io.save_inference_model(dirname=dirname,
-            dirname=dirname,
+                                      main_program=encode_program,
-            main_program=encode_program,
+                                      executor=exe,
-            executor=exe,
+                                      feeded_var_names=encode_feeded_var_names,
-            feeded_var_names=encode_feeded_var_names,
+                                      target_vars=encode_target_vars,
-            target_vars=encode_target_vars,
+                                      model_filename=model_filename,
-            model_filename=model_filename,
+                                      params_filename=params_filename)
-            params_filename=params_filename)
    def _save_decode_model(self, dirname, model_filename=None, params_filename=None, combined=True):
        if combined:
@@ -159,14 +158,13 @@ class StyleProjection(hub.Module):
        decode_program, decode_feeded_var_names, decode_target_vars = fluid.io.load_inference_model(
            dirname=self.pretrained_decoder_net, executor=exe)
-        fluid.io.save_inference_model(
+        fluid.io.save_inference_model(dirname=dirname,
-            dirname=dirname,
+                                      main_program=decode_program,
-            main_program=decode_program,
+                                      executor=exe,
-            executor=exe,
+                                      feeded_var_names=decode_feeded_var_names,
-            feeded_var_names=decode_feeded_var_names,
+                                      target_vars=decode_target_vars,
-            target_vars=decode_target_vars,
+                                      model_filename=model_filename,
-            model_filename=model_filename,
+                                      params_filename=params_filename)
-            params_filename=params_filename)
    @serving
    def serving_method(self, images, **kwargs):
@@ -186,11 +184,10 @@ class StyleProjection(hub.Module):
        """
        Run as a command.
        """
-        self.parser = argparse.ArgumentParser(
+        self.parser = argparse.ArgumentParser(description="Run the {} module.".format(self.name),
-            description="Run the {} module.".format(self.name),
+                                              prog='hub run {}'.format(self.name),
-            prog='hub run {}'.format(self.name),
+                                              usage='%(prog)s',
-            usage='%(prog)s',
+                                              add_help=True)
-            add_help=True)
        self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
        self.arg_config_group = self.parser.add_argument_group(
@@ -202,20 +199,29 @@ class StyleProjection(hub.Module):
            paths = [{'content': args.content, 'styles': args.styles.split(',')}]
        else:
            paths = [{'content': args.content, 'styles': args.styles.split(','), 'weights': list(args.weights)}]
-        results = self.style_transfer(
+        results = self.style_transfer(paths=paths,
-            paths=paths, alpha=args.alpha, use_gpu=args.use_gpu, output_dir=args.output_dir, visualization=True)
+                                      alpha=args.alpha,
+                                      use_gpu=args.use_gpu,
+                                      output_dir=args.output_dir,
+                                      visualization=True)
        return results
    def add_module_config_arg(self):
        """
        Add the command config options.
        """
-        self.arg_config_group.add_argument(
+        self.arg_config_group.add_argument('--use_gpu',
-            '--use_gpu', type=ast.literal_eval, default=False, help="whether use GPU or not")
+                                           type=ast.literal_eval,
-        self.arg_config_group.add_argument(
+                                           default=False,
-            '--output_dir', type=str, default='transfer_result', help="The directory to save output images.")
+                                           help="whether use GPU or not")
-        self.arg_config_group.add_argument(
+        self.arg_config_group.add_argument('--output_dir',
-            '--visualization', type=ast.literal_eval, default=True, help="whether to save output as images.")
+                                           type=str,
+                                           default='transfer_result',
+                                           help="The directory to save output images.")
+        self.arg_config_group.add_argument('--visualization',
+                                           type=ast.literal_eval,
+                                           default=True,
+                                           help="whether to save output as images.")
    def add_module_input_arg(self):
        """
@@ -223,7 +229,11 @@ class StyleProjection(hub.Module):
        """
        self.arg_input_group.add_argument('--content', type=str, help="path to content.")
        self.arg_input_group.add_argument('--styles', type=str, help="path to styles.")
-        self.arg_input_group.add_argument(
+        self.arg_input_group.add_argument('--weights',
-            '--weights', type=ast.literal_eval, default=None, help="interpolation weights of styles.")
+                                          type=ast.literal_eval,
-        self.arg_config_group.add_argument(
+                                          default=None,
-            '--alpha', type=ast.literal_eval, default=1, help="The parameter to control the tranform degree.")
+                                          help="interpolation weights of styles.")
+        self.arg_config_group.add_argument('--alpha',
+                                           type=ast.literal_eval,
+                                           default=1,
+                                           help="The parameter to control the tranform degree.")
--- a/modules/image/style_transfer/stylepro_artistic/processor.py
+++ b/modules/image/style_transfer/stylepro_artistic/processor.py
--- a/modules/image/classification/README.md
+++ b/modules/image/classification/README.md
+## **更好用户体验，建议参考WEB端官方文档 -> [【图像分类】](https://www.paddlepaddle.org.cn/hublist)**
+### 图像分类
+图像分类是根据图像的语义信息对不同类别图像进行区分，是计算机视觉中重要的基础问题，是物体检测、图像分割、物体跟踪、行为分析、人脸识别等其他高层视觉任务的基础，在许多领域都有着广泛的应用。如：安防领域的人脸识别和智能视频分析等，交通领域的交通场景识别，互联网领域基于内容的图像检索和相册自动归类，医学领域的图像识别等。
+**注：** **如果你是资深开发者，那可以随意按需使用**，**假如你是新手，服务器端优先选择Resnet50，移动端优先选择MobileNetV3**
+- 精选模型推荐
+|            | **模型名称**                                                 | **模型特色**                                       |
+| ---------- | :----------------------------------------------------------- | ---------------------------------------------------------- |
+| 图像分类 | [菜品识别](https://www.paddlepaddle.org.cn/hubdetail?name=resnet50_vd_dishes&en_category=ImageClassification) | 私有数据集训练，支持8416种菜品的分类识别，适合进一步菜品方向微调 |
+| 图像分类 | [动物识别](https://www.paddlepaddle.org.cn/hubdetail?name=resnet50_vd_animals&en_category=ImageClassification) | 私有数据集训练，支持7978种动物的分类识别，适合进一步动物方向微调 |
+| 图像分类 | [野生动物制品识别](https://www.paddlepaddle.org.cn/hubdetail?name=resnet50_vd_wildanimals&en_category=ImageClassification) | 支持'象牙制品', '象牙', '大象', '虎皮', '老虎', '虎牙/虎爪/虎骨', '穿山甲甲片', '穿山甲', '穿山甲爪子', '其他' 这十个标签的识别。 |
+- 更多模型
+| **模型名称** | **模型简介** |
+| - | - |
+| [AlexNet](https://www.paddlepaddle.org.cn/hubdetail?name=alexnet_imagenet&en_category=ImageClassification) | 首次在 CNN 中成功的应用了 ReLU, Dropout 和 LRN，并使用 GPU 进行运算加速 |
+| [VGG19](https://www.paddlepaddle.org.cn/hubdetail?name=vgg19_imagenet&en_category=ImageClassification) | 在 AlexNet 的基础上使用 3*3 小卷积核，增加网络深度，具有很好的泛化能力 |
+| [GoogLeNet](https://github.com/PaddlePaddle/models/tree/release/1.7/PaddleCV/image_classification) | 在不增加计算负载的前提下增加了网络的深度和宽度，性能更加优越 |
+| [ResNet50](https://www.paddlepaddle.org.cn/hubdetail?name=resnet_v2_50_imagenet&en_category=ImageClassification) | Residual Network，引入了新的残差结构，解决了随着网络加深，准确率下降的问题 |
+| [Inceptionv4](https://www.paddlepaddle.org.cn/hubdetail?name=inception_v4_imagenet&en_category=ImageClassification) | 将 Inception 模块与 Residual Connection 进行结合，通过ResNet的结构极大地加速训练并获得性能的提升 |
+| [MobileNetV2](https://www.paddlepaddle.org.cn/hubdetail?name=mobilenet_v2_imagenet&en_category=ImageClassification) | MobileNet结构的微调，直接在 thinner 的 bottleneck层上进行 skip learning 连接以及对 bottleneck layer 不进行 ReLu 非线性处理可取得更好的结果 |
+| [se_resnext50](https://www.paddlepaddle.org.cn/hubdetail?name=se_resnext50_32x4d_imagenet&en_category=ImageClassification) | 在ResNeXt 基础、上加入了 SE(Sequeeze-and-Excitation) 模块，提高了识别准确率，在 ILSVRC 2017 的分类项目中取得了第一名 |
+| [ShuffleNetV2](https://www.paddlepaddle.org.cn/hubdetail?name=shufflenet_v2_imagenet&en_category=ImageClassification) | ECCV2018，轻量级 CNN 网络，在速度和准确度之间做了很好地平衡。在同等复杂度下，比 ShuffleNet 和 MobileNetv2 更准确，更适合移动端以及无人车领域 |
+| [efficientNetb7](https://www.paddlepaddle.org.cn/hubdetail?name=efficientnetb7_imagenet&en_category=ImageClassification) | 同时对模型的分辨率，通道数和深度进行缩放，用极少的参数就可以达到SOTA的精度。 |
+| [xception71](https://www.paddlepaddle.org.cn/hubdetail?name=xception71_imagenet&en_category=ImageClassification) | 对inception-v3的改进，用深度可分离卷积代替普通卷积，降低参数量同时提高了精度。 |
+| [dpn107](https://www.paddlepaddle.org.cn/hubdetail?name=dpn107_imagenet&en_category=ImageClassification) | 融合了densenet和resnext的特点。 |
+| [DarkNet53](https://www.paddlepaddle.org.cn/hubdetail?name=darknet53_imagenet&en_category=ImageClassification) | 检测框架yolov3使用的backbone，在分类和检测任务上都有不错表现。 |
+| [DenseNet161](https://www.paddlepaddle.org.cn/hubdetail?name=densenet161_imagenet&en_category=ImageClassification) | 提出了密集连接的网络结构，更加有利于信息流的传递。 |
+| [ResNeXt152_vd](https://www.paddlepaddle.org.cn/hubdetail?name=resnext152_64x4d_imagenet&en_category=ImageClassification) | 提出了cardinatity的概念，用于作为模型复杂度的另外一个度量，有效地提升模型精度。 |
--- a/modules/image/face_detection/README.md
+++ b/modules/image/face_detection/README.md
+## **更好用户体验，建议参考WEB端官方文档 -> [【人脸检测】](https://www.paddlepaddle.org.cn/hublist)**
+### 人脸检测
+人脸检测属于目标检测的一个重要分支，由于近年来安防市场、人脸识别、人脸安全方面的原因，成为目标检测中最重要的任务之一。
+- 推荐模型
+| 模型名称                                                     | 模型简介                                                     |
+| ------------------------------------------------------------ | ------------------------------------------------------------ |
+ | [人脸检测](https://www.paddlepaddle.org.cn/hubdetail?name=pyramidbox_lite_server&en_category=FaceDetection) | 百度自研，18年3月WIDER Face 数据集**冠军模型**，           |
+| [超轻量人脸检测](https://www.paddlepaddle.org.cn/hubdetail?name=ultra_light_fast_generic_face_detector_1mb_640&en_category=FaceDetection) | 针对边缘计算设备或低算力设备(如用ARM推理)设计的实时超轻量级通用人脸检测模型，可以在低算力设备中如用ARM进行实时的通用场景的人脸检测推理。 |
+| [口罩人脸检测与识别](https://www.paddlepaddle.org.cn/hubdetail?name=pyramidbox_lite_server_mask&en_category=FaceDetection) | 业界**首个开源口罩人脸检测与识别模型**，引起广泛关注。     |
--- a/modules/image/gan/README.md
+++ b/modules/image/gan/README.md
--- a/modules/image/keypoint_detection/README.md
+++ b/modules/image/keypoint_detection/README.md
+## **更好用户体验，建议参考WEB端官方文档 -> [【关键点检测】](https://www.paddlepaddle.org.cn/hublist)**
+#### 关键点检测
+人体骨骼关键点检测 (Pose Estimation) 主要检测人体的一些关键点，如关节，五官等，通过关键点描述人体骨骼信息。人体骨骼关键点检测对于描述人体姿态，预测人体行为至关重要。是诸多计算机视觉任务的基础，例如动作分类，异常行为检测，以及自动驾驶等等。
+- 精选模型
+| 模型名称                                                     | 模型简介                                                     |
+| ------------------------------------------------------------ | ------------------------------------------------------------ |
+| [单人--人体骨骼关键点检测](https://www.paddlepaddle.org.cn/hubdetail?name=human_pose_estimation_resnet50_mpii&en_category=KeyPointDetection) | 可用于行为识别、人物跟踪、步态识别等相关领域。具体应用主要集中在智能视频监控，病人监护系统，人机交互，虚拟现实，人体动画，智能家居，智能安防，运动员辅助训练等等。  |
+| [多人-人体骨骼关键点检测](https://www.paddlepaddle.org.cn/hubdetail?name=openpose_body_estimation&en_category=KeyPointDetection) | 可用于行为识别、人物跟踪、步态识别等相关领域。具体应用主要集中在智能视频监控，病人监护系统，人机交互，虚拟现实，人体动画，智能家居，智能安防，运动员辅助训练等等。  |
+| [面部关键点检测](https://www.paddlepaddle.org.cn/hubdetail?name=face_landmark_localization&en_category=KeyPointDetection) |可用于人脸识别、表情分析、三维人脸重建及三维动画等其它人脸相关问题，支持同一张图中的多个人脸检测  |
+| [手部关键点检测](https://www.paddlepaddle.org.cn/hubdetail?name=hand_pose_localization&en_category=KeyPointDetection) |可用于手势识别，配合人体骨骼关键点，可用于异常行为检测等多种场景  |
--- a/modules/image/object_detection/README.md
+++ b/modules/image/object_detection/README.md
+## **更好用户体验，建议参考WEB端官方文档 -> [【目标检测】](https://www.paddlepaddle.org.cn/hublist)**
+### 目标检测
+目标检测任务的目标是给定一张图像或是一个视频帧，让计算机找出其中所有目标的位置，并给出每个目标的具体类别。对于计算机而言，能够“看到”的是图像被编码之后的数字，但很难解图像或是视频帧中出现了人或是物体这样的高层语义概念，也就更加难以定位目标出现在图像中哪个区域。
+- 精选推荐模型
+| 模型名称                                                     | 模型简介                                                     |
+| ------------------------------------------------------------ | ------------------------------------------------------------ |
+ | [YOLOv3](https://www.paddlepaddle.org.cn/hubdetail?name=yolov3_darknet53_coco2017&en_category=ObjectDetection) | 实现精度相比原作者**提高5.9 个绝对百分点**，性能极致优化。 |
+ | [行人检测](https://www.paddlepaddle.org.cn/hubdetail?name=yolov3_darknet53_pedestrian&en_category=ObjectDetection) | 百度自研模型，海量私有数据集训练，可以应用于智能视频监控，人体行为分析，客流统计系统，智能交通等领域 |
+ | [车辆检测](https://www.paddlepaddle.org.cn/hubdetail?name=yolov3_darknet53_vehicles&en_category=ObjectDetection) | 百度自研模型，支持car (汽车)，truck (卡车)，bus (公交车)，motorbike (摩托车)，tricycle (三轮车)等车型的识别 |
--- a/modules/image/semantic_segmentation/README.md
+++ b/modules/image/semantic_segmentation/README.md
+## **更好用户体验，建议参考WEB端官方文档 -> [【图像分割】](https://www.paddlepaddle.org.cn/hublist)**
+### 图像分割
+图像语义分割顾名思义是将图像像素按照表达的语义含义的不同进行分组/分割，图像语义是指对图像内容的理解，例如，能够描绘出什么物体在哪里做了什么事情等，分割是指对图片中的每个像素点进行标注，标注属于哪一类别。近年来用在无人车驾驶技术中分割街景来避让行人和车辆、医疗影像分析中辅助诊断等。
+- 精选模型
+| 模型名称                                                     | 模型简介                                                     |
+| ------------------------------------------------------------ | ------------------------------------------------------------ |
+| [人像分割](https://www.paddlepaddle.org.cn/hubdetail?name=deeplabv3p_xception65_humanseg&en_category=ImageSegmentation) | 百度**自建数据集**训练，人像分割效果卓越。                 |
+| [人体解析](https://www.paddlepaddle.org.cn/hubdetail?name=ace2p&en_category=ImageSegmentation) | CVPR2019 LIP挑战赛中**满贯三冠王**。人体解析任务必选。     |
+| [肺炎CT影像分析](https://www.paddlepaddle.org.cn/hubdetail?name=Pneumonia_CT_LKM_PP&en_category=ImageSegmentation) | 助力连心医疗开源**业界首个**肺炎CT影像分析模型
--- a/modules/image/style_transfer/stylepro_artistic/decoder_network.py
+++ b/modules/image/style_transfer/stylepro_artistic/decoder_network.py
-# coding=utf-8
-from paddle.fluid.initializer import Constant
-from paddle.fluid.param_attr import ParamAttr
-import paddle.fluid as fluid
-def decoder_net():
-    x2paddle_22 = fluid.layers.create_parameter(
-        dtype='float32', shape=[4], name='x2paddle_22', attr='x2paddle_22', default_initializer=Constant(0.0))
-    x2paddle_36 = fluid.layers.create_parameter(
-        dtype='float32', shape=[4], name='x2paddle_36', attr='x2paddle_36', default_initializer=Constant(0.0))
-    x2paddle_44 = fluid.layers.create_parameter(
-        dtype='float32', shape=[4], name='x2paddle_44', attr='x2paddle_44', default_initializer=Constant(0.0))
-    x2paddle_input_1 = fluid.layers.data(
-        dtype='float32', shape=[1, 512, 64, 64], name='x2paddle_input_1', append_batch_size=False)
-    x2paddle_19 = fluid.layers.pad2d(
-        x2paddle_input_1, pad_value=0.0, mode='reflect', paddings=[1, 1, 1, 1], name='x2paddle_19')
-    x2paddle_20 = fluid.layers.conv2d(
-        x2paddle_19,
-        num_filters=256,
-        filter_size=[3, 3],
-        stride=[1, 1],
-        padding=[0, 0],
-        dilation=[1, 1],
-        groups=1,
-        param_attr='x2paddle_1',
-        name='x2paddle_20',
-        bias_attr='x2paddle_2')
-    x2paddle_21 = fluid.layers.relu(x2paddle_20, name='x2paddle_21')
-    x2paddle_23 = fluid.layers.resize_nearest(x2paddle_21, name='x2paddle_23', out_shape=[128, 128])
-    x2paddle_24 = fluid.layers.pad2d(
-        x2paddle_23, pad_value=0.0, mode='reflect', paddings=[1, 1, 1, 1], name='x2paddle_24')
-    x2paddle_25 = fluid.layers.conv2d(
-        x2paddle_24,
-        num_filters=256,
-        filter_size=[3, 3],
-        stride=[1, 1],
-        padding=[0, 0],
-        dilation=[1, 1],
-        groups=1,
-        param_attr='x2paddle_3',
-        name='x2paddle_25',
-        bias_attr='x2paddle_4')
-    x2paddle_26 = fluid.layers.relu(x2paddle_25, name='x2paddle_26')
-    x2paddle_27 = fluid.layers.pad2d(
-        x2paddle_26, pad_value=0.0, mode='reflect', paddings=[1, 1, 1, 1], name='x2paddle_27')
-    x2paddle_28 = fluid.layers.conv2d(
-        x2paddle_27,
-        num_filters=256,
-        filter_size=[3, 3],
-        stride=[1, 1],
-        padding=[0, 0],
-        dilation=[1, 1],
-        groups=1,
-        param_attr='x2paddle_5',
-        name='x2paddle_28',
-        bias_attr='x2paddle_6')
-    x2paddle_29 = fluid.layers.relu(x2paddle_28, name='x2paddle_29')
-    x2paddle_30 = fluid.layers.pad2d(
-        x2paddle_29, pad_value=0.0, mode='reflect', paddings=[1, 1, 1, 1], name='x2paddle_30')
-    x2paddle_31 = fluid.layers.conv2d(
-        x2paddle_30,
-        num_filters=256,
-        filter_size=[3, 3],
-        stride=[1, 1],
-        padding=[0, 0],
-        dilation=[1, 1],
-        groups=1,
-        param_attr='x2paddle_7',
-        name='x2paddle_31',
-        bias_attr='x2paddle_8')
-    x2paddle_32 = fluid.layers.relu(x2paddle_31, name='x2paddle_32')
-    x2paddle_33 = fluid.layers.pad2d(
-        x2paddle_32, pad_value=0.0, mode='reflect', paddings=[1, 1, 1, 1], name='x2paddle_33')
-    x2paddle_34 = fluid.layers.conv2d(
-        x2paddle_33,
-        num_filters=128,
-        filter_size=[3, 3],
-        stride=[1, 1],
-        padding=[0, 0],
-        dilation=[1, 1],
-        groups=1,
-        param_attr='x2paddle_9',
-        name='x2paddle_34',
-        bias_attr='x2paddle_10')
-    x2paddle_35 = fluid.layers.relu(x2paddle_34, name='x2paddle_35')
-    x2paddle_37 = fluid.layers.resize_nearest(x2paddle_35, name='x2paddle_37', out_shape=[256, 256])
-    x2paddle_38 = fluid.layers.pad2d(
-        x2paddle_37, pad_value=0.0, mode='reflect', paddings=[1, 1, 1, 1], name='x2paddle_38')
-    x2paddle_39 = fluid.layers.conv2d(
-        x2paddle_38,
-        num_filters=128,
-        filter_size=[3, 3],
-        stride=[1, 1],
-        padding=[0, 0],
-        dilation=[1, 1],
-        groups=1,
-        param_attr='x2paddle_11',
-        name='x2paddle_39',
-        bias_attr='x2paddle_12')
-    x2paddle_40 = fluid.layers.relu(x2paddle_39, name='x2paddle_40')
-    x2paddle_41 = fluid.layers.pad2d(
-        x2paddle_40, pad_value=0.0, mode='reflect', paddings=[1, 1, 1, 1], name='x2paddle_41')
-    x2paddle_42 = fluid.layers.conv2d(
-        x2paddle_41,
-        num_filters=64,
-        filter_size=[3, 3],
-        stride=[1, 1],
-        padding=[0, 0],
-        dilation=[1, 1],
-        groups=1,
-        param_attr='x2paddle_13',
-        name='x2paddle_42',
-        bias_attr='x2paddle_14')
-    x2paddle_43 = fluid.layers.relu(x2paddle_42, name='x2paddle_43')
-    x2paddle_45 = fluid.layers.resize_nearest(x2paddle_43, name='x2paddle_45', out_shape=[512, 512])
-    x2paddle_46 = fluid.layers.pad2d(
-        x2paddle_45, pad_value=0.0, mode='reflect', paddings=[1, 1, 1, 1], name='x2paddle_46')
-    x2paddle_47 = fluid.layers.conv2d(
-        x2paddle_46,
-        num_filters=64,
-        filter_size=[3, 3],
-        stride=[1, 1],
-        padding=[0, 0],
-        dilation=[1, 1],
-        groups=1,
-        param_attr='x2paddle_15',
-        name='x2paddle_47',
-        bias_attr='x2paddle_16')
-    x2paddle_48 = fluid.layers.relu(x2paddle_47, name='x2paddle_48')
-    x2paddle_49 = fluid.layers.pad2d(
-        x2paddle_48, pad_value=0.0, mode='reflect', paddings=[1, 1, 1, 1], name='x2paddle_49')
-    x2paddle_50 = fluid.layers.conv2d(
-        x2paddle_49,
-        num_filters=3,
-        filter_size=[3, 3],
-        stride=[1, 1],
-        padding=[0, 0],
-        dilation=[1, 1],
-        groups=1,
-        param_attr='x2paddle_17',
-        name='x2paddle_50',
-        bias_attr='x2paddle_18')
-    return x2paddle_input_1, x2paddle_50
--- a/modules/image/style_transfer/stylepro_artistic/encoder_network.py
+++ b/modules/image/style_transfer/stylepro_artistic/encoder_network.py
-# coding=utf-8
-from paddle.fluid.initializer import Constant
-from paddle.fluid.param_attr import ParamAttr
-import paddle.fluid as fluid
-def encoder_net():
-    x2paddle_0 = fluid.layers.data(dtype='float32', shape=[1, 3, 512, 512], name='x2paddle_0', append_batch_size=False)
-    x2paddle_21 = fluid.layers.conv2d(
-        x2paddle_0,
-        num_filters=3,
-        filter_size=[1, 1],
-        stride=[1, 1],
-        padding=[0, 0],
-        dilation=[1, 1],
-        groups=1,
-        param_attr='x2paddle_1',
-        name='x2paddle_21',
-        bias_attr='x2paddle_2')
-    x2paddle_22 = fluid.layers.pad2d(
-        x2paddle_21, pad_value=0.0, mode='reflect', paddings=[1, 1, 1, 1], name='x2paddle_22')
-    x2paddle_23 = fluid.layers.conv2d(
-        x2paddle_22,
-        num_filters=64,
-        filter_size=[3, 3],
-        stride=[1, 1],
-        padding=[0, 0],
-        dilation=[1, 1],
-        groups=1,
-        param_attr='x2paddle_3',
-        name='x2paddle_23',
-        bias_attr='x2paddle_4')
-    x2paddle_24 = fluid.layers.relu(x2paddle_23, name='x2paddle_24')
-    x2paddle_25 = fluid.layers.pad2d(
-        x2paddle_24, pad_value=0.0, mode='reflect', paddings=[1, 1, 1, 1], name='x2paddle_25')
-    x2paddle_26 = fluid.layers.conv2d(
-        x2paddle_25,
-        num_filters=64,
-        filter_size=[3, 3],
-        stride=[1, 1],
-        padding=[0, 0],
-        dilation=[1, 1],
-        groups=1,
-        param_attr='x2paddle_5',
-        name='x2paddle_26',
-        bias_attr='x2paddle_6')
-    x2paddle_27 = fluid.layers.relu(x2paddle_26, name='x2paddle_27')
-    x2paddle_28 = fluid.layers.pool2d(
-        x2paddle_27,
-        pool_size=[2, 2],
-        pool_type='max',
-        pool_stride=[2, 2],
-        pool_padding=[0, 0],
-        ceil_mode=False,
-        name='x2paddle_28',
-        exclusive=False)
-    x2paddle_29 = fluid.layers.pad2d(
-        x2paddle_28, pad_value=0.0, mode='reflect', paddings=[1, 1, 1, 1], name='x2paddle_29')
-    x2paddle_30 = fluid.layers.conv2d(
-        x2paddle_29,
-        num_filters=128,
-        filter_size=[3, 3],
-        stride=[1, 1],
-        padding=[0, 0],
-        dilation=[1, 1],
-        groups=1,
-        param_attr='x2paddle_7',
-        name='x2paddle_30',
-        bias_attr='x2paddle_8')
-    x2paddle_31 = fluid.layers.relu(x2paddle_30, name='x2paddle_31')
-    x2paddle_32 = fluid.layers.pad2d(
-        x2paddle_31, pad_value=0.0, mode='reflect', paddings=[1, 1, 1, 1], name='x2paddle_32')
-    x2paddle_33 = fluid.layers.conv2d(
-        x2paddle_32,
-        num_filters=128,
-        filter_size=[3, 3],
-        stride=[1, 1],
-        padding=[0, 0],
-        dilation=[1, 1],
-        groups=1,
-        param_attr='x2paddle_9',
-        name='x2paddle_33',
-        bias_attr='x2paddle_10')
-    x2paddle_34 = fluid.layers.relu(x2paddle_33, name='x2paddle_34')
-    x2paddle_35 = fluid.layers.pool2d(
-        x2paddle_34,
-        pool_size=[2, 2],
-        pool_type='max',
-        pool_stride=[2, 2],
-        pool_padding=[0, 0],
-        ceil_mode=False,
-        name='x2paddle_35',
-        exclusive=False)
-    x2paddle_36 = fluid.layers.pad2d(
-        x2paddle_35, pad_value=0.0, mode='reflect', paddings=[1, 1, 1, 1], name='x2paddle_36')
-    x2paddle_37 = fluid.layers.conv2d(
-        x2paddle_36,
-        num_filters=256,
-        filter_size=[3, 3],
-        stride=[1, 1],
-        padding=[0, 0],
-        dilation=[1, 1],
-        groups=1,
-        param_attr='x2paddle_11',
-        name='x2paddle_37',
-        bias_attr='x2paddle_12')
-    x2paddle_38 = fluid.layers.relu(x2paddle_37, name='x2paddle_38')
-    x2paddle_39 = fluid.layers.pad2d(
-        x2paddle_38, pad_value=0.0, mode='reflect', paddings=[1, 1, 1, 1], name='x2paddle_39')
-    x2paddle_40 = fluid.layers.conv2d(
-        x2paddle_39,
-        num_filters=256,
-        filter_size=[3, 3],
-        stride=[1, 1],
-        padding=[0, 0],
-        dilation=[1, 1],
-        groups=1,
-        param_attr='x2paddle_13',
-        name='x2paddle_40',
-        bias_attr='x2paddle_14')
-    x2paddle_41 = fluid.layers.relu(x2paddle_40, name='x2paddle_41')
-    x2paddle_42 = fluid.layers.pad2d(
-        x2paddle_41, pad_value=0.0, mode='reflect', paddings=[1, 1, 1, 1], name='x2paddle_42')
-    x2paddle_43 = fluid.layers.conv2d(
-        x2paddle_42,
-        num_filters=256,
-        filter_size=[3, 3],
-        stride=[1, 1],
-        padding=[0, 0],
-        dilation=[1, 1],
-        groups=1,
-        param_attr='x2paddle_15',
-        name='x2paddle_43',
-        bias_attr='x2paddle_16')
-    x2paddle_44 = fluid.layers.relu(x2paddle_43, name='x2paddle_44')
-    x2paddle_45 = fluid.layers.pad2d(
-        x2paddle_44, pad_value=0.0, mode='reflect', paddings=[1, 1, 1, 1], name='x2paddle_45')
-    x2paddle_46 = fluid.layers.conv2d(
-        x2paddle_45,
-        num_filters=256,
-        filter_size=[3, 3],
-        stride=[1, 1],
-        padding=[0, 0],
-        dilation=[1, 1],
-        groups=1,
-        param_attr='x2paddle_17',
-        name='x2paddle_46',
-        bias_attr='x2paddle_18')
-    x2paddle_47 = fluid.layers.relu(x2paddle_46, name='x2paddle_47')
-    x2paddle_48 = fluid.layers.pool2d(
-        x2paddle_47,
-        pool_size=[2, 2],
-        pool_type='max',
-        pool_stride=[2, 2],
-        pool_padding=[0, 0],
-        ceil_mode=False,
-        name='x2paddle_48',
-        exclusive=False)
-    x2paddle_49 = fluid.layers.pad2d(
-        x2paddle_48, pad_value=0.0, mode='reflect', paddings=[1, 1, 1, 1], name='x2paddle_49')
-    x2paddle_50 = fluid.layers.conv2d(
-        x2paddle_49,
-        num_filters=512,
-        filter_size=[3, 3],
-        stride=[1, 1],
-        padding=[0, 0],
-        dilation=[1, 1],
-        groups=1,
-        param_attr='x2paddle_19',
-        name='x2paddle_50',
-        bias_attr='x2paddle_20')
-    x2paddle_51 = fluid.layers.relu(x2paddle_50, name='x2paddle_51')
-    return x2paddle_0, x2paddle_51
--- a/modules/image/text_recognition/README.md
+++ b/modules/image/text_recognition/README.md
+## **更好用户体验，建议参考WEB端官方文档 -> [【文字识别】](https://www.paddlepaddle.org.cn/hublist)**
+### 文字识别
+文字识别（OCR）是计算机视觉重要任务之一，主要用于图像中文本信息的提取，具有重要的产业实践意义。
+- 推荐模型
+| 模型名称                                                     | 模型简介                                                     |
+| ------------------------------------------------------------ | ------------------------------------------------------------ |
+| [超轻量-中英文OCR文字识别](https://www.paddlepaddle.org.cn/hubdetail?name=chinese_ocr_db_crnn_mobile&en_category=TextRecognition) | 业界开源最小，8.1M超轻量中英文识别模型。支持中英文识别；支持倾斜、竖排等多种方向文字识别，**强力推荐** |
+| [高精度-中英文OCR文字识别](https://www.paddlepaddle.org.cn/hubdetail?name=chinese_ocr_db_crnn_mobile&en_category=TextRecognition) | 业界开源效果最好，155M高精度中英文识别模型。支持中英文识别；支持倾斜、竖排等多种方向文字识别，**强力推荐** |
+| [德语-超轻量OCR文字识别](https://www.paddlepaddle.org.cn/hubdetail?name=german_ocr_db_crnn_mobile&en_category=TextRecognition) | 德语OCR识别，超轻量|
+| [法语-超轻量OCR文字识别](https://www.paddlepaddle.org.cn/hubdetail?name=french_ocr_db_crnn_mobile&en_category=TextRecognition) | 法语OCR识别，超轻量|
+| [日语-超轻量OCR文字识别](https://www.paddlepaddle.org.cn/hubdetail?name=japan_ocr_db_crnn_mobile&en_category=TextRecognition) | 日语OCR识别，超轻量|
+| [韩语-超轻量OCR文字识别](https://www.paddlepaddle.org.cn/hubdetail?name=korean_ocr_db_crnn_mobile&en_category=TextRecognition) | 韩语OCR识别，超轻量|
--- a/modules/text/language_model/README.md
+++ b/modules/text/language_model/README.md
+## **更好用户体验，建议参考WEB端官方文档 -> [【语言模型】](https://www.paddlepaddle.org.cn/hublist)**
+### 语言模型
+- 推荐模型
+| 模型名称                                                     | 模型简介                                                     |
+| ------------------------------------------------------------ | ------------------------------------------------------------ |
+| [词嵌入模型](https://www.paddlepaddle.org.cn/hubdetail?name=word2vec_skipgram&en_category=SemanticModel) |在海量百度搜索数据集下预训练得到中文单词预训练词嵌入。其支持Fine-tune。Word2vec的预训练数据集的词汇表大小为1700249，word embedding维度为128。 |
+| [文本相似度](https://www.paddlepaddle.org.cn/hubdetail?name=simnet_bow&en_category=SemanticModel) |根据用户输入的两个文本，计算出文本相似度得分。 |
+| [ERNIE](https://www.paddlepaddle.org.cn/hubdetail?name=ERNIE&en_category=SemanticModel) |基于百科类、资讯类、论坛对话类数据等中文语料自研模型，其可用于文本分类、序列标注、阅读理解等任务。
+.
--- a/modules/text/semantic_model/bert_cased_L_12_H_768_A_12/README.md
+++ b/modules/text/semantic_model/bert_cased_L_12_H_768_A_12/README.md
--- a/modules/text/semantic_model/bert_cased_L_12_H_768_A_12/__init__.py
+++ b/modules/text/semantic_model/bert_cased_L_12_H_768_A_12/__init__.py
--- a/modules/text/semantic_model/bert_cased_L_12_H_768_A_12/model/__init__.py
+++ b/modules/text/semantic_model/bert_cased_L_12_H_768_A_12/model/__init__.py
--- a/modules/text/semantic_model/bert_cased_L_12_H_768_A_12/model/bert.py
+++ b/modules/text/semantic_model/bert_cased_L_12_H_768_A_12/model/bert.py
@@ -74,23 +74,23 @@ class BertModel(object):
    def _build_model(self, src_ids, position_ids, sentence_ids, input_mask):
        # padding id in vocabulary must be set to 0
-        emb_out = fluid.layers.embedding(
+        emb_out = fluid.layers.embedding(input=src_ids,
-            input=src_ids,
+                                         size=[self._voc_size, self._emb_size],
-            size=[self._voc_size, self._emb_size],
+                                         dtype=self._dtype,
-            dtype=self._dtype,
+                                         param_attr=fluid.ParamAttr(name=self._word_emb_name,
-            param_attr=fluid.ParamAttr(name=self._word_emb_name, initializer=self._param_initializer),
+                                                                    initializer=self._param_initializer),
-            is_sparse=False)
+                                         is_sparse=False)
-        position_emb_out = fluid.layers.embedding(
+        position_emb_out = fluid.layers.embedding(input=position_ids,
-            input=position_ids,
+                                                  size=[self._max_position_seq_len, self._emb_size],
-            size=[self._max_position_seq_len, self._emb_size],
+                                                  dtype=self._dtype,
-            dtype=self._dtype,
+                                                  param_attr=fluid.ParamAttr(name=self._pos_emb_name,
-            param_attr=fluid.ParamAttr(name=self._pos_emb_name, initializer=self._param_initializer))
+                                                                             initializer=self._param_initializer))
-        sent_emb_out = fluid.layers.embedding(
+        sent_emb_out = fluid.layers.embedding(sentence_ids,
-            sentence_ids,
+                                              size=[self._sent_types, self._emb_size],
-            size=[self._sent_types, self._emb_size],
+                                              dtype=self._dtype,
-            dtype=self._dtype,
+                                              param_attr=fluid.ParamAttr(name=self._sent_emb_name,
-            param_attr=fluid.ParamAttr(name=self._sent_emb_name, initializer=self._param_initializer))
+                                                                         initializer=self._param_initializer))
        emb_out = emb_out + position_emb_out
        emb_out = emb_out + sent_emb_out
@@ -105,23 +105,22 @@ class BertModel(object):
        n_head_self_attn_mask = fluid.layers.stack(x=[self_attn_mask] * self._n_head, axis=1)
        n_head_self_attn_mask.stop_gradient = True
-        self._enc_out = encoder(
+        self._enc_out = encoder(enc_input=emb_out,
-            enc_input=emb_out,
+                                attn_bias=n_head_self_attn_mask,
-            attn_bias=n_head_self_attn_mask,
+                                n_layer=self._n_layer,
-            n_layer=self._n_layer,
+                                n_head=self._n_head,
-            n_head=self._n_head,
+                                d_key=self._emb_size // self._n_head,
-            d_key=self._emb_size // self._n_head,
+                                d_value=self._emb_size // self._n_head,
-            d_value=self._emb_size // self._n_head,
+                                d_model=self._emb_size,
-            d_model=self._emb_size,
+                                d_inner_hid=self._emb_size * 4,
-            d_inner_hid=self._emb_size * 4,
+                                prepostprocess_dropout=self._prepostprocess_dropout,
-            prepostprocess_dropout=self._prepostprocess_dropout,
+                                attention_dropout=self._attention_dropout,
-            attention_dropout=self._attention_dropout,
+                                relu_dropout=0,
-            relu_dropout=0,
+                                hidden_act=self._hidden_act,
-            hidden_act=self._hidden_act,
+                                preprocess_cmd="",
-            preprocess_cmd="",
+                                postprocess_cmd="dan",
-            postprocess_cmd="dan",
+                                param_initializer=self._param_initializer,
-            param_initializer=self._param_initializer,
+                                name='encoder')
-            name='encoder')
    def get_sequence_output(self):
        return self._enc_out
@@ -130,12 +129,12 @@ class BertModel(object):
        """Get the first feature of each sequence for classification"""
        next_sent_feat = fluid.layers.slice(input=self._enc_out, axes=[1], starts=[0], ends=[1])
-        next_sent_feat = fluid.layers.fc(
+        next_sent_feat = fluid.layers.fc(input=next_sent_feat,
-            input=next_sent_feat,
+                                         size=self._emb_size,
-            size=self._emb_size,
+                                         act="tanh",
-            act="tanh",
+                                         param_attr=fluid.ParamAttr(name="pooled_fc.w_0",
-            param_attr=fluid.ParamAttr(name="pooled_fc.w_0", initializer=self._param_initializer),
+                                                                    initializer=self._param_initializer),
-            bias_attr="pooled_fc.b_0")
+                                         bias_attr="pooled_fc.b_0")
        return next_sent_feat
    def get_pretraining_output(self, mask_label, mask_pos, labels):
@@ -150,43 +149,45 @@ class BertModel(object):
        mask_feat = fluid.layers.gather(input=reshaped_emb_out, index=mask_pos)
        # transform: fc
-        mask_trans_feat = fluid.layers.fc(
+        mask_trans_feat = fluid.layers.fc(input=mask_feat,
-            input=mask_feat,
+                                          size=self._emb_size,
-            size=self._emb_size,
+                                          act=self._hidden_act,
-            act=self._hidden_act,
+                                          param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0',
-            param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0', initializer=self._param_initializer),
+                                                                     initializer=self._param_initializer),
-            bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
+                                          bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
        # transform: layer norm
        mask_trans_feat = pre_process_layer(mask_trans_feat, 'n', name='mask_lm_trans')
-        mask_lm_out_bias_attr = fluid.ParamAttr(
+        mask_lm_out_bias_attr = fluid.ParamAttr(name="mask_lm_out_fc.b_0",
-            name="mask_lm_out_fc.b_0", initializer=fluid.initializer.Constant(value=0.0))
+                                                initializer=fluid.initializer.Constant(value=0.0))
        if self._weight_sharing:
-            fc_out = fluid.layers.matmul(
+            fc_out = fluid.layers.matmul(x=mask_trans_feat,
-                x=mask_trans_feat,
+                                         y=fluid.default_main_program().global_block().var(self._word_emb_name),
-                y=fluid.default_main_program().global_block().var(self._word_emb_name),
+                                         transpose_y=True)
-                transpose_y=True)
+            fc_out += fluid.layers.create_parameter(shape=[self._voc_size],
-            fc_out += fluid.layers.create_parameter(
+                                                    dtype=self._dtype,
-                shape=[self._voc_size], dtype=self._dtype, attr=mask_lm_out_bias_attr, is_bias=True)
+                                                    attr=mask_lm_out_bias_attr,
+                                                    is_bias=True)
        else:
-            fc_out = fluid.layers.fc(
+            fc_out = fluid.layers.fc(input=mask_trans_feat,
-                input=mask_trans_feat,
+                                     size=self._voc_size,
-                size=self._voc_size,
+                                     param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0",
-                param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0", initializer=self._param_initializer),
+                                                                initializer=self._param_initializer),
-                bias_attr=mask_lm_out_bias_attr)
+                                     bias_attr=mask_lm_out_bias_attr)
        mask_lm_loss = fluid.layers.softmax_with_cross_entropy(logits=fc_out, label=mask_label)
        mean_mask_lm_loss = fluid.layers.mean(mask_lm_loss)
-        next_sent_fc_out = fluid.layers.fc(
+        next_sent_fc_out = fluid.layers.fc(input=next_sent_feat,
-            input=next_sent_feat,
+                                           size=2,
-            size=2,
+                                           param_attr=fluid.ParamAttr(name="next_sent_fc.w_0",
-            param_attr=fluid.ParamAttr(name="next_sent_fc.w_0", initializer=self._param_initializer),
+                                                                      initializer=self._param_initializer),
-            bias_attr="next_sent_fc.b_0")
+                                           bias_attr="next_sent_fc.b_0")
-        next_sent_loss, next_sent_softmax = fluid.layers.softmax_with_cross_entropy(
+        next_sent_loss, next_sent_softmax = fluid.layers.softmax_with_cross_entropy(logits=next_sent_fc_out,
-            logits=next_sent_fc_out, label=labels, return_softmax=True)
+                                                                                    label=labels,
+                                                                                    return_softmax=True)
        next_sent_acc = fluid.layers.accuracy(input=next_sent_softmax, label=labels)

--- a/modules/text/semantic_model/bert_chinese_L_12_H_768_A_12/model/transformer_encoder.py
+++ b/modules/text/semantic_model/bert_chinese_L_12_H_768_A_12/model/transformer_encoder.py
@@ -50,24 +50,21 @@ def multi_head_attention(queries,
        """
        Add linear projection to queries, keys, and values.
        """
-        q = layers.fc(
+        q = layers.fc(input=queries,
-            input=queries,
+                      size=d_key * n_head,
-            size=d_key * n_head,
+                      num_flatten_dims=2,
-            num_flatten_dims=2,
+                      param_attr=fluid.ParamAttr(name=name + '_query_fc.w_0', initializer=param_initializer),
-            param_attr=fluid.ParamAttr(name=name + '_query_fc.w_0', initializer=param_initializer),
+                      bias_attr=name + '_query_fc.b_0')
-            bias_attr=name + '_query_fc.b_0')
+        k = layers.fc(input=keys,
-        k = layers.fc(
+                      size=d_key * n_head,
-            input=keys,
+                      num_flatten_dims=2,
-            size=d_key * n_head,
+                      param_attr=fluid.ParamAttr(name=name + '_key_fc.w_0', initializer=param_initializer),
-            num_flatten_dims=2,
+                      bias_attr=name + '_key_fc.b_0')
-            param_attr=fluid.ParamAttr(name=name + '_key_fc.w_0', initializer=param_initializer),
+        v = layers.fc(input=values,
-            bias_attr=name + '_key_fc.b_0')
+                      size=d_value * n_head,
-        v = layers.fc(
+                      num_flatten_dims=2,
-            input=values,
+                      param_attr=fluid.ParamAttr(name=name + '_value_fc.w_0', initializer=param_initializer),
-            size=d_value * n_head,
+                      bias_attr=name + '_value_fc.b_0')
-            num_flatten_dims=2,
-            param_attr=fluid.ParamAttr(name=name + '_value_fc.w_0', initializer=param_initializer),
-            bias_attr=name + '_value_fc.b_0')
        return q, k, v
    def __split_heads(x, n_head):
@@ -110,8 +107,10 @@ def multi_head_attention(queries,
            product += attn_bias
        weights = layers.softmax(product)
        if dropout_rate:
-            weights = layers.dropout(
+            weights = layers.dropout(weights,
-                weights, dropout_prob=dropout_rate, dropout_implementation="upscale_in_train", is_test=False)
+                                     dropout_prob=dropout_rate,
+                                     dropout_implementation="upscale_in_train",
+                                     is_test=False)
        out = layers.matmul(weights, v)
        return out
@@ -133,12 +132,11 @@ def multi_head_attention(queries,
    out = __combine_heads(ctx_multiheads)
    # Project back to the model size.
-    proj_out = layers.fc(
+    proj_out = layers.fc(input=out,
-        input=out,
+                         size=d_model,
-        size=d_model,
+                         num_flatten_dims=2,
-        num_flatten_dims=2,
+                         param_attr=fluid.ParamAttr(name=name + '_output_fc.w_0', initializer=param_initializer),
-        param_attr=fluid.ParamAttr(name=name + '_output_fc.w_0', initializer=param_initializer),
+                         bias_attr=name + '_output_fc.b_0')
-        bias_attr=name + '_output_fc.b_0')
    return proj_out
@@ -148,22 +146,22 @@ def positionwise_feed_forward(x, d_inner_hid, d_hid, dropout_rate, hidden_act, p
    This module consists of two linear transformations with a ReLU activation
    in between, which is applied to each position separately and identically.
    """
-    hidden = layers.fc(
+    hidden = layers.fc(input=x,
-        input=x,
+                       size=d_inner_hid,
-        size=d_inner_hid,
+                       num_flatten_dims=2,
-        num_flatten_dims=2,
+                       act=hidden_act,
-        act=hidden_act,
+                       param_attr=fluid.ParamAttr(name=name + '_fc_0.w_0', initializer=param_initializer),
-        param_attr=fluid.ParamAttr(name=name + '_fc_0.w_0', initializer=param_initializer),
+                       bias_attr=name + '_fc_0.b_0')
-        bias_attr=name + '_fc_0.b_0')
    if dropout_rate:
-        hidden = layers.dropout(
+        hidden = layers.dropout(hidden,
-            hidden, dropout_prob=dropout_rate, dropout_implementation="upscale_in_train", is_test=False)
+                                dropout_prob=dropout_rate,
-    out = layers.fc(
+                                dropout_implementation="upscale_in_train",
-        input=hidden,
+                                is_test=False)
-        size=d_hid,
+    out = layers.fc(input=hidden,
-        num_flatten_dims=2,
+                    size=d_hid,
-        param_attr=fluid.ParamAttr(name=name + '_fc_1.w_0', initializer=param_initializer),
+                    num_flatten_dims=2,
-        bias_attr=name + '_fc_1.b_0')
+                    param_attr=fluid.ParamAttr(name=name + '_fc_1.w_0', initializer=param_initializer),
+                    bias_attr=name + '_fc_1.b_0')
    return out
@@ -181,17 +179,20 @@ def pre_post_process_layer(prev_out, out, process_cmd, dropout_rate=0., name='')
            out_dtype = out.dtype
            if out_dtype == fluid.core.VarDesc.VarType.FP16:
                out = layers.cast(x=out, dtype="float32")
-            out = layers.layer_norm(
+            out = layers.layer_norm(out,
-                out,
+                                    begin_norm_axis=len(out.shape) - 1,
-                begin_norm_axis=len(out.shape) - 1,
+                                    param_attr=fluid.ParamAttr(name=name + '_layer_norm_scale',
-                param_attr=fluid.ParamAttr(name=name + '_layer_norm_scale', initializer=fluid.initializer.Constant(1.)),
+                                                               initializer=fluid.initializer.Constant(1.)),
-                bias_attr=fluid.ParamAttr(name=name + '_layer_norm_bias', initializer=fluid.initializer.Constant(0.)))
+                                    bias_attr=fluid.ParamAttr(name=name + '_layer_norm_bias',
+                                                              initializer=fluid.initializer.Constant(0.)))
            if out_dtype == fluid.core.VarDesc.VarType.FP16:
                out = layers.cast(x=out, dtype="float16")
        elif cmd == "d":  # add dropout
            if dropout_rate:
-                out = layers.dropout(
+                out = layers.dropout(out,
-                    out, dropout_prob=dropout_rate, dropout_implementation="upscale_in_train", is_test=False)
+                                     dropout_prob=dropout_rate,
+                                     dropout_implementation="upscale_in_train",
+                                     is_test=False)
    return out
@@ -220,28 +221,35 @@ def encoder_layer(enc_input,
    with the post_process_layer to add residual connection, layer normalization
    and droput.
    """
-    attn_output = multi_head_attention(
+    attn_output = multi_head_attention(pre_process_layer(enc_input,
-        pre_process_layer(enc_input, preprocess_cmd, prepostprocess_dropout, name=name + '_pre_att'),
+                                                         preprocess_cmd,
-        None,
+                                                         prepostprocess_dropout,
-        None,
+                                                         name=name + '_pre_att'),
-        attn_bias,
+                                       None,
-        d_key,
+                                       None,
-        d_value,
+                                       attn_bias,
-        d_model,
+                                       d_key,
-        n_head,
+                                       d_value,
-        attention_dropout,
+                                       d_model,
-        param_initializer=param_initializer,
+                                       n_head,
-        name=name + '_multi_head_att')
+                                       attention_dropout,
-    attn_output = post_process_layer(
+                                       param_initializer=param_initializer,
-        enc_input, attn_output, postprocess_cmd, prepostprocess_dropout, name=name + '_post_att')
+                                       name=name + '_multi_head_att')
-    ffd_output = positionwise_feed_forward(
+    attn_output = post_process_layer(enc_input,
-        pre_process_layer(attn_output, preprocess_cmd, prepostprocess_dropout, name=name + '_pre_ffn'),
+                                     attn_output,
-        d_inner_hid,
+                                     postprocess_cmd,
-        d_model,
+                                     prepostprocess_dropout,
-        relu_dropout,
+                                     name=name + '_post_att')
-        hidden_act,
+    ffd_output = positionwise_feed_forward(pre_process_layer(attn_output,
-        param_initializer=param_initializer,
+                                                             preprocess_cmd,
-        name=name + '_ffn')
+                                                             prepostprocess_dropout,
+                                                             name=name + '_pre_ffn'),
+                                           d_inner_hid,
+                                           d_model,
+                                           relu_dropout,
+                                           hidden_act,
+                                           param_initializer=param_initializer,
+                                           name=name + '_ffn')
    return post_process_layer(attn_output, ffd_output, postprocess_cmd, prepostprocess_dropout, name=name + '_post_ffn')
@@ -266,22 +274,21 @@ def encoder(enc_input,
    encoder_layer.
    """
    for i in range(n_layer):
-        enc_output = encoder_layer(
+        enc_output = encoder_layer(enc_input,
-            enc_input,
+                                   attn_bias,
-            attn_bias,
+                                   n_head,
-            n_head,
+                                   d_key,
-            d_key,
+                                   d_value,
-            d_value,
+                                   d_model,
-            d_model,
+                                   d_inner_hid,
-            d_inner_hid,
+                                   prepostprocess_dropout,
-            prepostprocess_dropout,
+                                   attention_dropout,
-            attention_dropout,
+                                   relu_dropout,
-            relu_dropout,
+                                   hidden_act,
-            hidden_act,
+                                   preprocess_cmd,
-            preprocess_cmd,
+                                   postprocess_cmd,
-            postprocess_cmd,
+                                   param_initializer=param_initializer,
-            param_initializer=param_initializer,
+                                   name=name + '_layer_' + str(i))
-            name=name + '_layer_' + str(i))
        enc_input = enc_output
    enc_output = pre_process_layer(enc_output, preprocess_cmd, prepostprocess_dropout, name="post_encoder")

--- a/modules/text/semantic_model/bert_cased_L_12_H_768_A_12/module.py
+++ b/modules/text/semantic_model/bert_cased_L_12_H_768_A_12/module.py
@@ -58,13 +58,12 @@ class Bert(TransformerModule):
            pooled_output (tensor):  sentence-level output for classification task.
            sequence_output (tensor): token-level output for sequence task.
        """
-        bert = BertModel(
+        bert = BertModel(src_ids=input_ids,
-            src_ids=input_ids,
+                         position_ids=position_ids,
-            position_ids=position_ids,
+                         sentence_ids=segment_ids,
-            sentence_ids=segment_ids,
+                         input_mask=input_mask,
-            input_mask=input_mask,
+                         config=self.bert_config,
-            config=self.bert_config,
+                         use_fp16=False)
-            use_fp16=False)
        pooled_output = bert.get_pooled_output()
        sequence_output = bert.get_sequence_output()
        return pooled_output, sequence_output

--- a/modules/text/semantic_model/bert_cased_L_24_H_1024_A_16/README.md
+++ b/modules/text/semantic_model/bert_cased_L_24_H_1024_A_16/README.md
--- a/modules/text/semantic_model/bert_cased_L_24_H_1024_A_16/__init__.py
+++ b/modules/text/semantic_model/bert_cased_L_24_H_1024_A_16/__init__.py
--- a/modules/text/semantic_model/bert_cased_L_24_H_1024_A_16/model/__init__.py
+++ b/modules/text/semantic_model/bert_cased_L_24_H_1024_A_16/model/__init__.py
--- a/modules/text/semantic_model/bert_cased_L_24_H_1024_A_16/model/bert.py
+++ b/modules/text/semantic_model/bert_cased_L_24_H_1024_A_16/model/bert.py
@@ -74,23 +74,23 @@ class BertModel(object):
    def _build_model(self, src_ids, position_ids, sentence_ids, input_mask):
        # padding id in vocabulary must be set to 0
-        emb_out = fluid.layers.embedding(
+        emb_out = fluid.layers.embedding(input=src_ids,
-            input=src_ids,
+                                         size=[self._voc_size, self._emb_size],
-            size=[self._voc_size, self._emb_size],
+                                         dtype=self._dtype,
-            dtype=self._dtype,
+                                         param_attr=fluid.ParamAttr(name=self._word_emb_name,
-            param_attr=fluid.ParamAttr(name=self._word_emb_name, initializer=self._param_initializer),
+                                                                    initializer=self._param_initializer),
-            is_sparse=False)
+                                         is_sparse=False)
-        position_emb_out = fluid.layers.embedding(
+        position_emb_out = fluid.layers.embedding(input=position_ids,
-            input=position_ids,
+                                                  size=[self._max_position_seq_len, self._emb_size],
-            size=[self._max_position_seq_len, self._emb_size],
+                                                  dtype=self._dtype,
-            dtype=self._dtype,
+                                                  param_attr=fluid.ParamAttr(name=self._pos_emb_name,
-            param_attr=fluid.ParamAttr(name=self._pos_emb_name, initializer=self._param_initializer))
+                                                                             initializer=self._param_initializer))
-        sent_emb_out = fluid.layers.embedding(
+        sent_emb_out = fluid.layers.embedding(sentence_ids,
-            sentence_ids,
+                                              size=[self._sent_types, self._emb_size],
-            size=[self._sent_types, self._emb_size],
+                                              dtype=self._dtype,
-            dtype=self._dtype,
+                                              param_attr=fluid.ParamAttr(name=self._sent_emb_name,
-            param_attr=fluid.ParamAttr(name=self._sent_emb_name, initializer=self._param_initializer))
+                                                                         initializer=self._param_initializer))
        emb_out = emb_out + position_emb_out
        emb_out = emb_out + sent_emb_out
@@ -105,23 +105,22 @@ class BertModel(object):
        n_head_self_attn_mask = fluid.layers.stack(x=[self_attn_mask] * self._n_head, axis=1)
        n_head_self_attn_mask.stop_gradient = True
-        self._enc_out = encoder(
+        self._enc_out = encoder(enc_input=emb_out,
-            enc_input=emb_out,
+                                attn_bias=n_head_self_attn_mask,
-            attn_bias=n_head_self_attn_mask,
+                                n_layer=self._n_layer,
-            n_layer=self._n_layer,
+                                n_head=self._n_head,
-            n_head=self._n_head,
+                                d_key=self._emb_size // self._n_head,
-            d_key=self._emb_size // self._n_head,
+                                d_value=self._emb_size // self._n_head,
-            d_value=self._emb_size // self._n_head,
+                                d_model=self._emb_size,
-            d_model=self._emb_size,
+                                d_inner_hid=self._emb_size * 4,
-            d_inner_hid=self._emb_size * 4,
+                                prepostprocess_dropout=self._prepostprocess_dropout,
-            prepostprocess_dropout=self._prepostprocess_dropout,
+                                attention_dropout=self._attention_dropout,
-            attention_dropout=self._attention_dropout,
+                                relu_dropout=0,
-            relu_dropout=0,
+                                hidden_act=self._hidden_act,
-            hidden_act=self._hidden_act,
+                                preprocess_cmd="",
-            preprocess_cmd="",
+                                postprocess_cmd="dan",
-            postprocess_cmd="dan",
+                                param_initializer=self._param_initializer,
-            param_initializer=self._param_initializer,
+                                name='encoder')
-            name='encoder')
    def get_sequence_output(self):
        return self._enc_out
@@ -130,12 +129,12 @@ class BertModel(object):
        """Get the first feature of each sequence for classification"""
        next_sent_feat = fluid.layers.slice(input=self._enc_out, axes=[1], starts=[0], ends=[1])
-        next_sent_feat = fluid.layers.fc(
+        next_sent_feat = fluid.layers.fc(input=next_sent_feat,
-            input=next_sent_feat,
+                                         size=self._emb_size,
-            size=self._emb_size,
+                                         act="tanh",
-            act="tanh",
+                                         param_attr=fluid.ParamAttr(name="pooled_fc.w_0",
-            param_attr=fluid.ParamAttr(name="pooled_fc.w_0", initializer=self._param_initializer),
+                                                                    initializer=self._param_initializer),
-            bias_attr="pooled_fc.b_0")
+                                         bias_attr="pooled_fc.b_0")
        return next_sent_feat
    def get_pretraining_output(self, mask_label, mask_pos, labels):
@@ -150,43 +149,45 @@ class BertModel(object):
        mask_feat = fluid.layers.gather(input=reshaped_emb_out, index=mask_pos)
        # transform: fc
-        mask_trans_feat = fluid.layers.fc(
+        mask_trans_feat = fluid.layers.fc(input=mask_feat,
-            input=mask_feat,
+                                          size=self._emb_size,
-            size=self._emb_size,
+                                          act=self._hidden_act,
-            act=self._hidden_act,
+                                          param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0',
-            param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0', initializer=self._param_initializer),
+                                                                     initializer=self._param_initializer),
-            bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
+                                          bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
        # transform: layer norm
        mask_trans_feat = pre_process_layer(mask_trans_feat, 'n', name='mask_lm_trans')
-        mask_lm_out_bias_attr = fluid.ParamAttr(
+        mask_lm_out_bias_attr = fluid.ParamAttr(name="mask_lm_out_fc.b_0",
-            name="mask_lm_out_fc.b_0", initializer=fluid.initializer.Constant(value=0.0))
+                                                initializer=fluid.initializer.Constant(value=0.0))
        if self._weight_sharing:
-            fc_out = fluid.layers.matmul(
+            fc_out = fluid.layers.matmul(x=mask_trans_feat,
-                x=mask_trans_feat,
+                                         y=fluid.default_main_program().global_block().var(self._word_emb_name),
-                y=fluid.default_main_program().global_block().var(self._word_emb_name),
+                                         transpose_y=True)
-                transpose_y=True)
+            fc_out += fluid.layers.create_parameter(shape=[self._voc_size],
-            fc_out += fluid.layers.create_parameter(
+                                                    dtype=self._dtype,
-                shape=[self._voc_size], dtype=self._dtype, attr=mask_lm_out_bias_attr, is_bias=True)
+                                                    attr=mask_lm_out_bias_attr,
+                                                    is_bias=True)
        else:
-            fc_out = fluid.layers.fc(
+            fc_out = fluid.layers.fc(input=mask_trans_feat,
-                input=mask_trans_feat,
+                                     size=self._voc_size,
-                size=self._voc_size,
+                                     param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0",
-                param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0", initializer=self._param_initializer),
+                                                                initializer=self._param_initializer),
-                bias_attr=mask_lm_out_bias_attr)
+                                     bias_attr=mask_lm_out_bias_attr)
        mask_lm_loss = fluid.layers.softmax_with_cross_entropy(logits=fc_out, label=mask_label)
        mean_mask_lm_loss = fluid.layers.mean(mask_lm_loss)
-        next_sent_fc_out = fluid.layers.fc(
+        next_sent_fc_out = fluid.layers.fc(input=next_sent_feat,
-            input=next_sent_feat,
+                                           size=2,
-            size=2,
+                                           param_attr=fluid.ParamAttr(name="next_sent_fc.w_0",
-            param_attr=fluid.ParamAttr(name="next_sent_fc.w_0", initializer=self._param_initializer),
+                                                                      initializer=self._param_initializer),
-            bias_attr="next_sent_fc.b_0")
+                                           bias_attr="next_sent_fc.b_0")
-        next_sent_loss, next_sent_softmax = fluid.layers.softmax_with_cross_entropy(
+        next_sent_loss, next_sent_softmax = fluid.layers.softmax_with_cross_entropy(logits=next_sent_fc_out,
-            logits=next_sent_fc_out, label=labels, return_softmax=True)
+                                                                                    label=labels,
+                                                                                    return_softmax=True)
        next_sent_acc = fluid.layers.accuracy(input=next_sent_softmax, label=labels)

--- a/modules/text/semantic_model/bert_cased_L_24_H_1024_A_16/model/transformer_encoder.py
+++ b/modules/text/semantic_model/bert_cased_L_24_H_1024_A_16/model/transformer_encoder.py
@@ -50,24 +50,21 @@ def multi_head_attention(queries,
        """
        Add linear projection to queries, keys, and values.
        """
-        q = layers.fc(
+        q = layers.fc(input=queries,
-            input=queries,
+                      size=d_key * n_head,
-            size=d_key * n_head,
+                      num_flatten_dims=2,
-            num_flatten_dims=2,
+                      param_attr=fluid.ParamAttr(name=name + '_query_fc.w_0', initializer=param_initializer),
-            param_attr=fluid.ParamAttr(name=name + '_query_fc.w_0', initializer=param_initializer),
+                      bias_attr=name + '_query_fc.b_0')
-            bias_attr=name + '_query_fc.b_0')
+        k = layers.fc(input=keys,
-        k = layers.fc(
+                      size=d_key * n_head,
-            input=keys,
+                      num_flatten_dims=2,
-            size=d_key * n_head,
+                      param_attr=fluid.ParamAttr(name=name + '_key_fc.w_0', initializer=param_initializer),
-            num_flatten_dims=2,
+                      bias_attr=name + '_key_fc.b_0')
-            param_attr=fluid.ParamAttr(name=name + '_key_fc.w_0', initializer=param_initializer),
+        v = layers.fc(input=values,
-            bias_attr=name + '_key_fc.b_0')
+                      size=d_value * n_head,
-        v = layers.fc(
+                      num_flatten_dims=2,
-            input=values,
+                      param_attr=fluid.ParamAttr(name=name + '_value_fc.w_0', initializer=param_initializer),
-            size=d_value * n_head,
+                      bias_attr=name + '_value_fc.b_0')
-            num_flatten_dims=2,
-            param_attr=fluid.ParamAttr(name=name + '_value_fc.w_0', initializer=param_initializer),
-            bias_attr=name + '_value_fc.b_0')
        return q, k, v
    def __split_heads(x, n_head):
@@ -110,8 +107,10 @@ def multi_head_attention(queries,
            product += attn_bias
        weights = layers.softmax(product)
        if dropout_rate:
-            weights = layers.dropout(
+            weights = layers.dropout(weights,
-                weights, dropout_prob=dropout_rate, dropout_implementation="upscale_in_train", is_test=False)
+                                     dropout_prob=dropout_rate,
+                                     dropout_implementation="upscale_in_train",
+                                     is_test=False)
        out = layers.matmul(weights, v)
        return out
@@ -133,12 +132,11 @@ def multi_head_attention(queries,
    out = __combine_heads(ctx_multiheads)
    # Project back to the model size.
-    proj_out = layers.fc(
+    proj_out = layers.fc(input=out,
-        input=out,
+                         size=d_model,
-        size=d_model,
+                         num_flatten_dims=2,
-        num_flatten_dims=2,
+                         param_attr=fluid.ParamAttr(name=name + '_output_fc.w_0', initializer=param_initializer),
-        param_attr=fluid.ParamAttr(name=name + '_output_fc.w_0', initializer=param_initializer),
+                         bias_attr=name + '_output_fc.b_0')
-        bias_attr=name + '_output_fc.b_0')
    return proj_out
@@ -148,22 +146,22 @@ def positionwise_feed_forward(x, d_inner_hid, d_hid, dropout_rate, hidden_act, p
    This module consists of two linear transformations with a ReLU activation
    in between, which is applied to each position separately and identically.
    """
-    hidden = layers.fc(
+    hidden = layers.fc(input=x,
-        input=x,
+                       size=d_inner_hid,
-        size=d_inner_hid,
+                       num_flatten_dims=2,
-        num_flatten_dims=2,
+                       act=hidden_act,
-        act=hidden_act,
+                       param_attr=fluid.ParamAttr(name=name + '_fc_0.w_0', initializer=param_initializer),
-        param_attr=fluid.ParamAttr(name=name + '_fc_0.w_0', initializer=param_initializer),
+                       bias_attr=name + '_fc_0.b_0')
-        bias_attr=name + '_fc_0.b_0')
    if dropout_rate:
-        hidden = layers.dropout(
+        hidden = layers.dropout(hidden,
-            hidden, dropout_prob=dropout_rate, dropout_implementation="upscale_in_train", is_test=False)
+                                dropout_prob=dropout_rate,
-    out = layers.fc(
+                                dropout_implementation="upscale_in_train",
-        input=hidden,
+                                is_test=False)
-        size=d_hid,
+    out = layers.fc(input=hidden,
-        num_flatten_dims=2,
+                    size=d_hid,
-        param_attr=fluid.ParamAttr(name=name + '_fc_1.w_0', initializer=param_initializer),
+                    num_flatten_dims=2,
-        bias_attr=name + '_fc_1.b_0')
+                    param_attr=fluid.ParamAttr(name=name + '_fc_1.w_0', initializer=param_initializer),
+                    bias_attr=name + '_fc_1.b_0')
    return out
@@ -181,17 +179,20 @@ def pre_post_process_layer(prev_out, out, process_cmd, dropout_rate=0., name='')
            out_dtype = out.dtype
            if out_dtype == fluid.core.VarDesc.VarType.FP16:
                out = layers.cast(x=out, dtype="float32")
-            out = layers.layer_norm(
+            out = layers.layer_norm(out,
-                out,
+                                    begin_norm_axis=len(out.shape) - 1,
-                begin_norm_axis=len(out.shape) - 1,
+                                    param_attr=fluid.ParamAttr(name=name + '_layer_norm_scale',
-                param_attr=fluid.ParamAttr(name=name + '_layer_norm_scale', initializer=fluid.initializer.Constant(1.)),
+                                                               initializer=fluid.initializer.Constant(1.)),
-                bias_attr=fluid.ParamAttr(name=name + '_layer_norm_bias', initializer=fluid.initializer.Constant(0.)))
+                                    bias_attr=fluid.ParamAttr(name=name + '_layer_norm_bias',
+                                                              initializer=fluid.initializer.Constant(0.)))
            if out_dtype == fluid.core.VarDesc.VarType.FP16:
                out = layers.cast(x=out, dtype="float16")
        elif cmd == "d":  # add dropout
            if dropout_rate:
-                out = layers.dropout(
+                out = layers.dropout(out,
-                    out, dropout_prob=dropout_rate, dropout_implementation="upscale_in_train", is_test=False)
+                                     dropout_prob=dropout_rate,
+                                     dropout_implementation="upscale_in_train",
+                                     is_test=False)
    return out
@@ -220,28 +221,35 @@ def encoder_layer(enc_input,
    with the post_process_layer to add residual connection, layer normalization
    and droput.
    """
-    attn_output = multi_head_attention(
+    attn_output = multi_head_attention(pre_process_layer(enc_input,
-        pre_process_layer(enc_input, preprocess_cmd, prepostprocess_dropout, name=name + '_pre_att'),
+                                                         preprocess_cmd,
-        None,
+                                                         prepostprocess_dropout,
-        None,
+                                                         name=name + '_pre_att'),
-        attn_bias,
+                                       None,
-        d_key,
+                                       None,
-        d_value,
+                                       attn_bias,
-        d_model,
+                                       d_key,
-        n_head,
+                                       d_value,
-        attention_dropout,
+                                       d_model,
-        param_initializer=param_initializer,
+                                       n_head,
-        name=name + '_multi_head_att')
+                                       attention_dropout,
-    attn_output = post_process_layer(
+                                       param_initializer=param_initializer,
-        enc_input, attn_output, postprocess_cmd, prepostprocess_dropout, name=name + '_post_att')
+                                       name=name + '_multi_head_att')
-    ffd_output = positionwise_feed_forward(
+    attn_output = post_process_layer(enc_input,
-        pre_process_layer(attn_output, preprocess_cmd, prepostprocess_dropout, name=name + '_pre_ffn'),
+                                     attn_output,
-        d_inner_hid,
+                                     postprocess_cmd,
-        d_model,
+                                     prepostprocess_dropout,
-        relu_dropout,
+                                     name=name + '_post_att')
-        hidden_act,
+    ffd_output = positionwise_feed_forward(pre_process_layer(attn_output,
-        param_initializer=param_initializer,
+                                                             preprocess_cmd,
-        name=name + '_ffn')
+                                                             prepostprocess_dropout,
+                                                             name=name + '_pre_ffn'),
+                                           d_inner_hid,
+                                           d_model,
+                                           relu_dropout,
+                                           hidden_act,
+                                           param_initializer=param_initializer,
+                                           name=name + '_ffn')
    return post_process_layer(attn_output, ffd_output, postprocess_cmd, prepostprocess_dropout, name=name + '_post_ffn')
@@ -266,22 +274,21 @@ def encoder(enc_input,
    encoder_layer.
    """
    for i in range(n_layer):
-        enc_output = encoder_layer(
+        enc_output = encoder_layer(enc_input,
-            enc_input,
+                                   attn_bias,
-            attn_bias,
+                                   n_head,
-            n_head,
+                                   d_key,
-            d_key,
+                                   d_value,
-            d_value,
+                                   d_model,
-            d_model,
+                                   d_inner_hid,
-            d_inner_hid,
+                                   prepostprocess_dropout,
-            prepostprocess_dropout,
+                                   attention_dropout,
-            attention_dropout,
+                                   relu_dropout,
-            relu_dropout,
+                                   hidden_act,
-            hidden_act,
+                                   preprocess_cmd,
-            preprocess_cmd,
+                                   postprocess_cmd,
-            postprocess_cmd,
+                                   param_initializer=param_initializer,
-            param_initializer=param_initializer,
+                                   name=name + '_layer_' + str(i))
-            name=name + '_layer_' + str(i))
        enc_input = enc_output
    enc_output = pre_process_layer(enc_output, preprocess_cmd, prepostprocess_dropout, name="post_encoder")

--- a/modules/text/semantic_model/bert_cased_L_24_H_1024_A_16/module.py
+++ b/modules/text/semantic_model/bert_cased_L_24_H_1024_A_16/module.py
@@ -58,13 +58,12 @@ class Bert(TransformerModule):
            pooled_output (tensor):  sentence-level output for classification task.
            sequence_output (tensor): token-level output for sequence task.
        """
-        bert = BertModel(
+        bert = BertModel(src_ids=input_ids,
-            src_ids=input_ids,
+                         position_ids=position_ids,
-            position_ids=position_ids,
+                         sentence_ids=segment_ids,
-            sentence_ids=segment_ids,
+                         input_mask=input_mask,
-            input_mask=input_mask,
+                         config=self.bert_config,
-            config=self.bert_config,
+                         use_fp16=False)
-            use_fp16=False)
        pooled_output = bert.get_pooled_output()
        sequence_output = bert.get_sequence_output()
        return pooled_output, sequence_output

--- a/modules/text/semantic_model/bert_chinese_L_12_H_768_A_12/README.md
+++ b/modules/text/semantic_model/bert_chinese_L_12_H_768_A_12/README.md
--- a/modules/text/semantic_model/bert_chinese_L_12_H_768_A_12/__init__.py
+++ b/modules/text/semantic_model/bert_chinese_L_12_H_768_A_12/__init__.py
--- a/modules/text/semantic_model/bert_chinese_L_12_H_768_A_12/model/__init__.py
+++ b/modules/text/semantic_model/bert_chinese_L_12_H_768_A_12/model/__init__.py
--- a/modules/text/semantic_model/bert_chinese_L_12_H_768_A_12/model/bert.py
+++ b/modules/text/semantic_model/bert_chinese_L_12_H_768_A_12/model/bert.py
@@ -74,23 +74,23 @@ class BertModel(object):
    def _build_model(self, src_ids, position_ids, sentence_ids, input_mask):
        # padding id in vocabulary must be set to 0
-        emb_out = fluid.layers.embedding(
+        emb_out = fluid.layers.embedding(input=src_ids,
-            input=src_ids,
+                                         size=[self._voc_size, self._emb_size],
-            size=[self._voc_size, self._emb_size],
+                                         dtype=self._dtype,
-            dtype=self._dtype,
+                                         param_attr=fluid.ParamAttr(name=self._word_emb_name,
-            param_attr=fluid.ParamAttr(name=self._word_emb_name, initializer=self._param_initializer),
+                                                                    initializer=self._param_initializer),
-            is_sparse=False)
+                                         is_sparse=False)
-        position_emb_out = fluid.layers.embedding(
+        position_emb_out = fluid.layers.embedding(input=position_ids,
-            input=position_ids,
+                                                  size=[self._max_position_seq_len, self._emb_size],
-            size=[self._max_position_seq_len, self._emb_size],
+                                                  dtype=self._dtype,
-            dtype=self._dtype,
+                                                  param_attr=fluid.ParamAttr(name=self._pos_emb_name,
-            param_attr=fluid.ParamAttr(name=self._pos_emb_name, initializer=self._param_initializer))
+                                                                             initializer=self._param_initializer))
-        sent_emb_out = fluid.layers.embedding(
+        sent_emb_out = fluid.layers.embedding(sentence_ids,
-            sentence_ids,
+                                              size=[self._sent_types, self._emb_size],
-            size=[self._sent_types, self._emb_size],
+                                              dtype=self._dtype,
-            dtype=self._dtype,
+                                              param_attr=fluid.ParamAttr(name=self._sent_emb_name,
-            param_attr=fluid.ParamAttr(name=self._sent_emb_name, initializer=self._param_initializer))
+                                                                         initializer=self._param_initializer))
        emb_out = emb_out + position_emb_out
        emb_out = emb_out + sent_emb_out
@@ -105,23 +105,22 @@ class BertModel(object):
        n_head_self_attn_mask = fluid.layers.stack(x=[self_attn_mask] * self._n_head, axis=1)
        n_head_self_attn_mask.stop_gradient = True
-        self._enc_out = encoder(
+        self._enc_out = encoder(enc_input=emb_out,
-            enc_input=emb_out,
+                                attn_bias=n_head_self_attn_mask,
-            attn_bias=n_head_self_attn_mask,
+                                n_layer=self._n_layer,
-            n_layer=self._n_layer,
+                                n_head=self._n_head,
-            n_head=self._n_head,
+                                d_key=self._emb_size // self._n_head,
-            d_key=self._emb_size // self._n_head,
+                                d_value=self._emb_size // self._n_head,
-            d_value=self._emb_size // self._n_head,
+                                d_model=self._emb_size,
-            d_model=self._emb_size,
+                                d_inner_hid=self._emb_size * 4,
-            d_inner_hid=self._emb_size * 4,
+                                prepostprocess_dropout=self._prepostprocess_dropout,
-            prepostprocess_dropout=self._prepostprocess_dropout,
+                                attention_dropout=self._attention_dropout,
-            attention_dropout=self._attention_dropout,
+                                relu_dropout=0,
-            relu_dropout=0,
+                                hidden_act=self._hidden_act,
-            hidden_act=self._hidden_act,
+                                preprocess_cmd="",
-            preprocess_cmd="",
+                                postprocess_cmd="dan",
-            postprocess_cmd="dan",
+                                param_initializer=self._param_initializer,
-            param_initializer=self._param_initializer,
+                                name='encoder')
-            name='encoder')
    def get_sequence_output(self):
        return self._enc_out
@@ -130,12 +129,12 @@ class BertModel(object):
        """Get the first feature of each sequence for classification"""
        next_sent_feat = fluid.layers.slice(input=self._enc_out, axes=[1], starts=[0], ends=[1])
-        next_sent_feat = fluid.layers.fc(
+        next_sent_feat = fluid.layers.fc(input=next_sent_feat,
-            input=next_sent_feat,
+                                         size=self._emb_size,
-            size=self._emb_size,
+                                         act="tanh",
-            act="tanh",
+                                         param_attr=fluid.ParamAttr(name="pooled_fc.w_0",
-            param_attr=fluid.ParamAttr(name="pooled_fc.w_0", initializer=self._param_initializer),
+                                                                    initializer=self._param_initializer),
-            bias_attr="pooled_fc.b_0")
+                                         bias_attr="pooled_fc.b_0")
        return next_sent_feat
    def get_pretraining_output(self, mask_label, mask_pos, labels):
@@ -150,43 +149,45 @@ class BertModel(object):
        mask_feat = fluid.layers.gather(input=reshaped_emb_out, index=mask_pos)
        # transform: fc
-        mask_trans_feat = fluid.layers.fc(
+        mask_trans_feat = fluid.layers.fc(input=mask_feat,
-            input=mask_feat,
+                                          size=self._emb_size,
-            size=self._emb_size,
+                                          act=self._hidden_act,
-            act=self._hidden_act,
+                                          param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0',
-            param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0', initializer=self._param_initializer),
+                                                                     initializer=self._param_initializer),
-            bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
+                                          bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
        # transform: layer norm
        mask_trans_feat = pre_process_layer(mask_trans_feat, 'n', name='mask_lm_trans')
-        mask_lm_out_bias_attr = fluid.ParamAttr(
+        mask_lm_out_bias_attr = fluid.ParamAttr(name="mask_lm_out_fc.b_0",
-            name="mask_lm_out_fc.b_0", initializer=fluid.initializer.Constant(value=0.0))
+                                                initializer=fluid.initializer.Constant(value=0.0))
        if self._weight_sharing:
-            fc_out = fluid.layers.matmul(
+            fc_out = fluid.layers.matmul(x=mask_trans_feat,
-                x=mask_trans_feat,
+                                         y=fluid.default_main_program().global_block().var(self._word_emb_name),
-                y=fluid.default_main_program().global_block().var(self._word_emb_name),
+                                         transpose_y=True)
-                transpose_y=True)
+            fc_out += fluid.layers.create_parameter(shape=[self._voc_size],
-            fc_out += fluid.layers.create_parameter(
+                                                    dtype=self._dtype,
-                shape=[self._voc_size], dtype=self._dtype, attr=mask_lm_out_bias_attr, is_bias=True)
+                                                    attr=mask_lm_out_bias_attr,
+                                                    is_bias=True)
        else:
-            fc_out = fluid.layers.fc(
+            fc_out = fluid.layers.fc(input=mask_trans_feat,
-                input=mask_trans_feat,
+                                     size=self._voc_size,
-                size=self._voc_size,
+                                     param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0",
-                param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0", initializer=self._param_initializer),
+                                                                initializer=self._param_initializer),
-                bias_attr=mask_lm_out_bias_attr)
+                                     bias_attr=mask_lm_out_bias_attr)
        mask_lm_loss = fluid.layers.softmax_with_cross_entropy(logits=fc_out, label=mask_label)
        mean_mask_lm_loss = fluid.layers.mean(mask_lm_loss)
-        next_sent_fc_out = fluid.layers.fc(
+        next_sent_fc_out = fluid.layers.fc(input=next_sent_feat,
-            input=next_sent_feat,
+                                           size=2,
-            size=2,
+                                           param_attr=fluid.ParamAttr(name="next_sent_fc.w_0",
-            param_attr=fluid.ParamAttr(name="next_sent_fc.w_0", initializer=self._param_initializer),
+                                                                      initializer=self._param_initializer),
-            bias_attr="next_sent_fc.b_0")
+                                           bias_attr="next_sent_fc.b_0")
-        next_sent_loss, next_sent_softmax = fluid.layers.softmax_with_cross_entropy(
+        next_sent_loss, next_sent_softmax = fluid.layers.softmax_with_cross_entropy(logits=next_sent_fc_out,
-            logits=next_sent_fc_out, label=labels, return_softmax=True)
+                                                                                    label=labels,
+                                                                                    return_softmax=True)
        next_sent_acc = fluid.layers.accuracy(input=next_sent_softmax, label=labels)

--- a/modules/text/semantic_model/bert_multi_cased_L_12_H_768_A_12/model/transformer_encoder.py
+++ b/modules/text/semantic_model/bert_multi_cased_L_12_H_768_A_12/model/transformer_encoder.py
@@ -50,24 +50,21 @@ def multi_head_attention(queries,
        """
        Add linear projection to queries, keys, and values.
        """
-        q = layers.fc(
+        q = layers.fc(input=queries,
-            input=queries,
+                      size=d_key * n_head,
-            size=d_key * n_head,
+                      num_flatten_dims=2,
-            num_flatten_dims=2,
+                      param_attr=fluid.ParamAttr(name=name + '_query_fc.w_0', initializer=param_initializer),
-            param_attr=fluid.ParamAttr(name=name + '_query_fc.w_0', initializer=param_initializer),
+                      bias_attr=name + '_query_fc.b_0')
-            bias_attr=name + '_query_fc.b_0')
+        k = layers.fc(input=keys,
-        k = layers.fc(
+                      size=d_key * n_head,
-            input=keys,
+                      num_flatten_dims=2,
-            size=d_key * n_head,
+                      param_attr=fluid.ParamAttr(name=name + '_key_fc.w_0', initializer=param_initializer),
-            num_flatten_dims=2,
+                      bias_attr=name + '_key_fc.b_0')
-            param_attr=fluid.ParamAttr(name=name + '_key_fc.w_0', initializer=param_initializer),
+        v = layers.fc(input=values,
-            bias_attr=name + '_key_fc.b_0')
+                      size=d_value * n_head,
-        v = layers.fc(
+                      num_flatten_dims=2,
-            input=values,
+                      param_attr=fluid.ParamAttr(name=name + '_value_fc.w_0', initializer=param_initializer),
-            size=d_value * n_head,
+                      bias_attr=name + '_value_fc.b_0')
-            num_flatten_dims=2,
-            param_attr=fluid.ParamAttr(name=name + '_value_fc.w_0', initializer=param_initializer),
-            bias_attr=name + '_value_fc.b_0')
        return q, k, v
    def __split_heads(x, n_head):
@@ -110,8 +107,10 @@ def multi_head_attention(queries,
            product += attn_bias
        weights = layers.softmax(product)
        if dropout_rate:
-            weights = layers.dropout(
+            weights = layers.dropout(weights,
-                weights, dropout_prob=dropout_rate, dropout_implementation="upscale_in_train", is_test=False)
+                                     dropout_prob=dropout_rate,
+                                     dropout_implementation="upscale_in_train",
+                                     is_test=False)
        out = layers.matmul(weights, v)
        return out
@@ -133,12 +132,11 @@ def multi_head_attention(queries,
    out = __combine_heads(ctx_multiheads)
    # Project back to the model size.
-    proj_out = layers.fc(
+    proj_out = layers.fc(input=out,
-        input=out,
+                         size=d_model,
-        size=d_model,
+                         num_flatten_dims=2,
-        num_flatten_dims=2,
+                         param_attr=fluid.ParamAttr(name=name + '_output_fc.w_0', initializer=param_initializer),
-        param_attr=fluid.ParamAttr(name=name + '_output_fc.w_0', initializer=param_initializer),
+                         bias_attr=name + '_output_fc.b_0')
-        bias_attr=name + '_output_fc.b_0')
    return proj_out
@@ -148,22 +146,22 @@ def positionwise_feed_forward(x, d_inner_hid, d_hid, dropout_rate, hidden_act, p
    This module consists of two linear transformations with a ReLU activation
    in between, which is applied to each position separately and identically.
    """
-    hidden = layers.fc(
+    hidden = layers.fc(input=x,
-        input=x,
+                       size=d_inner_hid,
-        size=d_inner_hid,
+                       num_flatten_dims=2,
-        num_flatten_dims=2,
+                       act=hidden_act,
-        act=hidden_act,
+                       param_attr=fluid.ParamAttr(name=name + '_fc_0.w_0', initializer=param_initializer),
-        param_attr=fluid.ParamAttr(name=name + '_fc_0.w_0', initializer=param_initializer),
+                       bias_attr=name + '_fc_0.b_0')
-        bias_attr=name + '_fc_0.b_0')
    if dropout_rate:
-        hidden = layers.dropout(
+        hidden = layers.dropout(hidden,
-            hidden, dropout_prob=dropout_rate, dropout_implementation="upscale_in_train", is_test=False)
+                                dropout_prob=dropout_rate,
-    out = layers.fc(
+                                dropout_implementation="upscale_in_train",
-        input=hidden,
+                                is_test=False)
-        size=d_hid,
+    out = layers.fc(input=hidden,
-        num_flatten_dims=2,
+                    size=d_hid,
-        param_attr=fluid.ParamAttr(name=name + '_fc_1.w_0', initializer=param_initializer),
+                    num_flatten_dims=2,
-        bias_attr=name + '_fc_1.b_0')
+                    param_attr=fluid.ParamAttr(name=name + '_fc_1.w_0', initializer=param_initializer),
+                    bias_attr=name + '_fc_1.b_0')
    return out
@@ -181,17 +179,20 @@ def pre_post_process_layer(prev_out, out, process_cmd, dropout_rate=0., name='')
            out_dtype = out.dtype
            if out_dtype == fluid.core.VarDesc.VarType.FP16:
                out = layers.cast(x=out, dtype="float32")
-            out = layers.layer_norm(
+            out = layers.layer_norm(out,
-                out,
+                                    begin_norm_axis=len(out.shape) - 1,
-                begin_norm_axis=len(out.shape) - 1,
+                                    param_attr=fluid.ParamAttr(name=name + '_layer_norm_scale',
-                param_attr=fluid.ParamAttr(name=name + '_layer_norm_scale', initializer=fluid.initializer.Constant(1.)),
+                                                               initializer=fluid.initializer.Constant(1.)),
-                bias_attr=fluid.ParamAttr(name=name + '_layer_norm_bias', initializer=fluid.initializer.Constant(0.)))
+                                    bias_attr=fluid.ParamAttr(name=name + '_layer_norm_bias',
+                                                              initializer=fluid.initializer.Constant(0.)))
            if out_dtype == fluid.core.VarDesc.VarType.FP16:
                out = layers.cast(x=out, dtype="float16")
        elif cmd == "d":  # add dropout
            if dropout_rate:
-                out = layers.dropout(
+                out = layers.dropout(out,
-                    out, dropout_prob=dropout_rate, dropout_implementation="upscale_in_train", is_test=False)
+                                     dropout_prob=dropout_rate,
+                                     dropout_implementation="upscale_in_train",
+                                     is_test=False)
    return out
@@ -220,28 +221,35 @@ def encoder_layer(enc_input,
    with the post_process_layer to add residual connection, layer normalization
    and droput.
    """
-    attn_output = multi_head_attention(
+    attn_output = multi_head_attention(pre_process_layer(enc_input,
-        pre_process_layer(enc_input, preprocess_cmd, prepostprocess_dropout, name=name + '_pre_att'),
+                                                         preprocess_cmd,
-        None,
+                                                         prepostprocess_dropout,
-        None,
+                                                         name=name + '_pre_att'),
-        attn_bias,
+                                       None,
-        d_key,
+                                       None,
-        d_value,
+                                       attn_bias,
-        d_model,
+                                       d_key,
-        n_head,
+                                       d_value,
-        attention_dropout,
+                                       d_model,
-        param_initializer=param_initializer,
+                                       n_head,
-        name=name + '_multi_head_att')
+                                       attention_dropout,
-    attn_output = post_process_layer(
+                                       param_initializer=param_initializer,
-        enc_input, attn_output, postprocess_cmd, prepostprocess_dropout, name=name + '_post_att')
+                                       name=name + '_multi_head_att')
-    ffd_output = positionwise_feed_forward(
+    attn_output = post_process_layer(enc_input,
-        pre_process_layer(attn_output, preprocess_cmd, prepostprocess_dropout, name=name + '_pre_ffn'),
+                                     attn_output,
-        d_inner_hid,
+                                     postprocess_cmd,
-        d_model,
+                                     prepostprocess_dropout,
-        relu_dropout,
+                                     name=name + '_post_att')
-        hidden_act,
+    ffd_output = positionwise_feed_forward(pre_process_layer(attn_output,
-        param_initializer=param_initializer,
+                                                             preprocess_cmd,
-        name=name + '_ffn')
+                                                             prepostprocess_dropout,
+                                                             name=name + '_pre_ffn'),
+                                           d_inner_hid,
+                                           d_model,
+                                           relu_dropout,
+                                           hidden_act,
+                                           param_initializer=param_initializer,
+                                           name=name + '_ffn')
    return post_process_layer(attn_output, ffd_output, postprocess_cmd, prepostprocess_dropout, name=name + '_post_ffn')
@@ -266,22 +274,21 @@ def encoder(enc_input,
    encoder_layer.
    """
    for i in range(n_layer):
-        enc_output = encoder_layer(
+        enc_output = encoder_layer(enc_input,
-            enc_input,
+                                   attn_bias,
-            attn_bias,
+                                   n_head,
-            n_head,
+                                   d_key,
-            d_key,
+                                   d_value,
-            d_value,
+                                   d_model,
-            d_model,
+                                   d_inner_hid,
-            d_inner_hid,
+                                   prepostprocess_dropout,
-            prepostprocess_dropout,
+                                   attention_dropout,
-            attention_dropout,
+                                   relu_dropout,
-            relu_dropout,
+                                   hidden_act,
-            hidden_act,
+                                   preprocess_cmd,
-            preprocess_cmd,
+                                   postprocess_cmd,
-            postprocess_cmd,
+                                   param_initializer=param_initializer,
-            param_initializer=param_initializer,
+                                   name=name + '_layer_' + str(i))
-            name=name + '_layer_' + str(i))
        enc_input = enc_output
    enc_output = pre_process_layer(enc_output, preprocess_cmd, prepostprocess_dropout, name="post_encoder")

--- a/modules/text/semantic_model/bert_chinese_L_12_H_768_A_12/module.py
+++ b/modules/text/semantic_model/bert_chinese_L_12_H_768_A_12/module.py
@@ -58,13 +58,12 @@ class BertChinese(TransformerModule):
            pooled_output (tensor):  sentence-level output for classification task.
            sequence_output (tensor): token-level output for sequence task.
        """
-        bert = BertModel(
+        bert = BertModel(src_ids=input_ids,
-            src_ids=input_ids,
+                         position_ids=position_ids,
-            position_ids=position_ids,
+                         sentence_ids=segment_ids,
-            sentence_ids=segment_ids,
+                         input_mask=input_mask,
-            input_mask=input_mask,
+                         config=self.bert_config,
-            config=self.bert_config,
+                         use_fp16=False)
-            use_fp16=False)
        pooled_output = bert.get_pooled_output()
        sequence_output = bert.get_sequence_output()
        return pooled_output, sequence_output

--- a/modules/text/semantic_model/bert_multi_cased_L_12_H_768_A_12/README.md
+++ b/modules/text/semantic_model/bert_multi_cased_L_12_H_768_A_12/README.md
--- a/modules/text/semantic_model/bert_multi_cased_L_12_H_768_A_12/__init__.py
+++ b/modules/text/semantic_model/bert_multi_cased_L_12_H_768_A_12/__init__.py
--- a/modules/text/semantic_model/bert_multi_cased_L_12_H_768_A_12/model/__init__.py
+++ b/modules/text/semantic_model/bert_multi_cased_L_12_H_768_A_12/model/__init__.py
--- a/modules/text/semantic_model/bert_multi_cased_L_12_H_768_A_12/model/bert.py
+++ b/modules/text/semantic_model/bert_multi_cased_L_12_H_768_A_12/model/bert.py
@@ -74,23 +74,23 @@ class BertModel(object):
    def _build_model(self, src_ids, position_ids, sentence_ids, input_mask):
        # padding id in vocabulary must be set to 0
-        emb_out = fluid.layers.embedding(
+        emb_out = fluid.layers.embedding(input=src_ids,
-            input=src_ids,
+                                         size=[self._voc_size, self._emb_size],
-            size=[self._voc_size, self._emb_size],
+                                         dtype=self._dtype,
-            dtype=self._dtype,
+                                         param_attr=fluid.ParamAttr(name=self._word_emb_name,
-            param_attr=fluid.ParamAttr(name=self._word_emb_name, initializer=self._param_initializer),
+                                                                    initializer=self._param_initializer),
-            is_sparse=False)
+                                         is_sparse=False)
-        position_emb_out = fluid.layers.embedding(
+        position_emb_out = fluid.layers.embedding(input=position_ids,
-            input=position_ids,
+                                                  size=[self._max_position_seq_len, self._emb_size],
-            size=[self._max_position_seq_len, self._emb_size],
+                                                  dtype=self._dtype,
-            dtype=self._dtype,
+                                                  param_attr=fluid.ParamAttr(name=self._pos_emb_name,
-            param_attr=fluid.ParamAttr(name=self._pos_emb_name, initializer=self._param_initializer))
+                                                                             initializer=self._param_initializer))
-        sent_emb_out = fluid.layers.embedding(
+        sent_emb_out = fluid.layers.embedding(sentence_ids,
-            sentence_ids,
+                                              size=[self._sent_types, self._emb_size],
-            size=[self._sent_types, self._emb_size],
+                                              dtype=self._dtype,
-            dtype=self._dtype,
+                                              param_attr=fluid.ParamAttr(name=self._sent_emb_name,
-            param_attr=fluid.ParamAttr(name=self._sent_emb_name, initializer=self._param_initializer))
+                                                                         initializer=self._param_initializer))
        emb_out = emb_out + position_emb_out
        emb_out = emb_out + sent_emb_out
@@ -105,23 +105,22 @@ class BertModel(object):
        n_head_self_attn_mask = fluid.layers.stack(x=[self_attn_mask] * self._n_head, axis=1)
        n_head_self_attn_mask.stop_gradient = True
-        self._enc_out = encoder(
+        self._enc_out = encoder(enc_input=emb_out,
-            enc_input=emb_out,
+                                attn_bias=n_head_self_attn_mask,
-            attn_bias=n_head_self_attn_mask,
+                                n_layer=self._n_layer,
-            n_layer=self._n_layer,
+                                n_head=self._n_head,
-            n_head=self._n_head,
+                                d_key=self._emb_size // self._n_head,
-            d_key=self._emb_size // self._n_head,
+                                d_value=self._emb_size // self._n_head,
-            d_value=self._emb_size // self._n_head,
+                                d_model=self._emb_size,
-            d_model=self._emb_size,
+                                d_inner_hid=self._emb_size * 4,
-            d_inner_hid=self._emb_size * 4,
+                                prepostprocess_dropout=self._prepostprocess_dropout,
-            prepostprocess_dropout=self._prepostprocess_dropout,
+                                attention_dropout=self._attention_dropout,
-            attention_dropout=self._attention_dropout,
+                                relu_dropout=0,
-            relu_dropout=0,
+                                hidden_act=self._hidden_act,
-            hidden_act=self._hidden_act,
+                                preprocess_cmd="",
-            preprocess_cmd="",
+                                postprocess_cmd="dan",
-            postprocess_cmd="dan",
+                                param_initializer=self._param_initializer,
-            param_initializer=self._param_initializer,
+                                name='encoder')
-            name='encoder')
    def get_sequence_output(self):
        return self._enc_out
@@ -130,12 +129,12 @@ class BertModel(object):
        """Get the first feature of each sequence for classification"""
        next_sent_feat = fluid.layers.slice(input=self._enc_out, axes=[1], starts=[0], ends=[1])
-        next_sent_feat = fluid.layers.fc(
+        next_sent_feat = fluid.layers.fc(input=next_sent_feat,
-            input=next_sent_feat,
+                                         size=self._emb_size,
-            size=self._emb_size,
+                                         act="tanh",
-            act="tanh",
+                                         param_attr=fluid.ParamAttr(name="pooled_fc.w_0",
-            param_attr=fluid.ParamAttr(name="pooled_fc.w_0", initializer=self._param_initializer),
+                                                                    initializer=self._param_initializer),
-            bias_attr="pooled_fc.b_0")
+                                         bias_attr="pooled_fc.b_0")
        return next_sent_feat
    def get_pretraining_output(self, mask_label, mask_pos, labels):
@@ -150,43 +149,45 @@ class BertModel(object):
        mask_feat = fluid.layers.gather(input=reshaped_emb_out, index=mask_pos)
        # transform: fc
-        mask_trans_feat = fluid.layers.fc(
+        mask_trans_feat = fluid.layers.fc(input=mask_feat,
-            input=mask_feat,
+                                          size=self._emb_size,
-            size=self._emb_size,
+                                          act=self._hidden_act,
-            act=self._hidden_act,
+                                          param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0',
-            param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0', initializer=self._param_initializer),
+                                                                     initializer=self._param_initializer),
-            bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
+                                          bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
        # transform: layer norm
        mask_trans_feat = pre_process_layer(mask_trans_feat, 'n', name='mask_lm_trans')
-        mask_lm_out_bias_attr = fluid.ParamAttr(
+        mask_lm_out_bias_attr = fluid.ParamAttr(name="mask_lm_out_fc.b_0",
-            name="mask_lm_out_fc.b_0", initializer=fluid.initializer.Constant(value=0.0))
+                                                initializer=fluid.initializer.Constant(value=0.0))
        if self._weight_sharing:
-            fc_out = fluid.layers.matmul(
+            fc_out = fluid.layers.matmul(x=mask_trans_feat,
-                x=mask_trans_feat,
+                                         y=fluid.default_main_program().global_block().var(self._word_emb_name),
-                y=fluid.default_main_program().global_block().var(self._word_emb_name),
+                                         transpose_y=True)
-                transpose_y=True)
+            fc_out += fluid.layers.create_parameter(shape=[self._voc_size],
-            fc_out += fluid.layers.create_parameter(
+                                                    dtype=self._dtype,
-                shape=[self._voc_size], dtype=self._dtype, attr=mask_lm_out_bias_attr, is_bias=True)
+                                                    attr=mask_lm_out_bias_attr,
+                                                    is_bias=True)
        else:
-            fc_out = fluid.layers.fc(
+            fc_out = fluid.layers.fc(input=mask_trans_feat,
-                input=mask_trans_feat,
+                                     size=self._voc_size,
-                size=self._voc_size,
+                                     param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0",
-                param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0", initializer=self._param_initializer),
+                                                                initializer=self._param_initializer),
-                bias_attr=mask_lm_out_bias_attr)
+                                     bias_attr=mask_lm_out_bias_attr)
        mask_lm_loss = fluid.layers.softmax_with_cross_entropy(logits=fc_out, label=mask_label)
        mean_mask_lm_loss = fluid.layers.mean(mask_lm_loss)
-        next_sent_fc_out = fluid.layers.fc(
+        next_sent_fc_out = fluid.layers.fc(input=next_sent_feat,
-            input=next_sent_feat,
+                                           size=2,
-            size=2,
+                                           param_attr=fluid.ParamAttr(name="next_sent_fc.w_0",
-            param_attr=fluid.ParamAttr(name="next_sent_fc.w_0", initializer=self._param_initializer),
+                                                                      initializer=self._param_initializer),
-            bias_attr="next_sent_fc.b_0")
+                                           bias_attr="next_sent_fc.b_0")
-        next_sent_loss, next_sent_softmax = fluid.layers.softmax_with_cross_entropy(
+        next_sent_loss, next_sent_softmax = fluid.layers.softmax_with_cross_entropy(logits=next_sent_fc_out,
-            logits=next_sent_fc_out, label=labels, return_softmax=True)
+                                                                                    label=labels,
+                                                                                    return_softmax=True)
        next_sent_acc = fluid.layers.accuracy(input=next_sent_softmax, label=labels)

--- a/modules/text/semantic_model/bert_cased_L_12_H_768_A_12/model/transformer_encoder.py
+++ b/modules/text/semantic_model/bert_cased_L_12_H_768_A_12/model/transformer_encoder.py
@@ -50,24 +50,21 @@ def multi_head_attention(queries,
        """
        Add linear projection to queries, keys, and values.
        """
-        q = layers.fc(
+        q = layers.fc(input=queries,
-            input=queries,
+                      size=d_key * n_head,
-            size=d_key * n_head,
+                      num_flatten_dims=2,
-            num_flatten_dims=2,
+                      param_attr=fluid.ParamAttr(name=name + '_query_fc.w_0', initializer=param_initializer),
-            param_attr=fluid.ParamAttr(name=name + '_query_fc.w_0', initializer=param_initializer),
+                      bias_attr=name + '_query_fc.b_0')
-            bias_attr=name + '_query_fc.b_0')
+        k = layers.fc(input=keys,
-        k = layers.fc(
+                      size=d_key * n_head,
-            input=keys,
+                      num_flatten_dims=2,
-            size=d_key * n_head,
+                      param_attr=fluid.ParamAttr(name=name + '_key_fc.w_0', initializer=param_initializer),
-            num_flatten_dims=2,
+                      bias_attr=name + '_key_fc.b_0')
-            param_attr=fluid.ParamAttr(name=name + '_key_fc.w_0', initializer=param_initializer),
+        v = layers.fc(input=values,
-            bias_attr=name + '_key_fc.b_0')
+                      size=d_value * n_head,
-        v = layers.fc(
+                      num_flatten_dims=2,
-            input=values,
+                      param_attr=fluid.ParamAttr(name=name + '_value_fc.w_0', initializer=param_initializer),
-            size=d_value * n_head,
+                      bias_attr=name + '_value_fc.b_0')
-            num_flatten_dims=2,
-            param_attr=fluid.ParamAttr(name=name + '_value_fc.w_0', initializer=param_initializer),
-            bias_attr=name + '_value_fc.b_0')
        return q, k, v
    def __split_heads(x, n_head):
@@ -110,8 +107,10 @@ def multi_head_attention(queries,
            product += attn_bias
        weights = layers.softmax(product)
        if dropout_rate:
-            weights = layers.dropout(
+            weights = layers.dropout(weights,
-                weights, dropout_prob=dropout_rate, dropout_implementation="upscale_in_train", is_test=False)
+                                     dropout_prob=dropout_rate,
+                                     dropout_implementation="upscale_in_train",
+                                     is_test=False)
        out = layers.matmul(weights, v)
        return out
@@ -133,12 +132,11 @@ def multi_head_attention(queries,
    out = __combine_heads(ctx_multiheads)
    # Project back to the model size.
-    proj_out = layers.fc(
+    proj_out = layers.fc(input=out,
-        input=out,
+                         size=d_model,
-        size=d_model,
+                         num_flatten_dims=2,
-        num_flatten_dims=2,
+                         param_attr=fluid.ParamAttr(name=name + '_output_fc.w_0', initializer=param_initializer),
-        param_attr=fluid.ParamAttr(name=name + '_output_fc.w_0', initializer=param_initializer),
+                         bias_attr=name + '_output_fc.b_0')
-        bias_attr=name + '_output_fc.b_0')
    return proj_out
@@ -148,22 +146,22 @@ def positionwise_feed_forward(x, d_inner_hid, d_hid, dropout_rate, hidden_act, p
    This module consists of two linear transformations with a ReLU activation
    in between, which is applied to each position separately and identically.
    """
-    hidden = layers.fc(
+    hidden = layers.fc(input=x,
-        input=x,
+                       size=d_inner_hid,
-        size=d_inner_hid,
+                       num_flatten_dims=2,
-        num_flatten_dims=2,
+                       act=hidden_act,
-        act=hidden_act,
+                       param_attr=fluid.ParamAttr(name=name + '_fc_0.w_0', initializer=param_initializer),
-        param_attr=fluid.ParamAttr(name=name + '_fc_0.w_0', initializer=param_initializer),
+                       bias_attr=name + '_fc_0.b_0')
-        bias_attr=name + '_fc_0.b_0')
    if dropout_rate:
-        hidden = layers.dropout(
+        hidden = layers.dropout(hidden,
-            hidden, dropout_prob=dropout_rate, dropout_implementation="upscale_in_train", is_test=False)
+                                dropout_prob=dropout_rate,
-    out = layers.fc(
+                                dropout_implementation="upscale_in_train",
-        input=hidden,
+                                is_test=False)
-        size=d_hid,
+    out = layers.fc(input=hidden,
-        num_flatten_dims=2,
+                    size=d_hid,
-        param_attr=fluid.ParamAttr(name=name + '_fc_1.w_0', initializer=param_initializer),
+                    num_flatten_dims=2,
-        bias_attr=name + '_fc_1.b_0')
+                    param_attr=fluid.ParamAttr(name=name + '_fc_1.w_0', initializer=param_initializer),
+                    bias_attr=name + '_fc_1.b_0')
    return out
@@ -181,17 +179,20 @@ def pre_post_process_layer(prev_out, out, process_cmd, dropout_rate=0., name='')
            out_dtype = out.dtype
            if out_dtype == fluid.core.VarDesc.VarType.FP16:
                out = layers.cast(x=out, dtype="float32")
-            out = layers.layer_norm(
+            out = layers.layer_norm(out,
-                out,
+                                    begin_norm_axis=len(out.shape) - 1,
-                begin_norm_axis=len(out.shape) - 1,
+                                    param_attr=fluid.ParamAttr(name=name + '_layer_norm_scale',
-                param_attr=fluid.ParamAttr(name=name + '_layer_norm_scale', initializer=fluid.initializer.Constant(1.)),
+                                                               initializer=fluid.initializer.Constant(1.)),
-                bias_attr=fluid.ParamAttr(name=name + '_layer_norm_bias', initializer=fluid.initializer.Constant(0.)))
+                                    bias_attr=fluid.ParamAttr(name=name + '_layer_norm_bias',
+                                                              initializer=fluid.initializer.Constant(0.)))
            if out_dtype == fluid.core.VarDesc.VarType.FP16:
                out = layers.cast(x=out, dtype="float16")
        elif cmd == "d":  # add dropout
            if dropout_rate:
-                out = layers.dropout(
+                out = layers.dropout(out,
-                    out, dropout_prob=dropout_rate, dropout_implementation="upscale_in_train", is_test=False)
+                                     dropout_prob=dropout_rate,
+                                     dropout_implementation="upscale_in_train",
+                                     is_test=False)
    return out
@@ -220,28 +221,35 @@ def encoder_layer(enc_input,
    with the post_process_layer to add residual connection, layer normalization
    and droput.
    """
-    attn_output = multi_head_attention(
+    attn_output = multi_head_attention(pre_process_layer(enc_input,
-        pre_process_layer(enc_input, preprocess_cmd, prepostprocess_dropout, name=name + '_pre_att'),
+                                                         preprocess_cmd,
-        None,
+                                                         prepostprocess_dropout,
-        None,
+                                                         name=name + '_pre_att'),
-        attn_bias,
+                                       None,
-        d_key,
+                                       None,
-        d_value,
+                                       attn_bias,
-        d_model,
+                                       d_key,
-        n_head,
+                                       d_value,
-        attention_dropout,
+                                       d_model,
-        param_initializer=param_initializer,
+                                       n_head,
-        name=name + '_multi_head_att')
+                                       attention_dropout,
-    attn_output = post_process_layer(
+                                       param_initializer=param_initializer,
-        enc_input, attn_output, postprocess_cmd, prepostprocess_dropout, name=name + '_post_att')
+                                       name=name + '_multi_head_att')
-    ffd_output = positionwise_feed_forward(
+    attn_output = post_process_layer(enc_input,
-        pre_process_layer(attn_output, preprocess_cmd, prepostprocess_dropout, name=name + '_pre_ffn'),
+                                     attn_output,
-        d_inner_hid,
+                                     postprocess_cmd,
-        d_model,
+                                     prepostprocess_dropout,
-        relu_dropout,
+                                     name=name + '_post_att')
-        hidden_act,
+    ffd_output = positionwise_feed_forward(pre_process_layer(attn_output,
-        param_initializer=param_initializer,
+                                                             preprocess_cmd,
-        name=name + '_ffn')
+                                                             prepostprocess_dropout,
+                                                             name=name + '_pre_ffn'),
+                                           d_inner_hid,
+                                           d_model,
+                                           relu_dropout,
+                                           hidden_act,
+                                           param_initializer=param_initializer,
+                                           name=name + '_ffn')
    return post_process_layer(attn_output, ffd_output, postprocess_cmd, prepostprocess_dropout, name=name + '_post_ffn')
@@ -266,22 +274,21 @@ def encoder(enc_input,
    encoder_layer.
    """
    for i in range(n_layer):
-        enc_output = encoder_layer(
+        enc_output = encoder_layer(enc_input,
-            enc_input,
+                                   attn_bias,
-            attn_bias,
+                                   n_head,
-            n_head,
+                                   d_key,
-            d_key,
+                                   d_value,
-            d_value,
+                                   d_model,
-            d_model,
+                                   d_inner_hid,
-            d_inner_hid,
+                                   prepostprocess_dropout,
-            prepostprocess_dropout,
+                                   attention_dropout,
-            attention_dropout,
+                                   relu_dropout,
-            relu_dropout,
+                                   hidden_act,
-            hidden_act,
+                                   preprocess_cmd,
-            preprocess_cmd,
+                                   postprocess_cmd,
-            postprocess_cmd,
+                                   param_initializer=param_initializer,
-            param_initializer=param_initializer,
+                                   name=name + '_layer_' + str(i))
-            name=name + '_layer_' + str(i))
        enc_input = enc_output
    enc_output = pre_process_layer(enc_output, preprocess_cmd, prepostprocess_dropout, name="post_encoder")

--- a/modules/text/semantic_model/bert_multi_cased_L_12_H_768_A_12/module.py
+++ b/modules/text/semantic_model/bert_multi_cased_L_12_H_768_A_12/module.py
@@ -58,13 +58,12 @@ class Bert(TransformerModule):
            pooled_output (tensor):  sentence-level output for classification task.
            sequence_output (tensor): token-level output for sequence task.
        """
-        bert = BertModel(
+        bert = BertModel(src_ids=input_ids,
-            src_ids=input_ids,
+                         position_ids=position_ids,
-            position_ids=position_ids,
+                         sentence_ids=segment_ids,
-            sentence_ids=segment_ids,
+                         input_mask=input_mask,
-            input_mask=input_mask,
+                         config=self.bert_config,
-            config=self.bert_config,
+                         use_fp16=False)
-            use_fp16=False)
        pooled_output = bert.get_pooled_output()
        sequence_output = bert.get_sequence_output()
        return pooled_output, sequence_output

--- a/modules/text/semantic_model/bert_multi_uncased_L_12_H_768_A_12/README.md
+++ b/modules/text/semantic_model/bert_multi_uncased_L_12_H_768_A_12/README.md
--- a/modules/text/semantic_model/bert_multi_uncased_L_12_H_768_A_12/__init__.py
+++ b/modules/text/semantic_model/bert_multi_uncased_L_12_H_768_A_12/__init__.py
--- a/modules/text/semantic_model/bert_multi_uncased_L_12_H_768_A_12/model/__init__.py
+++ b/modules/text/semantic_model/bert_multi_uncased_L_12_H_768_A_12/model/__init__.py
--- a/modules/text/semantic_model/bert_multi_uncased_L_12_H_768_A_12/model/bert.py
+++ b/modules/text/semantic_model/bert_multi_uncased_L_12_H_768_A_12/model/bert.py
@@ -74,23 +74,23 @@ class BertModel(object):
    def _build_model(self, src_ids, position_ids, sentence_ids, input_mask):
        # padding id in vocabulary must be set to 0
-        emb_out = fluid.layers.embedding(
+        emb_out = fluid.layers.embedding(input=src_ids,
-            input=src_ids,
+                                         size=[self._voc_size, self._emb_size],
-            size=[self._voc_size, self._emb_size],
+                                         dtype=self._dtype,
-            dtype=self._dtype,
+                                         param_attr=fluid.ParamAttr(name=self._word_emb_name,
-            param_attr=fluid.ParamAttr(name=self._word_emb_name, initializer=self._param_initializer),
+                                                                    initializer=self._param_initializer),
-            is_sparse=False)
+                                         is_sparse=False)
-        position_emb_out = fluid.layers.embedding(
+        position_emb_out = fluid.layers.embedding(input=position_ids,
-            input=position_ids,
+                                                  size=[self._max_position_seq_len, self._emb_size],
-            size=[self._max_position_seq_len, self._emb_size],
+                                                  dtype=self._dtype,
-            dtype=self._dtype,
+                                                  param_attr=fluid.ParamAttr(name=self._pos_emb_name,
-            param_attr=fluid.ParamAttr(name=self._pos_emb_name, initializer=self._param_initializer))
+                                                                             initializer=self._param_initializer))
-        sent_emb_out = fluid.layers.embedding(
+        sent_emb_out = fluid.layers.embedding(sentence_ids,
-            sentence_ids,
+                                              size=[self._sent_types, self._emb_size],
-            size=[self._sent_types, self._emb_size],
+                                              dtype=self._dtype,
-            dtype=self._dtype,
+                                              param_attr=fluid.ParamAttr(name=self._sent_emb_name,
-            param_attr=fluid.ParamAttr(name=self._sent_emb_name, initializer=self._param_initializer))
+                                                                         initializer=self._param_initializer))
        emb_out = emb_out + position_emb_out
        emb_out = emb_out + sent_emb_out
@@ -105,23 +105,22 @@ class BertModel(object):
        n_head_self_attn_mask = fluid.layers.stack(x=[self_attn_mask] * self._n_head, axis=1)
        n_head_self_attn_mask.stop_gradient = True
-        self._enc_out = encoder(
+        self._enc_out = encoder(enc_input=emb_out,
-            enc_input=emb_out,
+                                attn_bias=n_head_self_attn_mask,
-            attn_bias=n_head_self_attn_mask,
+                                n_layer=self._n_layer,
-            n_layer=self._n_layer,
+                                n_head=self._n_head,
-            n_head=self._n_head,
+                                d_key=self._emb_size // self._n_head,
-            d_key=self._emb_size // self._n_head,
+                                d_value=self._emb_size // self._n_head,
-            d_value=self._emb_size // self._n_head,
+                                d_model=self._emb_size,
-            d_model=self._emb_size,
+                                d_inner_hid=self._emb_size * 4,
-            d_inner_hid=self._emb_size * 4,
+                                prepostprocess_dropout=self._prepostprocess_dropout,
-            prepostprocess_dropout=self._prepostprocess_dropout,
+                                attention_dropout=self._attention_dropout,
-            attention_dropout=self._attention_dropout,
+                                relu_dropout=0,
-            relu_dropout=0,
+                                hidden_act=self._hidden_act,
-            hidden_act=self._hidden_act,
+                                preprocess_cmd="",
-            preprocess_cmd="",
+                                postprocess_cmd="dan",
-            postprocess_cmd="dan",
+                                param_initializer=self._param_initializer,
-            param_initializer=self._param_initializer,
+                                name='encoder')
-            name='encoder')
    def get_sequence_output(self):
        return self._enc_out
@@ -130,12 +129,12 @@ class BertModel(object):
        """Get the first feature of each sequence for classification"""
        next_sent_feat = fluid.layers.slice(input=self._enc_out, axes=[1], starts=[0], ends=[1])
-        next_sent_feat = fluid.layers.fc(
+        next_sent_feat = fluid.layers.fc(input=next_sent_feat,
-            input=next_sent_feat,
+                                         size=self._emb_size,
-            size=self._emb_size,
+                                         act="tanh",
-            act="tanh",
+                                         param_attr=fluid.ParamAttr(name="pooled_fc.w_0",
-            param_attr=fluid.ParamAttr(name="pooled_fc.w_0", initializer=self._param_initializer),
+                                                                    initializer=self._param_initializer),
-            bias_attr="pooled_fc.b_0")
+                                         bias_attr="pooled_fc.b_0")
        return next_sent_feat
    def get_pretraining_output(self, mask_label, mask_pos, labels):
@@ -150,43 +149,45 @@ class BertModel(object):
        mask_feat = fluid.layers.gather(input=reshaped_emb_out, index=mask_pos)
        # transform: fc
-        mask_trans_feat = fluid.layers.fc(
+        mask_trans_feat = fluid.layers.fc(input=mask_feat,
-            input=mask_feat,
+                                          size=self._emb_size,
-            size=self._emb_size,
+                                          act=self._hidden_act,
-            act=self._hidden_act,
+                                          param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0',
-            param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0', initializer=self._param_initializer),
+                                                                     initializer=self._param_initializer),
-            bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
+                                          bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
        # transform: layer norm
        mask_trans_feat = pre_process_layer(mask_trans_feat, 'n', name='mask_lm_trans')
-        mask_lm_out_bias_attr = fluid.ParamAttr(
+        mask_lm_out_bias_attr = fluid.ParamAttr(name="mask_lm_out_fc.b_0",
-            name="mask_lm_out_fc.b_0", initializer=fluid.initializer.Constant(value=0.0))
+                                                initializer=fluid.initializer.Constant(value=0.0))
        if self._weight_sharing:
-            fc_out = fluid.layers.matmul(
+            fc_out = fluid.layers.matmul(x=mask_trans_feat,
-                x=mask_trans_feat,
+                                         y=fluid.default_main_program().global_block().var(self._word_emb_name),
-                y=fluid.default_main_program().global_block().var(self._word_emb_name),
+                                         transpose_y=True)
-                transpose_y=True)
+            fc_out += fluid.layers.create_parameter(shape=[self._voc_size],
-            fc_out += fluid.layers.create_parameter(
+                                                    dtype=self._dtype,
-                shape=[self._voc_size], dtype=self._dtype, attr=mask_lm_out_bias_attr, is_bias=True)
+                                                    attr=mask_lm_out_bias_attr,
+                                                    is_bias=True)
        else:
-            fc_out = fluid.layers.fc(
+            fc_out = fluid.layers.fc(input=mask_trans_feat,
-                input=mask_trans_feat,
+                                     size=self._voc_size,
-                size=self._voc_size,
+                                     param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0",
-                param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0", initializer=self._param_initializer),
+                                                                initializer=self._param_initializer),
-                bias_attr=mask_lm_out_bias_attr)
+                                     bias_attr=mask_lm_out_bias_attr)
        mask_lm_loss = fluid.layers.softmax_with_cross_entropy(logits=fc_out, label=mask_label)
        mean_mask_lm_loss = fluid.layers.mean(mask_lm_loss)
-        next_sent_fc_out = fluid.layers.fc(
+        next_sent_fc_out = fluid.layers.fc(input=next_sent_feat,
-            input=next_sent_feat,
+                                           size=2,
-            size=2,
+                                           param_attr=fluid.ParamAttr(name="next_sent_fc.w_0",
-            param_attr=fluid.ParamAttr(name="next_sent_fc.w_0", initializer=self._param_initializer),
+                                                                      initializer=self._param_initializer),
-            bias_attr="next_sent_fc.b_0")
+                                           bias_attr="next_sent_fc.b_0")
-        next_sent_loss, next_sent_softmax = fluid.layers.softmax_with_cross_entropy(
+        next_sent_loss, next_sent_softmax = fluid.layers.softmax_with_cross_entropy(logits=next_sent_fc_out,
-            logits=next_sent_fc_out, label=labels, return_softmax=True)
+                                                                                    label=labels,
+                                                                                    return_softmax=True)
        next_sent_acc = fluid.layers.accuracy(input=next_sent_softmax, label=labels)

--- a/modules/text/language_model/bert_multi_uncased_L_12_H_768_A_12/model/transformer_encoder.py
+++ b/modules/text/language_model/bert_multi_uncased_L_12_H_768_A_12/model/transformer_encoder.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Transformer encoder."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from functools import partial
+import paddle.fluid as fluid
+import paddle.fluid.layers as layers
+def multi_head_attention(queries,
+                         keys,
+                         values,
+                         attn_bias,
+                         d_key,
+                         d_value,
+                         d_model,
+                         n_head=1,
+                         dropout_rate=0.,
+                         cache=None,
+                         param_initializer=None,
+                         name='multi_head_att'):
+    """
+    Multi-Head Attention. Note that attn_bias is added to the logit before
+    computing softmax activiation to mask certain selected positions so that
+    they will not considered in attention weights.
+    """
+    keys = queries if keys is None else keys
+    values = keys if values is None else values
+    if not (len(queries.shape) == len(keys.shape) == len(values.shape) == 3):
+        raise ValueError("Inputs: quries, keys and values should all be 3-D tensors.")
+    def __compute_qkv(queries, keys, values, n_head, d_key, d_value):
+        """
+        Add linear projection to queries, keys, and values.
+        """
+        q = layers.fc(input=queries,
+                      size=d_key * n_head,
+                      num_flatten_dims=2,
+                      param_attr=fluid.ParamAttr(name=name + '_query_fc.w_0', initializer=param_initializer),
+                      bias_attr=name + '_query_fc.b_0')
+        k = layers.fc(input=keys,
+                      size=d_key * n_head,
+                      num_flatten_dims=2,
+                      param_attr=fluid.ParamAttr(name=name + '_key_fc.w_0', initializer=param_initializer),
+                      bias_attr=name + '_key_fc.b_0')
+        v = layers.fc(input=values,
+                      size=d_value * n_head,
+                      num_flatten_dims=2,
+                      param_attr=fluid.ParamAttr(name=name + '_value_fc.w_0', initializer=param_initializer),
+                      bias_attr=name + '_value_fc.b_0')
+        return q, k, v
+    def __split_heads(x, n_head):
+        """
+        Reshape the last dimension of inpunt tensor x so that it becomes two
+        dimensions and then transpose. Specifically, input a tensor with shape
+        [bs, max_sequence_length, n_head * hidden_dim] then output a tensor
+        with shape [bs, n_head, max_sequence_length, hidden_dim].
+        """
+        hidden_size = x.shape[-1]
+        # The value 0 in shape attr means copying the corresponding dimension
+        # size of the input as the output dimension size.
+        reshaped = layers.reshape(x=x, shape=[0, 0, n_head, hidden_size // n_head], inplace=True)
+        # permuate the dimensions into:
+        # [batch_size, n_head, max_sequence_len, hidden_size_per_head]
+        return layers.transpose(x=reshaped, perm=[0, 2, 1, 3])
+    def __combine_heads(x):
+        """
+        Transpose and then reshape the last two dimensions of inpunt tensor x
+        so that it becomes one dimension, which is reverse to __split_heads.
+        """
+        if len(x.shape) == 3: return x
+        if len(x.shape) != 4:
+            raise ValueError("Input(x) should be a 4-D Tensor.")
+        trans_x = layers.transpose(x, perm=[0, 2, 1, 3])
+        # The value 0 in shape attr means copying the corresponding dimension
+        # size of the input as the output dimension size.
+        return layers.reshape(x=trans_x, shape=[0, 0, trans_x.shape[2] * trans_x.shape[3]], inplace=True)
+    def scaled_dot_product_attention(q, k, v, attn_bias, d_key, dropout_rate):
+        """
+        Scaled Dot-Product Attention
+        """
+        scaled_q = layers.scale(x=q, scale=d_key**-0.5)
+        product = layers.matmul(x=scaled_q, y=k, transpose_y=True)
+        if attn_bias:
+            product += attn_bias
+        weights = layers.softmax(product)
+        if dropout_rate:
+            weights = layers.dropout(weights,
+                                     dropout_prob=dropout_rate,
+                                     dropout_implementation="upscale_in_train",
+                                     is_test=False)
+        out = layers.matmul(weights, v)
+        return out
+    q, k, v = __compute_qkv(queries, keys, values, n_head, d_key, d_value)
+    if cache is not None:  # use cache and concat time steps
+        # Since the inplace reshape in __split_heads changes the shape of k and
+        # v, which is the cache input for next time step, reshape the cache
+        # input from the previous time step first.
+        k = cache["k"] = layers.concat([layers.reshape(cache["k"], shape=[0, 0, d_model]), k], axis=1)
+        v = cache["v"] = layers.concat([layers.reshape(cache["v"], shape=[0, 0, d_model]), v], axis=1)
+    q = __split_heads(q, n_head)
+    k = __split_heads(k, n_head)
+    v = __split_heads(v, n_head)
+    ctx_multiheads = scaled_dot_product_attention(q, k, v, attn_bias, d_key, dropout_rate)
+    out = __combine_heads(ctx_multiheads)
+    # Project back to the model size.
+    proj_out = layers.fc(input=out,
+                         size=d_model,
+                         num_flatten_dims=2,
+                         param_attr=fluid.ParamAttr(name=name + '_output_fc.w_0', initializer=param_initializer),
+                         bias_attr=name + '_output_fc.b_0')
+    return proj_out
+def positionwise_feed_forward(x, d_inner_hid, d_hid, dropout_rate, hidden_act, param_initializer=None, name='ffn'):
+    """
+    Position-wise Feed-Forward Networks.
+    This module consists of two linear transformations with a ReLU activation
+    in between, which is applied to each position separately and identically.
+    """
+    hidden = layers.fc(input=x,
+                       size=d_inner_hid,
+                       num_flatten_dims=2,
+                       act=hidden_act,
+                       param_attr=fluid.ParamAttr(name=name + '_fc_0.w_0', initializer=param_initializer),
+                       bias_attr=name + '_fc_0.b_0')
+    if dropout_rate:
+        hidden = layers.dropout(hidden,
+                                dropout_prob=dropout_rate,
+                                dropout_implementation="upscale_in_train",
+                                is_test=False)
+    out = layers.fc(input=hidden,
+                    size=d_hid,
+                    num_flatten_dims=2,
+                    param_attr=fluid.ParamAttr(name=name + '_fc_1.w_0', initializer=param_initializer),
+                    bias_attr=name + '_fc_1.b_0')
+    return out
+def pre_post_process_layer(prev_out, out, process_cmd, dropout_rate=0., name=''):
+    """
+    Add residual connection, layer normalization and droput to the out tensor
+    optionally according to the value of process_cmd.
+    This will be used before or after multi-head attention and position-wise
+    feed-forward networks.
+    """
+    for cmd in process_cmd:
+        if cmd == "a":  # add residual connection
+            out = out + prev_out if prev_out else out
+        elif cmd == "n":  # add layer normalization
+            out_dtype = out.dtype
+            if out_dtype == fluid.core.VarDesc.VarType.FP16:
+                out = layers.cast(x=out, dtype="float32")
+            out = layers.layer_norm(out,
+                                    begin_norm_axis=len(out.shape) - 1,
+                                    param_attr=fluid.ParamAttr(name=name + '_layer_norm_scale',
+                                                               initializer=fluid.initializer.Constant(1.)),
+                                    bias_attr=fluid.ParamAttr(name=name + '_layer_norm_bias',
+                                                              initializer=fluid.initializer.Constant(0.)))
+            if out_dtype == fluid.core.VarDesc.VarType.FP16:
+                out = layers.cast(x=out, dtype="float16")
+        elif cmd == "d":  # add dropout
+            if dropout_rate:
+                out = layers.dropout(out,
+                                     dropout_prob=dropout_rate,
+                                     dropout_implementation="upscale_in_train",
+                                     is_test=False)
+    return out
+pre_process_layer = partial(pre_post_process_layer, None)
+post_process_layer = pre_post_process_layer
+def encoder_layer(enc_input,
+                  attn_bias,
+                  n_head,
+                  d_key,
+                  d_value,
+                  d_model,
+                  d_inner_hid,
+                  prepostprocess_dropout,
+                  attention_dropout,
+                  relu_dropout,
+                  hidden_act,
+                  preprocess_cmd="n",
+                  postprocess_cmd="da",
+                  param_initializer=None,
+                  name=''):
+    """The encoder layers that can be stacked to form a deep encoder.
+    This module consits of a multi-head (self) attention followed by
+    position-wise feed-forward networks and both the two components companied
+    with the post_process_layer to add residual connection, layer normalization
+    and droput.
+    """
+    attn_output = multi_head_attention(pre_process_layer(enc_input,
+                                                         preprocess_cmd,
+                                                         prepostprocess_dropout,
+                                                         name=name + '_pre_att'),
+                                       None,
+                                       None,
+                                       attn_bias,
+                                       d_key,
+                                       d_value,
+                                       d_model,
+                                       n_head,
+                                       attention_dropout,
+                                       param_initializer=param_initializer,
+                                       name=name + '_multi_head_att')
+    attn_output = post_process_layer(enc_input,
+                                     attn_output,
+                                     postprocess_cmd,
+                                     prepostprocess_dropout,
+                                     name=name + '_post_att')
+    ffd_output = positionwise_feed_forward(pre_process_layer(attn_output,
+                                                             preprocess_cmd,
+                                                             prepostprocess_dropout,
+                                                             name=name + '_pre_ffn'),
+                                           d_inner_hid,
+                                           d_model,
+                                           relu_dropout,
+                                           hidden_act,
+                                           param_initializer=param_initializer,
+                                           name=name + '_ffn')
+    return post_process_layer(attn_output, ffd_output, postprocess_cmd, prepostprocess_dropout, name=name + '_post_ffn')
+def encoder(enc_input,
+            attn_bias,
+            n_layer,
+            n_head,
+            d_key,
+            d_value,
+            d_model,
+            d_inner_hid,
+            prepostprocess_dropout,
+            attention_dropout,
+            relu_dropout,
+            hidden_act,
+            preprocess_cmd="n",
+            postprocess_cmd="da",
+            param_initializer=None,
+            name=''):
+    """
+    The encoder is composed of a stack of identical layers returned by calling
+    encoder_layer.
+    """
+    for i in range(n_layer):
+        enc_output = encoder_layer(enc_input,
+                                   attn_bias,
+                                   n_head,
+                                   d_key,
+                                   d_value,
+                                   d_model,
+                                   d_inner_hid,
+                                   prepostprocess_dropout,
+                                   attention_dropout,
+                                   relu_dropout,
+                                   hidden_act,
+                                   preprocess_cmd,
+                                   postprocess_cmd,
+                                   param_initializer=param_initializer,
+                                   name=name + '_layer_' + str(i))
+        enc_input = enc_output
+    enc_output = pre_process_layer(enc_output, preprocess_cmd, prepostprocess_dropout, name="post_encoder")
+    return enc_output
--- a/modules/text/semantic_model/bert_multi_uncased_L_12_H_768_A_12/module.py
+++ b/modules/text/semantic_model/bert_multi_uncased_L_12_H_768_A_12/module.py
@@ -58,13 +58,12 @@ class Bert(TransformerModule):
            pooled_output (tensor):  sentence-level output for classification task.
            sequence_output (tensor): token-level output for sequence task.
        """
-        bert = BertModel(
+        bert = BertModel(src_ids=input_ids,
-            src_ids=input_ids,
+                         position_ids=position_ids,
-            position_ids=position_ids,
+                         sentence_ids=segment_ids,
-            sentence_ids=segment_ids,
+                         input_mask=input_mask,
-            input_mask=input_mask,
+                         config=self.bert_config,
-            config=self.bert_config,
+                         use_fp16=False)
-            use_fp16=False)
        pooled_output = bert.get_pooled_output()
        sequence_output = bert.get_sequence_output()
        return pooled_output, sequence_output

--- a/modules/text/semantic_model/bert_uncased_L_12_H_768_A_12/README.md
+++ b/modules/text/semantic_model/bert_uncased_L_12_H_768_A_12/README.md
--- a/modules/text/semantic_model/bert_uncased_L_12_H_768_A_12/__init__.py
+++ b/modules/text/semantic_model/bert_uncased_L_12_H_768_A_12/__init__.py
--- a/modules/text/semantic_model/bert_uncased_L_12_H_768_A_12/model/__init__.py
+++ b/modules/text/semantic_model/bert_uncased_L_12_H_768_A_12/model/__init__.py
--- a/modules/text/semantic_model/bert_uncased_L_12_H_768_A_12/model/bert.py
+++ b/modules/text/semantic_model/bert_uncased_L_12_H_768_A_12/model/bert.py
@@ -74,23 +74,23 @@ class BertModel(object):
    def _build_model(self, src_ids, position_ids, sentence_ids, input_mask):
        # padding id in vocabulary must be set to 0
-        emb_out = fluid.layers.embedding(
+        emb_out = fluid.layers.embedding(input=src_ids,
-            input=src_ids,
+                                         size=[self._voc_size, self._emb_size],
-            size=[self._voc_size, self._emb_size],
+                                         dtype=self._dtype,
-            dtype=self._dtype,
+                                         param_attr=fluid.ParamAttr(name=self._word_emb_name,
-            param_attr=fluid.ParamAttr(name=self._word_emb_name, initializer=self._param_initializer),
+                                                                    initializer=self._param_initializer),
-            is_sparse=False)
+                                         is_sparse=False)
-        position_emb_out = fluid.layers.embedding(
+        position_emb_out = fluid.layers.embedding(input=position_ids,
-            input=position_ids,
+                                                  size=[self._max_position_seq_len, self._emb_size],
-            size=[self._max_position_seq_len, self._emb_size],
+                                                  dtype=self._dtype,
-            dtype=self._dtype,
+                                                  param_attr=fluid.ParamAttr(name=self._pos_emb_name,
-            param_attr=fluid.ParamAttr(name=self._pos_emb_name, initializer=self._param_initializer))
+                                                                             initializer=self._param_initializer))
-        sent_emb_out = fluid.layers.embedding(
+        sent_emb_out = fluid.layers.embedding(sentence_ids,
-            sentence_ids,
+                                              size=[self._sent_types, self._emb_size],
-            size=[self._sent_types, self._emb_size],
+                                              dtype=self._dtype,
-            dtype=self._dtype,
+                                              param_attr=fluid.ParamAttr(name=self._sent_emb_name,
-            param_attr=fluid.ParamAttr(name=self._sent_emb_name, initializer=self._param_initializer))
+                                                                         initializer=self._param_initializer))
        emb_out = emb_out + position_emb_out
        emb_out = emb_out + sent_emb_out
@@ -105,23 +105,22 @@ class BertModel(object):
        n_head_self_attn_mask = fluid.layers.stack(x=[self_attn_mask] * self._n_head, axis=1)
        n_head_self_attn_mask.stop_gradient = True
-        self._enc_out = encoder(
+        self._enc_out = encoder(enc_input=emb_out,
-            enc_input=emb_out,
+                                attn_bias=n_head_self_attn_mask,
-            attn_bias=n_head_self_attn_mask,
+                                n_layer=self._n_layer,
-            n_layer=self._n_layer,
+                                n_head=self._n_head,
-            n_head=self._n_head,
+                                d_key=self._emb_size // self._n_head,
-            d_key=self._emb_size // self._n_head,
+                                d_value=self._emb_size // self._n_head,
-            d_value=self._emb_size // self._n_head,
+                                d_model=self._emb_size,
-            d_model=self._emb_size,
+                                d_inner_hid=self._emb_size * 4,
-            d_inner_hid=self._emb_size * 4,
+                                prepostprocess_dropout=self._prepostprocess_dropout,
-            prepostprocess_dropout=self._prepostprocess_dropout,
+                                attention_dropout=self._attention_dropout,
-            attention_dropout=self._attention_dropout,
+                                relu_dropout=0,
-            relu_dropout=0,
+                                hidden_act=self._hidden_act,
-            hidden_act=self._hidden_act,
+                                preprocess_cmd="",
-            preprocess_cmd="",
+                                postprocess_cmd="dan",
-            postprocess_cmd="dan",
+                                param_initializer=self._param_initializer,
-            param_initializer=self._param_initializer,
+                                name='encoder')
-            name='encoder')
    def get_sequence_output(self):
        return self._enc_out
@@ -130,12 +129,12 @@ class BertModel(object):
        """Get the first feature of each sequence for classification"""
        next_sent_feat = fluid.layers.slice(input=self._enc_out, axes=[1], starts=[0], ends=[1])
-        next_sent_feat = fluid.layers.fc(
+        next_sent_feat = fluid.layers.fc(input=next_sent_feat,
-            input=next_sent_feat,
+                                         size=self._emb_size,
-            size=self._emb_size,
+                                         act="tanh",
-            act="tanh",
+                                         param_attr=fluid.ParamAttr(name="pooled_fc.w_0",
-            param_attr=fluid.ParamAttr(name="pooled_fc.w_0", initializer=self._param_initializer),
+                                                                    initializer=self._param_initializer),
-            bias_attr="pooled_fc.b_0")
+                                         bias_attr="pooled_fc.b_0")
        return next_sent_feat
    def get_pretraining_output(self, mask_label, mask_pos, labels):
@@ -150,43 +149,45 @@ class BertModel(object):
        mask_feat = fluid.layers.gather(input=reshaped_emb_out, index=mask_pos)
        # transform: fc
-        mask_trans_feat = fluid.layers.fc(
+        mask_trans_feat = fluid.layers.fc(input=mask_feat,
-            input=mask_feat,
+                                          size=self._emb_size,
-            size=self._emb_size,
+                                          act=self._hidden_act,
-            act=self._hidden_act,
+                                          param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0',
-            param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0', initializer=self._param_initializer),
+                                                                     initializer=self._param_initializer),
-            bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
+                                          bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
        # transform: layer norm
        mask_trans_feat = pre_process_layer(mask_trans_feat, 'n', name='mask_lm_trans')
-        mask_lm_out_bias_attr = fluid.ParamAttr(
+        mask_lm_out_bias_attr = fluid.ParamAttr(name="mask_lm_out_fc.b_0",
-            name="mask_lm_out_fc.b_0", initializer=fluid.initializer.Constant(value=0.0))
+                                                initializer=fluid.initializer.Constant(value=0.0))
        if self._weight_sharing:
-            fc_out = fluid.layers.matmul(
+            fc_out = fluid.layers.matmul(x=mask_trans_feat,
-                x=mask_trans_feat,
+                                         y=fluid.default_main_program().global_block().var(self._word_emb_name),
-                y=fluid.default_main_program().global_block().var(self._word_emb_name),
+                                         transpose_y=True)
-                transpose_y=True)
+            fc_out += fluid.layers.create_parameter(shape=[self._voc_size],
-            fc_out += fluid.layers.create_parameter(
+                                                    dtype=self._dtype,
-                shape=[self._voc_size], dtype=self._dtype, attr=mask_lm_out_bias_attr, is_bias=True)
+                                                    attr=mask_lm_out_bias_attr,
+                                                    is_bias=True)
        else:
-            fc_out = fluid.layers.fc(
+            fc_out = fluid.layers.fc(input=mask_trans_feat,
-                input=mask_trans_feat,
+                                     size=self._voc_size,
-                size=self._voc_size,
+                                     param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0",
-                param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0", initializer=self._param_initializer),
+                                                                initializer=self._param_initializer),
-                bias_attr=mask_lm_out_bias_attr)
+                                     bias_attr=mask_lm_out_bias_attr)
        mask_lm_loss = fluid.layers.softmax_with_cross_entropy(logits=fc_out, label=mask_label)
        mean_mask_lm_loss = fluid.layers.mean(mask_lm_loss)
-        next_sent_fc_out = fluid.layers.fc(
+        next_sent_fc_out = fluid.layers.fc(input=next_sent_feat,
-            input=next_sent_feat,
+                                           size=2,
-            size=2,
+                                           param_attr=fluid.ParamAttr(name="next_sent_fc.w_0",
-            param_attr=fluid.ParamAttr(name="next_sent_fc.w_0", initializer=self._param_initializer),
+                                                                      initializer=self._param_initializer),
-            bias_attr="next_sent_fc.b_0")
+                                           bias_attr="next_sent_fc.b_0")
-        next_sent_loss, next_sent_softmax = fluid.layers.softmax_with_cross_entropy(
+        next_sent_loss, next_sent_softmax = fluid.layers.softmax_with_cross_entropy(logits=next_sent_fc_out,
-            logits=next_sent_fc_out, label=labels, return_softmax=True)
+                                                                                    label=labels,
+                                                                                    return_softmax=True)
        next_sent_acc = fluid.layers.accuracy(input=next_sent_softmax, label=labels)

--- a/modules/text/language_model/bert_uncased_L_12_H_768_A_12/model/transformer_encoder.py
+++ b/modules/text/language_model/bert_uncased_L_12_H_768_A_12/model/transformer_encoder.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Transformer encoder."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from functools import partial
+import paddle.fluid as fluid
+import paddle.fluid.layers as layers
+def multi_head_attention(queries,
+                         keys,
+                         values,
+                         attn_bias,
+                         d_key,
+                         d_value,
+                         d_model,
+                         n_head=1,
+                         dropout_rate=0.,
+                         cache=None,
+                         param_initializer=None,
+                         name='multi_head_att'):
+    """
+    Multi-Head Attention. Note that attn_bias is added to the logit before
+    computing softmax activiation to mask certain selected positions so that
+    they will not considered in attention weights.
+    """
+    keys = queries if keys is None else keys
+    values = keys if values is None else values
+    if not (len(queries.shape) == len(keys.shape) == len(values.shape) == 3):
+        raise ValueError("Inputs: quries, keys and values should all be 3-D tensors.")
+    def __compute_qkv(queries, keys, values, n_head, d_key, d_value):
+        """
+        Add linear projection to queries, keys, and values.
+        """
+        q = layers.fc(input=queries,
+                      size=d_key * n_head,
+                      num_flatten_dims=2,
+                      param_attr=fluid.ParamAttr(name=name + '_query_fc.w_0', initializer=param_initializer),
+                      bias_attr=name + '_query_fc.b_0')
+        k = layers.fc(input=keys,
+                      size=d_key * n_head,
+                      num_flatten_dims=2,
+                      param_attr=fluid.ParamAttr(name=name + '_key_fc.w_0', initializer=param_initializer),
+                      bias_attr=name + '_key_fc.b_0')
+        v = layers.fc(input=values,
+                      size=d_value * n_head,
+                      num_flatten_dims=2,
+                      param_attr=fluid.ParamAttr(name=name + '_value_fc.w_0', initializer=param_initializer),
+                      bias_attr=name + '_value_fc.b_0')
+        return q, k, v
+    def __split_heads(x, n_head):
+        """
+        Reshape the last dimension of inpunt tensor x so that it becomes two
+        dimensions and then transpose. Specifically, input a tensor with shape
+        [bs, max_sequence_length, n_head * hidden_dim] then output a tensor
+        with shape [bs, n_head, max_sequence_length, hidden_dim].
+        """
+        hidden_size = x.shape[-1]
+        # The value 0 in shape attr means copying the corresponding dimension
+        # size of the input as the output dimension size.
+        reshaped = layers.reshape(x=x, shape=[0, 0, n_head, hidden_size // n_head], inplace=True)
+        # permuate the dimensions into:
+        # [batch_size, n_head, max_sequence_len, hidden_size_per_head]
+        return layers.transpose(x=reshaped, perm=[0, 2, 1, 3])
+    def __combine_heads(x):
+        """
+        Transpose and then reshape the last two dimensions of inpunt tensor x
+        so that it becomes one dimension, which is reverse to __split_heads.
+        """
+        if len(x.shape) == 3: return x
+        if len(x.shape) != 4:
+            raise ValueError("Input(x) should be a 4-D Tensor.")
+        trans_x = layers.transpose(x, perm=[0, 2, 1, 3])
+        # The value 0 in shape attr means copying the corresponding dimension
+        # size of the input as the output dimension size.
+        return layers.reshape(x=trans_x, shape=[0, 0, trans_x.shape[2] * trans_x.shape[3]], inplace=True)
+    def scaled_dot_product_attention(q, k, v, attn_bias, d_key, dropout_rate):
+        """
+        Scaled Dot-Product Attention
+        """
+        scaled_q = layers.scale(x=q, scale=d_key**-0.5)
+        product = layers.matmul(x=scaled_q, y=k, transpose_y=True)
+        if attn_bias:
+            product += attn_bias
+        weights = layers.softmax(product)
+        if dropout_rate:
+            weights = layers.dropout(weights,
+                                     dropout_prob=dropout_rate,
+                                     dropout_implementation="upscale_in_train",
+                                     is_test=False)
+        out = layers.matmul(weights, v)
+        return out
+    q, k, v = __compute_qkv(queries, keys, values, n_head, d_key, d_value)
+    if cache is not None:  # use cache and concat time steps
+        # Since the inplace reshape in __split_heads changes the shape of k and
+        # v, which is the cache input for next time step, reshape the cache
+        # input from the previous time step first.
+        k = cache["k"] = layers.concat([layers.reshape(cache["k"], shape=[0, 0, d_model]), k], axis=1)
+        v = cache["v"] = layers.concat([layers.reshape(cache["v"], shape=[0, 0, d_model]), v], axis=1)
+    q = __split_heads(q, n_head)
+    k = __split_heads(k, n_head)
+    v = __split_heads(v, n_head)
+    ctx_multiheads = scaled_dot_product_attention(q, k, v, attn_bias, d_key, dropout_rate)
+    out = __combine_heads(ctx_multiheads)
+    # Project back to the model size.
+    proj_out = layers.fc(input=out,
+                         size=d_model,
+                         num_flatten_dims=2,
+                         param_attr=fluid.ParamAttr(name=name + '_output_fc.w_0', initializer=param_initializer),
+                         bias_attr=name + '_output_fc.b_0')
+    return proj_out
+def positionwise_feed_forward(x, d_inner_hid, d_hid, dropout_rate, hidden_act, param_initializer=None, name='ffn'):
+    """
+    Position-wise Feed-Forward Networks.
+    This module consists of two linear transformations with a ReLU activation
+    in between, which is applied to each position separately and identically.
+    """
+    hidden = layers.fc(input=x,
+                       size=d_inner_hid,
+                       num_flatten_dims=2,
+                       act=hidden_act,
+                       param_attr=fluid.ParamAttr(name=name + '_fc_0.w_0', initializer=param_initializer),
+                       bias_attr=name + '_fc_0.b_0')
+    if dropout_rate:
+        hidden = layers.dropout(hidden,
+                                dropout_prob=dropout_rate,
+                                dropout_implementation="upscale_in_train",
+                                is_test=False)
+    out = layers.fc(input=hidden,
+                    size=d_hid,
+                    num_flatten_dims=2,
+                    param_attr=fluid.ParamAttr(name=name + '_fc_1.w_0', initializer=param_initializer),
+                    bias_attr=name + '_fc_1.b_0')
+    return out
+def pre_post_process_layer(prev_out, out, process_cmd, dropout_rate=0., name=''):
+    """
+    Add residual connection, layer normalization and droput to the out tensor
+    optionally according to the value of process_cmd.
+    This will be used before or after multi-head attention and position-wise
+    feed-forward networks.
+    """
+    for cmd in process_cmd:
+        if cmd == "a":  # add residual connection
+            out = out + prev_out if prev_out else out
+        elif cmd == "n":  # add layer normalization
+            out_dtype = out.dtype
+            if out_dtype == fluid.core.VarDesc.VarType.FP16:
+                out = layers.cast(x=out, dtype="float32")
+            out = layers.layer_norm(out,
+                                    begin_norm_axis=len(out.shape) - 1,
+                                    param_attr=fluid.ParamAttr(name=name + '_layer_norm_scale',
+                                                               initializer=fluid.initializer.Constant(1.)),
+                                    bias_attr=fluid.ParamAttr(name=name + '_layer_norm_bias',
+                                                              initializer=fluid.initializer.Constant(0.)))
+            if out_dtype == fluid.core.VarDesc.VarType.FP16:
+                out = layers.cast(x=out, dtype="float16")
+        elif cmd == "d":  # add dropout
+            if dropout_rate:
+                out = layers.dropout(out,
+                                     dropout_prob=dropout_rate,
+                                     dropout_implementation="upscale_in_train",
+                                     is_test=False)
+    return out
+pre_process_layer = partial(pre_post_process_layer, None)
+post_process_layer = pre_post_process_layer
+def encoder_layer(enc_input,
+                  attn_bias,
+                  n_head,
+                  d_key,
+                  d_value,
+                  d_model,
+                  d_inner_hid,
+                  prepostprocess_dropout,
+                  attention_dropout,
+                  relu_dropout,
+                  hidden_act,
+                  preprocess_cmd="n",
+                  postprocess_cmd="da",
+                  param_initializer=None,
+                  name=''):
+    """The encoder layers that can be stacked to form a deep encoder.
+    This module consits of a multi-head (self) attention followed by
+    position-wise feed-forward networks and both the two components companied
+    with the post_process_layer to add residual connection, layer normalization
+    and droput.
+    """
+    attn_output = multi_head_attention(pre_process_layer(enc_input,
+                                                         preprocess_cmd,
+                                                         prepostprocess_dropout,
+                                                         name=name + '_pre_att'),
+                                       None,
+                                       None,
+                                       attn_bias,
+                                       d_key,
+                                       d_value,
+                                       d_model,
+                                       n_head,
+                                       attention_dropout,
+                                       param_initializer=param_initializer,
+                                       name=name + '_multi_head_att')
+    attn_output = post_process_layer(enc_input,
+                                     attn_output,
+                                     postprocess_cmd,
+                                     prepostprocess_dropout,
+                                     name=name + '_post_att')
+    ffd_output = positionwise_feed_forward(pre_process_layer(attn_output,
+                                                             preprocess_cmd,
+                                                             prepostprocess_dropout,
+                                                             name=name + '_pre_ffn'),
+                                           d_inner_hid,
+                                           d_model,
+                                           relu_dropout,
+                                           hidden_act,
+                                           param_initializer=param_initializer,
+                                           name=name + '_ffn')
+    return post_process_layer(attn_output, ffd_output, postprocess_cmd, prepostprocess_dropout, name=name + '_post_ffn')
+def encoder(enc_input,
+            attn_bias,
+            n_layer,
+            n_head,
+            d_key,
+            d_value,
+            d_model,
+            d_inner_hid,
+            prepostprocess_dropout,
+            attention_dropout,
+            relu_dropout,
+            hidden_act,
+            preprocess_cmd="n",
+            postprocess_cmd="da",
+            param_initializer=None,
+            name=''):
+    """
+    The encoder is composed of a stack of identical layers returned by calling
+    encoder_layer.
+    """
+    for i in range(n_layer):
+        enc_output = encoder_layer(enc_input,
+                                   attn_bias,
+                                   n_head,
+                                   d_key,
+                                   d_value,
+                                   d_model,
+                                   d_inner_hid,
+                                   prepostprocess_dropout,
+                                   attention_dropout,
+                                   relu_dropout,
+                                   hidden_act,
+                                   preprocess_cmd,
+                                   postprocess_cmd,
+                                   param_initializer=param_initializer,
+                                   name=name + '_layer_' + str(i))
+        enc_input = enc_output
+    enc_output = pre_process_layer(enc_output, preprocess_cmd, prepostprocess_dropout, name="post_encoder")
+    return enc_output
--- a/modules/text/semantic_model/bert_uncased_L_12_H_768_A_12/module.py
+++ b/modules/text/semantic_model/bert_uncased_L_12_H_768_A_12/module.py
@@ -58,13 +58,12 @@ class Bert(TransformerModule):
            pooled_output (tensor):  sentence-level output for classification task.
            sequence_output (tensor): token-level output for sequence task.
        """
-        bert = BertModel(
+        bert = BertModel(src_ids=input_ids,
-            src_ids=input_ids,
+                         position_ids=position_ids,
-            position_ids=position_ids,
+                         sentence_ids=segment_ids,
-            sentence_ids=segment_ids,
+                         input_mask=input_mask,
-            input_mask=input_mask,
+                         config=self.bert_config,
-            config=self.bert_config,
+                         use_fp16=False)
-            use_fp16=False)
        pooled_output = bert.get_pooled_output()
        sequence_output = bert.get_sequence_output()
        return pooled_output, sequence_output

--- a/modules/text/semantic_model/bert_uncased_L_24_H_1024_A_16/README.md
+++ b/modules/text/semantic_model/bert_uncased_L_24_H_1024_A_16/README.md
--- a/modules/text/semantic_model/bert_uncased_L_24_H_1024_A_16/__init__.py
+++ b/modules/text/semantic_model/bert_uncased_L_24_H_1024_A_16/__init__.py
--- a/modules/text/semantic_model/bert_uncased_L_24_H_1024_A_16/model/__init__.py
+++ b/modules/text/semantic_model/bert_uncased_L_24_H_1024_A_16/model/__init__.py
--- a/modules/text/semantic_model/bert_uncased_L_24_H_1024_A_16/model/bert.py
+++ b/modules/text/semantic_model/bert_uncased_L_24_H_1024_A_16/model/bert.py
@@ -74,23 +74,23 @@ class BertModel(object):
    def _build_model(self, src_ids, position_ids, sentence_ids, input_mask):
        # padding id in vocabulary must be set to 0
-        emb_out = fluid.layers.embedding(
+        emb_out = fluid.layers.embedding(input=src_ids,
-            input=src_ids,
+                                         size=[self._voc_size, self._emb_size],
-            size=[self._voc_size, self._emb_size],
+                                         dtype=self._dtype,
-            dtype=self._dtype,
+                                         param_attr=fluid.ParamAttr(name=self._word_emb_name,
-            param_attr=fluid.ParamAttr(name=self._word_emb_name, initializer=self._param_initializer),
+                                                                    initializer=self._param_initializer),
-            is_sparse=False)
+                                         is_sparse=False)
-        position_emb_out = fluid.layers.embedding(
+        position_emb_out = fluid.layers.embedding(input=position_ids,
-            input=position_ids,
+                                                  size=[self._max_position_seq_len, self._emb_size],
-            size=[self._max_position_seq_len, self._emb_size],
+                                                  dtype=self._dtype,
-            dtype=self._dtype,
+                                                  param_attr=fluid.ParamAttr(name=self._pos_emb_name,
-            param_attr=fluid.ParamAttr(name=self._pos_emb_name, initializer=self._param_initializer))
+                                                                             initializer=self._param_initializer))
-        sent_emb_out = fluid.layers.embedding(
+        sent_emb_out = fluid.layers.embedding(sentence_ids,
-            sentence_ids,
+                                              size=[self._sent_types, self._emb_size],
-            size=[self._sent_types, self._emb_size],
+                                              dtype=self._dtype,
-            dtype=self._dtype,
+                                              param_attr=fluid.ParamAttr(name=self._sent_emb_name,
-            param_attr=fluid.ParamAttr(name=self._sent_emb_name, initializer=self._param_initializer))
+                                                                         initializer=self._param_initializer))
        emb_out = emb_out + position_emb_out
        emb_out = emb_out + sent_emb_out
@@ -105,23 +105,22 @@ class BertModel(object):
        n_head_self_attn_mask = fluid.layers.stack(x=[self_attn_mask] * self._n_head, axis=1)
        n_head_self_attn_mask.stop_gradient = True
-        self._enc_out = encoder(
+        self._enc_out = encoder(enc_input=emb_out,
-            enc_input=emb_out,
+                                attn_bias=n_head_self_attn_mask,
-            attn_bias=n_head_self_attn_mask,
+                                n_layer=self._n_layer,
-            n_layer=self._n_layer,
+                                n_head=self._n_head,
-            n_head=self._n_head,
+                                d_key=self._emb_size // self._n_head,
-            d_key=self._emb_size // self._n_head,
+                                d_value=self._emb_size // self._n_head,
-            d_value=self._emb_size // self._n_head,
+                                d_model=self._emb_size,
-            d_model=self._emb_size,
+                                d_inner_hid=self._emb_size * 4,
-            d_inner_hid=self._emb_size * 4,
+                                prepostprocess_dropout=self._prepostprocess_dropout,
-            prepostprocess_dropout=self._prepostprocess_dropout,
+                                attention_dropout=self._attention_dropout,
-            attention_dropout=self._attention_dropout,
+                                relu_dropout=0,
-            relu_dropout=0,
+                                hidden_act=self._hidden_act,
-            hidden_act=self._hidden_act,
+                                preprocess_cmd="",
-            preprocess_cmd="",
+                                postprocess_cmd="dan",
-            postprocess_cmd="dan",
+                                param_initializer=self._param_initializer,
-            param_initializer=self._param_initializer,
+                                name='encoder')
-            name='encoder')
    def get_sequence_output(self):
        return self._enc_out
@@ -130,12 +129,12 @@ class BertModel(object):
        """Get the first feature of each sequence for classification"""
        next_sent_feat = fluid.layers.slice(input=self._enc_out, axes=[1], starts=[0], ends=[1])
-        next_sent_feat = fluid.layers.fc(
+        next_sent_feat = fluid.layers.fc(input=next_sent_feat,
-            input=next_sent_feat,
+                                         size=self._emb_size,
-            size=self._emb_size,
+                                         act="tanh",
-            act="tanh",
+                                         param_attr=fluid.ParamAttr(name="pooled_fc.w_0",
-            param_attr=fluid.ParamAttr(name="pooled_fc.w_0", initializer=self._param_initializer),
+                                                                    initializer=self._param_initializer),
-            bias_attr="pooled_fc.b_0")
+                                         bias_attr="pooled_fc.b_0")
        return next_sent_feat
    def get_pretraining_output(self, mask_label, mask_pos, labels):
@@ -150,43 +149,45 @@ class BertModel(object):
        mask_feat = fluid.layers.gather(input=reshaped_emb_out, index=mask_pos)
        # transform: fc
-        mask_trans_feat = fluid.layers.fc(
+        mask_trans_feat = fluid.layers.fc(input=mask_feat,
-            input=mask_feat,
+                                          size=self._emb_size,
-            size=self._emb_size,
+                                          act=self._hidden_act,
-            act=self._hidden_act,
+                                          param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0',
-            param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0', initializer=self._param_initializer),
+                                                                     initializer=self._param_initializer),
-            bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
+                                          bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
        # transform: layer norm
        mask_trans_feat = pre_process_layer(mask_trans_feat, 'n', name='mask_lm_trans')
-        mask_lm_out_bias_attr = fluid.ParamAttr(
+        mask_lm_out_bias_attr = fluid.ParamAttr(name="mask_lm_out_fc.b_0",
-            name="mask_lm_out_fc.b_0", initializer=fluid.initializer.Constant(value=0.0))
+                                                initializer=fluid.initializer.Constant(value=0.0))
        if self._weight_sharing:
-            fc_out = fluid.layers.matmul(
+            fc_out = fluid.layers.matmul(x=mask_trans_feat,
-                x=mask_trans_feat,
+                                         y=fluid.default_main_program().global_block().var(self._word_emb_name),
-                y=fluid.default_main_program().global_block().var(self._word_emb_name),
+                                         transpose_y=True)
-                transpose_y=True)
+            fc_out += fluid.layers.create_parameter(shape=[self._voc_size],
-            fc_out += fluid.layers.create_parameter(
+                                                    dtype=self._dtype,
-                shape=[self._voc_size], dtype=self._dtype, attr=mask_lm_out_bias_attr, is_bias=True)
+                                                    attr=mask_lm_out_bias_attr,
+                                                    is_bias=True)
        else:
-            fc_out = fluid.layers.fc(
+            fc_out = fluid.layers.fc(input=mask_trans_feat,
-                input=mask_trans_feat,
+                                     size=self._voc_size,
-                size=self._voc_size,
+                                     param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0",
-                param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0", initializer=self._param_initializer),
+                                                                initializer=self._param_initializer),
-                bias_attr=mask_lm_out_bias_attr)
+                                     bias_attr=mask_lm_out_bias_attr)
        mask_lm_loss = fluid.layers.softmax_with_cross_entropy(logits=fc_out, label=mask_label)
        mean_mask_lm_loss = fluid.layers.mean(mask_lm_loss)
-        next_sent_fc_out = fluid.layers.fc(
+        next_sent_fc_out = fluid.layers.fc(input=next_sent_feat,
-            input=next_sent_feat,
+                                           size=2,
-            size=2,
+                                           param_attr=fluid.ParamAttr(name="next_sent_fc.w_0",
-            param_attr=fluid.ParamAttr(name="next_sent_fc.w_0", initializer=self._param_initializer),
+                                                                      initializer=self._param_initializer),
-            bias_attr="next_sent_fc.b_0")
+                                           bias_attr="next_sent_fc.b_0")
-        next_sent_loss, next_sent_softmax = fluid.layers.softmax_with_cross_entropy(
+        next_sent_loss, next_sent_softmax = fluid.layers.softmax_with_cross_entropy(logits=next_sent_fc_out,
-            logits=next_sent_fc_out, label=labels, return_softmax=True)
+                                                                                    label=labels,
+                                                                                    return_softmax=True)
        next_sent_acc = fluid.layers.accuracy(input=next_sent_softmax, label=labels)

--- a/modules/text/language_model/bert_uncased_L_24_H_1024_A_16/model/transformer_encoder.py
+++ b/modules/text/language_model/bert_uncased_L_24_H_1024_A_16/model/transformer_encoder.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Transformer encoder."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from functools import partial
+import paddle.fluid as fluid
+import paddle.fluid.layers as layers
+def multi_head_attention(queries,
+                         keys,
+                         values,
+                         attn_bias,
+                         d_key,
+                         d_value,
+                         d_model,
+                         n_head=1,
+                         dropout_rate=0.,
+                         cache=None,
+                         param_initializer=None,
+                         name='multi_head_att'):
+    """
+    Multi-Head Attention. Note that attn_bias is added to the logit before
+    computing softmax activiation to mask certain selected positions so that
+    they will not considered in attention weights.
+    """
+    keys = queries if keys is None else keys
+    values = keys if values is None else values
+    if not (len(queries.shape) == len(keys.shape) == len(values.shape) == 3):
+        raise ValueError("Inputs: quries, keys and values should all be 3-D tensors.")
+    def __compute_qkv(queries, keys, values, n_head, d_key, d_value):
+        """
+        Add linear projection to queries, keys, and values.
+        """
+        q = layers.fc(input=queries,
+                      size=d_key * n_head,
+                      num_flatten_dims=2,
+                      param_attr=fluid.ParamAttr(name=name + '_query_fc.w_0', initializer=param_initializer),
+                      bias_attr=name + '_query_fc.b_0')
+        k = layers.fc(input=keys,
+                      size=d_key * n_head,
+                      num_flatten_dims=2,
+                      param_attr=fluid.ParamAttr(name=name + '_key_fc.w_0', initializer=param_initializer),
+                      bias_attr=name + '_key_fc.b_0')
+        v = layers.fc(input=values,
+                      size=d_value * n_head,
+                      num_flatten_dims=2,
+                      param_attr=fluid.ParamAttr(name=name + '_value_fc.w_0', initializer=param_initializer),
+                      bias_attr=name + '_value_fc.b_0')
+        return q, k, v
+    def __split_heads(x, n_head):
+        """
+        Reshape the last dimension of inpunt tensor x so that it becomes two
+        dimensions and then transpose. Specifically, input a tensor with shape
+        [bs, max_sequence_length, n_head * hidden_dim] then output a tensor
+        with shape [bs, n_head, max_sequence_length, hidden_dim].
+        """
+        hidden_size = x.shape[-1]
+        # The value 0 in shape attr means copying the corresponding dimension
+        # size of the input as the output dimension size.
+        reshaped = layers.reshape(x=x, shape=[0, 0, n_head, hidden_size // n_head], inplace=True)
+        # permuate the dimensions into:
+        # [batch_size, n_head, max_sequence_len, hidden_size_per_head]
+        return layers.transpose(x=reshaped, perm=[0, 2, 1, 3])
+    def __combine_heads(x):
+        """
+        Transpose and then reshape the last two dimensions of inpunt tensor x
+        so that it becomes one dimension, which is reverse to __split_heads.
+        """
+        if len(x.shape) == 3: return x
+        if len(x.shape) != 4:
+            raise ValueError("Input(x) should be a 4-D Tensor.")
+        trans_x = layers.transpose(x, perm=[0, 2, 1, 3])
+        # The value 0 in shape attr means copying the corresponding dimension
+        # size of the input as the output dimension size.
+        return layers.reshape(x=trans_x, shape=[0, 0, trans_x.shape[2] * trans_x.shape[3]], inplace=True)
+    def scaled_dot_product_attention(q, k, v, attn_bias, d_key, dropout_rate):
+        """
+        Scaled Dot-Product Attention
+        """
+        scaled_q = layers.scale(x=q, scale=d_key**-0.5)
+        product = layers.matmul(x=scaled_q, y=k, transpose_y=True)
+        if attn_bias:
+            product += attn_bias
+        weights = layers.softmax(product)
+        if dropout_rate:
+            weights = layers.dropout(weights,
+                                     dropout_prob=dropout_rate,
+                                     dropout_implementation="upscale_in_train",
+                                     is_test=False)
+        out = layers.matmul(weights, v)
+        return out
+    q, k, v = __compute_qkv(queries, keys, values, n_head, d_key, d_value)
+    if cache is not None:  # use cache and concat time steps
+        # Since the inplace reshape in __split_heads changes the shape of k and
+        # v, which is the cache input for next time step, reshape the cache
+        # input from the previous time step first.
+        k = cache["k"] = layers.concat([layers.reshape(cache["k"], shape=[0, 0, d_model]), k], axis=1)
+        v = cache["v"] = layers.concat([layers.reshape(cache["v"], shape=[0, 0, d_model]), v], axis=1)
+    q = __split_heads(q, n_head)
+    k = __split_heads(k, n_head)
+    v = __split_heads(v, n_head)
+    ctx_multiheads = scaled_dot_product_attention(q, k, v, attn_bias, d_key, dropout_rate)
+    out = __combine_heads(ctx_multiheads)
+    # Project back to the model size.
+    proj_out = layers.fc(input=out,
+                         size=d_model,
+                         num_flatten_dims=2,
+                         param_attr=fluid.ParamAttr(name=name + '_output_fc.w_0', initializer=param_initializer),
+                         bias_attr=name + '_output_fc.b_0')
+    return proj_out
+def positionwise_feed_forward(x, d_inner_hid, d_hid, dropout_rate, hidden_act, param_initializer=None, name='ffn'):
+    """
+    Position-wise Feed-Forward Networks.
+    This module consists of two linear transformations with a ReLU activation
+    in between, which is applied to each position separately and identically.
+    """
+    hidden = layers.fc(input=x,
+                       size=d_inner_hid,
+                       num_flatten_dims=2,
+                       act=hidden_act,
+                       param_attr=fluid.ParamAttr(name=name + '_fc_0.w_0', initializer=param_initializer),
+                       bias_attr=name + '_fc_0.b_0')
+    if dropout_rate:
+        hidden = layers.dropout(hidden,
+                                dropout_prob=dropout_rate,
+                                dropout_implementation="upscale_in_train",
+                                is_test=False)
+    out = layers.fc(input=hidden,
+                    size=d_hid,
+                    num_flatten_dims=2,
+                    param_attr=fluid.ParamAttr(name=name + '_fc_1.w_0', initializer=param_initializer),
+                    bias_attr=name + '_fc_1.b_0')
+    return out
+def pre_post_process_layer(prev_out, out, process_cmd, dropout_rate=0., name=''):
+    """
+    Add residual connection, layer normalization and droput to the out tensor
+    optionally according to the value of process_cmd.
+    This will be used before or after multi-head attention and position-wise
+    feed-forward networks.
+    """
+    for cmd in process_cmd:
+        if cmd == "a":  # add residual connection
+            out = out + prev_out if prev_out else out
+        elif cmd == "n":  # add layer normalization
+            out_dtype = out.dtype
+            if out_dtype == fluid.core.VarDesc.VarType.FP16:
+                out = layers.cast(x=out, dtype="float32")
+            out = layers.layer_norm(out,
+                                    begin_norm_axis=len(out.shape) - 1,
+                                    param_attr=fluid.ParamAttr(name=name + '_layer_norm_scale',
+                                                               initializer=fluid.initializer.Constant(1.)),
+                                    bias_attr=fluid.ParamAttr(name=name + '_layer_norm_bias',
+                                                              initializer=fluid.initializer.Constant(0.)))
+            if out_dtype == fluid.core.VarDesc.VarType.FP16:
+                out = layers.cast(x=out, dtype="float16")
+        elif cmd == "d":  # add dropout
+            if dropout_rate:
+                out = layers.dropout(out,
+                                     dropout_prob=dropout_rate,
+                                     dropout_implementation="upscale_in_train",
+                                     is_test=False)
+    return out
+pre_process_layer = partial(pre_post_process_layer, None)
+post_process_layer = pre_post_process_layer
+def encoder_layer(enc_input,
+                  attn_bias,
+                  n_head,
+                  d_key,
+                  d_value,
+                  d_model,
+                  d_inner_hid,
+                  prepostprocess_dropout,
+                  attention_dropout,
+                  relu_dropout,
+                  hidden_act,
+                  preprocess_cmd="n",
+                  postprocess_cmd="da",
+                  param_initializer=None,
+                  name=''):
+    """The encoder layers that can be stacked to form a deep encoder.
+    This module consits of a multi-head (self) attention followed by
+    position-wise feed-forward networks and both the two components companied
+    with the post_process_layer to add residual connection, layer normalization
+    and droput.
+    """
+    attn_output = multi_head_attention(pre_process_layer(enc_input,
+                                                         preprocess_cmd,
+                                                         prepostprocess_dropout,
+                                                         name=name + '_pre_att'),
+                                       None,
+                                       None,
+                                       attn_bias,
+                                       d_key,
+                                       d_value,
+                                       d_model,
+                                       n_head,
+                                       attention_dropout,
+                                       param_initializer=param_initializer,
+                                       name=name + '_multi_head_att')
+    attn_output = post_process_layer(enc_input,
+                                     attn_output,
+                                     postprocess_cmd,
+                                     prepostprocess_dropout,
+                                     name=name + '_post_att')
+    ffd_output = positionwise_feed_forward(pre_process_layer(attn_output,
+                                                             preprocess_cmd,
+                                                             prepostprocess_dropout,
+                                                             name=name + '_pre_ffn'),
+                                           d_inner_hid,
+                                           d_model,
+                                           relu_dropout,
+                                           hidden_act,
+                                           param_initializer=param_initializer,
+                                           name=name + '_ffn')
+    return post_process_layer(attn_output, ffd_output, postprocess_cmd, prepostprocess_dropout, name=name + '_post_ffn')
+def encoder(enc_input,
+            attn_bias,
+            n_layer,
+            n_head,
+            d_key,
+            d_value,
+            d_model,
+            d_inner_hid,
+            prepostprocess_dropout,
+            attention_dropout,
+            relu_dropout,
+            hidden_act,
+            preprocess_cmd="n",
+            postprocess_cmd="da",
+            param_initializer=None,
+            name=''):
+    """
+    The encoder is composed of a stack of identical layers returned by calling
+    encoder_layer.
+    """
+    for i in range(n_layer):
+        enc_output = encoder_layer(enc_input,
+                                   attn_bias,
+                                   n_head,
+                                   d_key,
+                                   d_value,
+                                   d_model,
+                                   d_inner_hid,
+                                   prepostprocess_dropout,
+                                   attention_dropout,
+                                   relu_dropout,
+                                   hidden_act,
+                                   preprocess_cmd,
+                                   postprocess_cmd,
+                                   param_initializer=param_initializer,
+                                   name=name + '_layer_' + str(i))
+        enc_input = enc_output
+    enc_output = pre_process_layer(enc_output, preprocess_cmd, prepostprocess_dropout, name="post_encoder")
+    return enc_output
--- a/modules/text/semantic_model/bert_uncased_L_24_H_1024_A_16/module.py
+++ b/modules/text/semantic_model/bert_uncased_L_24_H_1024_A_16/module.py
--- a/modules/text/semantic_model/chinese_bert_wwm/README.md
+++ b/modules/text/semantic_model/chinese_bert_wwm/README.md
--- a/modules/text/semantic_model/chinese_bert_wwm/__init__.py
+++ b/modules/text/semantic_model/chinese_bert_wwm/__init__.py
--- a/modules/text/semantic_model/chinese_bert_wwm/model/__init__.py
+++ b/modules/text/semantic_model/chinese_bert_wwm/model/__init__.py
--- a/modules/text/semantic_model/chinese_bert_wwm/model/bert.py
+++ b/modules/text/semantic_model/chinese_bert_wwm/model/bert.py
--- a/modules/text/language_model/chinese_bert_wwm/model/transformer_encoder.py
+++ b/modules/text/language_model/chinese_bert_wwm/model/transformer_encoder.py
--- a/modules/text/semantic_model/chinese_bert_wwm/module.py
+++ b/modules/text/semantic_model/chinese_bert_wwm/module.py
--- a/modules/text/semantic_model/chinese_bert_wwm_ext/README.md
+++ b/modules/text/semantic_model/chinese_bert_wwm_ext/README.md
--- a/modules/text/semantic_model/chinese_bert_wwm_ext/__init__.py
+++ b/modules/text/semantic_model/chinese_bert_wwm_ext/__init__.py
--- a/modules/text/semantic_model/chinese_bert_wwm_ext/model/__init__.py
+++ b/modules/text/semantic_model/chinese_bert_wwm_ext/model/__init__.py
--- a/modules/text/semantic_model/chinese_bert_wwm_ext/model/bert.py
+++ b/modules/text/semantic_model/chinese_bert_wwm_ext/model/bert.py
--- a/modules/text/language_model/chinese_bert_wwm_ext/model/transformer_encoder.py
+++ b/modules/text/language_model/chinese_bert_wwm_ext/model/transformer_encoder.py
--- a/modules/text/semantic_model/chinese_bert_wwm_ext/module.py
+++ b/modules/text/semantic_model/chinese_bert_wwm_ext/module.py
--- a/modules/text/semantic_model/chinese_electra_base/README.md
+++ b/modules/text/semantic_model/chinese_electra_base/README.md
--- a/modules/text/semantic_model/chinese_electra_base/__init__.py
+++ b/modules/text/semantic_model/chinese_electra_base/__init__.py
--- a/modules/text/semantic_model/chinese_electra_base/model/__init__.py
+++ b/modules/text/semantic_model/chinese_electra_base/model/__init__.py
--- a/modules/text/semantic_model/chinese_electra_base/model/electra.py
+++ b/modules/text/semantic_model/chinese_electra_base/model/electra.py
--- a/modules/text/language_model/chinese_electra_base/model/transformer_encoder.py
+++ b/modules/text/language_model/chinese_electra_base/model/transformer_encoder.py
--- a/modules/text/semantic_model/chinese_electra_base/module.py
+++ b/modules/text/semantic_model/chinese_electra_base/module.py
--- a/modules/text/semantic_model/chinese_electra_small/README.md
+++ b/modules/text/semantic_model/chinese_electra_small/README.md
--- a/modules/text/semantic_model/chinese_electra_small/__init__.py
+++ b/modules/text/semantic_model/chinese_electra_small/__init__.py
--- a/modules/text/semantic_model/chinese_electra_small/model/__init__.py
+++ b/modules/text/semantic_model/chinese_electra_small/model/__init__.py
--- a/modules/text/semantic_model/chinese_electra_small/model/electra.py
+++ b/modules/text/semantic_model/chinese_electra_small/model/electra.py
--- a/modules/text/language_model/chinese_electra_small/model/transformer_encoder.py
+++ b/modules/text/language_model/chinese_electra_small/model/transformer_encoder.py
--- a/modules/text/semantic_model/chinese_electra_small/module.py
+++ b/modules/text/semantic_model/chinese_electra_small/module.py
--- a/modules/text/semantic_model/chinese_roberta_wwm_ext/README.md
+++ b/modules/text/semantic_model/chinese_roberta_wwm_ext/README.md
--- a/modules/text/semantic_model/chinese_roberta_wwm_ext/__init__.py
+++ b/modules/text/semantic_model/chinese_roberta_wwm_ext/__init__.py
--- a/modules/text/semantic_model/chinese_roberta_wwm_ext/model/__init__.py
+++ b/modules/text/semantic_model/chinese_roberta_wwm_ext/model/__init__.py
--- a/modules/text/semantic_model/chinese_roberta_wwm_ext/model/bert.py
+++ b/modules/text/semantic_model/chinese_roberta_wwm_ext/model/bert.py
--- a/modules/text/language_model/chinese_roberta_wwm_ext/model/transformer_encoder.py
+++ b/modules/text/language_model/chinese_roberta_wwm_ext/model/transformer_encoder.py
--- a/modules/text/semantic_model/chinese_roberta_wwm_ext/module.py
+++ b/modules/text/semantic_model/chinese_roberta_wwm_ext/module.py
--- a/modules/text/semantic_model/chinese_roberta_wwm_ext_large/README.md
+++ b/modules/text/semantic_model/chinese_roberta_wwm_ext_large/README.md
--- a/modules/text/semantic_model/chinese_roberta_wwm_ext_large/__init__.py
+++ b/modules/text/semantic_model/chinese_roberta_wwm_ext_large/__init__.py
--- a/modules/text/semantic_model/chinese_roberta_wwm_ext_large/model/__init__.py
+++ b/modules/text/semantic_model/chinese_roberta_wwm_ext_large/model/__init__.py
--- a/modules/text/semantic_model/chinese_roberta_wwm_ext_large/model/bert.py
+++ b/modules/text/semantic_model/chinese_roberta_wwm_ext_large/model/bert.py
--- a/modules/text/language_model/chinese_roberta_wwm_ext_large/model/transformer_encoder.py
+++ b/modules/text/language_model/chinese_roberta_wwm_ext_large/model/transformer_encoder.py
--- a/modules/text/semantic_model/chinese_roberta_wwm_ext_large/module.py
+++ b/modules/text/semantic_model/chinese_roberta_wwm_ext_large/module.py
--- a/modules/text/semantic_model/ernie/README.md
+++ b/modules/text/semantic_model/ernie/README.md
--- a/modules/text/semantic_model/ernie/__init__.py
+++ b/modules/text/semantic_model/ernie/__init__.py
--- a/modules/text/semantic_model/ernie/model/__init__.py
+++ b/modules/text/semantic_model/ernie/model/__init__.py
--- a/modules/text/semantic_model/ernie/model/ernie.py
+++ b/modules/text/semantic_model/ernie/model/ernie.py
--- a/modules/text/language_model/ernie/model/transformer_encoder.py
+++ b/modules/text/language_model/ernie/model/transformer_encoder.py
--- a/modules/text/semantic_model/ernie/module.py
+++ b/modules/text/semantic_model/ernie/module.py
--- a/modules/text/semantic_model/ernie_tiny/README.md
+++ b/modules/text/semantic_model/ernie_tiny/README.md
--- a/modules/text/semantic_model/ernie_tiny/__init__.py
+++ b/modules/text/semantic_model/ernie_tiny/__init__.py
--- a/modules/text/semantic_model/ernie_tiny/model/__init__.py
+++ b/modules/text/semantic_model/ernie_tiny/model/__init__.py
--- a/modules/text/semantic_model/ernie_tiny/model/ernie.py
+++ b/modules/text/semantic_model/ernie_tiny/model/ernie.py
--- a/modules/text/language_model/ernie_tiny/model/transformer_encoder.py
+++ b/modules/text/language_model/ernie_tiny/model/transformer_encoder.py
--- a/modules/text/semantic_model/ernie_tiny/module.py
+++ b/modules/text/semantic_model/ernie_tiny/module.py
--- a/modules/text/semantic_model/ernie_v2_eng_base/README.md
+++ b/modules/text/semantic_model/ernie_v2_eng_base/README.md
--- a/modules/text/semantic_model/ernie_v2_eng_base/__init__.py
+++ b/modules/text/semantic_model/ernie_v2_eng_base/__init__.py
--- a/modules/text/semantic_model/ernie_v2_eng_base/model/__init__.py
+++ b/modules/text/semantic_model/ernie_v2_eng_base/model/__init__.py
--- a/modules/text/semantic_model/ernie_v2_eng_base/model/ernie.py
+++ b/modules/text/semantic_model/ernie_v2_eng_base/model/ernie.py
--- a/modules/text/language_model/ernie_v2_eng_base/model/transformer_encoder.py
+++ b/modules/text/language_model/ernie_v2_eng_base/model/transformer_encoder.py
--- a/modules/text/semantic_model/ernie_v2_eng_base/module.py
+++ b/modules/text/semantic_model/ernie_v2_eng_base/module.py
--- a/modules/text/semantic_model/ernie_v2_eng_large/README.md
+++ b/modules/text/semantic_model/ernie_v2_eng_large/README.md
--- a/modules/text/semantic_model/ernie_v2_eng_large/__init__.py
+++ b/modules/text/semantic_model/ernie_v2_eng_large/__init__.py
--- a/modules/text/semantic_model/ernie_v2_eng_large/model/__init__.py
+++ b/modules/text/semantic_model/ernie_v2_eng_large/model/__init__.py
--- a/modules/text/semantic_model/ernie_v2_eng_large/model/ernie.py
+++ b/modules/text/semantic_model/ernie_v2_eng_large/model/ernie.py
--- a/modules/text/language_model/ernie_v2_eng_large/model/transformer_encoder.py
+++ b/modules/text/language_model/ernie_v2_eng_large/model/transformer_encoder.py
--- a/modules/text/semantic_model/ernie_v2_eng_large/module.py
+++ b/modules/text/semantic_model/ernie_v2_eng_large/module.py
--- a/modules/text/semantic_model/lda_news/README.md
+++ b/modules/text/semantic_model/lda_news/README.md
--- a/modules/text/semantic_model/lda_news/__init__.py
+++ b/modules/text/semantic_model/lda_news/__init__.py
--- a/modules/text/semantic_model/lda_news/config.py
+++ b/modules/text/semantic_model/lda_news/config.py
--- a/modules/text/semantic_model/lda_novel/document.py
+++ b/modules/text/semantic_model/lda_novel/document.py
--- a/modules/text/semantic_model/lda_news/inference_engine.py
+++ b/modules/text/semantic_model/lda_news/inference_engine.py
--- a/modules/text/semantic_model/lda_news/model.py
+++ b/modules/text/semantic_model/lda_news/model.py
--- a/modules/text/semantic_model/lda_news/module.py
+++ b/modules/text/semantic_model/lda_news/module.py
--- a/modules/text/semantic_model/lda_news/sampler.py
+++ b/modules/text/semantic_model/lda_news/sampler.py
--- a/modules/text/semantic_model/lda_news/semantic_matching.py
+++ b/modules/text/semantic_model/lda_news/semantic_matching.py
--- a/modules/text/semantic_model/lda_news/tokenizer.py
+++ b/modules/text/semantic_model/lda_news/tokenizer.py
--- a/modules/text/semantic_model/lda_news/util.py
+++ b/modules/text/semantic_model/lda_news/util.py
--- a/modules/text/semantic_model/lda_news/vocab.py
+++ b/modules/text/semantic_model/lda_news/vocab.py
--- a/modules/text/semantic_model/lda_news/vose_alias.py
+++ b/modules/text/semantic_model/lda_news/vose_alias.py
--- a/modules/text/semantic_model/lda_novel/README.md
+++ b/modules/text/semantic_model/lda_novel/README.md
--- a/modules/text/semantic_model/lda_novel/__init__.py
+++ b/modules/text/semantic_model/lda_novel/__init__.py
--- a/modules/text/semantic_model/lda_novel/config.py
+++ b/modules/text/semantic_model/lda_novel/config.py
--- a/modules/text/semantic_model/lda_news/document.py
+++ b/modules/text/semantic_model/lda_news/document.py
--- a/modules/text/semantic_model/lda_novel/inference_engine.py
+++ b/modules/text/semantic_model/lda_novel/inference_engine.py
--- a/modules/text/semantic_model/lda_novel/model.py
+++ b/modules/text/semantic_model/lda_novel/model.py
--- a/modules/text/semantic_model/lda_novel/module.py
+++ b/modules/text/semantic_model/lda_novel/module.py
--- a/modules/text/semantic_model/lda_novel/sampler.py
+++ b/modules/text/semantic_model/lda_novel/sampler.py
--- a/modules/text/semantic_model/lda_novel/semantic_matching.py
+++ b/modules/text/semantic_model/lda_novel/semantic_matching.py
--- a/modules/text/semantic_model/lda_webpage/tokenizer.py
+++ b/modules/text/semantic_model/lda_webpage/tokenizer.py
--- a/modules/text/semantic_model/lda_novel/util.py
+++ b/modules/text/semantic_model/lda_novel/util.py
--- a/modules/text/semantic_model/lda_novel/vocab.py
+++ b/modules/text/semantic_model/lda_novel/vocab.py
--- a/modules/text/semantic_model/lda_novel/vose_alias.py
+++ b/modules/text/semantic_model/lda_novel/vose_alias.py
--- a/modules/text/semantic_model/lda_webpage/README.md
+++ b/modules/text/semantic_model/lda_webpage/README.md
--- a/modules/text/semantic_model/lda_webpage/__init__.py
+++ b/modules/text/semantic_model/lda_webpage/__init__.py
--- a/modules/text/semantic_model/lda_webpage/config.py
+++ b/modules/text/semantic_model/lda_webpage/config.py
--- a/modules/text/semantic_model/slda_news/document.py
+++ b/modules/text/semantic_model/slda_news/document.py
--- a/modules/text/semantic_model/lda_webpage/inference_engine.py
+++ b/modules/text/semantic_model/lda_webpage/inference_engine.py
--- a/modules/text/semantic_model/lda_webpage/model.py
+++ b/modules/text/semantic_model/lda_webpage/model.py
--- a/modules/text/semantic_model/lda_webpage/module.py
+++ b/modules/text/semantic_model/lda_webpage/module.py
--- a/modules/text/semantic_model/lda_webpage/sampler.py
+++ b/modules/text/semantic_model/lda_webpage/sampler.py
--- a/modules/text/semantic_model/lda_webpage/semantic_matching.py
+++ b/modules/text/semantic_model/lda_webpage/semantic_matching.py
--- a/modules/text/semantic_model/slda_novel/tokenizer.py
+++ b/modules/text/semantic_model/slda_novel/tokenizer.py
--- a/modules/text/semantic_model/lda_webpage/util.py
+++ b/modules/text/semantic_model/lda_webpage/util.py
--- a/modules/text/semantic_model/lda_webpage/vocab.py
+++ b/modules/text/semantic_model/lda_webpage/vocab.py
--- a/modules/text/semantic_model/lda_webpage/vose_alias.py
+++ b/modules/text/semantic_model/lda_webpage/vose_alias.py
--- a/modules/text/semantic_model/rbt3/README.md
+++ b/modules/text/semantic_model/rbt3/README.md
--- a/modules/text/semantic_model/rbt3/__init__.py
+++ b/modules/text/semantic_model/rbt3/__init__.py
--- a/modules/text/semantic_model/rbt3/model/__init__.py
+++ b/modules/text/semantic_model/rbt3/model/__init__.py
--- a/modules/text/semantic_model/rbt3/model/bert.py
+++ b/modules/text/semantic_model/rbt3/model/bert.py
--- a/modules/text/language_model/rbt3/model/transformer_encoder.py
+++ b/modules/text/language_model/rbt3/model/transformer_encoder.py
--- a/modules/text/semantic_model/rbt3/module.py
+++ b/modules/text/semantic_model/rbt3/module.py
--- a/modules/text/semantic_model/rbtl3/README.md
+++ b/modules/text/semantic_model/rbtl3/README.md
--- a/modules/text/semantic_model/rbtl3/__init__.py
+++ b/modules/text/semantic_model/rbtl3/__init__.py
--- a/modules/text/semantic_model/rbtl3/model/__init__.py
+++ b/modules/text/semantic_model/rbtl3/model/__init__.py
--- a/modules/text/semantic_model/rbtl3/model/bert.py
+++ b/modules/text/semantic_model/rbtl3/model/bert.py
--- a/modules/text/language_model/rbtl3/model/transformer_encoder.py
+++ b/modules/text/language_model/rbtl3/model/transformer_encoder.py
--- a/modules/text/semantic_model/rbtl3/module.py
+++ b/modules/text/semantic_model/rbtl3/module.py
--- a/modules/text/semantic_model/simnet_bow/README.md
+++ b/modules/text/semantic_model/simnet_bow/README.md
--- a/modules/text/semantic_model/simnet_bow/__init__.py
+++ b/modules/text/semantic_model/simnet_bow/__init__.py
--- a/modules/text/semantic_model/simnet_bow/assets/params.txt
+++ b/modules/text/semantic_model/simnet_bow/assets/params.txt
--- a/modules/text/semantic_model/simnet_bow/assets/vocab.txt
+++ b/modules/text/semantic_model/simnet_bow/assets/vocab.txt
--- a/modules/text/semantic_model/simnet_bow/module.py
+++ b/modules/text/semantic_model/simnet_bow/module.py
--- a/modules/text/semantic_model/simnet_bow/processor.py
+++ b/modules/text/semantic_model/simnet_bow/processor.py
--- a/modules/text/semantic_model/slda_news/README.md
+++ b/modules/text/semantic_model/slda_news/README.md
--- a/modules/text/semantic_model/slda_news/__init__.py
+++ b/modules/text/semantic_model/slda_news/__init__.py
--- a/modules/text/semantic_model/slda_news/config.py
+++ b/modules/text/semantic_model/slda_news/config.py
--- a/modules/text/semantic_model/lda_webpage/document.py
+++ b/modules/text/semantic_model/lda_webpage/document.py
--- a/modules/text/semantic_model/slda_news/inference_engine.py
+++ b/modules/text/semantic_model/slda_news/inference_engine.py
--- a/modules/text/semantic_model/slda_news/model.py
+++ b/modules/text/semantic_model/slda_news/model.py
--- a/modules/text/semantic_model/slda_news/module.py
+++ b/modules/text/semantic_model/slda_news/module.py
--- a/modules/text/semantic_model/slda_news/sampler.py
+++ b/modules/text/semantic_model/slda_news/sampler.py
--- a/modules/text/semantic_model/slda_news/semantic_matching.py
+++ b/modules/text/semantic_model/slda_news/semantic_matching.py
--- a/modules/text/semantic_model/slda_news/tokenizer.py
+++ b/modules/text/semantic_model/slda_news/tokenizer.py
--- a/modules/text/semantic_model/slda_news/util.py
+++ b/modules/text/semantic_model/slda_news/util.py
--- a/modules/text/semantic_model/slda_news/vocab.py
+++ b/modules/text/semantic_model/slda_news/vocab.py
--- a/modules/text/semantic_model/slda_news/vose_alias.py
+++ b/modules/text/semantic_model/slda_news/vose_alias.py
--- a/modules/text/semantic_model/slda_novel/README.md
+++ b/modules/text/semantic_model/slda_novel/README.md
--- a/modules/text/semantic_model/slda_novel/__init__.py
+++ b/modules/text/semantic_model/slda_novel/__init__.py
--- a/modules/text/semantic_model/slda_novel/config.py
+++ b/modules/text/semantic_model/slda_novel/config.py
--- a/modules/text/language_model/slda_novel/document.py
+++ b/modules/text/language_model/slda_novel/document.py
--- a/modules/text/semantic_model/slda_novel/inference_engine.py
+++ b/modules/text/semantic_model/slda_novel/inference_engine.py
--- a/modules/text/semantic_model/slda_novel/model.py
+++ b/modules/text/semantic_model/slda_novel/model.py
--- a/modules/text/semantic_model/slda_novel/module.py
+++ b/modules/text/semantic_model/slda_novel/module.py
--- a/modules/text/semantic_model/slda_novel/sampler.py
+++ b/modules/text/semantic_model/slda_novel/sampler.py
--- a/modules/text/semantic_model/slda_novel/semantic_matching.py
+++ b/modules/text/semantic_model/slda_novel/semantic_matching.py
--- a/modules/text/semantic_model/lda_novel/tokenizer.py
+++ b/modules/text/semantic_model/lda_novel/tokenizer.py
--- a/modules/text/semantic_model/slda_novel/util.py
+++ b/modules/text/semantic_model/slda_novel/util.py
--- a/modules/text/semantic_model/slda_novel/vocab.py
+++ b/modules/text/semantic_model/slda_novel/vocab.py
--- a/modules/text/semantic_model/slda_novel/vose_alias.py
+++ b/modules/text/semantic_model/slda_novel/vose_alias.py
--- a/modules/text/semantic_model/slda_webpage/README.md
+++ b/modules/text/semantic_model/slda_webpage/README.md
--- a/modules/text/semantic_model/slda_webpage/__init__.py
+++ b/modules/text/semantic_model/slda_webpage/__init__.py
--- a/modules/text/semantic_model/slda_webpage/config.py
+++ b/modules/text/semantic_model/slda_webpage/config.py
--- a/modules/text/language_model/slda_webpage/document.py
+++ b/modules/text/language_model/slda_webpage/document.py
--- a/modules/text/semantic_model/slda_webpage/inference_engine.py
+++ b/modules/text/semantic_model/slda_webpage/inference_engine.py
--- a/modules/text/semantic_model/slda_webpage/model.py
+++ b/modules/text/semantic_model/slda_webpage/model.py
--- a/modules/text/semantic_model/slda_webpage/module.py
+++ b/modules/text/semantic_model/slda_webpage/module.py
--- a/modules/text/semantic_model/slda_webpage/sampler.py
+++ b/modules/text/semantic_model/slda_webpage/sampler.py
--- a/modules/text/semantic_model/slda_webpage/semantic_matching.py
+++ b/modules/text/semantic_model/slda_webpage/semantic_matching.py
--- a/modules/text/language_model/slda_webpage/tokenizer.py
+++ b/modules/text/language_model/slda_webpage/tokenizer.py
--- a/modules/text/semantic_model/slda_webpage/util.py
+++ b/modules/text/semantic_model/slda_webpage/util.py
--- a/modules/text/semantic_model/slda_webpage/vocab.py
+++ b/modules/text/semantic_model/slda_webpage/vocab.py
--- a/modules/text/semantic_model/slda_webpage/vose_alias.py
+++ b/modules/text/semantic_model/slda_webpage/vose_alias.py
--- a/modules/text/semantic_model/slda_weibo/README.md
+++ b/modules/text/semantic_model/slda_weibo/README.md
--- a/modules/text/semantic_model/slda_weibo/__init__.py
+++ b/modules/text/semantic_model/slda_weibo/__init__.py
--- a/modules/text/semantic_model/slda_weibo/config.py
+++ b/modules/text/semantic_model/slda_weibo/config.py
--- a/modules/text/language_model/slda_weibo/document.py
+++ b/modules/text/language_model/slda_weibo/document.py
--- a/modules/text/semantic_model/slda_weibo/inference_engine.py
+++ b/modules/text/semantic_model/slda_weibo/inference_engine.py
--- a/modules/text/semantic_model/slda_weibo/model.py
+++ b/modules/text/semantic_model/slda_weibo/model.py
--- a/modules/text/semantic_model/slda_weibo/module.py
+++ b/modules/text/semantic_model/slda_weibo/module.py
--- a/modules/text/semantic_model/slda_weibo/sampler.py
+++ b/modules/text/semantic_model/slda_weibo/sampler.py
--- a/modules/text/semantic_model/slda_weibo/semantic_matching.py
+++ b/modules/text/semantic_model/slda_weibo/semantic_matching.py
--- a/modules/text/language_model/slda_weibo/tokenizer.py
+++ b/modules/text/language_model/slda_weibo/tokenizer.py
--- a/modules/text/semantic_model/slda_weibo/util.py
+++ b/modules/text/semantic_model/slda_weibo/util.py
--- a/modules/text/semantic_model/slda_weibo/vocab.py
+++ b/modules/text/semantic_model/slda_weibo/vocab.py
--- a/modules/text/semantic_model/slda_weibo/vose_alias.py
+++ b/modules/text/semantic_model/slda_weibo/vose_alias.py
--- a/modules/text/lexical_analysis/README.md
+++ b/modules/text/lexical_analysis/README.md
--- a/modules/text/semantic_model/README.md
+++ b/modules/text/semantic_model/README.md
--- a/modules/text/semantic_model/bert_multi_uncased_L_12_H_768_A_12/model/transformer_encoder.py
+++ b/modules/text/semantic_model/bert_multi_uncased_L_12_H_768_A_12/model/transformer_encoder.py
--- a/modules/text/semantic_model/bert_uncased_L_12_H_768_A_12/model/transformer_encoder.py
+++ b/modules/text/semantic_model/bert_uncased_L_12_H_768_A_12/model/transformer_encoder.py
--- a/modules/text/semantic_model/bert_uncased_L_24_H_1024_A_16/model/transformer_encoder.py
+++ b/modules/text/semantic_model/bert_uncased_L_24_H_1024_A_16/model/transformer_encoder.py
--- a/modules/text/semantic_model/chinese_bert_wwm/model/transformer_encoder.py
+++ b/modules/text/semantic_model/chinese_bert_wwm/model/transformer_encoder.py
--- a/modules/text/semantic_model/chinese_bert_wwm_ext/model/transformer_encoder.py
+++ b/modules/text/semantic_model/chinese_bert_wwm_ext/model/transformer_encoder.py
--- a/modules/text/semantic_model/chinese_electra_base/model/transformer_encoder.py
+++ b/modules/text/semantic_model/chinese_electra_base/model/transformer_encoder.py
--- a/modules/text/semantic_model/chinese_electra_small/model/transformer_encoder.py
+++ b/modules/text/semantic_model/chinese_electra_small/model/transformer_encoder.py
--- a/modules/text/semantic_model/chinese_roberta_wwm_ext/model/transformer_encoder.py
+++ b/modules/text/semantic_model/chinese_roberta_wwm_ext/model/transformer_encoder.py
--- a/modules/text/semantic_model/chinese_roberta_wwm_ext_large/model/transformer_encoder.py
+++ b/modules/text/semantic_model/chinese_roberta_wwm_ext_large/model/transformer_encoder.py
--- a/modules/text/semantic_model/ernie/model/transformer_encoder.py
+++ b/modules/text/semantic_model/ernie/model/transformer_encoder.py
--- a/modules/text/semantic_model/ernie_tiny/model/transformer_encoder.py
+++ b/modules/text/semantic_model/ernie_tiny/model/transformer_encoder.py
--- a/modules/text/semantic_model/ernie_v2_eng_base/model/transformer_encoder.py
+++ b/modules/text/semantic_model/ernie_v2_eng_base/model/transformer_encoder.py
--- a/modules/text/semantic_model/ernie_v2_eng_large/model/transformer_encoder.py
+++ b/modules/text/semantic_model/ernie_v2_eng_large/model/transformer_encoder.py
--- a/modules/text/semantic_model/rbt3/model/transformer_encoder.py
+++ b/modules/text/semantic_model/rbt3/model/transformer_encoder.py
--- a/modules/text/semantic_model/rbtl3/model/transformer_encoder.py
+++ b/modules/text/semantic_model/rbtl3/model/transformer_encoder.py
--- a/modules/text/semantic_model/slda_novel/document.py
+++ b/modules/text/semantic_model/slda_novel/document.py
--- a/modules/text/semantic_model/slda_webpage/document.py
+++ b/modules/text/semantic_model/slda_webpage/document.py
--- a/modules/text/semantic_model/slda_webpage/tokenizer.py
+++ b/modules/text/semantic_model/slda_webpage/tokenizer.py
--- a/modules/text/semantic_model/slda_weibo/document.py
+++ b/modules/text/semantic_model/slda_weibo/document.py
--- a/modules/text/semantic_model/slda_weibo/tokenizer.py
+++ b/modules/text/semantic_model/slda_weibo/tokenizer.py
--- a/modules/text/sentiment_analysis/README.md
+++ b/modules/text/sentiment_analysis/README.md
--- a/modules/text/syntactic_analysis/README.md
+++ b/modules/text/syntactic_analysis/README.md
--- a/modules/text/text_generation/README.md
+++ b/modules/text/text_generation/README.md
--- a/modules/text/text_review/README.md
+++ b/modules/text/text_review/README.md
--- a/modules/video/README.md
+++ b/modules/video/README.md