test=pre-develop, test=documents_fix

edbe41ba · grasswolfs · c093331e · edbe41ba · edbe41ba · edbe41ba
263 changed file
--- a/README.md
+++ b/README.md
@@ -4,7 +4,7 @@
 ![support os](https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-yellow.svg)

 ## 简介
- PaddleHub旨在为开发者提供丰富的、高质量的、直接可用的预训练模型，**【无需深度学习背景、无需数据与训练过程】，**也可快速使用AI模型。
+- PaddleHub旨在为开发者提供丰富的、高质量的、直接可用的预训练模型，**【无需深度学习背景、无需数据与训练过程】**，也可快速使用AI模型。
 - 涵盖CV、NLP、Audio、Video主流四大品类，支持**一键预测**、**一键服务化部署**和**快速迁移学习**
 - 全部模型开源下载，**离线可运行**。

@@ -15,10 +15,10 @@
 - **2020.09.27**，新增文本生成模型6个，图像分割模型1个，预训练模型总量到达 **【154】** 个。
 - **2020.08.13**，发布v1.8.1，新增人像分割模型Humanseg，支持EMNLP2019-Sentence-BERT作为文本匹配任务网络，预训练模型总量到达 **【147】** 个。
 - **2020.07.29**，发布v1.8.0，新增AI对联和AI写诗、jieba切词，文本数据LDA、语义相似度计算，新增目标检测，短视频分类模型，超轻量中英文OCR，新增行人检测、车辆检测、动物识别等工业级模型，支持VisualDL可视化训练，预训练模型总量到达 **【135】** 个。
- [More]()
+- [More](./docs/release.md)


-## 特性
+## [特性](./docs/figures.md)
 - **【丰富的预训练模型】**：涵盖CV、NLP、Audio、Video主流四大品类的 180+ 预训练模型，全部开源下载，离线可运行。
 - **【一键模型快速预测】**：通过一行命令行或者极简的Python API实现模型调用，可快速体验模型效果。
 - **【一键模型转服务化】**：一行命令，搭建深度学习模型API服务化部署能力。
@@ -60,27 +60,27 @@
    - [在线运行体验demo【Official】](https://github.com/PaddlePaddle/PaddleHub/tree/release/v1.8/demo)
    - [生态趣味项目demo【ThirdPary】](./docs/quick_experience/more_demos.md)
 - 丰富的预训练模型 182 个
-    - [精品特色模型](./docs/pretrained_models.md)
+    - [精品特色模型](./docs/figure.md)
    - 计算机视觉 126 个
-      - [图像分类 64 个](./modules/image/classification)
-      - [目标检测 13 个](./modules/image/object_detection)
-      - [人脸检测 7 个](./modules/image/face_detection)  
-      - [关键点检测 3 个](./modules/image/keypoint_detection)
-      - [图像分割 7 个](./modules/image/semantic_segmentation)
-      - [文本识别 8 个](./modules/image/text_recognition)
-      - [图像生成 17 个](./modules/image/gan)
-      - [图像编辑 7 个](./modules/image/style_transfer)
+      - [图像分类 64 个](./modules/image/classification/README.md)
+      - [目标检测 13 个](./modules/image/object_detection/README.md)
+      - [人脸检测 7 个](./modules/image/face_detection/README.md)  
+      - [关键点检测 3 个](./modules/image/keypoint_detection/README.md)
+      - [图像分割 7 个](./modules/image/semantic_segmentation/README.md)
+      - [文本识别 8 个](./modules/image/text_recognition/README.md)
+      - [图像生成 17 个](./modules/image/Image_gan/README.md)
+      - [图像编辑 7 个](./modules/image/Image_editing/README.md)
    - 自然语言处理 48 个
-      - [词法分析 2 个](./modules/text/lexical_analysis)
-      - [句法分析 1 个](./modules/text/syntactic_analysis)
-      - [情感分析 7 个](./modules/text/semantic_model)
-      - [文本审核 3 个](./modules/text/text_review)
-      - [文本生成 9 个](./modules/text/text_generation)
-      - [语义模型 26 个](./modules/text/semantic_model)
+      - [词法分析 2 个](./modules/text/lexical_analysis/README.md)
+      - [句法分析 1 个](./modules/text/syntactic_analysis/README.md)
+      - [情感分析 7 个](./modules/text/sentiment_analysis/README.md)
+      - [文本审核 3 个](./modules/text/text_review/README.md)
+      - [文本生成 9 个](./modules/text/text_generation/README.md)
+      - [语义模型 26 个](./modules/text/language_model/README.md)
    - 语音 3 个
-      - [语音合成 3 个](./modules/audio)
+      - [语音合成 3 个](./modules/audio/README.md)
    - 视频5个
-      - [视频分类 5 个](./modules/video)
+      - [视频分类 5 个](./modules/video/README.md)
 - 部署
    - [一行代码服务化部署](./docs/tutorial/serving.md)
    - C++ Inference 部署（建议加群沟通）

--- a/docs/figures.md
+++ b/docs/figures.md
+## 特性详解
+<a name="丰富的预训练模型"></a>
+
+### 1、丰富的预训练模型
+
+- 1.1、图像
+
+|            | **精品模型举例**                                             |
+| ---------- | :----------------------------------------------------------- |
+| 图像分类 | [菜品识别](https://www.paddlepaddle.org.cn/hubdetail?name=resnet50_vd_dishes&en_category=ImageClassification)、[动物识别](https://www.paddlepaddle.org.cn/hubdetail?name=resnet50_vd_animals&en_category=ImageClassification)、[动物识别](https://www.paddlepaddle.org.cn/hubdetail?name=resnet50_vd_animals&en_category=ImageClassification)、[-->More](../modules/image/classification/README.md) |
+| 目标检测   | [通用检测](https://www.paddlepaddle.org.cn/hubdetail?name=yolov3_darknet53_coco2017&en_category=ObjectDetection)、[行人检测](https://www.paddlepaddle.org.cn/hubdetail?name=yolov3_darknet53_pedestrian&en_category=ObjectDetection)、[车辆检测](https://www.paddlepaddle.org.cn/hubdetail?name=yolov3_darknet53_vehicles&en_category=ObjectDetection)、[-->More](../modules/image/object_detection/README.md) |
+| 人脸检测 | [人脸检测](https://www.paddlepaddle.org.cn/hubdetail?name=pyramidbox_lite_server&en_category=FaceDetection)、[口罩检测](https://www.paddlepaddle.org.cn/hubdetail?name=pyramidbox_lite_server_mask&en_category=FaceDetection)、[-->More](../modules/image/face_detection/README.md) |
+| 图像分割   | [人像分割](https://www.paddlepaddle.org.cn/hubdetail?name=deeplabv3p_xception65_humanseg&en_category=ImageSegmentation)、[人体解析](https://www.paddlepaddle.org.cn/hubdetail?name=ace2p&en_category=ImageSegmentation)、[肺炎CT影像分析](https://www.paddlepaddle.org.cn/hubdetail?name=Pneumonia_CT_LKM_PP&en_category=ImageSegmentation)、[-->More](../modules/image/semantic_segmentation/README.md) |
+| 关键点检测 | [人体关键点](https://www.paddlepaddle.org.cn/hubdetail?name=human_pose_estimation_resnet50_mpii&en_category=KeyPointDetection)、[人脸关键点](https://www.paddlepaddle.org.cn/hubdetail?name=face_landmark_localization&en_category=KeyPointDetection)、[手部关键点](https://www.paddlepaddle.org.cn/hubdetail?name=hand_pose_localization&en_category=KeyPointDetection)、[-->More](./modules/image/keypoint_detection/README.md) |
+| 文本识别 | [超轻量中英文OCR文字识别](https://www.paddlepaddle.org.cn/hubdetail?name=chinese_ocr_db_crnn_mobile&en_category=TextRecognition)、[-->More](../modules/image/text_recognition/README.md) |
+| 图像生成    | [风格迁移](https://www.paddlepaddle.org.cn/hubdetail?name=stylepro_artistic&en_category=GANs)、[街景动漫画]()、[-->More](../modules/image/Image_gan/README.md) |
+| 图像编辑 | [超分辨率](https://www.paddlepaddle.org.cn/hubdetail?name=realsr&en_category=ImageEditing)、[黑白上色](https://www.paddlepaddle.org.cn/hubdetail?name=deoldify&en_category=ImageEditing)、[-->More](../modules/image/Image_editing/README.md) |
+
+- 1.2、文本
+|            | **精品模型举例**                                           |
+| ---------- | :----------------------------------------------------------- |
+| 词句分析 | [词法分析 ](https://www.paddlepaddle.org.cn/hubdetail?name=lac&en_category=LexicalAnalysis)、[句法分析](https://www.paddlepaddle.org.cn/hubdetail?name=ddparser&en_category=SyntacticAnalysis)、[-->More](../modules/text/lexical_analysis/README.md) |
+| 情感分析   | [情感判断](https://www.paddlepaddle.org.cn/hubdetail?name=lac&en_category=LexicalAnalysis)、[情绪分析](https://www.paddlepaddle.org.cn/hubdetail?name=emotion_detection_textcnn&en_category=SentimentAnalysis) 、[-->More](../modules/text/sentiment_analysis/README.md)|
+| 文本审核 | [色情审核](https://www.paddlepaddle.org.cn/hubdetail?name=porn_detection_gru&en_category=TextCensorship)、[-->More](../modules/text/text_review/README.md) |
+| 文本生成 | [对联生成]()、[情话生成]()、[藏图诗生成]()、[土味情话]() 、[-->More](../modules/text/text_generation/README.md)|
+| 语义模型   | [ERNIE](https://www.paddlepaddle.org.cn/hubdetail?name=ERNIE&en_category=SemanticModel)、[文本相似度](https://www.paddlepaddle.org.cn/hubdetail?name=simnet_bow&en_category=SemanticModel)、[-->More](../modules/text/language_model/README.md) |
+
+- 1.3、语音
+|            | **精品模型举例**                                           |
+| ---------- | :----------------------------------------------------------- |
+| 语音合成   | [语音合成]() 、[-->More](../modules/audio/README.md)                         |
+
+- 1.4、视频
+|            | **精品模型举例**                                       |
+| ---------- | :----------------------------------------------------------- |
+| 视频分类 | [视频分类]()、[-->More](../modules/video/README.md) |
+
+<a name="一键模型预测"></a>
+
+### 2、一键模型预测
+
+
+* 举例，假如考虑使用文字识别轻量级中文OCR模型chinese_ocr_db_crnn_mobile即可一键快速识别图片中的文字。
+```shell
+$ pip install paddlehub
+$ wget https://paddlehub.bj.bcebos.com/model/image/ocr/test_ocr.jpg
+$ hub run chinese_ocr_db_crnn_mobile --input_path test_ocr.jpg --visualization=True
+```
+
+* 预测结果图片保存在当前运行路径下ocr_result文件夹中，如下图所示。
+
+<p align="center">
+ <img src="./imgs/ocr_res.jpg" width='70%' align="middle"  
+</p>
+
+* 使用词法分析模型LAC进行分词
+```shell
+$ hub run lac --input_text "现在，慕尼黑再保险公司不仅是此类行动的倡议者，更是将其大量气候数据整合进保险产品中，并与公众共享大量天气信息，参与到新能源领域的保障中。"
+[{
+    'word': ['现在', '，', '慕尼黑再保险公司', '不仅', '是', '此类', '行动', '的', '倡议者', '，', '更是', '将', '其', '大量', '气候', '数据', '整合', '进', '保险', '产品', '中', '，', '并', '与', '公众', '共享', '大量', '天气', '信息', '，', '参与', '到', '新能源', '领域', '的', '保障', '中', '。'],
+    'tag':  ['TIME', 'w', 'ORG', 'c', 'v', 'r', 'n', 'u', 'n', 'w', 'd', 'p', 'r', 'a', 'n', 'n', 'v', 'v', 'n', 'n', 'f', 'w', 'c', 'p', 'n', 'v', 'a', 'n', 'n', 'w', 'v', 'v', 'n', 'n', 'u', 'vn', 'f', 'w']
+}]
+```
+
+除了一行代码预测之外，PaddleHub也支持使用API调用模型的方式，可以参考每个模型的详细文档。
+
+<a name="一键模型转服务"></a>
+
+### 3、一键模型转服务
+
+PaddleHub提供便捷的模型转服务的能力，只需简单一行命令即可完成模型的HTTP服务部署。通过以下命令即可快速启动LAC词法分析服务：
+
+```shell
+$ hub serving start -m chinese_ocr_db_crnn_mobile
+```
+
+更多关于模型服务化使用说明参见[PaddleHub模型一键服务化部署](./tutorial/serving.md)。
+
+
+
+<a name="十行代码迁移学习"></a>
+
+### 4、十行代码迁移学习
+
+通过Fine-tune API，只需要少量代码即可完成深度学习模型在计算机视觉场景下的迁移学习。
+
+* [Demo示例](../demo)提供丰富的Fine-tune API的使用代码，包括[图像分类](../demo/image_classification)、[图像着色](../demo/colorization)、[风格迁移](../demo/style_transfer)、等场景的模型迁移示例。
+
+<p align="center">
+ <img src="./imgs/paddlehub_finetune.gif" align="middle"  
+</p>
+
+<p align='center'>
+ 十行代码完成工业级文本分类
+</p>
+
+* 如需在线快速体验，请点击[PaddleHub教程合集](https://aistudio.baidu.com/aistudio/projectdetail/231146)，可使用AI Studio平台提供的GPU算力进行快速尝试。
+
+
+
+<a name="许可证书"></a>
+## 许可证书
+本项目的发布受<a href="./LICENSE">Apache 2.0 license</a>许可认证。
+
+<a name="致谢"></a>
+## 致谢
+我们非常欢迎您为PaddleHub贡献代码，也十分感谢您的反馈。
+
+* 非常感谢[Austendeng](https://github.com/Austendeng)贡献了修复SequenceLabelReader的pr
+* 非常感谢[cclauss](https://github.com/cclauss)贡献了优化travis-ci检查的pr
+* 非常感谢[奇想天外](http://www.cheerthink.com/)贡献了口罩检测的demo
+* 非常感谢[mhlwsk](https://github.com/mhlwsk)贡献了修复序列标注预测demo的pr
+* 非常感谢[zbp-xxxp](https://github.com/zbp-xxxp)贡献了看图作诗的module
+* 非常感谢[zbp-xxxp](https://github.com/zbp-xxxp)和[七年期限](https://github.com/1084667371)联合贡献了看图写诗中秋特别版module
+* 非常感谢[livingbody](https://github.com/livingbody)贡献了基于PaddleHub能力的风格迁移和中秋看图写诗微信小程序
--- a/modules/audio/README.md
+++ b/modules/audio/README.md
+## **更好用户体验，建议参考WEB端官方文档 -> [【语音合成】](https://www.paddlepaddle.org.cn/hubdetail)**
+
+### 文字识别
+语音合成（TTS）任务可以实现讲文字转化为语音，已经广泛应用于各种语音交互设备中。
+- 推荐模型
+
+| 模型名称                                                     | 模型简介                                                     |
+| ------------------------------------------------------------ | ------------------------------------------------------------ |
+| [语音合成transformer_tts_ljspeech](https://www.paddlepaddle.org.cn/hubdetail?name=transformer_tts_ljspeech&en_category=TextToSpeech) | TansformerTTS 对 Transformer 和 Tacotron2 进行了融合，取得了令人满意的效果，英文TTS模型，仅支持预测。 |
+| [语音合成fastspeech_ljspeech](https://www.paddlepaddle.org.cn/hubdetail?name=fastspeech_ljspeech&en_category=TextToSpeech) | FastSpeech是基于encoder-decoder结构的teacher model中提取attention对角线来做发音持续时间预测，英文TTS模型，仅支持预测。 |
+| [语音合成deepvoice3_ljspeech](https://www.paddlepaddle.org.cn/hubdetail?name=deepvoice3_ljspeech&en_category=TextToSpeech) | Deep Voice 3是百度研究院2017年发布的端到端的TTS模型（论文录用于ICLR 2018）。它是一个基于卷积神经网络和注意力机制的seq2seq模型,英文TTS模型，仅支持预测。|
--- a/modules/image/Image_editing/README.md
+++ b/modules/image/Image_editing/README.md
+## **更好用户体验，建议参考WEB端官方文档 -> [【图像编辑】](https://www.paddlepaddle.org.cn/hubdetail)**
+
+
+
+### 图像编辑
+
+图像编辑是指在输入图像的基础上，对图像的像素点进行进一步的编辑和调整，输出新的目标图像，具体的应用场景有：超分辨率、黑白片上色，老照片修复等。
+
+- 精选推荐模型
+
+| 模型名称                                                     | 模型简介                                                     |
+| ------------------------------------------------------------ | ------------------------------------------------------------ |
+ | [超分辨率](https://www.paddlepaddle.org.cn/hubdetail?name=realsr&en_category=ImageEditing) | 可用于图像和视频超分模型，它能够将输入的图片和视频超分四倍。 |
+ | [黑白图像上色](https://www.paddlepaddle.org.cn/hubdetail?name=deoldify&en_category=ImageEditing) | deoldify是用于图像和视频的着色渲染模型，该模型能够实现给黑白照片和视频恢复原彩。 |
+  | [老照片修复](https://www.paddlepaddle.org.cn/hubdetail?name=photo_restoration&en_category=ImageEditing) | 针对老照片修复的模型。它主要由两个部分组成：着色和超分。|
--- a/modules/image/colorization/user_guided_colorization/data_feed.py
+++ b/modules/image/colorization/user_guided_colorization/data_feed.py
@@ -130,4 +130,4 @@ class ColorizePreprocess:
        data['real_B_enc'] = paddle.to_tensor(data['real_B_enc'].astype(np.int64))
        data['hint_B'] = paddle.to_tensor(data['hint_B'].astype(np.float32))
        data['mask_B'] = paddle.to_tensor(data['mask_B'].astype(np.float32))
-        return data
\ No newline at end of file
+        return data
--- a/modules/image/colorization/user_guided_colorization/module.py
+++ b/modules/image/colorization/user_guided_colorization/module.py
@@ -40,7 +40,6 @@ class UserGuidedColorization(nn.Layer):
        load_checkpoint (str): Pretrained checkpoint path.

    """
-
    def __init__(self, use_tanh: bool = True, load_checkpoint: str = None):
        super(UserGuidedColorization, self).__init__()
        self.input_nc = 4
@@ -119,8 +118,8 @@ class UserGuidedColorization(nn.Layer):
        )

        # Conv8
-        model8up = (Conv2DTranspose(512, 256, kernel_size=4, stride=2, padding=1),)
-        model3short8 = (Conv2D(256, 256, 3, 1, 1),)
+        model8up = (Conv2DTranspose(512, 256, kernel_size=4, stride=2, padding=1), )
+        model3short8 = (Conv2D(256, 256, 3, 1, 1), )
        model8 = (
            nn.ReLU(),
            Conv2D(256, 256, 3, 1, 1),
@@ -131,20 +130,26 @@ class UserGuidedColorization(nn.Layer):
        )

        # Conv9
-        model9up = (Conv2DTranspose(256, 128, kernel_size=4, stride=2, padding=1),)
-        model2short9 = (Conv2D(128, 128, 3, 1, 1,),)
+        model9up = (Conv2DTranspose(256, 128, kernel_size=4, stride=2, padding=1), )
+        model2short9 = (Conv2D(
+            128,
+            128,
+            3,
+            1,
+            1,
+        ), )
        model9 = (nn.ReLU(), Conv2D(128, 128, 3, 1, 1), nn.ReLU(), nn.BatchNorm(128))

        # Conv10
-        model10up = (Conv2DTranspose(128, 128, kernel_size=4, stride=2, padding=1),)
-        model1short10 = (Conv2D(64, 128, 3, 1, 1),)
+        model10up = (Conv2DTranspose(128, 128, kernel_size=4, stride=2, padding=1), )
+        model1short10 = (Conv2D(64, 128, 3, 1, 1), )
        model10 = (nn.ReLU(), Conv2D(128, 128, 3, 1, 1), nn.LeakyReLU(negative_slope=0.2))
-        model_class = (Conv2D(256, 529, 1),)
+        model_class = (Conv2D(256, 529, 1), )

        if use_tanh:
            model_out = (Conv2D(128, 2, 1, 1, 0, 1), nn.Tanh())
        else:
-            model_out = (Conv2D(128, 2, 1, 1, 0, 1),)
+            model_out = (Conv2D(128, 2, 1, 1, 0, 1), )

        self.model1 = nn.Sequential(*model1)
        self.model2 = nn.Sequential(*model2)
@@ -178,10 +183,10 @@ class UserGuidedColorization(nn.Layer):
            print("load pretrained checkpoint success")

    def transforms(self, images: str) -> callable:
- 
+
        transform = T.Compose([T.Resize((256, 256), interpolation='NEAREST'), T.RGB2LAB()], to_rgb=True)
        return transform(images)
-    
+
    def set_config(self, classification: bool = True, prob: float = 1., num_point: int = None):
        self.classification = classification
        self.pre_func = ColorizePreprocess(ab_thresh=0., p=prob, points=num_point)
@@ -221,4 +226,4 @@ class UserGuidedColorization(nn.Layer):
            conv10_2 = self.model10(conv10_up)
            out_reg = self.model_out(conv10_2)

-        return out_class, out_reg
\ No newline at end of file
+        return out_class, out_reg
--- a/modules/image/Image_gan/README.md
+++ b/modules/image/Image_gan/README.md
+## **更好用户体验，建议参考WEB端官方文档 -> [【图像生成】](https://www.paddlepaddle.org.cn/hubdetail)**
+
+
+
+### 图像生成
+
+图像生成是指根据输入向量，生成目标图像。这里的输入向量可以是随机的噪声或用户指定的条件向量。具体的应用场景有：风格迁移、图像动漫画等。
+
+- 精选推荐模型
+
+| 模型名称                                                     | 模型简介                                                     |
+| ------------------------------------------------------------ | ------------------------------------------------------------ |
+ | [艺术风格迁移](https://www.paddlepaddle.org.cn/hubdetail?name=stylepro_artistic&en_category=GANs) | 将给定的图像转换为任意的艺术风格。确保模型高保真还原内容图片的语义细节信息与风格图片的风格信息。 |
+ | [图像动漫化-新海诚](https://www.paddlepaddle.org.cn/hubdetail?name=animegan_v2_shinkai_53&en_category=GANs) | AnimeGAN V2 图像风格转换模型, 模型可将输入的图像转换成新海诚动漫风格 |
+  | [图像动漫化-宫崎骏](https://www.paddlepaddle.org.cn/hubdetail?name=animegan_v2_hayao_64&en_category=GANs) | AnimeGAN V2 图像风格转换模型, 模型可将输入的图像转换成宫崎骏动漫风格|
+  | [图像动漫化-今敏红辣椒](https://www.paddlepaddle.org.cn/hubdetail?name=animegan_v2_paprika_97&en_category=GANs) | AnimeGAN V2 图像风格转换模型, 模型可将输入的图像转换成今敏红辣椒动漫风格。|
--- a/modules/image/style_transfer/msgnet/module.py
+++ b/modules/image/style_transfer/msgnet/module.py
@@ -14,7 +14,6 @@ from paddlehub.module.cv_module import StyleTransferModule

 class GramMatrix(nn.Layer):
    """Calculate gram matrix"""
-
    def forward(self, y):
        (b, ch, h, w) = y.shape
        features = y.reshape((b, ch, w * h))
@@ -25,7 +24,6 @@ class GramMatrix(nn.Layer):

 class ConvLayer(nn.Layer):
    """Basic conv layer with reflection padding layer"""
-
    def __init__(self, in_channels: int, out_channels: int, kernel_size: int, stride: int):
        super(ConvLayer, self).__init__()
        pad = int(np.floor(kernel_size / 2))
@@ -53,7 +51,6 @@ class UpsampleConvLayer(nn.Layer):
    Return:
        img(paddle.Tensor): UpsampleConvLayer output.
    """
-
    def __init__(self, in_channels: int, out_channels: int, kernel_size: int, stride: int, upsample=None):
        super(UpsampleConvLayer, self).__init__()
        self.upsample = upsample
@@ -88,7 +85,6 @@ class Bottleneck(nn.Layer):
    Return:
        img(paddle.Tensor): Bottleneck output.
    """
-
    def __init__(self,
                 inplanes: int,
                 planes: int,
@@ -102,8 +98,8 @@ class Bottleneck(nn.Layer):
            self.residual_layer = nn.Conv2D(inplanes, planes * self.expansion, kernel_size=1, stride=stride)
        conv_block = (norm_layer(inplanes), nn.ReLU(), nn.Conv2D(inplanes, planes, kernel_size=1, stride=1),
                      norm_layer(planes), nn.ReLU(), ConvLayer(planes, planes, kernel_size=3, stride=stride),
-                      norm_layer(planes), nn.ReLU(), nn.Conv2D(
-                          planes, planes * self.expansion, kernel_size=1, stride=1))
+                      norm_layer(planes), nn.ReLU(), nn.Conv2D(planes, planes * self.expansion, kernel_size=1,
+                                                               stride=1))
        self.conv_block = nn.Sequential(*conv_block)

    def forward(self, x: paddle.Tensor):
@@ -129,12 +125,14 @@ class UpBottleneck(nn.Layer):
    Return:
        img(paddle.Tensor): UpBottleneck output.
    """
-
    def __init__(self, inplanes: int, planes: int, stride: int = 2, norm_layer: nn.Layer = nn.BatchNorm2D):
        super(UpBottleneck, self).__init__()
        self.expansion = 4
-        self.residual_layer = UpsampleConvLayer(
-            inplanes, planes * self.expansion, kernel_size=1, stride=1, upsample=stride)
+        self.residual_layer = UpsampleConvLayer(inplanes,
+                                                planes * self.expansion,
+                                                kernel_size=1,
+                                                stride=1,
+                                                upsample=stride)
        conv_block = []
        conv_block += [norm_layer(inplanes), nn.ReLU(), nn.Conv2D(inplanes, planes, kernel_size=1, stride=1)]
        conv_block += [
@@ -165,7 +163,6 @@ class Inspiration(nn.Layer):
    Return:
        img(paddle.Tensor): UpBottleneck output.
    """
-
    def __init__(self, C: int, B: int = 1):
        super(Inspiration, self).__init__()

@@ -182,8 +179,8 @@ class Inspiration(nn.Layer):
        self.P = paddle.bmm(self.weight.expand_as(self.G), self.G)

        x = paddle.bmm(
-            self.P.transpose((0, 2, 1)).expand((X.shape[0], self.C, self.C)), X.reshape((X.shape[0], X.shape[1],
-                                                                                         -1))).reshape(X.shape)
+            self.P.transpose((0, 2, 1)).expand((X.shape[0], self.C, self.C)), X.reshape(
+                (X.shape[0], X.shape[1], -1))).reshape(X.shape)
        return x

    def __repr__(self):
@@ -193,7 +190,6 @@ class Inspiration(nn.Layer):

 class Vgg16(nn.Layer):
    """ First four layers from Vgg16."""
-
    def __init__(self):
        super(Vgg16, self).__init__()
        self.conv1_1 = nn.Conv2D(3, 64, kernel_size=3, stride=1, padding=1)
@@ -268,8 +264,12 @@ class MSGNet(nn.Layer):
    Return:
        img(paddle.Tensor): MSGNet output.
    """
-
-    def __init__(self, input_nc=3, output_nc=3, ngf=128, n_blocks=6, norm_layer=nn.InstanceNorm2D,
+    def __init__(self,
+                 input_nc=3,
+                 output_nc=3,
+                 ngf=128,
+                 n_blocks=6,
+                 norm_layer=nn.InstanceNorm2D,
                 load_checkpoint=None):
        super(MSGNet, self).__init__()
        self.gram = GramMatrix()
@@ -341,4 +341,4 @@ class MSGNet(nn.Layer):
        return self._vgg(input)

    def forward(self, input: paddle.Tensor):
-        return self.model(input)
\ No newline at end of file
+        return self.model(input)
--- a/modules/image/style_transfer/stylepro_artistic/README.md
+++ b/modules/image/style_transfer/stylepro_artistic/README.md
--- a/modules/image/style_transfer/stylepro_artistic/__init__.py
+++ b/modules/image/style_transfer/stylepro_artistic/__init__.py
--- a/modules/image/style_transfer/stylepro_artistic/data_feed.py
+++ b/modules/image/style_transfer/stylepro_artistic/data_feed.py
--- a/modules/image/Image_gan/style_transfer/stylepro_artistic/decoder_network.py
+++ b/modules/image/Image_gan/style_transfer/stylepro_artistic/decoder_network.py
+# coding=utf-8
+from paddle.fluid.initializer import Constant
+from paddle.fluid.param_attr import ParamAttr
+import paddle.fluid as fluid
+
+
+def decoder_net():
+    x2paddle_22 = fluid.layers.create_parameter(dtype='float32',
+                                                shape=[4],
+                                                name='x2paddle_22',
+                                                attr='x2paddle_22',
+                                                default_initializer=Constant(0.0))
+    x2paddle_36 = fluid.layers.create_parameter(dtype='float32',
+                                                shape=[4],
+                                                name='x2paddle_36',
+                                                attr='x2paddle_36',
+                                                default_initializer=Constant(0.0))
+    x2paddle_44 = fluid.layers.create_parameter(dtype='float32',
+                                                shape=[4],
+                                                name='x2paddle_44',
+                                                attr='x2paddle_44',
+                                                default_initializer=Constant(0.0))
+    x2paddle_input_1 = fluid.layers.data(dtype='float32',
+                                         shape=[1, 512, 64, 64],
+                                         name='x2paddle_input_1',
+                                         append_batch_size=False)
+    x2paddle_19 = fluid.layers.pad2d(x2paddle_input_1,
+                                     pad_value=0.0,
+                                     mode='reflect',
+                                     paddings=[1, 1, 1, 1],
+                                     name='x2paddle_19')
+    x2paddle_20 = fluid.layers.conv2d(x2paddle_19,
+                                      num_filters=256,
+                                      filter_size=[3, 3],
+                                      stride=[1, 1],
+                                      padding=[0, 0],
+                                      dilation=[1, 1],
+                                      groups=1,
+                                      param_attr='x2paddle_1',
+                                      name='x2paddle_20',
+                                      bias_attr='x2paddle_2')
+    x2paddle_21 = fluid.layers.relu(x2paddle_20, name='x2paddle_21')
+    x2paddle_23 = fluid.layers.resize_nearest(x2paddle_21, name='x2paddle_23', out_shape=[128, 128])
+    x2paddle_24 = fluid.layers.pad2d(x2paddle_23,
+                                     pad_value=0.0,
+                                     mode='reflect',
+                                     paddings=[1, 1, 1, 1],
+                                     name='x2paddle_24')
+    x2paddle_25 = fluid.layers.conv2d(x2paddle_24,
+                                      num_filters=256,
+                                      filter_size=[3, 3],
+                                      stride=[1, 1],
+                                      padding=[0, 0],
+                                      dilation=[1, 1],
+                                      groups=1,
+                                      param_attr='x2paddle_3',
+                                      name='x2paddle_25',
+                                      bias_attr='x2paddle_4')
+    x2paddle_26 = fluid.layers.relu(x2paddle_25, name='x2paddle_26')
+    x2paddle_27 = fluid.layers.pad2d(x2paddle_26,
+                                     pad_value=0.0,
+                                     mode='reflect',
+                                     paddings=[1, 1, 1, 1],
+                                     name='x2paddle_27')
+    x2paddle_28 = fluid.layers.conv2d(x2paddle_27,
+                                      num_filters=256,
+                                      filter_size=[3, 3],
+                                      stride=[1, 1],
+                                      padding=[0, 0],
+                                      dilation=[1, 1],
+                                      groups=1,
+                                      param_attr='x2paddle_5',
+                                      name='x2paddle_28',
+                                      bias_attr='x2paddle_6')
+    x2paddle_29 = fluid.layers.relu(x2paddle_28, name='x2paddle_29')
+    x2paddle_30 = fluid.layers.pad2d(x2paddle_29,
+                                     pad_value=0.0,
+                                     mode='reflect',
+                                     paddings=[1, 1, 1, 1],
+                                     name='x2paddle_30')
+    x2paddle_31 = fluid.layers.conv2d(x2paddle_30,
+                                      num_filters=256,
+                                      filter_size=[3, 3],
+                                      stride=[1, 1],
+                                      padding=[0, 0],
+                                      dilation=[1, 1],
+                                      groups=1,
+                                      param_attr='x2paddle_7',
+                                      name='x2paddle_31',
+                                      bias_attr='x2paddle_8')
+    x2paddle_32 = fluid.layers.relu(x2paddle_31, name='x2paddle_32')
+    x2paddle_33 = fluid.layers.pad2d(x2paddle_32,
+                                     pad_value=0.0,
+                                     mode='reflect',
+                                     paddings=[1, 1, 1, 1],
+                                     name='x2paddle_33')
+    x2paddle_34 = fluid.layers.conv2d(x2paddle_33,
+                                      num_filters=128,
+                                      filter_size=[3, 3],
+                                      stride=[1, 1],
+                                      padding=[0, 0],
+                                      dilation=[1, 1],
+                                      groups=1,
+                                      param_attr='x2paddle_9',
+                                      name='x2paddle_34',
+                                      bias_attr='x2paddle_10')
+    x2paddle_35 = fluid.layers.relu(x2paddle_34, name='x2paddle_35')
+    x2paddle_37 = fluid.layers.resize_nearest(x2paddle_35, name='x2paddle_37', out_shape=[256, 256])
+    x2paddle_38 = fluid.layers.pad2d(x2paddle_37,
+                                     pad_value=0.0,
+                                     mode='reflect',
+                                     paddings=[1, 1, 1, 1],
+                                     name='x2paddle_38')
+    x2paddle_39 = fluid.layers.conv2d(x2paddle_38,
+                                      num_filters=128,
+                                      filter_size=[3, 3],
+                                      stride=[1, 1],
+                                      padding=[0, 0],
+                                      dilation=[1, 1],
+                                      groups=1,
+                                      param_attr='x2paddle_11',
+                                      name='x2paddle_39',
+                                      bias_attr='x2paddle_12')
+    x2paddle_40 = fluid.layers.relu(x2paddle_39, name='x2paddle_40')
+    x2paddle_41 = fluid.layers.pad2d(x2paddle_40,
+                                     pad_value=0.0,
+                                     mode='reflect',
+                                     paddings=[1, 1, 1, 1],
+                                     name='x2paddle_41')
+    x2paddle_42 = fluid.layers.conv2d(x2paddle_41,
+                                      num_filters=64,
+                                      filter_size=[3, 3],
+                                      stride=[1, 1],
+                                      padding=[0, 0],
+                                      dilation=[1, 1],
+                                      groups=1,
+                                      param_attr='x2paddle_13',
+                                      name='x2paddle_42',
+                                      bias_attr='x2paddle_14')
+    x2paddle_43 = fluid.layers.relu(x2paddle_42, name='x2paddle_43')
+    x2paddle_45 = fluid.layers.resize_nearest(x2paddle_43, name='x2paddle_45', out_shape=[512, 512])
+    x2paddle_46 = fluid.layers.pad2d(x2paddle_45,
+                                     pad_value=0.0,
+                                     mode='reflect',
+                                     paddings=[1, 1, 1, 1],
+                                     name='x2paddle_46')
+    x2paddle_47 = fluid.layers.conv2d(x2paddle_46,
+                                      num_filters=64,
+                                      filter_size=[3, 3],
+                                      stride=[1, 1],
+                                      padding=[0, 0],
+                                      dilation=[1, 1],
+                                      groups=1,
+                                      param_attr='x2paddle_15',
+                                      name='x2paddle_47',
+                                      bias_attr='x2paddle_16')
+    x2paddle_48 = fluid.layers.relu(x2paddle_47, name='x2paddle_48')
+    x2paddle_49 = fluid.layers.pad2d(x2paddle_48,
+                                     pad_value=0.0,
+                                     mode='reflect',
+                                     paddings=[1, 1, 1, 1],
+                                     name='x2paddle_49')
+    x2paddle_50 = fluid.layers.conv2d(x2paddle_49,
+                                      num_filters=3,
+                                      filter_size=[3, 3],
+                                      stride=[1, 1],
+                                      padding=[0, 0],
+                                      dilation=[1, 1],
+                                      groups=1,
+                                      param_attr='x2paddle_17',
+                                      name='x2paddle_50',
+                                      bias_attr='x2paddle_18')
+    return x2paddle_input_1, x2paddle_50
--- a/modules/image/Image_gan/style_transfer/stylepro_artistic/encoder_network.py
+++ b/modules/image/Image_gan/style_transfer/stylepro_artistic/encoder_network.py
+# coding=utf-8
+from paddle.fluid.initializer import Constant
+from paddle.fluid.param_attr import ParamAttr
+import paddle.fluid as fluid
+
+
+def encoder_net():
+    x2paddle_0 = fluid.layers.data(dtype='float32', shape=[1, 3, 512, 512], name='x2paddle_0', append_batch_size=False)
+    x2paddle_21 = fluid.layers.conv2d(x2paddle_0,
+                                      num_filters=3,
+                                      filter_size=[1, 1],
+                                      stride=[1, 1],
+                                      padding=[0, 0],
+                                      dilation=[1, 1],
+                                      groups=1,
+                                      param_attr='x2paddle_1',
+                                      name='x2paddle_21',
+                                      bias_attr='x2paddle_2')
+    x2paddle_22 = fluid.layers.pad2d(x2paddle_21,
+                                     pad_value=0.0,
+                                     mode='reflect',
+                                     paddings=[1, 1, 1, 1],
+                                     name='x2paddle_22')
+    x2paddle_23 = fluid.layers.conv2d(x2paddle_22,
+                                      num_filters=64,
+                                      filter_size=[3, 3],
+                                      stride=[1, 1],
+                                      padding=[0, 0],
+                                      dilation=[1, 1],
+                                      groups=1,
+                                      param_attr='x2paddle_3',
+                                      name='x2paddle_23',
+                                      bias_attr='x2paddle_4')
+    x2paddle_24 = fluid.layers.relu(x2paddle_23, name='x2paddle_24')
+    x2paddle_25 = fluid.layers.pad2d(x2paddle_24,
+                                     pad_value=0.0,
+                                     mode='reflect',
+                                     paddings=[1, 1, 1, 1],
+                                     name='x2paddle_25')
+    x2paddle_26 = fluid.layers.conv2d(x2paddle_25,
+                                      num_filters=64,
+                                      filter_size=[3, 3],
+                                      stride=[1, 1],
+                                      padding=[0, 0],
+                                      dilation=[1, 1],
+                                      groups=1,
+                                      param_attr='x2paddle_5',
+                                      name='x2paddle_26',
+                                      bias_attr='x2paddle_6')
+    x2paddle_27 = fluid.layers.relu(x2paddle_26, name='x2paddle_27')
+    x2paddle_28 = fluid.layers.pool2d(x2paddle_27,
+                                      pool_size=[2, 2],
+                                      pool_type='max',
+                                      pool_stride=[2, 2],
+                                      pool_padding=[0, 0],
+                                      ceil_mode=False,
+                                      name='x2paddle_28',
+                                      exclusive=False)
+    x2paddle_29 = fluid.layers.pad2d(x2paddle_28,
+                                     pad_value=0.0,
+                                     mode='reflect',
+                                     paddings=[1, 1, 1, 1],
+                                     name='x2paddle_29')
+    x2paddle_30 = fluid.layers.conv2d(x2paddle_29,
+                                      num_filters=128,
+                                      filter_size=[3, 3],
+                                      stride=[1, 1],
+                                      padding=[0, 0],
+                                      dilation=[1, 1],
+                                      groups=1,
+                                      param_attr='x2paddle_7',
+                                      name='x2paddle_30',
+                                      bias_attr='x2paddle_8')
+    x2paddle_31 = fluid.layers.relu(x2paddle_30, name='x2paddle_31')
+    x2paddle_32 = fluid.layers.pad2d(x2paddle_31,
+                                     pad_value=0.0,
+                                     mode='reflect',
+                                     paddings=[1, 1, 1, 1],
+                                     name='x2paddle_32')
+    x2paddle_33 = fluid.layers.conv2d(x2paddle_32,
+                                      num_filters=128,
+                                      filter_size=[3, 3],
+                                      stride=[1, 1],
+                                      padding=[0, 0],
+                                      dilation=[1, 1],
+                                      groups=1,
+                                      param_attr='x2paddle_9',
+                                      name='x2paddle_33',
+                                      bias_attr='x2paddle_10')
+    x2paddle_34 = fluid.layers.relu(x2paddle_33, name='x2paddle_34')
+    x2paddle_35 = fluid.layers.pool2d(x2paddle_34,
+                                      pool_size=[2, 2],
+                                      pool_type='max',
+                                      pool_stride=[2, 2],
+                                      pool_padding=[0, 0],
+                                      ceil_mode=False,
+                                      name='x2paddle_35',
+                                      exclusive=False)
+    x2paddle_36 = fluid.layers.pad2d(x2paddle_35,
+                                     pad_value=0.0,
+                                     mode='reflect',
+                                     paddings=[1, 1, 1, 1],
+                                     name='x2paddle_36')
+    x2paddle_37 = fluid.layers.conv2d(x2paddle_36,
+                                      num_filters=256,
+                                      filter_size=[3, 3],
+                                      stride=[1, 1],
+                                      padding=[0, 0],
+                                      dilation=[1, 1],
+                                      groups=1,
+                                      param_attr='x2paddle_11',
+                                      name='x2paddle_37',
+                                      bias_attr='x2paddle_12')
+    x2paddle_38 = fluid.layers.relu(x2paddle_37, name='x2paddle_38')
+    x2paddle_39 = fluid.layers.pad2d(x2paddle_38,
+                                     pad_value=0.0,
+                                     mode='reflect',
+                                     paddings=[1, 1, 1, 1],
+                                     name='x2paddle_39')
+    x2paddle_40 = fluid.layers.conv2d(x2paddle_39,
+                                      num_filters=256,
+                                      filter_size=[3, 3],
+                                      stride=[1, 1],
+                                      padding=[0, 0],
+                                      dilation=[1, 1],
+                                      groups=1,
+                                      param_attr='x2paddle_13',
+                                      name='x2paddle_40',
+                                      bias_attr='x2paddle_14')
+    x2paddle_41 = fluid.layers.relu(x2paddle_40, name='x2paddle_41')
+    x2paddle_42 = fluid.layers.pad2d(x2paddle_41,
+                                     pad_value=0.0,
+                                     mode='reflect',
+                                     paddings=[1, 1, 1, 1],
+                                     name='x2paddle_42')
+    x2paddle_43 = fluid.layers.conv2d(x2paddle_42,
+                                      num_filters=256,
+                                      filter_size=[3, 3],
+                                      stride=[1, 1],
+                                      padding=[0, 0],
+                                      dilation=[1, 1],
+                                      groups=1,
+                                      param_attr='x2paddle_15',
+                                      name='x2paddle_43',
+                                      bias_attr='x2paddle_16')
+    x2paddle_44 = fluid.layers.relu(x2paddle_43, name='x2paddle_44')
+    x2paddle_45 = fluid.layers.pad2d(x2paddle_44,
+                                     pad_value=0.0,
+                                     mode='reflect',
+                                     paddings=[1, 1, 1, 1],
+                                     name='x2paddle_45')
+    x2paddle_46 = fluid.layers.conv2d(x2paddle_45,
+                                      num_filters=256,
+                                      filter_size=[3, 3],
+                                      stride=[1, 1],
+                                      padding=[0, 0],
+                                      dilation=[1, 1],
+                                      groups=1,
+                                      param_attr='x2paddle_17',
+                                      name='x2paddle_46',
+                                      bias_attr='x2paddle_18')
+    x2paddle_47 = fluid.layers.relu(x2paddle_46, name='x2paddle_47')
+    x2paddle_48 = fluid.layers.pool2d(x2paddle_47,
+                                      pool_size=[2, 2],
+                                      pool_type='max',
+                                      pool_stride=[2, 2],
+                                      pool_padding=[0, 0],
+                                      ceil_mode=False,
+                                      name='x2paddle_48',
+                                      exclusive=False)
+    x2paddle_49 = fluid.layers.pad2d(x2paddle_48,
+                                     pad_value=0.0,
+                                     mode='reflect',
+                                     paddings=[1, 1, 1, 1],
+                                     name='x2paddle_49')
+    x2paddle_50 = fluid.layers.conv2d(x2paddle_49,
+                                      num_filters=512,
+                                      filter_size=[3, 3],
+                                      stride=[1, 1],
+                                      padding=[0, 0],
+                                      dilation=[1, 1],
+                                      groups=1,
+                                      param_attr='x2paddle_19',
+                                      name='x2paddle_50',
+                                      bias_attr='x2paddle_20')
+    x2paddle_51 = fluid.layers.relu(x2paddle_50, name='x2paddle_51')
+    return x2paddle_0, x2paddle_51
--- a/modules/image/style_transfer/stylepro_artistic/module.py
+++ b/modules/image/style_transfer/stylepro_artistic/module.py
@@ -140,14 +140,13 @@ class StyleProjection(hub.Module):
        encode_program, encode_feeded_var_names, encode_target_vars = fluid.io.load_inference_model(
            dirname=self.pretrained_encoder_net, executor=exe)

-        fluid.io.save_inference_model(
-            dirname=dirname,
-            main_program=encode_program,
-            executor=exe,
-            feeded_var_names=encode_feeded_var_names,
-            target_vars=encode_target_vars,
-            model_filename=model_filename,
-            params_filename=params_filename)
+        fluid.io.save_inference_model(dirname=dirname,
+                                      main_program=encode_program,
+                                      executor=exe,
+                                      feeded_var_names=encode_feeded_var_names,
+                                      target_vars=encode_target_vars,
+                                      model_filename=model_filename,
+                                      params_filename=params_filename)

    def _save_decode_model(self, dirname, model_filename=None, params_filename=None, combined=True):
        if combined:
@@ -159,14 +158,13 @@ class StyleProjection(hub.Module):
        decode_program, decode_feeded_var_names, decode_target_vars = fluid.io.load_inference_model(
            dirname=self.pretrained_decoder_net, executor=exe)

-        fluid.io.save_inference_model(
-            dirname=dirname,
-            main_program=decode_program,
-            executor=exe,
-            feeded_var_names=decode_feeded_var_names,
-            target_vars=decode_target_vars,
-            model_filename=model_filename,
-            params_filename=params_filename)
+        fluid.io.save_inference_model(dirname=dirname,
+                                      main_program=decode_program,
+                                      executor=exe,
+                                      feeded_var_names=decode_feeded_var_names,
+                                      target_vars=decode_target_vars,
+                                      model_filename=model_filename,
+                                      params_filename=params_filename)

    @serving
    def serving_method(self, images, **kwargs):
@@ -186,11 +184,10 @@ class StyleProjection(hub.Module):
        """
        Run as a command.
        """
-        self.parser = argparse.ArgumentParser(
-            description="Run the {} module.".format(self.name),
-            prog='hub run {}'.format(self.name),
-            usage='%(prog)s',
-            add_help=True)
+        self.parser = argparse.ArgumentParser(description="Run the {} module.".format(self.name),
+                                              prog='hub run {}'.format(self.name),
+                                              usage='%(prog)s',
+                                              add_help=True)

        self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
        self.arg_config_group = self.parser.add_argument_group(
@@ -202,20 +199,29 @@ class StyleProjection(hub.Module):
            paths = [{'content': args.content, 'styles': args.styles.split(',')}]
        else:
            paths = [{'content': args.content, 'styles': args.styles.split(','), 'weights': list(args.weights)}]
-        results = self.style_transfer(
-            paths=paths, alpha=args.alpha, use_gpu=args.use_gpu, output_dir=args.output_dir, visualization=True)
+        results = self.style_transfer(paths=paths,
+                                      alpha=args.alpha,
+                                      use_gpu=args.use_gpu,
+                                      output_dir=args.output_dir,
+                                      visualization=True)
        return results

    def add_module_config_arg(self):
        """
        Add the command config options.
        """
-        self.arg_config_group.add_argument(
-            '--use_gpu', type=ast.literal_eval, default=False, help="whether use GPU or not")
-        self.arg_config_group.add_argument(
-            '--output_dir', type=str, default='transfer_result', help="The directory to save output images.")
-        self.arg_config_group.add_argument(
-            '--visualization', type=ast.literal_eval, default=True, help="whether to save output as images.")
+        self.arg_config_group.add_argument('--use_gpu',
+                                           type=ast.literal_eval,
+                                           default=False,
+                                           help="whether use GPU or not")
+        self.arg_config_group.add_argument('--output_dir',
+                                           type=str,
+                                           default='transfer_result',
+                                           help="The directory to save output images.")
+        self.arg_config_group.add_argument('--visualization',
+                                           type=ast.literal_eval,
+                                           default=True,
+                                           help="whether to save output as images.")

    def add_module_input_arg(self):
        """
@@ -223,7 +229,11 @@ class StyleProjection(hub.Module):
        """
        self.arg_input_group.add_argument('--content', type=str, help="path to content.")
        self.arg_input_group.add_argument('--styles', type=str, help="path to styles.")
-        self.arg_input_group.add_argument(
-            '--weights', type=ast.literal_eval, default=None, help="interpolation weights of styles.")
-        self.arg_config_group.add_argument(
-            '--alpha', type=ast.literal_eval, default=1, help="The parameter to control the tranform degree.")
+        self.arg_input_group.add_argument('--weights',
+                                          type=ast.literal_eval,
+                                          default=None,
+                                          help="interpolation weights of styles.")
+        self.arg_config_group.add_argument('--alpha',
+                                           type=ast.literal_eval,
+                                           default=1,
+                                           help="The parameter to control the tranform degree.")
--- a/modules/image/style_transfer/stylepro_artistic/processor.py
+++ b/modules/image/style_transfer/stylepro_artistic/processor.py
--- a/modules/image/classification/README.md
+++ b/modules/image/classification/README.md
+
+## **更好用户体验，建议参考WEB端官方文档 -> [【图像分类】](https://www.paddlepaddle.org.cn/hubdetail)**
+
+
+### 图像分类
+图像分类是根据图像的语义信息对不同类别图像进行区分，是计算机视觉中重要的基础问题，是物体检测、图像分割、物体跟踪、行为分析、人脸识别等其他高层视觉任务的基础，在许多领域都有着广泛的应用。如：安防领域的人脸识别和智能视频分析等，交通领域的交通场景识别，互联网领域基于内容的图像检索和相册自动归类，医学领域的图像识别等。
+
+**注：** **如果你是资深开发者，那可以随意按需使用**，**假如你是新手，服务器端优先选择Resnet50，移动端优先选择MobileNetV3**
+
+- 精选模型推荐
+
+|            | **模型名称**                                                 | **模型特色**                                       |
+| ---------- | :----------------------------------------------------------- | ---------------------------------------------------------- |
+| 图像分类 | [菜品识别](https://www.paddlepaddle.org.cn/hubdetail?name=resnet50_vd_dishes&en_category=ImageClassification) | 私有数据集训练，支持8416种菜品的分类识别，适合进一步菜品方向微调 |
+| 图像分类 | [动物识别](https://www.paddlepaddle.org.cn/hubdetail?name=resnet50_vd_animals&en_category=ImageClassification) | 私有数据集训练，支持7978种动物的分类识别，适合进一步动物方向微调 |
+| 图像分类 | [野生动物制品识别](https://www.paddlepaddle.org.cn/hubdetail?name=resnet50_vd_wildanimals&en_category=ImageClassification) | 支持'象牙制品', '象牙', '大象', '虎皮', '老虎', '虎牙/虎爪/虎骨', '穿山甲甲片', '穿山甲', '穿山甲爪子', '其他' 这十个标签的识别。 |
+
+
+- 更多模型
+
+| **模型名称** | **模型简介** |
+| - | - |
+| [AlexNet](https://www.paddlepaddle.org.cn/hubdetail?name=alexnet_imagenet&en_category=ImageClassification) | 首次在 CNN 中成功的应用了 ReLU, Dropout 和 LRN，并使用 GPU 进行运算加速 |
+| [VGG19](https://www.paddlepaddle.org.cn/hubdetail?name=vgg19_imagenet&en_category=ImageClassification) | 在 AlexNet 的基础上使用 3*3 小卷积核，增加网络深度，具有很好的泛化能力 |
+| [GoogLeNet](https://github.com/PaddlePaddle/models/tree/release/1.7/PaddleCV/image_classification) | 在不增加计算负载的前提下增加了网络的深度和宽度，性能更加优越 |
+| [ResNet50](https://www.paddlepaddle.org.cn/hubdetail?name=resnet_v2_50_imagenet&en_category=ImageClassification) | Residual Network，引入了新的残差结构，解决了随着网络加深，准确率下降的问题 |
+| [Inceptionv4](https://www.paddlepaddle.org.cn/hubdetail?name=inception_v4_imagenet&en_category=ImageClassification) | 将 Inception 模块与 Residual Connection 进行结合，通过ResNet的结构极大地加速训练并获得性能的提升 |
+| [MobileNetV2](https://www.paddlepaddle.org.cn/hubdetail?name=mobilenet_v2_imagenet&en_category=ImageClassification) | MobileNet结构的微调，直接在 thinner 的 bottleneck层上进行 skip learning 连接以及对 bottleneck layer 不进行 ReLu 非线性处理可取得更好的结果 |
+| [se_resnext50](https://www.paddlepaddle.org.cn/hubdetail?name=se_resnext50_32x4d_imagenet&en_category=ImageClassification) | 在ResNeXt 基础、上加入了 SE(Sequeeze-and-Excitation) 模块，提高了识别准确率，在 ILSVRC 2017 的分类项目中取得了第一名 |
+| [ShuffleNetV2](https://www.paddlepaddle.org.cn/hubdetail?name=shufflenet_v2_imagenet&en_category=ImageClassification) | ECCV2018，轻量级 CNN 网络，在速度和准确度之间做了很好地平衡。在同等复杂度下，比 ShuffleNet 和 MobileNetv2 更准确，更适合移动端以及无人车领域 |
+| [efficientNetb7](https://www.paddlepaddle.org.cn/hubdetail?name=efficientnetb7_imagenet&en_category=ImageClassification) | 同时对模型的分辨率，通道数和深度进行缩放，用极少的参数就可以达到SOTA的精度。 |
+| [xception71](https://www.paddlepaddle.org.cn/hubdetail?name=xception71_imagenet&en_category=ImageClassification) | 对inception-v3的改进，用深度可分离卷积代替普通卷积，降低参数量同时提高了精度。 |
+| [dpn107](https://www.paddlepaddle.org.cn/hubdetail?name=dpn107_imagenet&en_category=ImageClassification) | 融合了densenet和resnext的特点。 |
+| [DarkNet53](https://www.paddlepaddle.org.cn/hubdetail?name=darknet53_imagenet&en_category=ImageClassification) | 检测框架yolov3使用的backbone，在分类和检测任务上都有不错表现。 |
+| [DenseNet161](https://www.paddlepaddle.org.cn/hubdetail?name=densenet161_imagenet&en_category=ImageClassification) | 提出了密集连接的网络结构，更加有利于信息流的传递。 |
+| [ResNeXt152_vd](https://www.paddlepaddle.org.cn/hubdetail?name=resnext152_64x4d_imagenet&en_category=ImageClassification) | 提出了cardinatity的概念，用于作为模型复杂度的另外一个度量，有效地提升模型精度。 |
--- a/modules/image/face_detection/README.md
+++ b/modules/image/face_detection/README.md
+## **更好用户体验，建议参考WEB端官方文档 -> [【人脸检测】](https://www.paddlepaddle.org.cn/hubdetail)**
+
+### 人脸检测
+人脸检测属于目标检测的一个重要分支，由于近年来安防市场、人脸识别、人脸安全方面的原因，成为目标检测中最重要的任务之一。
+
+- 推荐模型
+
+| 模型名称                                                     | 模型简介                                                     |
+| ------------------------------------------------------------ | ------------------------------------------------------------ |
+ | [人脸检测](https://www.paddlepaddle.org.cn/hubdetail?name=pyramidbox_lite_server&en_category=FaceDetection) | 百度自研，18年3月WIDER Face 数据集**冠军模型**，           |
+| [超轻量人脸检测](https://www.paddlepaddle.org.cn/hubdetail?name=ultra_light_fast_generic_face_detector_1mb_640&en_category=FaceDetection) | 针对边缘计算设备或低算力设备(如用ARM推理)设计的实时超轻量级通用人脸检测模型，可以在低算力设备中如用ARM进行实时的通用场景的人脸检测推理。 |
+| [口罩人脸检测与识别](https://www.paddlepaddle.org.cn/hubdetail?name=pyramidbox_lite_server_mask&en_category=FaceDetection) | 业界**首个开源口罩人脸检测与识别模型**，引起广泛关注。     |
--- a/modules/image/gan/README.md
+++ b/modules/image/gan/README.md
--- a/modules/image/keypoint_detection/README.md
+++ b/modules/image/keypoint_detection/README.md
+## **更好用户体验，建议参考WEB端官方文档 -> [【关键点检测】](https://www.paddlepaddle.org.cn/hubdetail)**
+
+#### 关键点检测
+
+人体骨骼关键点检测 (Pose Estimation) 主要检测人体的一些关键点，如关节，五官等，通过关键点描述人体骨骼信息。人体骨骼关键点检测对于描述人体姿态，预测人体行为至关重要。是诸多计算机视觉任务的基础，例如动作分类，异常行为检测，以及自动驾驶等等。
+
+- 精选模型
+
+| 模型名称                                                     | 模型简介                                                     |
+| ------------------------------------------------------------ | ------------------------------------------------------------ |
+| [单人--人体骨骼关键点检测](https://www.paddlepaddle.org.cn/hubdetail?name=human_pose_estimation_resnet50_mpii&en_category=KeyPointDetection) | 可用于行为识别、人物跟踪、步态识别等相关领域。具体应用主要集中在智能视频监控，病人监护系统，人机交互，虚拟现实，人体动画，智能家居，智能安防，运动员辅助训练等等。  |
+| [多人-人体骨骼关键点检测](https://www.paddlepaddle.org.cn/hubdetail?name=openpose_body_estimation&en_category=KeyPointDetection) | 可用于行为识别、人物跟踪、步态识别等相关领域。具体应用主要集中在智能视频监控，病人监护系统，人机交互，虚拟现实，人体动画，智能家居，智能安防，运动员辅助训练等等。  |
+| [面部关键点检测](https://www.paddlepaddle.org.cn/hubdetail?name=face_landmark_localization&en_category=KeyPointDetection) |可用于人脸识别、表情分析、三维人脸重建及三维动画等其它人脸相关问题，支持同一张图中的多个人脸检测  |
+| [手部关键点检测](https://www.paddlepaddle.org.cn/hubdetail?name=hand_pose_localization&en_category=KeyPointDetection) |可用于手势识别，配合人体骨骼关键点，可用于异常行为检测等多种场景  |
--- a/modules/image/object_detection/README.md
+++ b/modules/image/object_detection/README.md
+## **更好用户体验，建议参考WEB端官方文档 -> [【目标检测】](https://www.paddlepaddle.org.cn/hubdetail)**
+
+
+
+### 目标检测
+
+目标检测任务的目标是给定一张图像或是一个视频帧，让计算机找出其中所有目标的位置，并给出每个目标的具体类别。对于计算机而言，能够“看到”的是图像被编码之后的数字，但很难解图像或是视频帧中出现了人或是物体这样的高层语义概念，也就更加难以定位目标出现在图像中哪个区域。
+
+- 精选推荐模型
+
+| 模型名称                                                     | 模型简介                                                     |
+| ------------------------------------------------------------ | ------------------------------------------------------------ |
+ | [YOLOv3](https://www.paddlepaddle.org.cn/hubdetail?name=yolov3_darknet53_coco2017&en_category=ObjectDetection) | 实现精度相比原作者**提高5.9 个绝对百分点**，性能极致优化。 |
+ | [行人检测](https://www.paddlepaddle.org.cn/hubdetail?name=yolov3_darknet53_pedestrian&en_category=ObjectDetection) | 百度自研模型，海量私有数据集训练，可以应用于智能视频监控，人体行为分析，客流统计系统，智能交通等领域 |
+ | [车辆检测](https://www.paddlepaddle.org.cn/hubdetail?name=yolov3_darknet53_vehicles&en_category=ObjectDetection) | 百度自研模型，支持car (汽车)，truck (卡车)，bus (公交车)，motorbike (摩托车)，tricycle (三轮车)等车型的识别 |
--- a/modules/image/semantic_segmentation/README.md
+++ b/modules/image/semantic_segmentation/README.md
+## **更好用户体验，建议参考WEB端官方文档 -> [【图像分割】](https://www.paddlepaddle.org.cn/hubdetail)**
+
+### 图像分割
+
+图像语义分割顾名思义是将图像像素按照表达的语义含义的不同进行分组/分割，图像语义是指对图像内容的理解，例如，能够描绘出什么物体在哪里做了什么事情等，分割是指对图片中的每个像素点进行标注，标注属于哪一类别。近年来用在无人车驾驶技术中分割街景来避让行人和车辆、医疗影像分析中辅助诊断等。
+
+- 精选模型
+| 模型名称                                                     | 模型简介                                                     |
+| ------------------------------------------------------------ | ------------------------------------------------------------ |
+| [人像分割](https://www.paddlepaddle.org.cn/hubdetail?name=deeplabv3p_xception65_humanseg&en_category=ImageSegmentation) | 百度**自建数据集**训练，人像分割效果卓越。                 |
+| [人体解析](https://www.paddlepaddle.org.cn/hubdetail?name=ace2p&en_category=ImageSegmentation) | CVPR2019 LIP挑战赛中**满贯三冠王**。人体解析任务必选。     |
+| [肺炎CT影像分析](https://www.paddlepaddle.org.cn/hubdetail?name=Pneumonia_CT_LKM_PP&en_category=ImageSegmentation) | 助力连心医疗开源**业界首个**肺炎CT影像分析模型
--- a/modules/image/style_transfer/stylepro_artistic/decoder_network.py
+++ b/modules/image/style_transfer/stylepro_artistic/decoder_network.py
-# coding=utf-8
-from paddle.fluid.initializer import Constant
-from paddle.fluid.param_attr import ParamAttr
-import paddle.fluid as fluid
-
-
-def decoder_net():
-    x2paddle_22 = fluid.layers.create_parameter(
-        dtype='float32', shape=[4], name='x2paddle_22', attr='x2paddle_22', default_initializer=Constant(0.0))
-    x2paddle_36 = fluid.layers.create_parameter(
-        dtype='float32', shape=[4], name='x2paddle_36', attr='x2paddle_36', default_initializer=Constant(0.0))
-    x2paddle_44 = fluid.layers.create_parameter(
-        dtype='float32', shape=[4], name='x2paddle_44', attr='x2paddle_44', default_initializer=Constant(0.0))
-    x2paddle_input_1 = fluid.layers.data(
-        dtype='float32', shape=[1, 512, 64, 64], name='x2paddle_input_1', append_batch_size=False)
-    x2paddle_19 = fluid.layers.pad2d(
-        x2paddle_input_1, pad_value=0.0, mode='reflect', paddings=[1, 1, 1, 1], name='x2paddle_19')
-    x2paddle_20 = fluid.layers.conv2d(
-        x2paddle_19,
-        num_filters=256,
-        filter_size=[3, 3],
-        stride=[1, 1],
-        padding=[0, 0],
-        dilation=[1, 1],
-        groups=1,
-        param_attr='x2paddle_1',
-        name='x2paddle_20',
-        bias_attr='x2paddle_2')
-    x2paddle_21 = fluid.layers.relu(x2paddle_20, name='x2paddle_21')
-    x2paddle_23 = fluid.layers.resize_nearest(x2paddle_21, name='x2paddle_23', out_shape=[128, 128])
-    x2paddle_24 = fluid.layers.pad2d(
-        x2paddle_23, pad_value=0.0, mode='reflect', paddings=[1, 1, 1, 1], name='x2paddle_24')
-    x2paddle_25 = fluid.layers.conv2d(
-        x2paddle_24,
-        num_filters=256,
-        filter_size=[3, 3],
-        stride=[1, 1],
-        padding=[0, 0],
-        dilation=[1, 1],
-        groups=1,
-        param_attr='x2paddle_3',
-        name='x2paddle_25',
-        bias_attr='x2paddle_4')
-    x2paddle_26 = fluid.layers.relu(x2paddle_25, name='x2paddle_26')
-    x2paddle_27 = fluid.layers.pad2d(
-        x2paddle_26, pad_value=0.0, mode='reflect', paddings=[1, 1, 1, 1], name='x2paddle_27')
-    x2paddle_28 = fluid.layers.conv2d(
-        x2paddle_27,
-        num_filters=256,
-        filter_size=[3, 3],
-        stride=[1, 1],
-        padding=[0, 0],
-        dilation=[1, 1],
-        groups=1,
-        param_attr='x2paddle_5',
-        name='x2paddle_28',
-        bias_attr='x2paddle_6')
-    x2paddle_29 = fluid.layers.relu(x2paddle_28, name='x2paddle_29')
-    x2paddle_30 = fluid.layers.pad2d(
-        x2paddle_29, pad_value=0.0, mode='reflect', paddings=[1, 1, 1, 1], name='x2paddle_30')
-    x2paddle_31 = fluid.layers.conv2d(
-        x2paddle_30,
-        num_filters=256,
-        filter_size=[3, 3],
-        stride=[1, 1],
-        padding=[0, 0],
-        dilation=[1, 1],
-        groups=1,
-        param_attr='x2paddle_7',
-        name='x2paddle_31',
-        bias_attr='x2paddle_8')
-    x2paddle_32 = fluid.layers.relu(x2paddle_31, name='x2paddle_32')
-    x2paddle_33 = fluid.layers.pad2d(
-        x2paddle_32, pad_value=0.0, mode='reflect', paddings=[1, 1, 1, 1], name='x2paddle_33')
-    x2paddle_34 = fluid.layers.conv2d(
-        x2paddle_33,
-        num_filters=128,
-        filter_size=[3, 3],
-        stride=[1, 1],
-        padding=[0, 0],
-        dilation=[1, 1],
-        groups=1,
-        param_attr='x2paddle_9',
-        name='x2paddle_34',
-        bias_attr='x2paddle_10')
-    x2paddle_35 = fluid.layers.relu(x2paddle_34, name='x2paddle_35')
-    x2paddle_37 = fluid.layers.resize_nearest(x2paddle_35, name='x2paddle_37', out_shape=[256, 256])
-    x2paddle_38 = fluid.layers.pad2d(
-        x2paddle_37, pad_value=0.0, mode='reflect', paddings=[1, 1, 1, 1], name='x2paddle_38')
-    x2paddle_39 = fluid.layers.conv2d(
-        x2paddle_38,
-        num_filters=128,
-        filter_size=[3, 3],
-        stride=[1, 1],
-        padding=[0, 0],
-        dilation=[1, 1],
-        groups=1,
-        param_attr='x2paddle_11',
-        name='x2paddle_39',
-        bias_attr='x2paddle_12')
-    x2paddle_40 = fluid.layers.relu(x2paddle_39, name='x2paddle_40')
-    x2paddle_41 = fluid.layers.pad2d(
-        x2paddle_40, pad_value=0.0, mode='reflect', paddings=[1, 1, 1, 1], name='x2paddle_41')
-    x2paddle_42 = fluid.layers.conv2d(
-        x2paddle_41,
-        num_filters=64,
-        filter_size=[3, 3],
-        stride=[1, 1],
-        padding=[0, 0],
-        dilation=[1, 1],
-        groups=1,
-        param_attr='x2paddle_13',
-        name='x2paddle_42',
-        bias_attr='x2paddle_14')
-    x2paddle_43 = fluid.layers.relu(x2paddle_42, name='x2paddle_43')
-    x2paddle_45 = fluid.layers.resize_nearest(x2paddle_43, name='x2paddle_45', out_shape=[512, 512])
-    x2paddle_46 = fluid.layers.pad2d(
-        x2paddle_45, pad_value=0.0, mode='reflect', paddings=[1, 1, 1, 1], name='x2paddle_46')
-    x2paddle_47 = fluid.layers.conv2d(
-        x2paddle_46,
-        num_filters=64,
-        filter_size=[3, 3],
-        stride=[1, 1],
-        padding=[0, 0],
-        dilation=[1, 1],
-        groups=1,
-        param_attr='x2paddle_15',
-        name='x2paddle_47',
-        bias_attr='x2paddle_16')
-    x2paddle_48 = fluid.layers.relu(x2paddle_47, name='x2paddle_48')
-    x2paddle_49 = fluid.layers.pad2d(
-        x2paddle_48, pad_value=0.0, mode='reflect', paddings=[1, 1, 1, 1], name='x2paddle_49')
-    x2paddle_50 = fluid.layers.conv2d(
-        x2paddle_49,
-        num_filters=3,
-        filter_size=[3, 3],
-        stride=[1, 1],
-        padding=[0, 0],
-        dilation=[1, 1],
-        groups=1,
-        param_attr='x2paddle_17',
-        name='x2paddle_50',
-        bias_attr='x2paddle_18')
-    return x2paddle_input_1, x2paddle_50
--- a/modules/image/style_transfer/stylepro_artistic/encoder_network.py
+++ b/modules/image/style_transfer/stylepro_artistic/encoder_network.py
-# coding=utf-8
-from paddle.fluid.initializer import Constant
-from paddle.fluid.param_attr import ParamAttr
-import paddle.fluid as fluid
-
-
-def encoder_net():
-    x2paddle_0 = fluid.layers.data(dtype='float32', shape=[1, 3, 512, 512], name='x2paddle_0', append_batch_size=False)
-    x2paddle_21 = fluid.layers.conv2d(
-        x2paddle_0,
-        num_filters=3,
-        filter_size=[1, 1],
-        stride=[1, 1],
-        padding=[0, 0],
-        dilation=[1, 1],
-        groups=1,
-        param_attr='x2paddle_1',
-        name='x2paddle_21',
-        bias_attr='x2paddle_2')
-    x2paddle_22 = fluid.layers.pad2d(
-        x2paddle_21, pad_value=0.0, mode='reflect', paddings=[1, 1, 1, 1], name='x2paddle_22')
-    x2paddle_23 = fluid.layers.conv2d(
-        x2paddle_22,
-        num_filters=64,
-        filter_size=[3, 3],
-        stride=[1, 1],
-        padding=[0, 0],
-        dilation=[1, 1],
-        groups=1,
-        param_attr='x2paddle_3',
-        name='x2paddle_23',
-        bias_attr='x2paddle_4')
-    x2paddle_24 = fluid.layers.relu(x2paddle_23, name='x2paddle_24')
-    x2paddle_25 = fluid.layers.pad2d(
-        x2paddle_24, pad_value=0.0, mode='reflect', paddings=[1, 1, 1, 1], name='x2paddle_25')
-    x2paddle_26 = fluid.layers.conv2d(
-        x2paddle_25,
-        num_filters=64,
-        filter_size=[3, 3],
-        stride=[1, 1],
-        padding=[0, 0],
-        dilation=[1, 1],
-        groups=1,
-        param_attr='x2paddle_5',
-        name='x2paddle_26',
-        bias_attr='x2paddle_6')
-    x2paddle_27 = fluid.layers.relu(x2paddle_26, name='x2paddle_27')
-    x2paddle_28 = fluid.layers.pool2d(
-        x2paddle_27,
-        pool_size=[2, 2],
-        pool_type='max',
-        pool_stride=[2, 2],
-        pool_padding=[0, 0],
-        ceil_mode=False,
-        name='x2paddle_28',
-        exclusive=False)
-    x2paddle_29 = fluid.layers.pad2d(
-        x2paddle_28, pad_value=0.0, mode='reflect', paddings=[1, 1, 1, 1], name='x2paddle_29')
-    x2paddle_30 = fluid.layers.conv2d(
-        x2paddle_29,
-        num_filters=128,
-        filter_size=[3, 3],
-        stride=[1, 1],
-        padding=[0, 0],
-        dilation=[1, 1],
-        groups=1,
-        param_attr='x2paddle_7',
-        name='x2paddle_30',
-        bias_attr='x2paddle_8')
-    x2paddle_31 = fluid.layers.relu(x2paddle_30, name='x2paddle_31')
-    x2paddle_32 = fluid.layers.pad2d(
-        x2paddle_31, pad_value=0.0, mode='reflect', paddings=[1, 1, 1, 1], name='x2paddle_32')
-    x2paddle_33 = fluid.layers.conv2d(
-        x2paddle_32,
-        num_filters=128,
-        filter_size=[3, 3],
-        stride=[1, 1],
-        padding=[0, 0],
-        dilation=[1, 1],
-        groups=1,
-        param_attr='x2paddle_9',
-        name='x2paddle_33',
-        bias_attr='x2paddle_10')
-    x2paddle_34 = fluid.layers.relu(x2paddle_33, name='x2paddle_34')
-    x2paddle_35 = fluid.layers.pool2d(
-        x2paddle_34,
-        pool_size=[2, 2],
-        pool_type='max',
-        pool_stride=[2, 2],
-        pool_padding=[0, 0],
-        ceil_mode=False,
-        name='x2paddle_35',
-        exclusive=False)
-    x2paddle_36 = fluid.layers.pad2d(
-        x2paddle_35, pad_value=0.0, mode='reflect', paddings=[1, 1, 1, 1], name='x2paddle_36')
-    x2paddle_37 = fluid.layers.conv2d(
-        x2paddle_36,
-        num_filters=256,
-        filter_size=[3, 3],
-        stride=[1, 1],
-        padding=[0, 0],
-        dilation=[1, 1],
-        groups=1,
-        param_attr='x2paddle_11',
-        name='x2paddle_37',
-        bias_attr='x2paddle_12')
-    x2paddle_38 = fluid.layers.relu(x2paddle_37, name='x2paddle_38')
-    x2paddle_39 = fluid.layers.pad2d(
-        x2paddle_38, pad_value=0.0, mode='reflect', paddings=[1, 1, 1, 1], name='x2paddle_39')
-    x2paddle_40 = fluid.layers.conv2d(
-        x2paddle_39,
-        num_filters=256,
-        filter_size=[3, 3],
-        stride=[1, 1],
-        padding=[0, 0],
-        dilation=[1, 1],
-        groups=1,
-        param_attr='x2paddle_13',
-        name='x2paddle_40',
-        bias_attr='x2paddle_14')
-    x2paddle_41 = fluid.layers.relu(x2paddle_40, name='x2paddle_41')
-    x2paddle_42 = fluid.layers.pad2d(
-        x2paddle_41, pad_value=0.0, mode='reflect', paddings=[1, 1, 1, 1], name='x2paddle_42')
-    x2paddle_43 = fluid.layers.conv2d(
-        x2paddle_42,
-        num_filters=256,
-        filter_size=[3, 3],
-        stride=[1, 1],
-        padding=[0, 0],
-        dilation=[1, 1],
-        groups=1,
-        param_attr='x2paddle_15',
-        name='x2paddle_43',
-        bias_attr='x2paddle_16')
-    x2paddle_44 = fluid.layers.relu(x2paddle_43, name='x2paddle_44')
-    x2paddle_45 = fluid.layers.pad2d(
-        x2paddle_44, pad_value=0.0, mode='reflect', paddings=[1, 1, 1, 1], name='x2paddle_45')
-    x2paddle_46 = fluid.layers.conv2d(
-        x2paddle_45,
-        num_filters=256,
-        filter_size=[3, 3],
-        stride=[1, 1],
-        padding=[0, 0],
-        dilation=[1, 1],
-        groups=1,
-        param_attr='x2paddle_17',
-        name='x2paddle_46',
-        bias_attr='x2paddle_18')
-    x2paddle_47 = fluid.layers.relu(x2paddle_46, name='x2paddle_47')
-    x2paddle_48 = fluid.layers.pool2d(
-        x2paddle_47,
-        pool_size=[2, 2],
-        pool_type='max',
-        pool_stride=[2, 2],
-        pool_padding=[0, 0],
-        ceil_mode=False,
-        name='x2paddle_48',
-        exclusive=False)
-    x2paddle_49 = fluid.layers.pad2d(
-        x2paddle_48, pad_value=0.0, mode='reflect', paddings=[1, 1, 1, 1], name='x2paddle_49')
-    x2paddle_50 = fluid.layers.conv2d(
-        x2paddle_49,
-        num_filters=512,
-        filter_size=[3, 3],
-        stride=[1, 1],
-        padding=[0, 0],
-        dilation=[1, 1],
-        groups=1,
-        param_attr='x2paddle_19',
-        name='x2paddle_50',
-        bias_attr='x2paddle_20')
-    x2paddle_51 = fluid.layers.relu(x2paddle_50, name='x2paddle_51')
-    return x2paddle_0, x2paddle_51
--- a/modules/image/text_recognition/README.md
+++ b/modules/image/text_recognition/README.md
+## **更好用户体验，建议参考WEB端官方文档 -> [【文字识别】](https://www.paddlepaddle.org.cn/hubdetail)**
+
+### 文字识别
+文字识别（OCR）是计算机视觉重要任务之一，主要用于图像中文本信息的提取，具有重要的产业实践意义。
+
+- 推荐模型
+
+| 模型名称                                                     | 模型简介                                                     |
+| ------------------------------------------------------------ | ------------------------------------------------------------ |
+| [超轻量-中英文OCR文字识别](https://www.paddlepaddle.org.cn/hubdetail?name=chinese_ocr_db_crnn_mobile&en_category=TextRecognition) | 业界开源最小，8.1M超轻量中英文识别模型。支持中英文识别；支持倾斜、竖排等多种方向文字识别，**强力推荐** |
+| [高精度-中英文OCR文字识别](https://www.paddlepaddle.org.cn/hubdetail?name=chinese_ocr_db_crnn_mobile&en_category=TextRecognition) | 业界开源效果最好，155M高精度中英文识别模型。支持中英文识别；支持倾斜、竖排等多种方向文字识别，**强力推荐** |
+| [德语-超轻量OCR文字识别](https://www.paddlepaddle.org.cn/hubdetail?name=german_ocr_db_crnn_mobile&en_category=TextRecognition) | 德语OCR识别，超轻量|
+| [法语-超轻量OCR文字识别](https://www.paddlepaddle.org.cn/hubdetail?name=french_ocr_db_crnn_mobile&en_category=TextRecognition) | 法语OCR识别，超轻量|
+| [日语-超轻量OCR文字识别](https://www.paddlepaddle.org.cn/hubdetail?name=japan_ocr_db_crnn_mobile&en_category=TextRecognition) | 日语OCR识别，超轻量|
+| [韩语-超轻量OCR文字识别](https://www.paddlepaddle.org.cn/hubdetail?name=korean_ocr_db_crnn_mobile&en_category=TextRecognition) | 韩语OCR识别，超轻量|
--- a/modules/text/language_model/README.md
+++ b/modules/text/language_model/README.md
+## **更好用户体验，建议参考WEB端官方文档 -> [【语言模型】](https://www.paddlepaddle.org.cn/hubdetail)**
+
+### 语言模型
+
+
+- 推荐模型
+
+| 模型名称                                                     | 模型简介                                                     |
+| ------------------------------------------------------------ | ------------------------------------------------------------ |
+| [词嵌入模型](https://www.paddlepaddle.org.cn/hubdetail?name=word2vec_skipgram&en_category=SemanticModel) |在海量百度搜索数据集下预训练得到中文单词预训练词嵌入。其支持Fine-tune。Word2vec的预训练数据集的词汇表大小为1700249，word embedding维度为128。 |
+| [文本相似度](https://www.paddlepaddle.org.cn/hubdetail?name=simnet_bow&en_category=SemanticModel) |根据用户输入的两个文本，计算出文本相似度得分。 |
+| [ERNIE](https://www.paddlepaddle.org.cn/hubdetail?name=ERNIE&en_category=SemanticModel) |基于百科类、资讯类、论坛对话类数据等中文语料自研模型，其可用于文本分类、序列标注、阅读理解等任务。
+.
--- a/modules/text/semantic_model/bert_cased_L_12_H_768_A_12/README.md
+++ b/modules/text/semantic_model/bert_cased_L_12_H_768_A_12/README.md
--- a/modules/text/semantic_model/bert_cased_L_12_H_768_A_12/__init__.py
+++ b/modules/text/semantic_model/bert_cased_L_12_H_768_A_12/__init__.py
--- a/modules/text/semantic_model/bert_cased_L_12_H_768_A_12/model/__init__.py
+++ b/modules/text/semantic_model/bert_cased_L_12_H_768_A_12/model/__init__.py
--- a/modules/text/semantic_model/bert_cased_L_12_H_768_A_12/model/bert.py
+++ b/modules/text/semantic_model/bert_cased_L_12_H_768_A_12/model/bert.py
@@ -74,23 +74,23 @@ class BertModel(object):

    def _build_model(self, src_ids, position_ids, sentence_ids, input_mask):
        # padding id in vocabulary must be set to 0
-        emb_out = fluid.layers.embedding(
-            input=src_ids,
-            size=[self._voc_size, self._emb_size],
-            dtype=self._dtype,
-            param_attr=fluid.ParamAttr(name=self._word_emb_name, initializer=self._param_initializer),
-            is_sparse=False)
-        position_emb_out = fluid.layers.embedding(
-            input=position_ids,
-            size=[self._max_position_seq_len, self._emb_size],
-            dtype=self._dtype,
-            param_attr=fluid.ParamAttr(name=self._pos_emb_name, initializer=self._param_initializer))
-
-        sent_emb_out = fluid.layers.embedding(
-            sentence_ids,
-            size=[self._sent_types, self._emb_size],
-            dtype=self._dtype,
-            param_attr=fluid.ParamAttr(name=self._sent_emb_name, initializer=self._param_initializer))
+        emb_out = fluid.layers.embedding(input=src_ids,
+                                         size=[self._voc_size, self._emb_size],
+                                         dtype=self._dtype,
+                                         param_attr=fluid.ParamAttr(name=self._word_emb_name,
+                                                                    initializer=self._param_initializer),
+                                         is_sparse=False)
+        position_emb_out = fluid.layers.embedding(input=position_ids,
+                                                  size=[self._max_position_seq_len, self._emb_size],
+                                                  dtype=self._dtype,
+                                                  param_attr=fluid.ParamAttr(name=self._pos_emb_name,
+                                                                             initializer=self._param_initializer))
+
+        sent_emb_out = fluid.layers.embedding(sentence_ids,
+                                              size=[self._sent_types, self._emb_size],
+                                              dtype=self._dtype,
+                                              param_attr=fluid.ParamAttr(name=self._sent_emb_name,
+                                                                         initializer=self._param_initializer))

        emb_out = emb_out + position_emb_out
        emb_out = emb_out + sent_emb_out
@@ -105,23 +105,22 @@ class BertModel(object):
        n_head_self_attn_mask = fluid.layers.stack(x=[self_attn_mask] * self._n_head, axis=1)
        n_head_self_attn_mask.stop_gradient = True

-        self._enc_out = encoder(
-            enc_input=emb_out,
-            attn_bias=n_head_self_attn_mask,
-            n_layer=self._n_layer,
-            n_head=self._n_head,
-            d_key=self._emb_size // self._n_head,
-            d_value=self._emb_size // self._n_head,
-            d_model=self._emb_size,
-            d_inner_hid=self._emb_size * 4,
-            prepostprocess_dropout=self._prepostprocess_dropout,
-            attention_dropout=self._attention_dropout,
-            relu_dropout=0,
-            hidden_act=self._hidden_act,
-            preprocess_cmd="",
-            postprocess_cmd="dan",
-            param_initializer=self._param_initializer,
-            name='encoder')
+        self._enc_out = encoder(enc_input=emb_out,
+                                attn_bias=n_head_self_attn_mask,
+                                n_layer=self._n_layer,
+                                n_head=self._n_head,
+                                d_key=self._emb_size // self._n_head,
+                                d_value=self._emb_size // self._n_head,
+                                d_model=self._emb_size,
+                                d_inner_hid=self._emb_size * 4,
+                                prepostprocess_dropout=self._prepostprocess_dropout,
+                                attention_dropout=self._attention_dropout,
+                                relu_dropout=0,
+                                hidden_act=self._hidden_act,
+                                preprocess_cmd="",
+                                postprocess_cmd="dan",
+                                param_initializer=self._param_initializer,
+                                name='encoder')

    def get_sequence_output(self):
        return self._enc_out
@@ -130,12 +129,12 @@ class BertModel(object):
        """Get the first feature of each sequence for classification"""

        next_sent_feat = fluid.layers.slice(input=self._enc_out, axes=[1], starts=[0], ends=[1])
-        next_sent_feat = fluid.layers.fc(
-            input=next_sent_feat,
-            size=self._emb_size,
-            act="tanh",
-            param_attr=fluid.ParamAttr(name="pooled_fc.w_0", initializer=self._param_initializer),
-            bias_attr="pooled_fc.b_0")
+        next_sent_feat = fluid.layers.fc(input=next_sent_feat,
+                                         size=self._emb_size,
+                                         act="tanh",
+                                         param_attr=fluid.ParamAttr(name="pooled_fc.w_0",
+                                                                    initializer=self._param_initializer),
+                                         bias_attr="pooled_fc.b_0")
        return next_sent_feat

    def get_pretraining_output(self, mask_label, mask_pos, labels):
@@ -150,43 +149,45 @@ class BertModel(object):
        mask_feat = fluid.layers.gather(input=reshaped_emb_out, index=mask_pos)

        # transform: fc
-        mask_trans_feat = fluid.layers.fc(
-            input=mask_feat,
-            size=self._emb_size,
-            act=self._hidden_act,
-            param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0', initializer=self._param_initializer),
-            bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
+        mask_trans_feat = fluid.layers.fc(input=mask_feat,
+                                          size=self._emb_size,
+                                          act=self._hidden_act,
+                                          param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0',
+                                                                     initializer=self._param_initializer),
+                                          bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
        # transform: layer norm
        mask_trans_feat = pre_process_layer(mask_trans_feat, 'n', name='mask_lm_trans')

-        mask_lm_out_bias_attr = fluid.ParamAttr(
-            name="mask_lm_out_fc.b_0", initializer=fluid.initializer.Constant(value=0.0))
+        mask_lm_out_bias_attr = fluid.ParamAttr(name="mask_lm_out_fc.b_0",
+                                                initializer=fluid.initializer.Constant(value=0.0))
        if self._weight_sharing:
-            fc_out = fluid.layers.matmul(
-                x=mask_trans_feat,
-                y=fluid.default_main_program().global_block().var(self._word_emb_name),
-                transpose_y=True)
-            fc_out += fluid.layers.create_parameter(
-                shape=[self._voc_size], dtype=self._dtype, attr=mask_lm_out_bias_attr, is_bias=True)
+            fc_out = fluid.layers.matmul(x=mask_trans_feat,
+                                         y=fluid.default_main_program().global_block().var(self._word_emb_name),
+                                         transpose_y=True)
+            fc_out += fluid.layers.create_parameter(shape=[self._voc_size],
+                                                    dtype=self._dtype,
+                                                    attr=mask_lm_out_bias_attr,
+                                                    is_bias=True)

        else:
-            fc_out = fluid.layers.fc(
-                input=mask_trans_feat,
-                size=self._voc_size,
-                param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0", initializer=self._param_initializer),
-                bias_attr=mask_lm_out_bias_attr)
+            fc_out = fluid.layers.fc(input=mask_trans_feat,
+                                     size=self._voc_size,
+                                     param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0",
+                                                                initializer=self._param_initializer),
+                                     bias_attr=mask_lm_out_bias_attr)

        mask_lm_loss = fluid.layers.softmax_with_cross_entropy(logits=fc_out, label=mask_label)
        mean_mask_lm_loss = fluid.layers.mean(mask_lm_loss)

-        next_sent_fc_out = fluid.layers.fc(
-            input=next_sent_feat,
-            size=2,
-            param_attr=fluid.ParamAttr(name="next_sent_fc.w_0", initializer=self._param_initializer),
-            bias_attr="next_sent_fc.b_0")
+        next_sent_fc_out = fluid.layers.fc(input=next_sent_feat,
+                                           size=2,
+                                           param_attr=fluid.ParamAttr(name="next_sent_fc.w_0",
+                                                                      initializer=self._param_initializer),
+                                           bias_attr="next_sent_fc.b_0")

-        next_sent_loss, next_sent_softmax = fluid.layers.softmax_with_cross_entropy(
-            logits=next_sent_fc_out, label=labels, return_softmax=True)
+        next_sent_loss, next_sent_softmax = fluid.layers.softmax_with_cross_entropy(logits=next_sent_fc_out,
+                                                                                    label=labels,
+                                                                                    return_softmax=True)

        next_sent_acc = fluid.layers.accuracy(input=next_sent_softmax, label=labels)


--- a/modules/text/semantic_model/bert_chinese_L_12_H_768_A_12/model/transformer_encoder.py
+++ b/modules/text/semantic_model/bert_chinese_L_12_H_768_A_12/model/transformer_encoder.py
@@ -50,24 +50,21 @@ def multi_head_attention(queries,
        """
        Add linear projection to queries, keys, and values.
        """
-        q = layers.fc(
-            input=queries,
-            size=d_key * n_head,
-            num_flatten_dims=2,
-            param_attr=fluid.ParamAttr(name=name + '_query_fc.w_0', initializer=param_initializer),
-            bias_attr=name + '_query_fc.b_0')
-        k = layers.fc(
-            input=keys,
-            size=d_key * n_head,
-            num_flatten_dims=2,
-            param_attr=fluid.ParamAttr(name=name + '_key_fc.w_0', initializer=param_initializer),
-            bias_attr=name + '_key_fc.b_0')
-        v = layers.fc(
-            input=values,
-            size=d_value * n_head,
-            num_flatten_dims=2,
-            param_attr=fluid.ParamAttr(name=name + '_value_fc.w_0', initializer=param_initializer),
-            bias_attr=name + '_value_fc.b_0')
+        q = layers.fc(input=queries,
+                      size=d_key * n_head,
+                      num_flatten_dims=2,
+                      param_attr=fluid.ParamAttr(name=name + '_query_fc.w_0', initializer=param_initializer),
+                      bias_attr=name + '_query_fc.b_0')
+        k = layers.fc(input=keys,
+                      size=d_key * n_head,
+                      num_flatten_dims=2,
+                      param_attr=fluid.ParamAttr(name=name + '_key_fc.w_0', initializer=param_initializer),
+                      bias_attr=name + '_key_fc.b_0')
+        v = layers.fc(input=values,
+                      size=d_value * n_head,
+                      num_flatten_dims=2,
+                      param_attr=fluid.ParamAttr(name=name + '_value_fc.w_0', initializer=param_initializer),
+                      bias_attr=name + '_value_fc.b_0')
        return q, k, v

    def __split_heads(x, n_head):
@@ -110,8 +107,10 @@ def multi_head_attention(queries,
            product += attn_bias
        weights = layers.softmax(product)
        if dropout_rate:
-            weights = layers.dropout(
-                weights, dropout_prob=dropout_rate, dropout_implementation="upscale_in_train", is_test=False)
+            weights = layers.dropout(weights,
+                                     dropout_prob=dropout_rate,
+                                     dropout_implementation="upscale_in_train",
+                                     is_test=False)
        out = layers.matmul(weights, v)
        return out

@@ -133,12 +132,11 @@ def multi_head_attention(queries,
    out = __combine_heads(ctx_multiheads)

    # Project back to the model size.
-    proj_out = layers.fc(
-        input=out,
-        size=d_model,
-        num_flatten_dims=2,
-        param_attr=fluid.ParamAttr(name=name + '_output_fc.w_0', initializer=param_initializer),
-        bias_attr=name + '_output_fc.b_0')
+    proj_out = layers.fc(input=out,
+                         size=d_model,
+                         num_flatten_dims=2,
+                         param_attr=fluid.ParamAttr(name=name + '_output_fc.w_0', initializer=param_initializer),
+                         bias_attr=name + '_output_fc.b_0')
    return proj_out


@@ -148,22 +146,22 @@ def positionwise_feed_forward(x, d_inner_hid, d_hid, dropout_rate, hidden_act, p
    This module consists of two linear transformations with a ReLU activation
    in between, which is applied to each position separately and identically.
    """
-    hidden = layers.fc(
-        input=x,
-        size=d_inner_hid,
-        num_flatten_dims=2,
-        act=hidden_act,
-        param_attr=fluid.ParamAttr(name=name + '_fc_0.w_0', initializer=param_initializer),
-        bias_attr=name + '_fc_0.b_0')
+    hidden = layers.fc(input=x,
+                       size=d_inner_hid,
+                       num_flatten_dims=2,
+                       act=hidden_act,
+                       param_attr=fluid.ParamAttr(name=name + '_fc_0.w_0', initializer=param_initializer),
+                       bias_attr=name + '_fc_0.b_0')
    if dropout_rate:
-        hidden = layers.dropout(
-            hidden, dropout_prob=dropout_rate, dropout_implementation="upscale_in_train", is_test=False)
-    out = layers.fc(
-        input=hidden,
-        size=d_hid,
-        num_flatten_dims=2,
-        param_attr=fluid.ParamAttr(name=name + '_fc_1.w_0', initializer=param_initializer),
-        bias_attr=name + '_fc_1.b_0')
+        hidden = layers.dropout(hidden,
+                                dropout_prob=dropout_rate,
+                                dropout_implementation="upscale_in_train",
+                                is_test=False)
+    out = layers.fc(input=hidden,
+                    size=d_hid,
+                    num_flatten_dims=2,
+                    param_attr=fluid.ParamAttr(name=name + '_fc_1.w_0', initializer=param_initializer),
+                    bias_attr=name + '_fc_1.b_0')
    return out


@@ -181,17 +179,20 @@ def pre_post_process_layer(prev_out, out, process_cmd, dropout_rate=0., name='')
            out_dtype = out.dtype
            if out_dtype == fluid.core.VarDesc.VarType.FP16:
                out = layers.cast(x=out, dtype="float32")
-            out = layers.layer_norm(
-                out,
-                begin_norm_axis=len(out.shape) - 1,
-                param_attr=fluid.ParamAttr(name=name + '_layer_norm_scale', initializer=fluid.initializer.Constant(1.)),
-                bias_attr=fluid.ParamAttr(name=name + '_layer_norm_bias', initializer=fluid.initializer.Constant(0.)))
+            out = layers.layer_norm(out,
+                                    begin_norm_axis=len(out.shape) - 1,
+                                    param_attr=fluid.ParamAttr(name=name + '_layer_norm_scale',
+                                                               initializer=fluid.initializer.Constant(1.)),
+                                    bias_attr=fluid.ParamAttr(name=name + '_layer_norm_bias',
+                                                              initializer=fluid.initializer.Constant(0.)))
            if out_dtype == fluid.core.VarDesc.VarType.FP16:
                out = layers.cast(x=out, dtype="float16")
        elif cmd == "d":  # add dropout
            if dropout_rate:
-                out = layers.dropout(
-                    out, dropout_prob=dropout_rate, dropout_implementation="upscale_in_train", is_test=False)
+                out = layers.dropout(out,
+                                     dropout_prob=dropout_rate,
+                                     dropout_implementation="upscale_in_train",
+                                     is_test=False)
    return out


@@ -220,28 +221,35 @@ def encoder_layer(enc_input,
    with the post_process_layer to add residual connection, layer normalization
    and droput.
    """
-    attn_output = multi_head_attention(
-        pre_process_layer(enc_input, preprocess_cmd, prepostprocess_dropout, name=name + '_pre_att'),
-        None,
-        None,
-        attn_bias,
-        d_key,
-        d_value,
-        d_model,
-        n_head,
-        attention_dropout,
-        param_initializer=param_initializer,
-        name=name + '_multi_head_att')
-    attn_output = post_process_layer(
-        enc_input, attn_output, postprocess_cmd, prepostprocess_dropout, name=name + '_post_att')
-    ffd_output = positionwise_feed_forward(
-        pre_process_layer(attn_output, preprocess_cmd, prepostprocess_dropout, name=name + '_pre_ffn'),
-        d_inner_hid,
-        d_model,
-        relu_dropout,
-        hidden_act,
-        param_initializer=param_initializer,
-        name=name + '_ffn')
+    attn_output = multi_head_attention(pre_process_layer(enc_input,
+                                                         preprocess_cmd,
+                                                         prepostprocess_dropout,
+                                                         name=name + '_pre_att'),
+                                       None,
+                                       None,
+                                       attn_bias,
+                                       d_key,
+                                       d_value,
+                                       d_model,
+                                       n_head,
+                                       attention_dropout,
+                                       param_initializer=param_initializer,
+                                       name=name + '_multi_head_att')
+    attn_output = post_process_layer(enc_input,
+                                     attn_output,
+                                     postprocess_cmd,
+                                     prepostprocess_dropout,
+                                     name=name + '_post_att')
+    ffd_output = positionwise_feed_forward(pre_process_layer(attn_output,
+                                                             preprocess_cmd,
+                                                             prepostprocess_dropout,
+                                                             name=name + '_pre_ffn'),
+                                           d_inner_hid,
+                                           d_model,
+                                           relu_dropout,
+                                           hidden_act,
+                                           param_initializer=param_initializer,
+                                           name=name + '_ffn')
    return post_process_layer(attn_output, ffd_output, postprocess_cmd, prepostprocess_dropout, name=name + '_post_ffn')


@@ -266,22 +274,21 @@ def encoder(enc_input,
    encoder_layer.
    """
    for i in range(n_layer):
-        enc_output = encoder_layer(
-            enc_input,
-            attn_bias,
-            n_head,
-            d_key,
-            d_value,
-            d_model,
-            d_inner_hid,
-            prepostprocess_dropout,
-            attention_dropout,
-            relu_dropout,
-            hidden_act,
-            preprocess_cmd,
-            postprocess_cmd,
-            param_initializer=param_initializer,
-            name=name + '_layer_' + str(i))
+        enc_output = encoder_layer(enc_input,
+                                   attn_bias,
+                                   n_head,
+                                   d_key,
+                                   d_value,
+                                   d_model,
+                                   d_inner_hid,
+                                   prepostprocess_dropout,
+                                   attention_dropout,
+                                   relu_dropout,
+                                   hidden_act,
+                                   preprocess_cmd,
+                                   postprocess_cmd,
+                                   param_initializer=param_initializer,
+                                   name=name + '_layer_' + str(i))
        enc_input = enc_output
    enc_output = pre_process_layer(enc_output, preprocess_cmd, prepostprocess_dropout, name="post_encoder")


--- a/modules/text/semantic_model/bert_cased_L_12_H_768_A_12/module.py
+++ b/modules/text/semantic_model/bert_cased_L_12_H_768_A_12/module.py
@@ -58,13 +58,12 @@ class Bert(TransformerModule):
            pooled_output (tensor):  sentence-level output for classification task.
            sequence_output (tensor): token-level output for sequence task.
        """
-        bert = BertModel(
-            src_ids=input_ids,
-            position_ids=position_ids,
-            sentence_ids=segment_ids,
-            input_mask=input_mask,
-            config=self.bert_config,
-            use_fp16=False)
+        bert = BertModel(src_ids=input_ids,
+                         position_ids=position_ids,
+                         sentence_ids=segment_ids,
+                         input_mask=input_mask,
+                         config=self.bert_config,
+                         use_fp16=False)
        pooled_output = bert.get_pooled_output()
        sequence_output = bert.get_sequence_output()
        return pooled_output, sequence_output

--- a/modules/text/semantic_model/bert_cased_L_24_H_1024_A_16/README.md
+++ b/modules/text/semantic_model/bert_cased_L_24_H_1024_A_16/README.md
--- a/modules/text/semantic_model/bert_cased_L_24_H_1024_A_16/__init__.py
+++ b/modules/text/semantic_model/bert_cased_L_24_H_1024_A_16/__init__.py
--- a/modules/text/semantic_model/bert_cased_L_24_H_1024_A_16/model/__init__.py
+++ b/modules/text/semantic_model/bert_cased_L_24_H_1024_A_16/model/__init__.py
--- a/modules/text/semantic_model/bert_cased_L_24_H_1024_A_16/model/bert.py
+++ b/modules/text/semantic_model/bert_cased_L_24_H_1024_A_16/model/bert.py
@@ -74,23 +74,23 @@ class BertModel(object):

    def _build_model(self, src_ids, position_ids, sentence_ids, input_mask):
        # padding id in vocabulary must be set to 0
-        emb_out = fluid.layers.embedding(
-            input=src_ids,
-            size=[self._voc_size, self._emb_size],
-            dtype=self._dtype,
-            param_attr=fluid.ParamAttr(name=self._word_emb_name, initializer=self._param_initializer),
-            is_sparse=False)
-        position_emb_out = fluid.layers.embedding(
-            input=position_ids,
-            size=[self._max_position_seq_len, self._emb_size],
-            dtype=self._dtype,
-            param_attr=fluid.ParamAttr(name=self._pos_emb_name, initializer=self._param_initializer))
-
-        sent_emb_out = fluid.layers.embedding(
-            sentence_ids,
-            size=[self._sent_types, self._emb_size],
-            dtype=self._dtype,
-            param_attr=fluid.ParamAttr(name=self._sent_emb_name, initializer=self._param_initializer))
+        emb_out = fluid.layers.embedding(input=src_ids,
+                                         size=[self._voc_size, self._emb_size],
+                                         dtype=self._dtype,
+                                         param_attr=fluid.ParamAttr(name=self._word_emb_name,
+                                                                    initializer=self._param_initializer),
+                                         is_sparse=False)
+        position_emb_out = fluid.layers.embedding(input=position_ids,
+                                                  size=[self._max_position_seq_len, self._emb_size],
+                                                  dtype=self._dtype,
+                                                  param_attr=fluid.ParamAttr(name=self._pos_emb_name,
+                                                                             initializer=self._param_initializer))
+
+        sent_emb_out = fluid.layers.embedding(sentence_ids,
+                                              size=[self._sent_types, self._emb_size],
+                                              dtype=self._dtype,
+                                              param_attr=fluid.ParamAttr(name=self._sent_emb_name,
+                                                                         initializer=self._param_initializer))

        emb_out = emb_out + position_emb_out
        emb_out = emb_out + sent_emb_out
@@ -105,23 +105,22 @@ class BertModel(object):
        n_head_self_attn_mask = fluid.layers.stack(x=[self_attn_mask] * self._n_head, axis=1)
        n_head_self_attn_mask.stop_gradient = True

-        self._enc_out = encoder(
-            enc_input=emb_out,
-            attn_bias=n_head_self_attn_mask,
-            n_layer=self._n_layer,
-            n_head=self._n_head,
-            d_key=self._emb_size // self._n_head,
-            d_value=self._emb_size // self._n_head,
-            d_model=self._emb_size,
-            d_inner_hid=self._emb_size * 4,
-            prepostprocess_dropout=self._prepostprocess_dropout,
-            attention_dropout=self._attention_dropout,
-            relu_dropout=0,
-            hidden_act=self._hidden_act,
-            preprocess_cmd="",
-            postprocess_cmd="dan",
-            param_initializer=self._param_initializer,
-            name='encoder')
+        self._enc_out = encoder(enc_input=emb_out,
+                                attn_bias=n_head_self_attn_mask,
+                                n_layer=self._n_layer,
+                                n_head=self._n_head,
+                                d_key=self._emb_size // self._n_head,
+                                d_value=self._emb_size // self._n_head,
+                                d_model=self._emb_size,
+                                d_inner_hid=self._emb_size * 4,
+                                prepostprocess_dropout=self._prepostprocess_dropout,
+                                attention_dropout=self._attention_dropout,
+                                relu_dropout=0,
+                                hidden_act=self._hidden_act,
+                                preprocess_cmd="",
+                                postprocess_cmd="dan",
+                                param_initializer=self._param_initializer,
+                                name='encoder')

    def get_sequence_output(self):
        return self._enc_out
@@ -130,12 +129,12 @@ class BertModel(object):
        """Get the first feature of each sequence for classification"""

        next_sent_feat = fluid.layers.slice(input=self._enc_out, axes=[1], starts=[0], ends=[1])
-        next_sent_feat = fluid.layers.fc(
-            input=next_sent_feat,
-            size=self._emb_size,
-            act="tanh",
-            param_attr=fluid.ParamAttr(name="pooled_fc.w_0", initializer=self._param_initializer),
-            bias_attr="pooled_fc.b_0")
+        next_sent_feat = fluid.layers.fc(input=next_sent_feat,
+                                         size=self._emb_size,
+                                         act="tanh",
+                                         param_attr=fluid.ParamAttr(name="pooled_fc.w_0",
+                                                                    initializer=self._param_initializer),
+                                         bias_attr="pooled_fc.b_0")
        return next_sent_feat

    def get_pretraining_output(self, mask_label, mask_pos, labels):
@@ -150,43 +149,45 @@ class BertModel(object):
        mask_feat = fluid.layers.gather(input=reshaped_emb_out, index=mask_pos)

        # transform: fc
-        mask_trans_feat = fluid.layers.fc(
-            input=mask_feat,
-            size=self._emb_size,
-            act=self._hidden_act,
-            param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0', initializer=self._param_initializer),
-            bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
+        mask_trans_feat = fluid.layers.fc(input=mask_feat,
+                                          size=self._emb_size,
+                                          act=self._hidden_act,
+                                          param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0',
+                                                                     initializer=self._param_initializer),
+                                          bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
        # transform: layer norm
        mask_trans_feat = pre_process_layer(mask_trans_feat, 'n', name='mask_lm_trans')

-        mask_lm_out_bias_attr = fluid.ParamAttr(
-            name="mask_lm_out_fc.b_0", initializer=fluid.initializer.Constant(value=0.0))
+        mask_lm_out_bias_attr = fluid.ParamAttr(name="mask_lm_out_fc.b_0",
+                                                initializer=fluid.initializer.Constant(value=0.0))
        if self._weight_sharing:
-            fc_out = fluid.layers.matmul(
-                x=mask_trans_feat,
-                y=fluid.default_main_program().global_block().var(self._word_emb_name),
-                transpose_y=True)
-            fc_out += fluid.layers.create_parameter(
-                shape=[self._voc_size], dtype=self._dtype, attr=mask_lm_out_bias_attr, is_bias=True)
+            fc_out = fluid.layers.matmul(x=mask_trans_feat,
+                                         y=fluid.default_main_program().global_block().var(self._word_emb_name),
+                                         transpose_y=True)
+            fc_out += fluid.layers.create_parameter(shape=[self._voc_size],
+                                                    dtype=self._dtype,
+                                                    attr=mask_lm_out_bias_attr,
+                                                    is_bias=True)

        else:
-            fc_out = fluid.layers.fc(
-                input=mask_trans_feat,
-                size=self._voc_size,
-                param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0", initializer=self._param_initializer),
-                bias_attr=mask_lm_out_bias_attr)
+            fc_out = fluid.layers.fc(input=mask_trans_feat,
+                                     size=self._voc_size,
+                                     param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0",
+                                                                initializer=self._param_initializer),
+                                     bias_attr=mask_lm_out_bias_attr)

        mask_lm_loss = fluid.layers.softmax_with_cross_entropy(logits=fc_out, label=mask_label)
        mean_mask_lm_loss = fluid.layers.mean(mask_lm_loss)

-        next_sent_fc_out = fluid.layers.fc(
-            input=next_sent_feat,
-            size=2,
-            param_attr=fluid.ParamAttr(name="next_sent_fc.w_0", initializer=self._param_initializer),
-            bias_attr="next_sent_fc.b_0")
+        next_sent_fc_out = fluid.layers.fc(input=next_sent_feat,
+                                           size=2,
+                                           param_attr=fluid.ParamAttr(name="next_sent_fc.w_0",
+                                                                      initializer=self._param_initializer),
+                                           bias_attr="next_sent_fc.b_0")

-        next_sent_loss, next_sent_softmax = fluid.layers.softmax_with_cross_entropy(
-            logits=next_sent_fc_out, label=labels, return_softmax=True)
+        next_sent_loss, next_sent_softmax = fluid.layers.softmax_with_cross_entropy(logits=next_sent_fc_out,
+                                                                                    label=labels,
+                                                                                    return_softmax=True)

        next_sent_acc = fluid.layers.accuracy(input=next_sent_softmax, label=labels)


--- a/modules/text/semantic_model/bert_cased_L_24_H_1024_A_16/model/transformer_encoder.py
+++ b/modules/text/semantic_model/bert_cased_L_24_H_1024_A_16/model/transformer_encoder.py
@@ -50,24 +50,21 @@ def multi_head_attention(queries,
        """
        Add linear projection to queries, keys, and values.
        """
-        q = layers.fc(
-            input=queries,
-            size=d_key * n_head,
-            num_flatten_dims=2,
-            param_attr=fluid.ParamAttr(name=name + '_query_fc.w_0', initializer=param_initializer),
-            bias_attr=name + '_query_fc.b_0')
-        k = layers.fc(
-            input=keys,
-            size=d_key * n_head,
-            num_flatten_dims=2,
-            param_attr=fluid.ParamAttr(name=name + '_key_fc.w_0', initializer=param_initializer),
-            bias_attr=name + '_key_fc.b_0')
-        v = layers.fc(
-            input=values,
-            size=d_value * n_head,
-            num_flatten_dims=2,
-            param_attr=fluid.ParamAttr(name=name + '_value_fc.w_0', initializer=param_initializer),
-            bias_attr=name + '_value_fc.b_0')
+        q = layers.fc(input=queries,
+                      size=d_key * n_head,
+                      num_flatten_dims=2,
+                      param_attr=fluid.ParamAttr(name=name + '_query_fc.w_0', initializer=param_initializer),
+                      bias_attr=name + '_query_fc.b_0')
+        k = layers.fc(input=keys,
+                      size=d_key * n_head,
+                      num_flatten_dims=2,
+                      param_attr=fluid.ParamAttr(name=name + '_key_fc.w_0', initializer=param_initializer),
+                      bias_attr=name + '_key_fc.b_0')
+        v = layers.fc(input=values,
+                      size=d_value * n_head,
+                      num_flatten_dims=2,
+                      param_attr=fluid.ParamAttr(name=name + '_value_fc.w_0', initializer=param_initializer),
+                      bias_attr=name + '_value_fc.b_0')
        return q, k, v

    def __split_heads(x, n_head):
@@ -110,8 +107,10 @@ def multi_head_attention(queries,
            product += attn_bias
        weights = layers.softmax(product)
        if dropout_rate:
-            weights = layers.dropout(
-                weights, dropout_prob=dropout_rate, dropout_implementation="upscale_in_train", is_test=False)
+            weights = layers.dropout(weights,
+                                     dropout_prob=dropout_rate,
+                                     dropout_implementation="upscale_in_train",
+                                     is_test=False)
        out = layers.matmul(weights, v)
        return out

@@ -133,12 +132,11 @@ def multi_head_attention(queries,
    out = __combine_heads(ctx_multiheads)

    # Project back to the model size.
-    proj_out = layers.fc(
-        input=out,
-        size=d_model,
-        num_flatten_dims=2,
-        param_attr=fluid.ParamAttr(name=name + '_output_fc.w_0', initializer=param_initializer),
-        bias_attr=name + '_output_fc.b_0')
+    proj_out = layers.fc(input=out,
+                         size=d_model,
+                         num_flatten_dims=2,
+                         param_attr=fluid.ParamAttr(name=name + '_output_fc.w_0', initializer=param_initializer),
+                         bias_attr=name + '_output_fc.b_0')
    return proj_out


@@ -148,22 +146,22 @@ def positionwise_feed_forward(x, d_inner_hid, d_hid, dropout_rate, hidden_act, p
    This module consists of two linear transformations with a ReLU activation
    in between, which is applied to each position separately and identically.
    """
-    hidden = layers.fc(
-        input=x,
-        size=d_inner_hid,
-        num_flatten_dims=2,
-        act=hidden_act,
-        param_attr=fluid.ParamAttr(name=name + '_fc_0.w_0', initializer=param_initializer),
-        bias_attr=name + '_fc_0.b_0')
+    hidden = layers.fc(input=x,
+                       size=d_inner_hid,
+                       num_flatten_dims=2,
+                       act=hidden_act,
+                       param_attr=fluid.ParamAttr(name=name + '_fc_0.w_0', initializer=param_initializer),
+                       bias_attr=name + '_fc_0.b_0')
    if dropout_rate:
-        hidden = layers.dropout(
-            hidden, dropout_prob=dropout_rate, dropout_implementation="upscale_in_train", is_test=False)
-    out = layers.fc(
-        input=hidden,
-        size=d_hid,
-        num_flatten_dims=2,
-        param_attr=fluid.ParamAttr(name=name + '_fc_1.w_0', initializer=param_initializer),
-        bias_attr=name + '_fc_1.b_0')
+        hidden = layers.dropout(hidden,
+                                dropout_prob=dropout_rate,
+                                dropout_implementation="upscale_in_train",
+                                is_test=False)
+    out = layers.fc(input=hidden,
+                    size=d_hid,
+                    num_flatten_dims=2,
+                    param_attr=fluid.ParamAttr(name=name + '_fc_1.w_0', initializer=param_initializer),
+                    bias_attr=name + '_fc_1.b_0')
    return out


@@ -181,17 +179,20 @@ def pre_post_process_layer(prev_out, out, process_cmd, dropout_rate=0., name='')
            out_dtype = out.dtype
            if out_dtype == fluid.core.VarDesc.VarType.FP16:
                out = layers.cast(x=out, dtype="float32")
-            out = layers.layer_norm(
-                out,
-                begin_norm_axis=len(out.shape) - 1,
-                param_attr=fluid.ParamAttr(name=name + '_layer_norm_scale', initializer=fluid.initializer.Constant(1.)),
-                bias_attr=fluid.ParamAttr(name=name + '_layer_norm_bias', initializer=fluid.initializer.Constant(0.)))
+            out = layers.layer_norm(out,
+                                    begin_norm_axis=len(out.shape) - 1,
+                                    param_attr=fluid.ParamAttr(name=name + '_layer_norm_scale',
+                                                               initializer=fluid.initializer.Constant(1.)),
+                                    bias_attr=fluid.ParamAttr(name=name + '_layer_norm_bias',
+                                                              initializer=fluid.initializer.Constant(0.)))
            if out_dtype == fluid.core.VarDesc.VarType.FP16:
                out = layers.cast(x=out, dtype="float16")
        elif cmd == "d":  # add dropout
            if dropout_rate:
-                out = layers.dropout(
-                    out, dropout_prob=dropout_rate, dropout_implementation="upscale_in_train", is_test=False)
+                out = layers.dropout(out,
+                                     dropout_prob=dropout_rate,
+                                     dropout_implementation="upscale_in_train",
+                                     is_test=False)
    return out


@@ -220,28 +221,35 @@ def encoder_layer(enc_input,
    with the post_process_layer to add residual connection, layer normalization
    and droput.
    """
-    attn_output = multi_head_attention(
-        pre_process_layer(enc_input, preprocess_cmd, prepostprocess_dropout, name=name + '_pre_att'),
-        None,
-        None,
-        attn_bias,
-        d_key,
-        d_value,
-        d_model,
-        n_head,
-        attention_dropout,
-        param_initializer=param_initializer,
-        name=name + '_multi_head_att')
-    attn_output = post_process_layer(
-        enc_input, attn_output, postprocess_cmd, prepostprocess_dropout, name=name + '_post_att')
-    ffd_output = positionwise_feed_forward(
-        pre_process_layer(attn_output, preprocess_cmd, prepostprocess_dropout, name=name + '_pre_ffn'),
-        d_inner_hid,
-        d_model,
-        relu_dropout,
-        hidden_act,
-        param_initializer=param_initializer,
-        name=name + '_ffn')
+    attn_output = multi_head_attention(pre_process_layer(enc_input,
+                                                         preprocess_cmd,
+                                                         prepostprocess_dropout,
+                                                         name=name + '_pre_att'),
+                                       None,
+                                       None,
+                                       attn_bias,
+                                       d_key,
+                                       d_value,
+                                       d_model,
+                                       n_head,
+                                       attention_dropout,
+                                       param_initializer=param_initializer,
+                                       name=name + '_multi_head_att')
+    attn_output = post_process_layer(enc_input,
+                                     attn_output,
+                                     postprocess_cmd,
+                                     prepostprocess_dropout,
+                                     name=name + '_post_att')
+    ffd_output = positionwise_feed_forward(pre_process_layer(attn_output,
+                                                             preprocess_cmd,
+                                                             prepostprocess_dropout,
+                                                             name=name + '_pre_ffn'),
+                                           d_inner_hid,
+                                           d_model,
+                                           relu_dropout,
+                                           hidden_act,
+                                           param_initializer=param_initializer,
+                                           name=name + '_ffn')
    return post_process_layer(attn_output, ffd_output, postprocess_cmd, prepostprocess_dropout, name=name + '_post_ffn')


@@ -266,22 +274,21 @@ def encoder(enc_input,
    encoder_layer.
    """
    for i in range(n_layer):
-        enc_output = encoder_layer(
-            enc_input,
-            attn_bias,
-            n_head,
-            d_key,
-            d_value,
-            d_model,
-            d_inner_hid,
-            prepostprocess_dropout,
-            attention_dropout,
-            relu_dropout,
-            hidden_act,
-            preprocess_cmd,
-            postprocess_cmd,
-            param_initializer=param_initializer,
-            name=name + '_layer_' + str(i))
+        enc_output = encoder_layer(enc_input,
+                                   attn_bias,
+                                   n_head,
+                                   d_key,
+                                   d_value,
+                                   d_model,
+                                   d_inner_hid,
+                                   prepostprocess_dropout,
+                                   attention_dropout,
+                                   relu_dropout,
+                                   hidden_act,
+                                   preprocess_cmd,
+                                   postprocess_cmd,
+                                   param_initializer=param_initializer,
+                                   name=name + '_layer_' + str(i))
        enc_input = enc_output
    enc_output = pre_process_layer(enc_output, preprocess_cmd, prepostprocess_dropout, name="post_encoder")


--- a/modules/text/semantic_model/bert_cased_L_24_H_1024_A_16/module.py
+++ b/modules/text/semantic_model/bert_cased_L_24_H_1024_A_16/module.py
@@ -58,13 +58,12 @@ class Bert(TransformerModule):
            pooled_output (tensor):  sentence-level output for classification task.
            sequence_output (tensor): token-level output for sequence task.
        """
-        bert = BertModel(
-            src_ids=input_ids,
-            position_ids=position_ids,
-            sentence_ids=segment_ids,
-            input_mask=input_mask,
-            config=self.bert_config,
-            use_fp16=False)
+        bert = BertModel(src_ids=input_ids,
+                         position_ids=position_ids,
+                         sentence_ids=segment_ids,
+                         input_mask=input_mask,
+                         config=self.bert_config,
+                         use_fp16=False)
        pooled_output = bert.get_pooled_output()
        sequence_output = bert.get_sequence_output()
        return pooled_output, sequence_output

--- a/modules/text/semantic_model/bert_chinese_L_12_H_768_A_12/README.md
+++ b/modules/text/semantic_model/bert_chinese_L_12_H_768_A_12/README.md
--- a/modules/text/semantic_model/bert_chinese_L_12_H_768_A_12/__init__.py
+++ b/modules/text/semantic_model/bert_chinese_L_12_H_768_A_12/__init__.py
--- a/modules/text/semantic_model/bert_chinese_L_12_H_768_A_12/model/__init__.py
+++ b/modules/text/semantic_model/bert_chinese_L_12_H_768_A_12/model/__init__.py
--- a/modules/text/semantic_model/bert_chinese_L_12_H_768_A_12/model/bert.py
+++ b/modules/text/semantic_model/bert_chinese_L_12_H_768_A_12/model/bert.py
@@ -74,23 +74,23 @@ class BertModel(object):

    def _build_model(self, src_ids, position_ids, sentence_ids, input_mask):
        # padding id in vocabulary must be set to 0
-        emb_out = fluid.layers.embedding(
-            input=src_ids,
-            size=[self._voc_size, self._emb_size],
-            dtype=self._dtype,
-            param_attr=fluid.ParamAttr(name=self._word_emb_name, initializer=self._param_initializer),
-            is_sparse=False)
-        position_emb_out = fluid.layers.embedding(
-            input=position_ids,
-            size=[self._max_position_seq_len, self._emb_size],
-            dtype=self._dtype,
-            param_attr=fluid.ParamAttr(name=self._pos_emb_name, initializer=self._param_initializer))
-
-        sent_emb_out = fluid.layers.embedding(
-            sentence_ids,
-            size=[self._sent_types, self._emb_size],
-            dtype=self._dtype,
-            param_attr=fluid.ParamAttr(name=self._sent_emb_name, initializer=self._param_initializer))
+        emb_out = fluid.layers.embedding(input=src_ids,
+                                         size=[self._voc_size, self._emb_size],
+                                         dtype=self._dtype,
+                                         param_attr=fluid.ParamAttr(name=self._word_emb_name,
+                                                                    initializer=self._param_initializer),
+                                         is_sparse=False)
+        position_emb_out = fluid.layers.embedding(input=position_ids,
+                                                  size=[self._max_position_seq_len, self._emb_size],
+                                                  dtype=self._dtype,
+                                                  param_attr=fluid.ParamAttr(name=self._pos_emb_name,
+                                                                             initializer=self._param_initializer))
+
+        sent_emb_out = fluid.layers.embedding(sentence_ids,
+                                              size=[self._sent_types, self._emb_size],
+                                              dtype=self._dtype,
+                                              param_attr=fluid.ParamAttr(name=self._sent_emb_name,
+                                                                         initializer=self._param_initializer))

        emb_out = emb_out + position_emb_out
        emb_out = emb_out + sent_emb_out
@@ -105,23 +105,22 @@ class BertModel(object):
        n_head_self_attn_mask = fluid.layers.stack(x=[self_attn_mask] * self._n_head, axis=1)
        n_head_self_attn_mask.stop_gradient = True

-        self._enc_out = encoder(
-            enc_input=emb_out,
-            attn_bias=n_head_self_attn_mask,
-            n_layer=self._n_layer,
-            n_head=self._n_head,
-            d_key=self._emb_size // self._n_head,
-            d_value=self._emb_size // self._n_head,
-            d_model=self._emb_size,
-            d_inner_hid=self._emb_size * 4,
-            prepostprocess_dropout=self._prepostprocess_dropout,
-            attention_dropout=self._attention_dropout,
-            relu_dropout=0,
-            hidden_act=self._hidden_act,
-            preprocess_cmd="",
-            postprocess_cmd="dan",
-            param_initializer=self._param_initializer,
-            name='encoder')
+        self._enc_out = encoder(enc_input=emb_out,
+                                attn_bias=n_head_self_attn_mask,
+                                n_layer=self._n_layer,
+                                n_head=self._n_head,
+                                d_key=self._emb_size // self._n_head,
+                                d_value=self._emb_size // self._n_head,
+                                d_model=self._emb_size,
+                                d_inner_hid=self._emb_size * 4,
+                                prepostprocess_dropout=self._prepostprocess_dropout,
+                                attention_dropout=self._attention_dropout,
+                                relu_dropout=0,
+                                hidden_act=self._hidden_act,
+                                preprocess_cmd="",
+                                postprocess_cmd="dan",
+                                param_initializer=self._param_initializer,
+                                name='encoder')

    def get_sequence_output(self):
        return self._enc_out
@@ -130,12 +129,12 @@ class BertModel(object):
        """Get the first feature of each sequence for classification"""

        next_sent_feat = fluid.layers.slice(input=self._enc_out, axes=[1], starts=[0], ends=[1])
-        next_sent_feat = fluid.layers.fc(
-            input=next_sent_feat,
-            size=self._emb_size,
-            act="tanh",
-            param_attr=fluid.ParamAttr(name="pooled_fc.w_0", initializer=self._param_initializer),
-            bias_attr="pooled_fc.b_0")
+        next_sent_feat = fluid.layers.fc(input=next_sent_feat,
+                                         size=self._emb_size,
+                                         act="tanh",
+                                         param_attr=fluid.ParamAttr(name="pooled_fc.w_0",
+                                                                    initializer=self._param_initializer),
+                                         bias_attr="pooled_fc.b_0")
        return next_sent_feat

    def get_pretraining_output(self, mask_label, mask_pos, labels):
@@ -150,43 +149,45 @@ class BertModel(object):
        mask_feat = fluid.layers.gather(input=reshaped_emb_out, index=mask_pos)

        # transform: fc
-        mask_trans_feat = fluid.layers.fc(
-            input=mask_feat,
-            size=self._emb_size,
-            act=self._hidden_act,
-            param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0', initializer=self._param_initializer),
-            bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
+        mask_trans_feat = fluid.layers.fc(input=mask_feat,
+                                          size=self._emb_size,
+                                          act=self._hidden_act,
+                                          param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0',
+                                                                     initializer=self._param_initializer),
+                                          bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
        # transform: layer norm
        mask_trans_feat = pre_process_layer(mask_trans_feat, 'n', name='mask_lm_trans')

-        mask_lm_out_bias_attr = fluid.ParamAttr(
-            name="mask_lm_out_fc.b_0", initializer=fluid.initializer.Constant(value=0.0))
+        mask_lm_out_bias_attr = fluid.ParamAttr(name="mask_lm_out_fc.b_0",
+                                                initializer=fluid.initializer.Constant(value=0.0))
        if self._weight_sharing:
-            fc_out = fluid.layers.matmul(
-                x=mask_trans_feat,
-                y=fluid.default_main_program().global_block().var(self._word_emb_name),
-                transpose_y=True)
-            fc_out += fluid.layers.create_parameter(
-                shape=[self._voc_size], dtype=self._dtype, attr=mask_lm_out_bias_attr, is_bias=True)
+            fc_out = fluid.layers.matmul(x=mask_trans_feat,
+                                         y=fluid.default_main_program().global_block().var(self._word_emb_name),
+                                         transpose_y=True)
+            fc_out += fluid.layers.create_parameter(shape=[self._voc_size],
+                                                    dtype=self._dtype,
+                                                    attr=mask_lm_out_bias_attr,
+                                                    is_bias=True)

        else:
-            fc_out = fluid.layers.fc(
-                input=mask_trans_feat,
-                size=self._voc_size,
-                param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0", initializer=self._param_initializer),
-                bias_attr=mask_lm_out_bias_attr)
+            fc_out = fluid.layers.fc(input=mask_trans_feat,
+                                     size=self._voc_size,
+                                     param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0",
+                                                                initializer=self._param_initializer),
+                                     bias_attr=mask_lm_out_bias_attr)

        mask_lm_loss = fluid.layers.softmax_with_cross_entropy(logits=fc_out, label=mask_label)
        mean_mask_lm_loss = fluid.layers.mean(mask_lm_loss)

-        next_sent_fc_out = fluid.layers.fc(
-            input=next_sent_feat,
-            size=2,
-            param_attr=fluid.ParamAttr(name="next_sent_fc.w_0", initializer=self._param_initializer),
-            bias_attr="next_sent_fc.b_0")
+        next_sent_fc_out = fluid.layers.fc(input=next_sent_feat,
+                                           size=2,
+                                           param_attr=fluid.ParamAttr(name="next_sent_fc.w_0",
+                                                                      initializer=self._param_initializer),
+                                           bias_attr="next_sent_fc.b_0")

-        next_sent_loss, next_sent_softmax = fluid.layers.softmax_with_cross_entropy(
-            logits=next_sent_fc_out, label=labels, return_softmax=True)
+        next_sent_loss, next_sent_softmax = fluid.layers.softmax_with_cross_entropy(logits=next_sent_fc_out,
+                                                                                    label=labels,
+                                                                                    return_softmax=True)

        next_sent_acc = fluid.layers.accuracy(input=next_sent_softmax, label=labels)


--- a/modules/text/semantic_model/bert_multi_cased_L_12_H_768_A_12/model/transformer_encoder.py
+++ b/modules/text/semantic_model/bert_multi_cased_L_12_H_768_A_12/model/transformer_encoder.py
@@ -50,24 +50,21 @@ def multi_head_attention(queries,
        """
        Add linear projection to queries, keys, and values.
        """
-        q = layers.fc(
-            input=queries,
-            size=d_key * n_head,
-            num_flatten_dims=2,
-            param_attr=fluid.ParamAttr(name=name + '_query_fc.w_0', initializer=param_initializer),
-            bias_attr=name + '_query_fc.b_0')
-        k = layers.fc(
-            input=keys,
-            size=d_key * n_head,
-            num_flatten_dims=2,
-            param_attr=fluid.ParamAttr(name=name + '_key_fc.w_0', initializer=param_initializer),
-            bias_attr=name + '_key_fc.b_0')
-        v = layers.fc(
-            input=values,
-            size=d_value * n_head,
-            num_flatten_dims=2,
-            param_attr=fluid.ParamAttr(name=name + '_value_fc.w_0', initializer=param_initializer),
-            bias_attr=name + '_value_fc.b_0')
+        q = layers.fc(input=queries,
+                      size=d_key * n_head,
+                      num_flatten_dims=2,
+                      param_attr=fluid.ParamAttr(name=name + '_query_fc.w_0', initializer=param_initializer),
+                      bias_attr=name + '_query_fc.b_0')
+        k = layers.fc(input=keys,
+                      size=d_key * n_head,
+                      num_flatten_dims=2,
+                      param_attr=fluid.ParamAttr(name=name + '_key_fc.w_0', initializer=param_initializer),
+                      bias_attr=name + '_key_fc.b_0')
+        v = layers.fc(input=values,
+                      size=d_value * n_head,
+                      num_flatten_dims=2,
+                      param_attr=fluid.ParamAttr(name=name + '_value_fc.w_0', initializer=param_initializer),
+                      bias_attr=name + '_value_fc.b_0')
        return q, k, v

    def __split_heads(x, n_head):
@@ -110,8 +107,10 @@ def multi_head_attention(queries,
            product += attn_bias
        weights = layers.softmax(product)
        if dropout_rate:
-            weights = layers.dropout(
-                weights, dropout_prob=dropout_rate, dropout_implementation="upscale_in_train", is_test=False)
+            weights = layers.dropout(weights,
+                                     dropout_prob=dropout_rate,
+                                     dropout_implementation="upscale_in_train",
+                                     is_test=False)
        out = layers.matmul(weights, v)
        return out

@@ -133,12 +132,11 @@ def multi_head_attention(queries,
    out = __combine_heads(ctx_multiheads)

    # Project back to the model size.
-    proj_out = layers.fc(
-        input=out,
-        size=d_model,
-        num_flatten_dims=2,
-        param_attr=fluid.ParamAttr(name=name + '_output_fc.w_0', initializer=param_initializer),
-        bias_attr=name + '_output_fc.b_0')
+    proj_out = layers.fc(input=out,
+                         size=d_model,
+                         num_flatten_dims=2,
+                         param_attr=fluid.ParamAttr(name=name + '_output_fc.w_0', initializer=param_initializer),
+                         bias_attr=name + '_output_fc.b_0')
    return proj_out


@@ -148,22 +146,22 @@ def positionwise_feed_forward(x, d_inner_hid, d_hid, dropout_rate, hidden_act, p
    This module consists of two linear transformations with a ReLU activation
    in between, which is applied to each position separately and identically.
    """
-    hidden = layers.fc(
-        input=x,
-        size=d_inner_hid,
-        num_flatten_dims=2,
-        act=hidden_act,
-        param_attr=fluid.ParamAttr(name=name + '_fc_0.w_0', initializer=param_initializer),
-        bias_attr=name + '_fc_0.b_0')
+    hidden = layers.fc(input=x,
+                       size=d_inner_hid,
+                       num_flatten_dims=2,
+                       act=hidden_act,
+                       param_attr=fluid.ParamAttr(name=name + '_fc_0.w_0', initializer=param_initializer),
+                       bias_attr=name + '_fc_0.b_0')
    if dropout_rate:
-        hidden = layers.dropout(
-            hidden, dropout_prob=dropout_rate, dropout_implementation="upscale_in_train", is_test=False)
-    out = layers.fc(
-        input=hidden,
-        size=d_hid,
-        num_flatten_dims=2,
-        param_attr=fluid.ParamAttr(name=name + '_fc_1.w_0', initializer=param_initializer),
-        bias_attr=name + '_fc_1.b_0')
+        hidden = layers.dropout(hidden,
+                                dropout_prob=dropout_rate,
+                                dropout_implementation="upscale_in_train",
+                                is_test=False)
+    out = layers.fc(input=hidden,
+                    size=d_hid,
+                    num_flatten_dims=2,
+                    param_attr=fluid.ParamAttr(name=name + '_fc_1.w_0', initializer=param_initializer),
+                    bias_attr=name + '_fc_1.b_0')
    return out


@@ -181,17 +179,20 @@ def pre_post_process_layer(prev_out, out, process_cmd, dropout_rate=0., name='')
            out_dtype = out.dtype
            if out_dtype == fluid.core.VarDesc.VarType.FP16:
                out = layers.cast(x=out, dtype="float32")
-            out = layers.layer_norm(
-                out,
-                begin_norm_axis=len(out.shape) - 1,
-                param_attr=fluid.ParamAttr(name=name + '_layer_norm_scale', initializer=fluid.initializer.Constant(1.)),
-                bias_attr=fluid.ParamAttr(name=name + '_layer_norm_bias', initializer=fluid.initializer.Constant(0.)))
+            out = layers.layer_norm(out,
+                                    begin_norm_axis=len(out.shape) - 1,
+                                    param_attr=fluid.ParamAttr(name=name + '_layer_norm_scale',
+                                                               initializer=fluid.initializer.Constant(1.)),
+                                    bias_attr=fluid.ParamAttr(name=name + '_layer_norm_bias',
+                                                              initializer=fluid.initializer.Constant(0.)))
            if out_dtype == fluid.core.VarDesc.VarType.FP16:
                out = layers.cast(x=out, dtype="float16")
        elif cmd == "d":  # add dropout
            if dropout_rate:
-                out = layers.dropout(
-                    out, dropout_prob=dropout_rate, dropout_implementation="upscale_in_train", is_test=False)
+                out = layers.dropout(out,
+                                     dropout_prob=dropout_rate,
+                                     dropout_implementation="upscale_in_train",
+                                     is_test=False)
    return out


@@ -220,28 +221,35 @@ def encoder_layer(enc_input,
    with the post_process_layer to add residual connection, layer normalization
    and droput.
    """
-    attn_output = multi_head_attention(
-        pre_process_layer(enc_input, preprocess_cmd, prepostprocess_dropout, name=name + '_pre_att'),
-        None,
-        None,
-        attn_bias,
-        d_key,
-        d_value,
-        d_model,
-        n_head,
-        attention_dropout,
-        param_initializer=param_initializer,
-        name=name + '_multi_head_att')
-    attn_output = post_process_layer(
-        enc_input, attn_output, postprocess_cmd, prepostprocess_dropout, name=name + '_post_att')
-    ffd_output = positionwise_feed_forward(
-        pre_process_layer(attn_output, preprocess_cmd, prepostprocess_dropout, name=name + '_pre_ffn'),
-        d_inner_hid,
-        d_model,
-        relu_dropout,
-        hidden_act,
-        param_initializer=param_initializer,
-        name=name + '_ffn')
+    attn_output = multi_head_attention(pre_process_layer(enc_input,
+                                                         preprocess_cmd,
+                                                         prepostprocess_dropout,
+                                                         name=name + '_pre_att'),
+                                       None,
+                                       None,
+                                       attn_bias,
+                                       d_key,
+                                       d_value,
+                                       d_model,
+                                       n_head,
+                                       attention_dropout,
+                                       param_initializer=param_initializer,
+                                       name=name + '_multi_head_att')
+    attn_output = post_process_layer(enc_input,
+                                     attn_output,
+                                     postprocess_cmd,
+                                     prepostprocess_dropout,
+                                     name=name + '_post_att')
+    ffd_output = positionwise_feed_forward(pre_process_layer(attn_output,
+                                                             preprocess_cmd,
+                                                             prepostprocess_dropout,
+                                                             name=name + '_pre_ffn'),
+                                           d_inner_hid,
+                                           d_model,
+                                           relu_dropout,
+                                           hidden_act,
+                                           param_initializer=param_initializer,
+                                           name=name + '_ffn')
    return post_process_layer(attn_output, ffd_output, postprocess_cmd, prepostprocess_dropout, name=name + '_post_ffn')


@@ -266,22 +274,21 @@ def encoder(enc_input,
    encoder_layer.
    """
    for i in range(n_layer):
-        enc_output = encoder_layer(
-            enc_input,
-            attn_bias,
-            n_head,
-            d_key,
-            d_value,
-            d_model,
-            d_inner_hid,
-            prepostprocess_dropout,
-            attention_dropout,
-            relu_dropout,
-            hidden_act,
-            preprocess_cmd,
-            postprocess_cmd,
-            param_initializer=param_initializer,
-            name=name + '_layer_' + str(i))
+        enc_output = encoder_layer(enc_input,
+                                   attn_bias,
+                                   n_head,
+                                   d_key,
+                                   d_value,
+                                   d_model,
+                                   d_inner_hid,
+                                   prepostprocess_dropout,
+                                   attention_dropout,
+                                   relu_dropout,
+                                   hidden_act,
+                                   preprocess_cmd,
+                                   postprocess_cmd,
+                                   param_initializer=param_initializer,
+                                   name=name + '_layer_' + str(i))
        enc_input = enc_output
    enc_output = pre_process_layer(enc_output, preprocess_cmd, prepostprocess_dropout, name="post_encoder")


--- a/modules/text/semantic_model/bert_chinese_L_12_H_768_A_12/module.py
+++ b/modules/text/semantic_model/bert_chinese_L_12_H_768_A_12/module.py
@@ -58,13 +58,12 @@ class BertChinese(TransformerModule):
            pooled_output (tensor):  sentence-level output for classification task.
            sequence_output (tensor): token-level output for sequence task.
        """
-        bert = BertModel(
-            src_ids=input_ids,
-            position_ids=position_ids,
-            sentence_ids=segment_ids,
-            input_mask=input_mask,
-            config=self.bert_config,
-            use_fp16=False)
+        bert = BertModel(src_ids=input_ids,
+                         position_ids=position_ids,
+                         sentence_ids=segment_ids,
+                         input_mask=input_mask,
+                         config=self.bert_config,
+                         use_fp16=False)
        pooled_output = bert.get_pooled_output()
        sequence_output = bert.get_sequence_output()
        return pooled_output, sequence_output

--- a/modules/text/semantic_model/bert_multi_cased_L_12_H_768_A_12/README.md
+++ b/modules/text/semantic_model/bert_multi_cased_L_12_H_768_A_12/README.md
--- a/modules/text/semantic_model/bert_multi_cased_L_12_H_768_A_12/__init__.py
+++ b/modules/text/semantic_model/bert_multi_cased_L_12_H_768_A_12/__init__.py
--- a/modules/text/semantic_model/bert_multi_cased_L_12_H_768_A_12/model/__init__.py
+++ b/modules/text/semantic_model/bert_multi_cased_L_12_H_768_A_12/model/__init__.py
--- a/modules/text/semantic_model/bert_multi_cased_L_12_H_768_A_12/model/bert.py
+++ b/modules/text/semantic_model/bert_multi_cased_L_12_H_768_A_12/model/bert.py
@@ -74,23 +74,23 @@ class BertModel(object):

    def _build_model(self, src_ids, position_ids, sentence_ids, input_mask):
        # padding id in vocabulary must be set to 0
-        emb_out = fluid.layers.embedding(
-            input=src_ids,
-            size=[self._voc_size, self._emb_size],
-            dtype=self._dtype,
-            param_attr=fluid.ParamAttr(name=self._word_emb_name, initializer=self._param_initializer),
-            is_sparse=False)
-        position_emb_out = fluid.layers.embedding(
-            input=position_ids,
-            size=[self._max_position_seq_len, self._emb_size],
-            dtype=self._dtype,
-            param_attr=fluid.ParamAttr(name=self._pos_emb_name, initializer=self._param_initializer))
-
-        sent_emb_out = fluid.layers.embedding(
-            sentence_ids,
-            size=[self._sent_types, self._emb_size],
-            dtype=self._dtype,
-            param_attr=fluid.ParamAttr(name=self._sent_emb_name, initializer=self._param_initializer))
+        emb_out = fluid.layers.embedding(input=src_ids,
+                                         size=[self._voc_size, self._emb_size],
+                                         dtype=self._dtype,
+                                         param_attr=fluid.ParamAttr(name=self._word_emb_name,
+                                                                    initializer=self._param_initializer),
+                                         is_sparse=False)
+        position_emb_out = fluid.layers.embedding(input=position_ids,
+                                                  size=[self._max_position_seq_len, self._emb_size],
+                                                  dtype=self._dtype,
+                                                  param_attr=fluid.ParamAttr(name=self._pos_emb_name,
+                                                                             initializer=self._param_initializer))
+
+        sent_emb_out = fluid.layers.embedding(sentence_ids,
+                                              size=[self._sent_types, self._emb_size],
+                                              dtype=self._dtype,
+                                              param_attr=fluid.ParamAttr(name=self._sent_emb_name,
+                                                                         initializer=self._param_initializer))

        emb_out = emb_out + position_emb_out
        emb_out = emb_out + sent_emb_out
@@ -105,23 +105,22 @@ class BertModel(object):
        n_head_self_attn_mask = fluid.layers.stack(x=[self_attn_mask] * self._n_head, axis=1)
        n_head_self_attn_mask.stop_gradient = True

-        self._enc_out = encoder(
-            enc_input=emb_out,
-            attn_bias=n_head_self_attn_mask,
-            n_layer=self._n_layer,
-            n_head=self._n_head,
-            d_key=self._emb_size // self._n_head,
-            d_value=self._emb_size // self._n_head,
-            d_model=self._emb_size,
-            d_inner_hid=self._emb_size * 4,
-            prepostprocess_dropout=self._prepostprocess_dropout,
-            attention_dropout=self._attention_dropout,
-            relu_dropout=0,
-            hidden_act=self._hidden_act,
-            preprocess_cmd="",
-            postprocess_cmd="dan",
-            param_initializer=self._param_initializer,
-            name='encoder')
+        self._enc_out = encoder(enc_input=emb_out,
+                                attn_bias=n_head_self_attn_mask,
+                                n_layer=self._n_layer,
+                                n_head=self._n_head,
+                                d_key=self._emb_size // self._n_head,
+                                d_value=self._emb_size // self._n_head,
+                                d_model=self._emb_size,
+                                d_inner_hid=self._emb_size * 4,
+                                prepostprocess_dropout=self._prepostprocess_dropout,
+                                attention_dropout=self._attention_dropout,
+                                relu_dropout=0,
+                                hidden_act=self._hidden_act,
+                                preprocess_cmd="",
+                                postprocess_cmd="dan",
+                                param_initializer=self._param_initializer,
+                                name='encoder')

    def get_sequence_output(self):
        return self._enc_out
@@ -130,12 +129,12 @@ class BertModel(object):
        """Get the first feature of each sequence for classification"""

        next_sent_feat = fluid.layers.slice(input=self._enc_out, axes=[1], starts=[0], ends=[1])
-        next_sent_feat = fluid.layers.fc(
-            input=next_sent_feat,
-            size=self._emb_size,
-            act="tanh",
-            param_attr=fluid.ParamAttr(name="pooled_fc.w_0", initializer=self._param_initializer),
-            bias_attr="pooled_fc.b_0")
+        next_sent_feat = fluid.layers.fc(input=next_sent_feat,
+                                         size=self._emb_size,
+                                         act="tanh",
+                                         param_attr=fluid.ParamAttr(name="pooled_fc.w_0",
+                                                                    initializer=self._param_initializer),
+                                         bias_attr="pooled_fc.b_0")
        return next_sent_feat

    def get_pretraining_output(self, mask_label, mask_pos, labels):
@@ -150,43 +149,45 @@ class BertModel(object):
        mask_feat = fluid.layers.gather(input=reshaped_emb_out, index=mask_pos)

        # transform: fc
-        mask_trans_feat = fluid.layers.fc(
-            input=mask_feat,
-            size=self._emb_size,
-            act=self._hidden_act,
-            param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0', initializer=self._param_initializer),
-            bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
+        mask_trans_feat = fluid.layers.fc(input=mask_feat,
+                                          size=self._emb_size,
+                                          act=self._hidden_act,
+                                          param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0',
+                                                                     initializer=self._param_initializer),
+                                          bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
        # transform: layer norm
        mask_trans_feat = pre_process_layer(mask_trans_feat, 'n', name='mask_lm_trans')

-        mask_lm_out_bias_attr = fluid.ParamAttr(
-            name="mask_lm_out_fc.b_0", initializer=fluid.initializer.Constant(value=0.0))
+        mask_lm_out_bias_attr = fluid.ParamAttr(name="mask_lm_out_fc.b_0",
+                                                initializer=fluid.initializer.Constant(value=0.0))
        if self._weight_sharing:
-            fc_out = fluid.layers.matmul(
-                x=mask_trans_feat,
-                y=fluid.default_main_program().global_block().var(self._word_emb_name),
-                transpose_y=True)
-            fc_out += fluid.layers.create_parameter(
-                shape=[self._voc_size], dtype=self._dtype, attr=mask_lm_out_bias_attr, is_bias=True)
+            fc_out = fluid.layers.matmul(x=mask_trans_feat,
+                                         y=fluid.default_main_program().global_block().var(self._word_emb_name),
+                                         transpose_y=True)
+            fc_out += fluid.layers.create_parameter(shape=[self._voc_size],
+                                                    dtype=self._dtype,
+                                                    attr=mask_lm_out_bias_attr,
+                                                    is_bias=True)

        else:
-            fc_out = fluid.layers.fc(
-                input=mask_trans_feat,
-                size=self._voc_size,
-                param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0", initializer=self._param_initializer),
-                bias_attr=mask_lm_out_bias_attr)
+            fc_out = fluid.layers.fc(input=mask_trans_feat,
+                                     size=self._voc_size,
+                                     param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0",
+                                                                initializer=self._param_initializer),
+                                     bias_attr=mask_lm_out_bias_attr)

        mask_lm_loss = fluid.layers.softmax_with_cross_entropy(logits=fc_out, label=mask_label)
        mean_mask_lm_loss = fluid.layers.mean(mask_lm_loss)

-        next_sent_fc_out = fluid.layers.fc(
-            input=next_sent_feat,
-            size=2,
-            param_attr=fluid.ParamAttr(name="next_sent_fc.w_0", initializer=self._param_initializer),
-            bias_attr="next_sent_fc.b_0")
+        next_sent_fc_out = fluid.layers.fc(input=next_sent_feat,
+                                           size=2,
+                                           param_attr=fluid.ParamAttr(name="next_sent_fc.w_0",
+                                                                      initializer=self._param_initializer),
+                                           bias_attr="next_sent_fc.b_0")

-        next_sent_loss, next_sent_softmax = fluid.layers.softmax_with_cross_entropy(
-            logits=next_sent_fc_out, label=labels, return_softmax=True)
+        next_sent_loss, next_sent_softmax = fluid.layers.softmax_with_cross_entropy(logits=next_sent_fc_out,
+                                                                                    label=labels,
+                                                                                    return_softmax=True)

        next_sent_acc = fluid.layers.accuracy(input=next_sent_softmax, label=labels)


--- a/modules/text/semantic_model/bert_cased_L_12_H_768_A_12/model/transformer_encoder.py
+++ b/modules/text/semantic_model/bert_cased_L_12_H_768_A_12/model/transformer_encoder.py
@@ -50,24 +50,21 @@ def multi_head_attention(queries,
        """
        Add linear projection to queries, keys, and values.
        """
-        q = layers.fc(
-            input=queries,
-            size=d_key * n_head,
-            num_flatten_dims=2,
-            param_attr=fluid.ParamAttr(name=name + '_query_fc.w_0', initializer=param_initializer),
-            bias_attr=name + '_query_fc.b_0')
-        k = layers.fc(
-            input=keys,
-            size=d_key * n_head,
-            num_flatten_dims=2,
-            param_attr=fluid.ParamAttr(name=name + '_key_fc.w_0', initializer=param_initializer),
-            bias_attr=name + '_key_fc.b_0')
-        v = layers.fc(
-            input=values,
-            size=d_value * n_head,
-            num_flatten_dims=2,
-            param_attr=fluid.ParamAttr(name=name + '_value_fc.w_0', initializer=param_initializer),
-            bias_attr=name + '_value_fc.b_0')
+        q = layers.fc(input=queries,
+                      size=d_key * n_head,
+                      num_flatten_dims=2,
+                      param_attr=fluid.ParamAttr(name=name + '_query_fc.w_0', initializer=param_initializer),
+                      bias_attr=name + '_query_fc.b_0')
+        k = layers.fc(input=keys,
+                      size=d_key * n_head,
+                      num_flatten_dims=2,
+                      param_attr=fluid.ParamAttr(name=name + '_key_fc.w_0', initializer=param_initializer),
+                      bias_attr=name + '_key_fc.b_0')
+        v = layers.fc(input=values,
+                      size=d_value * n_head,
+                      num_flatten_dims=2,
+                      param_attr=fluid.ParamAttr(name=name + '_value_fc.w_0', initializer=param_initializer),
+                      bias_attr=name + '_value_fc.b_0')
        return q, k, v

    def __split_heads(x, n_head):
@@ -110,8 +107,10 @@ def multi_head_attention(queries,
            product += attn_bias
        weights = layers.softmax(product)
        if dropout_rate:
-            weights = layers.dropout(
-                weights, dropout_prob=dropout_rate, dropout_implementation="upscale_in_train", is_test=False)
+            weights = layers.dropout(weights,
+                                     dropout_prob=dropout_rate,
+                                     dropout_implementation="upscale_in_train",
+                                     is_test=False)
        out = layers.matmul(weights, v)
        return out

@@ -133,12 +132,11 @@ def multi_head_attention(queries,
    out = __combine_heads(ctx_multiheads)

    # Project back to the model size.
-    proj_out = layers.fc(
-        input=out,
-        size=d_model,
-        num_flatten_dims=2,
-        param_attr=fluid.ParamAttr(name=name + '_output_fc.w_0', initializer=param_initializer),
-        bias_attr=name + '_output_fc.b_0')
+    proj_out = layers.fc(input=out,
+                         size=d_model,
+                         num_flatten_dims=2,
+                         param_attr=fluid.ParamAttr(name=name + '_output_fc.w_0', initializer=param_initializer),
+                         bias_attr=name + '_output_fc.b_0')
    return proj_out


@@ -148,22 +146,22 @@ def positionwise_feed_forward(x, d_inner_hid, d_hid, dropout_rate, hidden_act, p
    This module consists of two linear transformations with a ReLU activation
    in between, which is applied to each position separately and identically.
    """
-    hidden = layers.fc(
-        input=x,
-        size=d_inner_hid,
-        num_flatten_dims=2,
-        act=hidden_act,
-        param_attr=fluid.ParamAttr(name=name + '_fc_0.w_0', initializer=param_initializer),
-        bias_attr=name + '_fc_0.b_0')
+    hidden = layers.fc(input=x,
+                       size=d_inner_hid,
+                       num_flatten_dims=2,
+                       act=hidden_act,
+                       param_attr=fluid.ParamAttr(name=name + '_fc_0.w_0', initializer=param_initializer),
+                       bias_attr=name + '_fc_0.b_0')
    if dropout_rate:
-        hidden = layers.dropout(
-            hidden, dropout_prob=dropout_rate, dropout_implementation="upscale_in_train", is_test=False)
-    out = layers.fc(
-        input=hidden,
-        size=d_hid,
-        num_flatten_dims=2,
-        param_attr=fluid.ParamAttr(name=name + '_fc_1.w_0', initializer=param_initializer),
-        bias_attr=name + '_fc_1.b_0')
+        hidden = layers.dropout(hidden,
+                                dropout_prob=dropout_rate,
+                                dropout_implementation="upscale_in_train",
+                                is_test=False)
+    out = layers.fc(input=hidden,
+                    size=d_hid,
+                    num_flatten_dims=2,
+                    param_attr=fluid.ParamAttr(name=name + '_fc_1.w_0', initializer=param_initializer),
+                    bias_attr=name + '_fc_1.b_0')
    return out


@@ -181,17 +179,20 @@ def pre_post_process_layer(prev_out, out, process_cmd, dropout_rate=0., name='')
            out_dtype = out.dtype
            if out_dtype == fluid.core.VarDesc.VarType.FP16:
                out = layers.cast(x=out, dtype="float32")
-            out = layers.layer_norm(
-                out,
-                begin_norm_axis=len(out.shape) - 1,
-                param_attr=fluid.ParamAttr(name=name + '_layer_norm_scale', initializer=fluid.initializer.Constant(1.)),
-                bias_attr=fluid.ParamAttr(name=name + '_layer_norm_bias', initializer=fluid.initializer.Constant(0.)))
+            out = layers.layer_norm(out,
+                                    begin_norm_axis=len(out.shape) - 1,
+                                    param_attr=fluid.ParamAttr(name=name + '_layer_norm_scale',
+                                                               initializer=fluid.initializer.Constant(1.)),
+                                    bias_attr=fluid.ParamAttr(name=name + '_layer_norm_bias',
+                                                              initializer=fluid.initializer.Constant(0.)))
            if out_dtype == fluid.core.VarDesc.VarType.FP16:
                out = layers.cast(x=out, dtype="float16")
        elif cmd == "d":  # add dropout
            if dropout_rate:
-                out = layers.dropout(
-                    out, dropout_prob=dropout_rate, dropout_implementation="upscale_in_train", is_test=False)
+                out = layers.dropout(out,
+                                     dropout_prob=dropout_rate,
+                                     dropout_implementation="upscale_in_train",
+                                     is_test=False)
    return out


@@ -220,28 +221,35 @@ def encoder_layer(enc_input,
    with the post_process_layer to add residual connection, layer normalization
    and droput.
    """
-    attn_output = multi_head_attention(
-        pre_process_layer(enc_input, preprocess_cmd, prepostprocess_dropout, name=name + '_pre_att'),
-        None,
-        None,
-        attn_bias,
-        d_key,
-        d_value,
-        d_model,
-        n_head,
-        attention_dropout,
-        param_initializer=param_initializer,
-        name=name + '_multi_head_att')
-    attn_output = post_process_layer(
-        enc_input, attn_output, postprocess_cmd, prepostprocess_dropout, name=name + '_post_att')
-    ffd_output = positionwise_feed_forward(
-        pre_process_layer(attn_output, preprocess_cmd, prepostprocess_dropout, name=name + '_pre_ffn'),
-        d_inner_hid,
-        d_model,
-        relu_dropout,
-        hidden_act,
-        param_initializer=param_initializer,
-        name=name + '_ffn')
+    attn_output = multi_head_attention(pre_process_layer(enc_input,
+                                                         preprocess_cmd,
+                                                         prepostprocess_dropout,
+                                                         name=name + '_pre_att'),
+                                       None,
+                                       None,
+                                       attn_bias,
+                                       d_key,
+                                       d_value,
+                                       d_model,
+                                       n_head,
+                                       attention_dropout,
+                                       param_initializer=param_initializer,
+                                       name=name + '_multi_head_att')
+    attn_output = post_process_layer(enc_input,
+                                     attn_output,
+                                     postprocess_cmd,
+                                     prepostprocess_dropout,
+                                     name=name + '_post_att')
+    ffd_output = positionwise_feed_forward(pre_process_layer(attn_output,
+                                                             preprocess_cmd,
+                                                             prepostprocess_dropout,
+                                                             name=name + '_pre_ffn'),
+                                           d_inner_hid,
+                                           d_model,
+                                           relu_dropout,
+                                           hidden_act,
+                                           param_initializer=param_initializer,
+                                           name=name + '_ffn')
    return post_process_layer(attn_output, ffd_output, postprocess_cmd, prepostprocess_dropout, name=name + '_post_ffn')


@@ -266,22 +274,21 @@ def encoder(enc_input,
    encoder_layer.
    """
    for i in range(n_layer):
-        enc_output = encoder_layer(
-            enc_input,
-            attn_bias,
-            n_head,
-            d_key,
-            d_value,
-            d_model,
-            d_inner_hid,
-            prepostprocess_dropout,
-            attention_dropout,
-            relu_dropout,
-            hidden_act,
-            preprocess_cmd,
-            postprocess_cmd,
-            param_initializer=param_initializer,
-            name=name + '_layer_' + str(i))
+        enc_output = encoder_layer(enc_input,
+                                   attn_bias,
+                                   n_head,
+                                   d_key,
+                                   d_value,
+                                   d_model,
+                                   d_inner_hid,
+                                   prepostprocess_dropout,
+                                   attention_dropout,
+                                   relu_dropout,
+                                   hidden_act,
+                                   preprocess_cmd,
+                                   postprocess_cmd,
+                                   param_initializer=param_initializer,
+                                   name=name + '_layer_' + str(i))
        enc_input = enc_output
    enc_output = pre_process_layer(enc_output, preprocess_cmd, prepostprocess_dropout, name="post_encoder")


--- a/modules/text/semantic_model/bert_multi_cased_L_12_H_768_A_12/module.py
+++ b/modules/text/semantic_model/bert_multi_cased_L_12_H_768_A_12/module.py
@@ -58,13 +58,12 @@ class Bert(TransformerModule):
            pooled_output (tensor):  sentence-level output for classification task.
            sequence_output (tensor): token-level output for sequence task.
        """
-        bert = BertModel(
-            src_ids=input_ids,
-            position_ids=position_ids,
-            sentence_ids=segment_ids,
-            input_mask=input_mask,
-            config=self.bert_config,
-            use_fp16=False)
+        bert = BertModel(src_ids=input_ids,
+                         position_ids=position_ids,
+                         sentence_ids=segment_ids,
+                         input_mask=input_mask,
+                         config=self.bert_config,
+                         use_fp16=False)
        pooled_output = bert.get_pooled_output()
        sequence_output = bert.get_sequence_output()
        return pooled_output, sequence_output

--- a/modules/text/semantic_model/bert_multi_uncased_L_12_H_768_A_12/README.md
+++ b/modules/text/semantic_model/bert_multi_uncased_L_12_H_768_A_12/README.md
--- a/modules/text/semantic_model/bert_multi_uncased_L_12_H_768_A_12/__init__.py
+++ b/modules/text/semantic_model/bert_multi_uncased_L_12_H_768_A_12/__init__.py
--- a/modules/text/semantic_model/bert_multi_uncased_L_12_H_768_A_12/model/__init__.py
+++ b/modules/text/semantic_model/bert_multi_uncased_L_12_H_768_A_12/model/__init__.py
--- a/modules/text/semantic_model/bert_multi_uncased_L_12_H_768_A_12/model/bert.py
+++ b/modules/text/semantic_model/bert_multi_uncased_L_12_H_768_A_12/model/bert.py
@@ -74,23 +74,23 @@ class BertModel(object):

    def _build_model(self, src_ids, position_ids, sentence_ids, input_mask):
        # padding id in vocabulary must be set to 0
-        emb_out = fluid.layers.embedding(
-            input=src_ids,
-            size=[self._voc_size, self._emb_size],
-            dtype=self._dtype,
-            param_attr=fluid.ParamAttr(name=self._word_emb_name, initializer=self._param_initializer),
-            is_sparse=False)
-        position_emb_out = fluid.layers.embedding(
-            input=position_ids,
-            size=[self._max_position_seq_len, self._emb_size],
-            dtype=self._dtype,
-            param_attr=fluid.ParamAttr(name=self._pos_emb_name, initializer=self._param_initializer))
-
-        sent_emb_out = fluid.layers.embedding(
-            sentence_ids,
-            size=[self._sent_types, self._emb_size],
-            dtype=self._dtype,
-            param_attr=fluid.ParamAttr(name=self._sent_emb_name, initializer=self._param_initializer))
+        emb_out = fluid.layers.embedding(input=src_ids,
+                                         size=[self._voc_size, self._emb_size],
+                                         dtype=self._dtype,
+                                         param_attr=fluid.ParamAttr(name=self._word_emb_name,
+                                                                    initializer=self._param_initializer),
+                                         is_sparse=False)
+        position_emb_out = fluid.layers.embedding(input=position_ids,
+                                                  size=[self._max_position_seq_len, self._emb_size],
+                                                  dtype=self._dtype,
+                                                  param_attr=fluid.ParamAttr(name=self._pos_emb_name,
+                                                                             initializer=self._param_initializer))
+
+        sent_emb_out = fluid.layers.embedding(sentence_ids,
+                                              size=[self._sent_types, self._emb_size],
+                                              dtype=self._dtype,
+                                              param_attr=fluid.ParamAttr(name=self._sent_emb_name,
+                                                                         initializer=self._param_initializer))

        emb_out = emb_out + position_emb_out
        emb_out = emb_out + sent_emb_out
@@ -105,23 +105,22 @@ class BertModel(object):
        n_head_self_attn_mask = fluid.layers.stack(x=[self_attn_mask] * self._n_head, axis=1)
        n_head_self_attn_mask.stop_gradient = True

-        self._enc_out = encoder(
-            enc_input=emb_out,
-            attn_bias=n_head_self_attn_mask,
-            n_layer=self._n_layer,
-            n_head=self._n_head,
-            d_key=self._emb_size // self._n_head,
-            d_value=self._emb_size // self._n_head,
-            d_model=self._emb_size,
-            d_inner_hid=self._emb_size * 4,
-            prepostprocess_dropout=self._prepostprocess_dropout,
-            attention_dropout=self._attention_dropout,
-            relu_dropout=0,
-            hidden_act=self._hidden_act,
-            preprocess_cmd="",
-            postprocess_cmd="dan",
-            param_initializer=self._param_initializer,
-            name='encoder')
+        self._enc_out = encoder(enc_input=emb_out,
+                                attn_bias=n_head_self_attn_mask,
+                                n_layer=self._n_layer,
+                                n_head=self._n_head,
+                                d_key=self._emb_size // self._n_head,
+                                d_value=self._emb_size // self._n_head,
+                                d_model=self._emb_size,
+                                d_inner_hid=self._emb_size * 4,
+                                prepostprocess_dropout=self._prepostprocess_dropout,
+                                attention_dropout=self._attention_dropout,
+                                relu_dropout=0,
+                                hidden_act=self._hidden_act,
+                                preprocess_cmd="",
+                                postprocess_cmd="dan",
+                                param_initializer=self._param_initializer,
+                                name='encoder')

    def get_sequence_output(self):
        return self._enc_out
@@ -130,12 +129,12 @@ class BertModel(object):
        """Get the first feature of each sequence for classification"""

        next_sent_feat = fluid.layers.slice(input=self._enc_out, axes=[1], starts=[0], ends=[1])
-        next_sent_feat = fluid.layers.fc(
-            input=next_sent_feat,
-            size=self._emb_size,
-            act="tanh",
-            param_attr=fluid.ParamAttr(name="pooled_fc.w_0", initializer=self._param_initializer),
-            bias_attr="pooled_fc.b_0")
+        next_sent_feat = fluid.layers.fc(input=next_sent_feat,
+                                         size=self._emb_size,
+                                         act="tanh",
+                                         param_attr=fluid.ParamAttr(name="pooled_fc.w_0",
+                                                                    initializer=self._param_initializer),
+                                         bias_attr="pooled_fc.b_0")
        return next_sent_feat

    def get_pretraining_output(self, mask_label, mask_pos, labels):
@@ -150,43 +149,45 @@ class BertModel(object):
        mask_feat = fluid.layers.gather(input=reshaped_emb_out, index=mask_pos)

        # transform: fc
-        mask_trans_feat = fluid.layers.fc(
-            input=mask_feat,
-            size=self._emb_size,
-            act=self._hidden_act,
-            param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0', initializer=self._param_initializer),
-            bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
+        mask_trans_feat = fluid.layers.fc(input=mask_feat,
+                                          size=self._emb_size,
+                                          act=self._hidden_act,
+                                          param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0',
+                                                                     initializer=self._param_initializer),
+                                          bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
        # transform: layer norm
        mask_trans_feat = pre_process_layer(mask_trans_feat, 'n', name='mask_lm_trans')

-        mask_lm_out_bias_attr = fluid.ParamAttr(
-            name="mask_lm_out_fc.b_0", initializer=fluid.initializer.Constant(value=0.0))
+        mask_lm_out_bias_attr = fluid.ParamAttr(name="mask_lm_out_fc.b_0",
+                                                initializer=fluid.initializer.Constant(value=0.0))
        if self._weight_sharing:
-            fc_out = fluid.layers.matmul(
-                x=mask_trans_feat,
-                y=fluid.default_main_program().global_block().var(self._word_emb_name),
-                transpose_y=True)
-            fc_out += fluid.layers.create_parameter(
-                shape=[self._voc_size], dtype=self._dtype, attr=mask_lm_out_bias_attr, is_bias=True)
+            fc_out = fluid.layers.matmul(x=mask_trans_feat,
+                                         y=fluid.default_main_program().global_block().var(self._word_emb_name),
+                                         transpose_y=True)
+            fc_out += fluid.layers.create_parameter(shape=[self._voc_size],
+                                                    dtype=self._dtype,
+                                                    attr=mask_lm_out_bias_attr,
+                                                    is_bias=True)

        else:
-            fc_out = fluid.layers.fc(
-                input=mask_trans_feat,
-                size=self._voc_size,
-                param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0", initializer=self._param_initializer),
-                bias_attr=mask_lm_out_bias_attr)
+            fc_out = fluid.layers.fc(input=mask_trans_feat,
+                                     size=self._voc_size,
+                                     param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0",
+                                                                initializer=self._param_initializer),
+                                     bias_attr=mask_lm_out_bias_attr)

        mask_lm_loss = fluid.layers.softmax_with_cross_entropy(logits=fc_out, label=mask_label)
        mean_mask_lm_loss = fluid.layers.mean(mask_lm_loss)

-        next_sent_fc_out = fluid.layers.fc(
-            input=next_sent_feat,
-            size=2,
-            param_attr=fluid.ParamAttr(name="next_sent_fc.w_0", initializer=self._param_initializer),
-            bias_attr="next_sent_fc.b_0")
+        next_sent_fc_out = fluid.layers.fc(input=next_sent_feat,
+                                           size=2,
+                                           param_attr=fluid.ParamAttr(name="next_sent_fc.w_0",
+                                                                      initializer=self._param_initializer),
+                                           bias_attr="next_sent_fc.b_0")

-        next_sent_loss, next_sent_softmax = fluid.layers.softmax_with_cross_entropy(
-            logits=next_sent_fc_out, label=labels, return_softmax=True)
+        next_sent_loss, next_sent_softmax = fluid.layers.softmax_with_cross_entropy(logits=next_sent_fc_out,
+                                                                                    label=labels,
+                                                                                    return_softmax=True)

        next_sent_acc = fluid.layers.accuracy(input=next_sent_softmax, label=labels)


--- a/modules/text/language_model/bert_multi_uncased_L_12_H_768_A_12/model/transformer_encoder.py
+++ b/modules/text/language_model/bert_multi_uncased_L_12_H_768_A_12/model/transformer_encoder.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Transformer encoder."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from functools import partial
+
+import paddle.fluid as fluid
+import paddle.fluid.layers as layers
+
+
+def multi_head_attention(queries,
+                         keys,
+                         values,
+                         attn_bias,
+                         d_key,
+                         d_value,
+                         d_model,
+                         n_head=1,
+                         dropout_rate=0.,
+                         cache=None,
+                         param_initializer=None,
+                         name='multi_head_att'):
+    """
+    Multi-Head Attention. Note that attn_bias is added to the logit before
+    computing softmax activiation to mask certain selected positions so that
+    they will not considered in attention weights.
+    """
+    keys = queries if keys is None else keys
+    values = keys if values is None else values
+
+    if not (len(queries.shape) == len(keys.shape) == len(values.shape) == 3):
+        raise ValueError("Inputs: quries, keys and values should all be 3-D tensors.")
+
+    def __compute_qkv(queries, keys, values, n_head, d_key, d_value):
+        """
+        Add linear projection to queries, keys, and values.
+        """
+        q = layers.fc(input=queries,
+                      size=d_key * n_head,
+                      num_flatten_dims=2,
+                      param_attr=fluid.ParamAttr(name=name + '_query_fc.w_0', initializer=param_initializer),
+                      bias_attr=name + '_query_fc.b_0')
+        k = layers.fc(input=keys,
+                      size=d_key * n_head,
+                      num_flatten_dims=2,
+                      param_attr=fluid.ParamAttr(name=name + '_key_fc.w_0', initializer=param_initializer),
+                      bias_attr=name + '_key_fc.b_0')
+        v = layers.fc(input=values,
+                      size=d_value * n_head,
+                      num_flatten_dims=2,
+                      param_attr=fluid.ParamAttr(name=name + '_value_fc.w_0', initializer=param_initializer),
+                      bias_attr=name + '_value_fc.b_0')
+        return q, k, v
+
+    def __split_heads(x, n_head):
+        """
+        Reshape the last dimension of inpunt tensor x so that it becomes two
+        dimensions and then transpose. Specifically, input a tensor with shape
+        [bs, max_sequence_length, n_head * hidden_dim] then output a tensor
+        with shape [bs, n_head, max_sequence_length, hidden_dim].
+        """
+        hidden_size = x.shape[-1]
+        # The value 0 in shape attr means copying the corresponding dimension
+        # size of the input as the output dimension size.
+        reshaped = layers.reshape(x=x, shape=[0, 0, n_head, hidden_size // n_head], inplace=True)
+
+        # permuate the dimensions into:
+        # [batch_size, n_head, max_sequence_len, hidden_size_per_head]
+        return layers.transpose(x=reshaped, perm=[0, 2, 1, 3])
+
+    def __combine_heads(x):
+        """
+        Transpose and then reshape the last two dimensions of inpunt tensor x
+        so that it becomes one dimension, which is reverse to __split_heads.
+        """
+        if len(x.shape) == 3: return x
+        if len(x.shape) != 4:
+            raise ValueError("Input(x) should be a 4-D Tensor.")
+
+        trans_x = layers.transpose(x, perm=[0, 2, 1, 3])
+        # The value 0 in shape attr means copying the corresponding dimension
+        # size of the input as the output dimension size.
+        return layers.reshape(x=trans_x, shape=[0, 0, trans_x.shape[2] * trans_x.shape[3]], inplace=True)
+
+    def scaled_dot_product_attention(q, k, v, attn_bias, d_key, dropout_rate):
+        """
+        Scaled Dot-Product Attention
+        """
+        scaled_q = layers.scale(x=q, scale=d_key**-0.5)
+        product = layers.matmul(x=scaled_q, y=k, transpose_y=True)
+        if attn_bias:
+            product += attn_bias
+        weights = layers.softmax(product)
+        if dropout_rate:
+            weights = layers.dropout(weights,
+                                     dropout_prob=dropout_rate,
+                                     dropout_implementation="upscale_in_train",
+                                     is_test=False)
+        out = layers.matmul(weights, v)
+        return out
+
+    q, k, v = __compute_qkv(queries, keys, values, n_head, d_key, d_value)
+
+    if cache is not None:  # use cache and concat time steps
+        # Since the inplace reshape in __split_heads changes the shape of k and
+        # v, which is the cache input for next time step, reshape the cache
+        # input from the previous time step first.
+        k = cache["k"] = layers.concat([layers.reshape(cache["k"], shape=[0, 0, d_model]), k], axis=1)
+        v = cache["v"] = layers.concat([layers.reshape(cache["v"], shape=[0, 0, d_model]), v], axis=1)
+
+    q = __split_heads(q, n_head)
+    k = __split_heads(k, n_head)
+    v = __split_heads(v, n_head)
+
+    ctx_multiheads = scaled_dot_product_attention(q, k, v, attn_bias, d_key, dropout_rate)
+
+    out = __combine_heads(ctx_multiheads)
+
+    # Project back to the model size.
+    proj_out = layers.fc(input=out,
+                         size=d_model,
+                         num_flatten_dims=2,
+                         param_attr=fluid.ParamAttr(name=name + '_output_fc.w_0', initializer=param_initializer),
+                         bias_attr=name + '_output_fc.b_0')
+    return proj_out
+
+
+def positionwise_feed_forward(x, d_inner_hid, d_hid, dropout_rate, hidden_act, param_initializer=None, name='ffn'):
+    """
+    Position-wise Feed-Forward Networks.
+    This module consists of two linear transformations with a ReLU activation
+    in between, which is applied to each position separately and identically.
+    """
+    hidden = layers.fc(input=x,
+                       size=d_inner_hid,
+                       num_flatten_dims=2,
+                       act=hidden_act,
+                       param_attr=fluid.ParamAttr(name=name + '_fc_0.w_0', initializer=param_initializer),
+                       bias_attr=name + '_fc_0.b_0')
+    if dropout_rate:
+        hidden = layers.dropout(hidden,
+                                dropout_prob=dropout_rate,
+                                dropout_implementation="upscale_in_train",
+                                is_test=False)
+    out = layers.fc(input=hidden,
+                    size=d_hid,
+                    num_flatten_dims=2,
+                    param_attr=fluid.ParamAttr(name=name + '_fc_1.w_0', initializer=param_initializer),
+                    bias_attr=name + '_fc_1.b_0')
+    return out
+
+
+def pre_post_process_layer(prev_out, out, process_cmd, dropout_rate=0., name=''):
+    """
+    Add residual connection, layer normalization and droput to the out tensor
+    optionally according to the value of process_cmd.
+    This will be used before or after multi-head attention and position-wise
+    feed-forward networks.
+    """
+    for cmd in process_cmd:
+        if cmd == "a":  # add residual connection
+            out = out + prev_out if prev_out else out
+        elif cmd == "n":  # add layer normalization
+            out_dtype = out.dtype
+            if out_dtype == fluid.core.VarDesc.VarType.FP16:
+                out = layers.cast(x=out, dtype="float32")
+            out = layers.layer_norm(out,
+                                    begin_norm_axis=len(out.shape) - 1,
+                                    param_attr=fluid.ParamAttr(name=name + '_layer_norm_scale',
+                                                               initializer=fluid.initializer.Constant(1.)),
+                                    bias_attr=fluid.ParamAttr(name=name + '_layer_norm_bias',
+                                                              initializer=fluid.initializer.Constant(0.)))
+            if out_dtype == fluid.core.VarDesc.VarType.FP16:
+                out = layers.cast(x=out, dtype="float16")
+        elif cmd == "d":  # add dropout
+            if dropout_rate:
+                out = layers.dropout(out,
+                                     dropout_prob=dropout_rate,
+                                     dropout_implementation="upscale_in_train",
+                                     is_test=False)
+    return out
+
+
+pre_process_layer = partial(pre_post_process_layer, None)
+post_process_layer = pre_post_process_layer
+
+
+def encoder_layer(enc_input,
+                  attn_bias,
+                  n_head,
+                  d_key,
+                  d_value,
+                  d_model,
+                  d_inner_hid,
+                  prepostprocess_dropout,
+                  attention_dropout,
+                  relu_dropout,
+                  hidden_act,
+                  preprocess_cmd="n",
+                  postprocess_cmd="da",
+                  param_initializer=None,
+                  name=''):
+    """The encoder layers that can be stacked to form a deep encoder.
+    This module consits of a multi-head (self) attention followed by
+    position-wise feed-forward networks and both the two components companied
+    with the post_process_layer to add residual connection, layer normalization
+    and droput.
+    """
+    attn_output = multi_head_attention(pre_process_layer(enc_input,
+                                                         preprocess_cmd,
+                                                         prepostprocess_dropout,
+                                                         name=name + '_pre_att'),
+                                       None,
+                                       None,
+                                       attn_bias,
+                                       d_key,
+                                       d_value,
+                                       d_model,
+                                       n_head,
+                                       attention_dropout,
+                                       param_initializer=param_initializer,
+                                       name=name + '_multi_head_att')
+    attn_output = post_process_layer(enc_input,
+                                     attn_output,
+                                     postprocess_cmd,
+                                     prepostprocess_dropout,
+                                     name=name + '_post_att')
+    ffd_output = positionwise_feed_forward(pre_process_layer(attn_output,
+                                                             preprocess_cmd,
+                                                             prepostprocess_dropout,
+                                                             name=name + '_pre_ffn'),
+                                           d_inner_hid,
+                                           d_model,
+                                           relu_dropout,
+                                           hidden_act,
+                                           param_initializer=param_initializer,
+                                           name=name + '_ffn')
+    return post_process_layer(attn_output, ffd_output, postprocess_cmd, prepostprocess_dropout, name=name + '_post_ffn')
+
+
+def encoder(enc_input,
+            attn_bias,
+            n_layer,
+            n_head,
+            d_key,
+            d_value,
+            d_model,
+            d_inner_hid,
+            prepostprocess_dropout,
+            attention_dropout,
+            relu_dropout,
+            hidden_act,
+            preprocess_cmd="n",
+            postprocess_cmd="da",
+            param_initializer=None,
+            name=''):
+    """
+    The encoder is composed of a stack of identical layers returned by calling
+    encoder_layer.
+    """
+    for i in range(n_layer):
+        enc_output = encoder_layer(enc_input,
+                                   attn_bias,
+                                   n_head,
+                                   d_key,
+                                   d_value,
+                                   d_model,
+                                   d_inner_hid,
+                                   prepostprocess_dropout,
+                                   attention_dropout,
+                                   relu_dropout,
+                                   hidden_act,
+                                   preprocess_cmd,
+                                   postprocess_cmd,
+                                   param_initializer=param_initializer,
+                                   name=name + '_layer_' + str(i))
+        enc_input = enc_output
+    enc_output = pre_process_layer(enc_output, preprocess_cmd, prepostprocess_dropout, name="post_encoder")
+
+    return enc_output
--- a/modules/text/semantic_model/bert_multi_uncased_L_12_H_768_A_12/module.py
+++ b/modules/text/semantic_model/bert_multi_uncased_L_12_H_768_A_12/module.py
@@ -58,13 +58,12 @@ class Bert(TransformerModule):
            pooled_output (tensor):  sentence-level output for classification task.
            sequence_output (tensor): token-level output for sequence task.
        """
-        bert = BertModel(
-            src_ids=input_ids,
-            position_ids=position_ids,
-            sentence_ids=segment_ids,
-            input_mask=input_mask,
-            config=self.bert_config,
-            use_fp16=False)
+        bert = BertModel(src_ids=input_ids,
+                         position_ids=position_ids,
+                         sentence_ids=segment_ids,
+                         input_mask=input_mask,
+                         config=self.bert_config,
+                         use_fp16=False)
        pooled_output = bert.get_pooled_output()
        sequence_output = bert.get_sequence_output()
        return pooled_output, sequence_output

--- a/modules/text/semantic_model/bert_uncased_L_12_H_768_A_12/README.md
+++ b/modules/text/semantic_model/bert_uncased_L_12_H_768_A_12/README.md
--- a/modules/text/semantic_model/bert_uncased_L_12_H_768_A_12/__init__.py
+++ b/modules/text/semantic_model/bert_uncased_L_12_H_768_A_12/__init__.py
--- a/modules/text/semantic_model/bert_uncased_L_12_H_768_A_12/model/__init__.py
+++ b/modules/text/semantic_model/bert_uncased_L_12_H_768_A_12/model/__init__.py
--- a/modules/text/semantic_model/bert_uncased_L_12_H_768_A_12/model/bert.py
+++ b/modules/text/semantic_model/bert_uncased_L_12_H_768_A_12/model/bert.py
@@ -74,23 +74,23 @@ class BertModel(object):

    def _build_model(self, src_ids, position_ids, sentence_ids, input_mask):
        # padding id in vocabulary must be set to 0
-        emb_out = fluid.layers.embedding(
-            input=src_ids,
-            size=[self._voc_size, self._emb_size],
-            dtype=self._dtype,
-            param_attr=fluid.ParamAttr(name=self._word_emb_name, initializer=self._param_initializer),
-            is_sparse=False)
-        position_emb_out = fluid.layers.embedding(
-            input=position_ids,
-            size=[self._max_position_seq_len, self._emb_size],
-            dtype=self._dtype,
-            param_attr=fluid.ParamAttr(name=self._pos_emb_name, initializer=self._param_initializer))
-
-        sent_emb_out = fluid.layers.embedding(
-            sentence_ids,
-            size=[self._sent_types, self._emb_size],
-            dtype=self._dtype,
-            param_attr=fluid.ParamAttr(name=self._sent_emb_name, initializer=self._param_initializer))
+        emb_out = fluid.layers.embedding(input=src_ids,
+                                         size=[self._voc_size, self._emb_size],
+                                         dtype=self._dtype,
+                                         param_attr=fluid.ParamAttr(name=self._word_emb_name,
+                                                                    initializer=self._param_initializer),
+                                         is_sparse=False)
+        position_emb_out = fluid.layers.embedding(input=position_ids,
+                                                  size=[self._max_position_seq_len, self._emb_size],
+                                                  dtype=self._dtype,
+                                                  param_attr=fluid.ParamAttr(name=self._pos_emb_name,
+                                                                             initializer=self._param_initializer))
+
+        sent_emb_out = fluid.layers.embedding(sentence_ids,
+                                              size=[self._sent_types, self._emb_size],
+                                              dtype=self._dtype,
+                                              param_attr=fluid.ParamAttr(name=self._sent_emb_name,
+                                                                         initializer=self._param_initializer))

        emb_out = emb_out + position_emb_out
        emb_out = emb_out + sent_emb_out
@@ -105,23 +105,22 @@ class BertModel(object):
        n_head_self_attn_mask = fluid.layers.stack(x=[self_attn_mask] * self._n_head, axis=1)
        n_head_self_attn_mask.stop_gradient = True

-        self._enc_out = encoder(
-            enc_input=emb_out,
-            attn_bias=n_head_self_attn_mask,
-            n_layer=self._n_layer,
-            n_head=self._n_head,
-            d_key=self._emb_size // self._n_head,
-            d_value=self._emb_size // self._n_head,
-            d_model=self._emb_size,
-            d_inner_hid=self._emb_size * 4,
-            prepostprocess_dropout=self._prepostprocess_dropout,
-            attention_dropout=self._attention_dropout,
-            relu_dropout=0,
-            hidden_act=self._hidden_act,
-            preprocess_cmd="",
-            postprocess_cmd="dan",
-            param_initializer=self._param_initializer,
-            name='encoder')
+        self._enc_out = encoder(enc_input=emb_out,
+                                attn_bias=n_head_self_attn_mask,
+                                n_layer=self._n_layer,
+                                n_head=self._n_head,
+                                d_key=self._emb_size // self._n_head,
+                                d_value=self._emb_size // self._n_head,
+                                d_model=self._emb_size,
+                                d_inner_hid=self._emb_size * 4,
+                                prepostprocess_dropout=self._prepostprocess_dropout,
+                                attention_dropout=self._attention_dropout,
+                                relu_dropout=0,
+                                hidden_act=self._hidden_act,
+                                preprocess_cmd="",
+                                postprocess_cmd="dan",
+                                param_initializer=self._param_initializer,
+                                name='encoder')

    def get_sequence_output(self):
        return self._enc_out
@@ -130,12 +129,12 @@ class BertModel(object):
        """Get the first feature of each sequence for classification"""

        next_sent_feat = fluid.layers.slice(input=self._enc_out, axes=[1], starts=[0], ends=[1])
-        next_sent_feat = fluid.layers.fc(
-            input=next_sent_feat,
-            size=self._emb_size,
-            act="tanh",
-            param_attr=fluid.ParamAttr(name="pooled_fc.w_0", initializer=self._param_initializer),
-            bias_attr="pooled_fc.b_0")
+        next_sent_feat = fluid.layers.fc(input=next_sent_feat,
+                                         size=self._emb_size,
+                                         act="tanh",
+                                         param_attr=fluid.ParamAttr(name="pooled_fc.w_0",
+                                                                    initializer=self._param_initializer),
+                                         bias_attr="pooled_fc.b_0")
        return next_sent_feat

    def get_pretraining_output(self, mask_label, mask_pos, labels):
@@ -150,43 +149,45 @@ class BertModel(object):
        mask_feat = fluid.layers.gather(input=reshaped_emb_out, index=mask_pos)

        # transform: fc
-        mask_trans_feat = fluid.layers.fc(
-            input=mask_feat,
-            size=self._emb_size,
-            act=self._hidden_act,
-            param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0', initializer=self._param_initializer),
-            bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
+        mask_trans_feat = fluid.layers.fc(input=mask_feat,
+                                          size=self._emb_size,
+                                          act=self._hidden_act,
+                                          param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0',
+                                                                     initializer=self._param_initializer),
+                                          bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
        # transform: layer norm
        mask_trans_feat = pre_process_layer(mask_trans_feat, 'n', name='mask_lm_trans')

-        mask_lm_out_bias_attr = fluid.ParamAttr(
-            name="mask_lm_out_fc.b_0", initializer=fluid.initializer.Constant(value=0.0))
+        mask_lm_out_bias_attr = fluid.ParamAttr(name="mask_lm_out_fc.b_0",
+                                                initializer=fluid.initializer.Constant(value=0.0))
        if self._weight_sharing:
-            fc_out = fluid.layers.matmul(
-                x=mask_trans_feat,
-                y=fluid.default_main_program().global_block().var(self._word_emb_name),
-                transpose_y=True)
-            fc_out += fluid.layers.create_parameter(
-                shape=[self._voc_size], dtype=self._dtype, attr=mask_lm_out_bias_attr, is_bias=True)
+            fc_out = fluid.layers.matmul(x=mask_trans_feat,
+                                         y=fluid.default_main_program().global_block().var(self._word_emb_name),
+                                         transpose_y=True)
+            fc_out += fluid.layers.create_parameter(shape=[self._voc_size],
+                                                    dtype=self._dtype,
+                                                    attr=mask_lm_out_bias_attr,
+                                                    is_bias=True)

        else:
-            fc_out = fluid.layers.fc(
-                input=mask_trans_feat,
-                size=self._voc_size,
-                param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0", initializer=self._param_initializer),
-                bias_attr=mask_lm_out_bias_attr)
+            fc_out = fluid.layers.fc(input=mask_trans_feat,
+                                     size=self._voc_size,
+                                     param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0",
+                                                                initializer=self._param_initializer),
+                                     bias_attr=mask_lm_out_bias_attr)

        mask_lm_loss = fluid.layers.softmax_with_cross_entropy(logits=fc_out, label=mask_label)
        mean_mask_lm_loss = fluid.layers.mean(mask_lm_loss)

-        next_sent_fc_out = fluid.layers.fc(
-            input=next_sent_feat,
-            size=2,
-            param_attr=fluid.ParamAttr(name="next_sent_fc.w_0", initializer=self._param_initializer),
-            bias_attr="next_sent_fc.b_0")
+        next_sent_fc_out = fluid.layers.fc(input=next_sent_feat,
+                                           size=2,
+                                           param_attr=fluid.ParamAttr(name="next_sent_fc.w_0",
+                                                                      initializer=self._param_initializer),
+                                           bias_attr="next_sent_fc.b_0")

-        next_sent_loss, next_sent_softmax = fluid.layers.softmax_with_cross_entropy(
-            logits=next_sent_fc_out, label=labels, return_softmax=True)
+        next_sent_loss, next_sent_softmax = fluid.layers.softmax_with_cross_entropy(logits=next_sent_fc_out,
+                                                                                    label=labels,
+                                                                                    return_softmax=True)

        next_sent_acc = fluid.layers.accuracy(input=next_sent_softmax, label=labels)


--- a/modules/text/language_model/bert_uncased_L_12_H_768_A_12/model/transformer_encoder.py
+++ b/modules/text/language_model/bert_uncased_L_12_H_768_A_12/model/transformer_encoder.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Transformer encoder."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from functools import partial
+
+import paddle.fluid as fluid
+import paddle.fluid.layers as layers
+
+
+def multi_head_attention(queries,
+                         keys,
+                         values,
+                         attn_bias,
+                         d_key,
+                         d_value,
+                         d_model,
+                         n_head=1,
+                         dropout_rate=0.,
+                         cache=None,
+                         param_initializer=None,
+                         name='multi_head_att'):
+    """
+    Multi-Head Attention. Note that attn_bias is added to the logit before
+    computing softmax activiation to mask certain selected positions so that
+    they will not considered in attention weights.
+    """
+    keys = queries if keys is None else keys
+    values = keys if values is None else values
+
+    if not (len(queries.shape) == len(keys.shape) == len(values.shape) == 3):
+        raise ValueError("Inputs: quries, keys and values should all be 3-D tensors.")
+
+    def __compute_qkv(queries, keys, values, n_head, d_key, d_value):
+        """
+        Add linear projection to queries, keys, and values.
+        """
+        q = layers.fc(input=queries,
+                      size=d_key * n_head,
+                      num_flatten_dims=2,
+                      param_attr=fluid.ParamAttr(name=name + '_query_fc.w_0', initializer=param_initializer),
+                      bias_attr=name + '_query_fc.b_0')
+        k = layers.fc(input=keys,
+                      size=d_key * n_head,
+                      num_flatten_dims=2,
+                      param_attr=fluid.ParamAttr(name=name + '_key_fc.w_0', initializer=param_initializer),
+                      bias_attr=name + '_key_fc.b_0')
+        v = layers.fc(input=values,
+                      size=d_value * n_head,
+                      num_flatten_dims=2,
+                      param_attr=fluid.ParamAttr(name=name + '_value_fc.w_0', initializer=param_initializer),
+                      bias_attr=name + '_value_fc.b_0')
+        return q, k, v
+
+    def __split_heads(x, n_head):
+        """
+        Reshape the last dimension of inpunt tensor x so that it becomes two
+        dimensions and then transpose. Specifically, input a tensor with shape
+        [bs, max_sequence_length, n_head * hidden_dim] then output a tensor
+        with shape [bs, n_head, max_sequence_length, hidden_dim].
+        """
+        hidden_size = x.shape[-1]
+        # The value 0 in shape attr means copying the corresponding dimension
+        # size of the input as the output dimension size.
+        reshaped = layers.reshape(x=x, shape=[0, 0, n_head, hidden_size // n_head], inplace=True)
+
+        # permuate the dimensions into:
+        # [batch_size, n_head, max_sequence_len, hidden_size_per_head]
+        return layers.transpose(x=reshaped, perm=[0, 2, 1, 3])
+
+    def __combine_heads(x):
+        """
+        Transpose and then reshape the last two dimensions of inpunt tensor x
+        so that it becomes one dimension, which is reverse to __split_heads.
+        """
+        if len(x.shape) == 3: return x
+        if len(x.shape) != 4:
+            raise ValueError("Input(x) should be a 4-D Tensor.")
+
+        trans_x = layers.transpose(x, perm=[0, 2, 1, 3])
+        # The value 0 in shape attr means copying the corresponding dimension
+        # size of the input as the output dimension size.
+        return layers.reshape(x=trans_x, shape=[0, 0, trans_x.shape[2] * trans_x.shape[3]], inplace=True)
+
+    def scaled_dot_product_attention(q, k, v, attn_bias, d_key, dropout_rate):
+        """
+        Scaled Dot-Product Attention
+        """
+        scaled_q = layers.scale(x=q, scale=d_key**-0.5)
+        product = layers.matmul(x=scaled_q, y=k, transpose_y=True)
+        if attn_bias:
+            product += attn_bias
+        weights = layers.softmax(product)
+        if dropout_rate:
+            weights = layers.dropout(weights,
+                                     dropout_prob=dropout_rate,
+                                     dropout_implementation="upscale_in_train",
+                                     is_test=False)
+        out = layers.matmul(weights, v)
+        return out
+
+    q, k, v = __compute_qkv(queries, keys, values, n_head, d_key, d_value)
+
+    if cache is not None:  # use cache and concat time steps
+        # Since the inplace reshape in __split_heads changes the shape of k and
+        # v, which is the cache input for next time step, reshape the cache
+        # input from the previous time step first.
+        k = cache["k"] = layers.concat([layers.reshape(cache["k"], shape=[0, 0, d_model]), k], axis=1)
+        v = cache["v"] = layers.concat([layers.reshape(cache["v"], shape=[0, 0, d_model]), v], axis=1)
+
+    q = __split_heads(q, n_head)
+    k = __split_heads(k, n_head)
+    v = __split_heads(v, n_head)
+
+    ctx_multiheads = scaled_dot_product_attention(q, k, v, attn_bias, d_key, dropout_rate)
+
+    out = __combine_heads(ctx_multiheads)
+
+    # Project back to the model size.
+    proj_out = layers.fc(input=out,
+                         size=d_model,
+                         num_flatten_dims=2,
+                         param_attr=fluid.ParamAttr(name=name + '_output_fc.w_0', initializer=param_initializer),
+                         bias_attr=name + '_output_fc.b_0')
+    return proj_out
+
+
+def positionwise_feed_forward(x, d_inner_hid, d_hid, dropout_rate, hidden_act, param_initializer=None, name='ffn'):
+    """
+    Position-wise Feed-Forward Networks.
+    This module consists of two linear transformations with a ReLU activation
+    in between, which is applied to each position separately and identically.
+    """
+    hidden = layers.fc(input=x,
+                       size=d_inner_hid,
+                       num_flatten_dims=2,
+                       act=hidden_act,
+                       param_attr=fluid.ParamAttr(name=name + '_fc_0.w_0', initializer=param_initializer),
+                       bias_attr=name + '_fc_0.b_0')
+    if dropout_rate:
+        hidden = layers.dropout(hidden,
+                                dropout_prob=dropout_rate,
+                                dropout_implementation="upscale_in_train",
+                                is_test=False)
+    out = layers.fc(input=hidden,
+                    size=d_hid,
+                    num_flatten_dims=2,
+                    param_attr=fluid.ParamAttr(name=name + '_fc_1.w_0', initializer=param_initializer),
+                    bias_attr=name + '_fc_1.b_0')
+    return out
+
+
+def pre_post_process_layer(prev_out, out, process_cmd, dropout_rate=0., name=''):
+    """
+    Add residual connection, layer normalization and droput to the out tensor
+    optionally according to the value of process_cmd.
+    This will be used before or after multi-head attention and position-wise
+    feed-forward networks.
+    """
+    for cmd in process_cmd:
+        if cmd == "a":  # add residual connection
+            out = out + prev_out if prev_out else out
+        elif cmd == "n":  # add layer normalization
+            out_dtype = out.dtype
+            if out_dtype == fluid.core.VarDesc.VarType.FP16:
+                out = layers.cast(x=out, dtype="float32")
+            out = layers.layer_norm(out,
+                                    begin_norm_axis=len(out.shape) - 1,
+                                    param_attr=fluid.ParamAttr(name=name + '_layer_norm_scale',
+                                                               initializer=fluid.initializer.Constant(1.)),
+                                    bias_attr=fluid.ParamAttr(name=name + '_layer_norm_bias',
+                                                              initializer=fluid.initializer.Constant(0.)))
+            if out_dtype == fluid.core.VarDesc.VarType.FP16:
+                out = layers.cast(x=out, dtype="float16")
+        elif cmd == "d":  # add dropout
+            if dropout_rate:
+                out = layers.dropout(out,
+                                     dropout_prob=dropout_rate,
+                                     dropout_implementation="upscale_in_train",
+                                     is_test=False)
+    return out
+
+
+pre_process_layer = partial(pre_post_process_layer, None)
+post_process_layer = pre_post_process_layer
+
+
+def encoder_layer(enc_input,
+                  attn_bias,
+                  n_head,
+                  d_key,
+                  d_value,
+                  d_model,
+                  d_inner_hid,
+                  prepostprocess_dropout,
+                  attention_dropout,
+                  relu_dropout,
+                  hidden_act,
+                  preprocess_cmd="n",
+                  postprocess_cmd="da",
+                  param_initializer=None,
+                  name=''):
+    """The encoder layers that can be stacked to form a deep encoder.
+    This module consits of a multi-head (self) attention followed by
+    position-wise feed-forward networks and both the two components companied
+    with the post_process_layer to add residual connection, layer normalization
+    and droput.
+    """
+    attn_output = multi_head_attention(pre_process_layer(enc_input,
+                                                         preprocess_cmd,
+                                                         prepostprocess_dropout,
+                                                         name=name + '_pre_att'),
+                                       None,
+                                       None,
+                                       attn_bias,
+                                       d_key,
+                                       d_value,
+                                       d_model,
+                                       n_head,
+                                       attention_dropout,
+                                       param_initializer=param_initializer,
+                                       name=name + '_multi_head_att')
+    attn_output = post_process_layer(enc_input,
+                                     attn_output,
+                                     postprocess_cmd,
+                                     prepostprocess_dropout,
+                                     name=name + '_post_att')
+    ffd_output = positionwise_feed_forward(pre_process_layer(attn_output,
+                                                             preprocess_cmd,
+                                                             prepostprocess_dropout,
+                                                             name=name + '_pre_ffn'),
+                                           d_inner_hid,
+                                           d_model,
+                                           relu_dropout,
+                                           hidden_act,
+                                           param_initializer=param_initializer,
+                                           name=name + '_ffn')
+    return post_process_layer(attn_output, ffd_output, postprocess_cmd, prepostprocess_dropout, name=name + '_post_ffn')
+
+
+def encoder(enc_input,
+            attn_bias,
+            n_layer,
+            n_head,
+            d_key,
+            d_value,
+            d_model,
+            d_inner_hid,
+            prepostprocess_dropout,
+            attention_dropout,
+            relu_dropout,
+            hidden_act,
+            preprocess_cmd="n",
+            postprocess_cmd="da",
+            param_initializer=None,
+            name=''):
+    """
+    The encoder is composed of a stack of identical layers returned by calling
+    encoder_layer.
+    """
+    for i in range(n_layer):
+        enc_output = encoder_layer(enc_input,
+                                   attn_bias,
+                                   n_head,
+                                   d_key,
+                                   d_value,
+                                   d_model,
+                                   d_inner_hid,
+                                   prepostprocess_dropout,
+                                   attention_dropout,
+                                   relu_dropout,
+                                   hidden_act,
+                                   preprocess_cmd,
+                                   postprocess_cmd,
+                                   param_initializer=param_initializer,
+                                   name=name + '_layer_' + str(i))
+        enc_input = enc_output
+    enc_output = pre_process_layer(enc_output, preprocess_cmd, prepostprocess_dropout, name="post_encoder")
+
+    return enc_output
--- a/modules/text/semantic_model/bert_uncased_L_12_H_768_A_12/module.py
+++ b/modules/text/semantic_model/bert_uncased_L_12_H_768_A_12/module.py
@@ -58,13 +58,12 @@ class Bert(TransformerModule):
            pooled_output (tensor):  sentence-level output for classification task.
            sequence_output (tensor): token-level output for sequence task.
        """
-        bert = BertModel(
-            src_ids=input_ids,
-            position_ids=position_ids,
-            sentence_ids=segment_ids,
-            input_mask=input_mask,
-            config=self.bert_config,
-            use_fp16=False)
+        bert = BertModel(src_ids=input_ids,
+                         position_ids=position_ids,
+                         sentence_ids=segment_ids,
+                         input_mask=input_mask,
+                         config=self.bert_config,
+                         use_fp16=False)
        pooled_output = bert.get_pooled_output()
        sequence_output = bert.get_sequence_output()
        return pooled_output, sequence_output

--- a/modules/text/semantic_model/bert_uncased_L_24_H_1024_A_16/README.md
+++ b/modules/text/semantic_model/bert_uncased_L_24_H_1024_A_16/README.md
--- a/modules/text/semantic_model/bert_uncased_L_24_H_1024_A_16/__init__.py
+++ b/modules/text/semantic_model/bert_uncased_L_24_H_1024_A_16/__init__.py
--- a/modules/text/semantic_model/bert_uncased_L_24_H_1024_A_16/model/__init__.py
+++ b/modules/text/semantic_model/bert_uncased_L_24_H_1024_A_16/model/__init__.py
--- a/modules/text/semantic_model/bert_uncased_L_24_H_1024_A_16/model/bert.py
+++ b/modules/text/semantic_model/bert_uncased_L_24_H_1024_A_16/model/bert.py
@@ -74,23 +74,23 @@ class BertModel(object):

    def _build_model(self, src_ids, position_ids, sentence_ids, input_mask):
        # padding id in vocabulary must be set to 0
-        emb_out = fluid.layers.embedding(
-            input=src_ids,
-            size=[self._voc_size, self._emb_size],
-            dtype=self._dtype,
-            param_attr=fluid.ParamAttr(name=self._word_emb_name, initializer=self._param_initializer),
-            is_sparse=False)
-        position_emb_out = fluid.layers.embedding(
-            input=position_ids,
-            size=[self._max_position_seq_len, self._emb_size],
-            dtype=self._dtype,
-            param_attr=fluid.ParamAttr(name=self._pos_emb_name, initializer=self._param_initializer))
-
-        sent_emb_out = fluid.layers.embedding(
-            sentence_ids,
-            size=[self._sent_types, self._emb_size],
-            dtype=self._dtype,
-            param_attr=fluid.ParamAttr(name=self._sent_emb_name, initializer=self._param_initializer))
+        emb_out = fluid.layers.embedding(input=src_ids,
+                                         size=[self._voc_size, self._emb_size],
+                                         dtype=self._dtype,
+                                         param_attr=fluid.ParamAttr(name=self._word_emb_name,
+                                                                    initializer=self._param_initializer),
+                                         is_sparse=False)
+        position_emb_out = fluid.layers.embedding(input=position_ids,
+                                                  size=[self._max_position_seq_len, self._emb_size],
+                                                  dtype=self._dtype,
+                                                  param_attr=fluid.ParamAttr(name=self._pos_emb_name,
+                                                                             initializer=self._param_initializer))
+
+        sent_emb_out = fluid.layers.embedding(sentence_ids,
+                                              size=[self._sent_types, self._emb_size],
+                                              dtype=self._dtype,
+                                              param_attr=fluid.ParamAttr(name=self._sent_emb_name,
+                                                                         initializer=self._param_initializer))

        emb_out = emb_out + position_emb_out
        emb_out = emb_out + sent_emb_out
@@ -105,23 +105,22 @@ class BertModel(object):
        n_head_self_attn_mask = fluid.layers.stack(x=[self_attn_mask] * self._n_head, axis=1)
        n_head_self_attn_mask.stop_gradient = True

-        self._enc_out = encoder(
-            enc_input=emb_out,
-            attn_bias=n_head_self_attn_mask,
-            n_layer=self._n_layer,
-            n_head=self._n_head,
-            d_key=self._emb_size // self._n_head,
-            d_value=self._emb_size // self._n_head,
-            d_model=self._emb_size,
-            d_inner_hid=self._emb_size * 4,
-            prepostprocess_dropout=self._prepostprocess_dropout,
-            attention_dropout=self._attention_dropout,
-            relu_dropout=0,
-            hidden_act=self._hidden_act,
-            preprocess_cmd="",
-            postprocess_cmd="dan",
-            param_initializer=self._param_initializer,
-            name='encoder')
+        self._enc_out = encoder(enc_input=emb_out,
+                                attn_bias=n_head_self_attn_mask,
+                                n_layer=self._n_layer,
+                                n_head=self._n_head,
+                                d_key=self._emb_size // self._n_head,
+                                d_value=self._emb_size // self._n_head,
+                                d_model=self._emb_size,
+                                d_inner_hid=self._emb_size * 4,
+                                prepostprocess_dropout=self._prepostprocess_dropout,
+                                attention_dropout=self._attention_dropout,
+                                relu_dropout=0,
+                                hidden_act=self._hidden_act,
+                                preprocess_cmd="",
+                                postprocess_cmd="dan",
+                                param_initializer=self._param_initializer,
+                                name='encoder')

    def get_sequence_output(self):
        return self._enc_out
@@ -130,12 +129,12 @@ class BertModel(object):
        """Get the first feature of each sequence for classification"""

        next_sent_feat = fluid.layers.slice(input=self._enc_out, axes=[1], starts=[0], ends=[1])
-        next_sent_feat = fluid.layers.fc(
-            input=next_sent_feat,
-            size=self._emb_size,
-            act="tanh",
-            param_attr=fluid.ParamAttr(name="pooled_fc.w_0", initializer=self._param_initializer),
-            bias_attr="pooled_fc.b_0")
+        next_sent_feat = fluid.layers.fc(input=next_sent_feat,
+                                         size=self._emb_size,
+                                         act="tanh",
+                                         param_attr=fluid.ParamAttr(name="pooled_fc.w_0",
+                                                                    initializer=self._param_initializer),
+                                         bias_attr="pooled_fc.b_0")
        return next_sent_feat

    def get_pretraining_output(self, mask_label, mask_pos, labels):
@@ -150,43 +149,45 @@ class BertModel(object):
        mask_feat = fluid.layers.gather(input=reshaped_emb_out, index=mask_pos)

        # transform: fc
-        mask_trans_feat = fluid.layers.fc(
-            input=mask_feat,
-            size=self._emb_size,
-            act=self._hidden_act,
-            param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0', initializer=self._param_initializer),
-            bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
+        mask_trans_feat = fluid.layers.fc(input=mask_feat,
+                                          size=self._emb_size,
+                                          act=self._hidden_act,
+                                          param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0',
+                                                                     initializer=self._param_initializer),
+                                          bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
        # transform: layer norm
        mask_trans_feat = pre_process_layer(mask_trans_feat, 'n', name='mask_lm_trans')

-        mask_lm_out_bias_attr = fluid.ParamAttr(
-            name="mask_lm_out_fc.b_0", initializer=fluid.initializer.Constant(value=0.0))
+        mask_lm_out_bias_attr = fluid.ParamAttr(name="mask_lm_out_fc.b_0",
+                                                initializer=fluid.initializer.Constant(value=0.0))
        if self._weight_sharing:
-            fc_out = fluid.layers.matmul(
-                x=mask_trans_feat,
-                y=fluid.default_main_program().global_block().var(self._word_emb_name),
-                transpose_y=True)
-            fc_out += fluid.layers.create_parameter(
-                shape=[self._voc_size], dtype=self._dtype, attr=mask_lm_out_bias_attr, is_bias=True)
+            fc_out = fluid.layers.matmul(x=mask_trans_feat,
+                                         y=fluid.default_main_program().global_block().var(self._word_emb_name),
+                                         transpose_y=True)
+            fc_out += fluid.layers.create_parameter(shape=[self._voc_size],
+                                                    dtype=self._dtype,
+                                                    attr=mask_lm_out_bias_attr,
+                                                    is_bias=True)

        else:
-            fc_out = fluid.layers.fc(
-                input=mask_trans_feat,
-                size=self._voc_size,
-                param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0", initializer=self._param_initializer),
-                bias_attr=mask_lm_out_bias_attr)
+            fc_out = fluid.layers.fc(input=mask_trans_feat,
+                                     size=self._voc_size,
+                                     param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0",
+                                                                initializer=self._param_initializer),
+                                     bias_attr=mask_lm_out_bias_attr)

        mask_lm_loss = fluid.layers.softmax_with_cross_entropy(logits=fc_out, label=mask_label)
        mean_mask_lm_loss = fluid.layers.mean(mask_lm_loss)

-        next_sent_fc_out = fluid.layers.fc(
-            input=next_sent_feat,
-            size=2,
-            param_attr=fluid.ParamAttr(name="next_sent_fc.w_0", initializer=self._param_initializer),
-            bias_attr="next_sent_fc.b_0")
+        next_sent_fc_out = fluid.layers.fc(input=next_sent_feat,
+                                           size=2,
+                                           param_attr=fluid.ParamAttr(name="next_sent_fc.w_0",
+                                                                      initializer=self._param_initializer),
+                                           bias_attr="next_sent_fc.b_0")

-        next_sent_loss, next_sent_softmax = fluid.layers.softmax_with_cross_entropy(
-            logits=next_sent_fc_out, label=labels, return_softmax=True)
+        next_sent_loss, next_sent_softmax = fluid.layers.softmax_with_cross_entropy(logits=next_sent_fc_out,
+                                                                                    label=labels,
+                                                                                    return_softmax=True)

        next_sent_acc = fluid.layers.accuracy(input=next_sent_softmax, label=labels)


--- a/modules/text/language_model/bert_uncased_L_24_H_1024_A_16/model/transformer_encoder.py
+++ b/modules/text/language_model/bert_uncased_L_24_H_1024_A_16/model/transformer_encoder.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Transformer encoder."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from functools import partial
+
+import paddle.fluid as fluid
+import paddle.fluid.layers as layers
+
+
+def multi_head_attention(queries,
+                         keys,
+                         values,
+                         attn_bias,
+                         d_key,
+                         d_value,
+                         d_model,
+                         n_head=1,
+                         dropout_rate=0.,
+                         cache=None,
+                         param_initializer=None,
+                         name='multi_head_att'):
+    """
+    Multi-Head Attention. Note that attn_bias is added to the logit before
+    computing softmax activiation to mask certain selected positions so that
+    they will not considered in attention weights.
+    """
+    keys = queries if keys is None else keys
+    values = keys if values is None else values
+
+    if not (len(queries.shape) == len(keys.shape) == len(values.shape) == 3):
+        raise ValueError("Inputs: quries, keys and values should all be 3-D tensors.")
+
+    def __compute_qkv(queries, keys, values, n_head, d_key, d_value):
+        """
+        Add linear projection to queries, keys, and values.
+        """
+        q = layers.fc(input=queries,
+                      size=d_key * n_head,
+                      num_flatten_dims=2,
+                      param_attr=fluid.ParamAttr(name=name + '_query_fc.w_0', initializer=param_initializer),
+                      bias_attr=name + '_query_fc.b_0')
+        k = layers.fc(input=keys,
+                      size=d_key * n_head,
+                      num_flatten_dims=2,
+                      param_attr=fluid.ParamAttr(name=name + '_key_fc.w_0', initializer=param_initializer),
+                      bias_attr=name + '_key_fc.b_0')
+        v = layers.fc(input=values,
+                      size=d_value * n_head,
+                      num_flatten_dims=2,
+                      param_attr=fluid.ParamAttr(name=name + '_value_fc.w_0', initializer=param_initializer),
+                      bias_attr=name + '_value_fc.b_0')
+        return q, k, v
+
+    def __split_heads(x, n_head):
+        """
+        Reshape the last dimension of inpunt tensor x so that it becomes two
+        dimensions and then transpose. Specifically, input a tensor with shape
+        [bs, max_sequence_length, n_head * hidden_dim] then output a tensor
+        with shape [bs, n_head, max_sequence_length, hidden_dim].
+        """
+        hidden_size = x.shape[-1]
+        # The value 0 in shape attr means copying the corresponding dimension
+        # size of the input as the output dimension size.
+        reshaped = layers.reshape(x=x, shape=[0, 0, n_head, hidden_size // n_head], inplace=True)
+
+        # permuate the dimensions into:
+        # [batch_size, n_head, max_sequence_len, hidden_size_per_head]
+        return layers.transpose(x=reshaped, perm=[0, 2, 1, 3])
+
+    def __combine_heads(x):
+        """
+        Transpose and then reshape the last two dimensions of inpunt tensor x
+        so that it becomes one dimension, which is reverse to __split_heads.
+        """
+        if len(x.shape) == 3: return x
+        if len(x.shape) != 4:
+            raise ValueError("Input(x) should be a 4-D Tensor.")
+
+        trans_x = layers.transpose(x, perm=[0, 2, 1, 3])
+        # The value 0 in shape attr means copying the corresponding dimension
+        # size of the input as the output dimension size.
+        return layers.reshape(x=trans_x, shape=[0, 0, trans_x.shape[2] * trans_x.shape[3]], inplace=True)
+
+    def scaled_dot_product_attention(q, k, v, attn_bias, d_key, dropout_rate):
+        """
+        Scaled Dot-Product Attention
+        """
+        scaled_q = layers.scale(x=q, scale=d_key**-0.5)
+        product = layers.matmul(x=scaled_q, y=k, transpose_y=True)
+        if attn_bias:
+            product += attn_bias
+        weights = layers.softmax(product)
+        if dropout_rate:
+            weights = layers.dropout(weights,
+                                     dropout_prob=dropout_rate,
+                                     dropout_implementation="upscale_in_train",
+                                     is_test=False)
+        out = layers.matmul(weights, v)
+        return out
+
+    q, k, v = __compute_qkv(queries, keys, values, n_head, d_key, d_value)
+
+    if cache is not None:  # use cache and concat time steps
+        # Since the inplace reshape in __split_heads changes the shape of k and
+        # v, which is the cache input for next time step, reshape the cache
+        # input from the previous time step first.
+        k = cache["k"] = layers.concat([layers.reshape(cache["k"], shape=[0, 0, d_model]), k], axis=1)
+        v = cache["v"] = layers.concat([layers.reshape(cache["v"], shape=[0, 0, d_model]), v], axis=1)
+
+    q = __split_heads(q, n_head)
+    k = __split_heads(k, n_head)
+    v = __split_heads(v, n_head)
+
+    ctx_multiheads = scaled_dot_product_attention(q, k, v, attn_bias, d_key, dropout_rate)
+
+    out = __combine_heads(ctx_multiheads)
+
+    # Project back to the model size.
+    proj_out = layers.fc(input=out,
+                         size=d_model,
+                         num_flatten_dims=2,
+                         param_attr=fluid.ParamAttr(name=name + '_output_fc.w_0', initializer=param_initializer),
+                         bias_attr=name + '_output_fc.b_0')
+    return proj_out
+
+
+def positionwise_feed_forward(x, d_inner_hid, d_hid, dropout_rate, hidden_act, param_initializer=None, name='ffn'):
+    """
+    Position-wise Feed-Forward Networks.
+    This module consists of two linear transformations with a ReLU activation
+    in between, which is applied to each position separately and identically.
+    """
+    hidden = layers.fc(input=x,
+                       size=d_inner_hid,
+                       num_flatten_dims=2,
+                       act=hidden_act,
+                       param_attr=fluid.ParamAttr(name=name + '_fc_0.w_0', initializer=param_initializer),
+                       bias_attr=name + '_fc_0.b_0')
+    if dropout_rate:
+        hidden = layers.dropout(hidden,
+                                dropout_prob=dropout_rate,
+                                dropout_implementation="upscale_in_train",
+                                is_test=False)
+    out = layers.fc(input=hidden,
+                    size=d_hid,
+                    num_flatten_dims=2,
+                    param_attr=fluid.ParamAttr(name=name + '_fc_1.w_0', initializer=param_initializer),
+                    bias_attr=name + '_fc_1.b_0')
+    return out
+
+
+def pre_post_process_layer(prev_out, out, process_cmd, dropout_rate=0., name=''):
+    """
+    Add residual connection, layer normalization and droput to the out tensor
+    optionally according to the value of process_cmd.
+    This will be used before or after multi-head attention and position-wise
+    feed-forward networks.
+    """
+    for cmd in process_cmd:
+        if cmd == "a":  # add residual connection
+            out = out + prev_out if prev_out else out
+        elif cmd == "n":  # add layer normalization
+            out_dtype = out.dtype
+            if out_dtype == fluid.core.VarDesc.VarType.FP16:
+                out = layers.cast(x=out, dtype="float32")
+            out = layers.layer_norm(out,
+                                    begin_norm_axis=len(out.shape) - 1,
+                                    param_attr=fluid.ParamAttr(name=name + '_layer_norm_scale',
+                                                               initializer=fluid.initializer.Constant(1.)),
+                                    bias_attr=fluid.ParamAttr(name=name + '_layer_norm_bias',
+                                                              initializer=fluid.initializer.Constant(0.)))
+            if out_dtype == fluid.core.VarDesc.VarType.FP16:
+                out = layers.cast(x=out, dtype="float16")
+        elif cmd == "d":  # add dropout
+            if dropout_rate:
+                out = layers.dropout(out,
+                                     dropout_prob=dropout_rate,
+                                     dropout_implementation="upscale_in_train",
+                                     is_test=False)
+    return out
+
+
+pre_process_layer = partial(pre_post_process_layer, None)
+post_process_layer = pre_post_process_layer
+
+
+def encoder_layer(enc_input,
+                  attn_bias,
+                  n_head,
+                  d_key,
+                  d_value,
+                  d_model,
+                  d_inner_hid,
+                  prepostprocess_dropout,
+                  attention_dropout,
+                  relu_dropout,
+                  hidden_act,
+                  preprocess_cmd="n",
+                  postprocess_cmd="da",
+                  param_initializer=None,
+                  name=''):
+    """The encoder layers that can be stacked to form a deep encoder.
+    This module consits of a multi-head (self) attention followed by
+    position-wise feed-forward networks and both the two components companied
+    with the post_process_layer to add residual connection, layer normalization
+    and droput.
+    """
+    attn_output = multi_head_attention(pre_process_layer(enc_input,
+                                                         preprocess_cmd,
+                                                         prepostprocess_dropout,
+                                                         name=name + '_pre_att'),
+                                       None,
+                                       None,
+                                       attn_bias,
+                                       d_key,
+                                       d_value,
+                                       d_model,
+                                       n_head,
+                                       attention_dropout,
+                                       param_initializer=param_initializer,
+                                       name=name + '_multi_head_att')
+    attn_output = post_process_layer(enc_input,
+                                     attn_output,
+                                     postprocess_cmd,
+                                     prepostprocess_dropout,
+                                     name=name + '_post_att')
+    ffd_output = positionwise_feed_forward(pre_process_layer(attn_output,
+                                                             preprocess_cmd,
+                                                             prepostprocess_dropout,
+                                                             name=name + '_pre_ffn'),
+                                           d_inner_hid,
+                                           d_model,
+                                           relu_dropout,
+                                           hidden_act,
+                                           param_initializer=param_initializer,
+                                           name=name + '_ffn')
+    return post_process_layer(attn_output, ffd_output, postprocess_cmd, prepostprocess_dropout, name=name + '_post_ffn')
+
+
+def encoder(enc_input,
+            attn_bias,
+            n_layer,
+            n_head,
+            d_key,
+            d_value,
+            d_model,
+            d_inner_hid,
+            prepostprocess_dropout,
+            attention_dropout,
+            relu_dropout,
+            hidden_act,
+            preprocess_cmd="n",
+            postprocess_cmd="da",
+            param_initializer=None,
+            name=''):
+    """
+    The encoder is composed of a stack of identical layers returned by calling
+    encoder_layer.
+    """
+    for i in range(n_layer):
+        enc_output = encoder_layer(enc_input,
+                                   attn_bias,
+                                   n_head,
+                                   d_key,
+                                   d_value,
+                                   d_model,
+                                   d_inner_hid,
+                                   prepostprocess_dropout,
+                                   attention_dropout,
+                                   relu_dropout,
+                                   hidden_act,
+                                   preprocess_cmd,
+                                   postprocess_cmd,
+                                   param_initializer=param_initializer,
+                                   name=name + '_layer_' + str(i))
+        enc_input = enc_output
+    enc_output = pre_process_layer(enc_output, preprocess_cmd, prepostprocess_dropout, name="post_encoder")
+
+    return enc_output
--- a/modules/text/semantic_model/bert_uncased_L_24_H_1024_A_16/module.py
+++ b/modules/text/semantic_model/bert_uncased_L_24_H_1024_A_16/module.py
@@ -58,13 +58,12 @@ class Bert(TransformerModule):
            pooled_output (tensor):  sentence-level output for classification task.
            sequence_output (tensor): token-level output for sequence task.
        """
-        bert = BertModel(
-            src_ids=input_ids,
-            position_ids=position_ids,
-            sentence_ids=segment_ids,
-            input_mask=input_mask,
-            config=self.bert_config,
-            use_fp16=False)
+        bert = BertModel(src_ids=input_ids,
+                         position_ids=position_ids,
+                         sentence_ids=segment_ids,
+                         input_mask=input_mask,
+                         config=self.bert_config,
+                         use_fp16=False)
        pooled_output = bert.get_pooled_output()
        sequence_output = bert.get_sequence_output()
        return pooled_output, sequence_output

--- a/modules/text/semantic_model/chinese_bert_wwm/README.md
+++ b/modules/text/semantic_model/chinese_bert_wwm/README.md
--- a/modules/text/semantic_model/chinese_bert_wwm/__init__.py
+++ b/modules/text/semantic_model/chinese_bert_wwm/__init__.py
--- a/modules/text/semantic_model/chinese_bert_wwm/model/__init__.py
+++ b/modules/text/semantic_model/chinese_bert_wwm/model/__init__.py
--- a/modules/text/semantic_model/chinese_bert_wwm/model/bert.py
+++ b/modules/text/semantic_model/chinese_bert_wwm/model/bert.py
@@ -74,23 +74,23 @@ class BertModel(object):

    def _build_model(self, src_ids, position_ids, sentence_ids, input_mask):
        # padding id in vocabulary must be set to 0
-        emb_out = fluid.layers.embedding(
-            input=src_ids,
-            size=[self._voc_size, self._emb_size],
-            dtype=self._dtype,
-            param_attr=fluid.ParamAttr(name=self._word_emb_name, initializer=self._param_initializer),
-            is_sparse=False)
-        position_emb_out = fluid.layers.embedding(
-            input=position_ids,
-            size=[self._max_position_seq_len, self._emb_size],
-            dtype=self._dtype,
-            param_attr=fluid.ParamAttr(name=self._pos_emb_name, initializer=self._param_initializer))
-
-        sent_emb_out = fluid.layers.embedding(
-            sentence_ids,
-            size=[self._sent_types, self._emb_size],
-            dtype=self._dtype,
-            param_attr=fluid.ParamAttr(name=self._sent_emb_name, initializer=self._param_initializer))
+        emb_out = fluid.layers.embedding(input=src_ids,
+                                         size=[self._voc_size, self._emb_size],
+                                         dtype=self._dtype,
+                                         param_attr=fluid.ParamAttr(name=self._word_emb_name,
+                                                                    initializer=self._param_initializer),
+                                         is_sparse=False)
+        position_emb_out = fluid.layers.embedding(input=position_ids,
+                                                  size=[self._max_position_seq_len, self._emb_size],
+                                                  dtype=self._dtype,
+                                                  param_attr=fluid.ParamAttr(name=self._pos_emb_name,
+                                                                             initializer=self._param_initializer))
+
+        sent_emb_out = fluid.layers.embedding(sentence_ids,
+                                              size=[self._sent_types, self._emb_size],
+                                              dtype=self._dtype,
+                                              param_attr=fluid.ParamAttr(name=self._sent_emb_name,
+                                                                         initializer=self._param_initializer))

        emb_out = emb_out + position_emb_out
        emb_out = emb_out + sent_emb_out
@@ -105,23 +105,22 @@ class BertModel(object):
        n_head_self_attn_mask = fluid.layers.stack(x=[self_attn_mask] * self._n_head, axis=1)
        n_head_self_attn_mask.stop_gradient = True

-        self._enc_out = encoder(
-            enc_input=emb_out,
-            attn_bias=n_head_self_attn_mask,
-            n_layer=self._n_layer,
-            n_head=self._n_head,
-            d_key=self._emb_size // self._n_head,
-            d_value=self._emb_size // self._n_head,
-            d_model=self._emb_size,
-            d_inner_hid=self._emb_size * 4,
-            prepostprocess_dropout=self._prepostprocess_dropout,
-            attention_dropout=self._attention_dropout,
-            relu_dropout=0,
-            hidden_act=self._hidden_act,
-            preprocess_cmd="",
-            postprocess_cmd="dan",
-            param_initializer=self._param_initializer,
-            name='encoder')
+        self._enc_out = encoder(enc_input=emb_out,
+                                attn_bias=n_head_self_attn_mask,
+                                n_layer=self._n_layer,
+                                n_head=self._n_head,
+                                d_key=self._emb_size // self._n_head,
+                                d_value=self._emb_size // self._n_head,
+                                d_model=self._emb_size,
+                                d_inner_hid=self._emb_size * 4,
+                                prepostprocess_dropout=self._prepostprocess_dropout,
+                                attention_dropout=self._attention_dropout,
+                                relu_dropout=0,
+                                hidden_act=self._hidden_act,
+                                preprocess_cmd="",
+                                postprocess_cmd="dan",
+                                param_initializer=self._param_initializer,
+                                name='encoder')

    def get_sequence_output(self):
        return self._enc_out
@@ -130,12 +129,12 @@ class BertModel(object):
        """Get the first feature of each sequence for classification"""

        next_sent_feat = fluid.layers.slice(input=self._enc_out, axes=[1], starts=[0], ends=[1])
-        next_sent_feat = fluid.layers.fc(
-            input=next_sent_feat,
-            size=self._emb_size,
-            act="tanh",
-            param_attr=fluid.ParamAttr(name="pooled_fc.w_0", initializer=self._param_initializer),
-            bias_attr="pooled_fc.b_0")
+        next_sent_feat = fluid.layers.fc(input=next_sent_feat,
+                                         size=self._emb_size,
+                                         act="tanh",
+                                         param_attr=fluid.ParamAttr(name="pooled_fc.w_0",
+                                                                    initializer=self._param_initializer),
+                                         bias_attr="pooled_fc.b_0")
        return next_sent_feat

    def get_pretraining_output(self, mask_label, mask_pos, labels):
@@ -150,43 +149,45 @@ class BertModel(object):
        mask_feat = fluid.layers.gather(input=reshaped_emb_out, index=mask_pos)

        # transform: fc
-        mask_trans_feat = fluid.layers.fc(
-            input=mask_feat,
-            size=self._emb_size,
-            act=self._hidden_act,
-            param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0', initializer=self._param_initializer),
-            bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
+        mask_trans_feat = fluid.layers.fc(input=mask_feat,
+                                          size=self._emb_size,
+                                          act=self._hidden_act,
+                                          param_attr=fluid.ParamAttr(name='mask_lm_trans_fc.w_0',
+                                                                     initializer=self._param_initializer),
+                                          bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
        # transform: layer norm
        mask_trans_feat = pre_process_layer(mask_trans_feat, 'n', name='mask_lm_trans')

-        mask_lm_out_bias_attr = fluid.ParamAttr(
-            name="mask_lm_out_fc.b_0", initializer=fluid.initializer.Constant(value=0.0))
+        mask_lm_out_bias_attr = fluid.ParamAttr(name="mask_lm_out_fc.b_0",
+                                                initializer=fluid.initializer.Constant(value=0.0))
        if self._weight_sharing:
-            fc_out = fluid.layers.matmul(
-                x=mask_trans_feat,
-                y=fluid.default_main_program().global_block().var(self._word_emb_name),
-                transpose_y=True)
-            fc_out += fluid.layers.create_parameter(
-                shape=[self._voc_size], dtype=self._dtype, attr=mask_lm_out_bias_attr, is_bias=True)
+            fc_out = fluid.layers.matmul(x=mask_trans_feat,
+                                         y=fluid.default_main_program().global_block().var(self._word_emb_name),
+                                         transpose_y=True)
+            fc_out += fluid.layers.create_parameter(shape=[self._voc_size],
+                                                    dtype=self._dtype,
+                                                    attr=mask_lm_out_bias_attr,
+                                                    is_bias=True)

        else:
-            fc_out = fluid.layers.fc(
-                input=mask_trans_feat,
-                size=self._voc_size,
-                param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0", initializer=self._param_initializer),
-                bias_attr=mask_lm_out_bias_attr)
+            fc_out = fluid.layers.fc(input=mask_trans_feat,
+                                     size=self._voc_size,
+                                     param_attr=fluid.ParamAttr(name="mask_lm_out_fc.w_0",
+                                                                initializer=self._param_initializer),
+                                     bias_attr=mask_lm_out_bias_attr)

        mask_lm_loss = fluid.layers.softmax_with_cross_entropy(logits=fc_out, label=mask_label)
        mean_mask_lm_loss = fluid.layers.mean(mask_lm_loss)

-        next_sent_fc_out = fluid.layers.fc(
-            input=next_sent_feat,
-            size=2,
-            param_attr=fluid.ParamAttr(name="next_sent_fc.w_0", initializer=self._param_initializer),
-            bias_attr="next_sent_fc.b_0")
+        next_sent_fc_out = fluid.layers.fc(input=next_sent_feat,
+                                           size=2,
+                                           param_attr=fluid.ParamAttr(name="next_sent_fc.w_0",
+                                                                      initializer=self._param_initializer),
+                                           bias_attr="next_sent_fc.b_0")

-        next_sent_loss, next_sent_softmax = fluid.layers.softmax_with_cross_entropy(
-            logits=next_sent_fc_out, label=labels, return_softmax=True)
+        next_sent_loss, next_sent_softmax = fluid.layers.softmax_with_cross_entropy(logits=next_sent_fc_out,
+                                                                                    label=labels,
+                                                                                    return_softmax=True)

        next_sent_acc = fluid.layers.accuracy(input=next_sent_softmax, label=labels)


--- a/modules/text/language_model/chinese_bert_wwm/model/transformer_encoder.py
+++ b/modules/text/language_model/chinese_bert_wwm/model/transformer_encoder.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Transformer encoder."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from functools import partial
+
+import paddle.fluid as fluid
+import paddle.fluid.layers as layers
+
+
+def multi_head_attention(queries,
+                         keys,
+                         values,
+                         attn_bias,
+                         d_key,
+                         d_value,
+                         d_model,
+                         n_head=1,
+                         dropout_rate=0.,
+                         cache=None,
+                         param_initializer=None,
+                         name='multi_head_att'):
+    """
+    Multi-Head Attention. Note that attn_bias is added to the logit before
+    computing softmax activiation to mask certain selected positions so that
+    they will not considered in attention weights.
+    """
+    keys = queries if keys is None else keys
+    values = keys if values is None else values
+
+    if not (len(queries.shape) == len(keys.shape) == len(values.shape) == 3):
+        raise ValueError("Inputs: quries, keys and values should all be 3-D tensors.")
+
+    def __compute_qkv(queries, keys, values, n_head, d_key, d_value):
+        """
+        Add linear projection to queries, keys, and values.
+        """
+        q = layers.fc(input=queries,
+                      size=d_key * n_head,
+                      num_flatten_dims=2,
+                      param_attr=fluid.ParamAttr(name=name + '_query_fc.w_0', initializer=param_initializer),
+                      bias_attr=name + '_query_fc.b_0')
+        k = layers.fc(input=keys,
+                      size=d_key * n_head,
+                      num_flatten_dims=2,
+                      param_attr=fluid.ParamAttr(name=name + '_key_fc.w_0', initializer=param_initializer),
+                      bias_attr=name + '_key_fc.b_0')
+        v = layers.fc(input=values,
+                      size=d_value * n_head,
+                      num_flatten_dims=2,
+                      param_attr=fluid.ParamAttr(name=name + '_value_fc.w_0', initializer=param_initializer),
+                      bias_attr=name + '_value_fc.b_0')
+        return q, k, v
+
+    def __split_heads(x, n_head):
+        """
+        Reshape the last dimension of inpunt tensor x so that it becomes two
+        dimensions and then transpose. Specifically, input a tensor with shape
+        [bs, max_sequence_length, n_head * hidden_dim] then output a tensor
+        with shape [bs, n_head, max_sequence_length, hidden_dim].
+        """
+        hidden_size = x.shape[-1]
+        # The value 0 in shape attr means copying the corresponding dimension
+        # size of the input as the output dimension size.
+        reshaped = layers.reshape(x=x, shape=[0, 0, n_head, hidden_size // n_head], inplace=True)
+
+        # permuate the dimensions into:
+        # [batch_size, n_head, max_sequence_len, hidden_size_per_head]
+        return layers.transpose(x=reshaped, perm=[0, 2, 1, 3])
+
+    def __combine_heads(x):
+        """
+        Transpose and then reshape the last two dimensions of inpunt tensor x
+        so that it becomes one dimension, which is reverse to __split_heads.
+        """
+        if len(x.shape) == 3: return x
+        if len(x.shape) != 4:
+            raise ValueError("Input(x) should be a 4-D Tensor.")
+
+        trans_x = layers.transpose(x, perm=[0, 2, 1, 3])
+        # The value 0 in shape attr means copying the corresponding dimension
+        # size of the input as the output dimension size.
+        return layers.reshape(x=trans_x, shape=[0, 0, trans_x.shape[2] * trans_x.shape[3]], inplace=True)
+
+    def scaled_dot_product_attention(q, k, v, attn_bias, d_key, dropout_rate):
+        """
+        Scaled Dot-Product Attention
+        """
+        scaled_q = layers.scale(x=q, scale=d_key**-0.5)
+        product = layers.matmul(x=scaled_q, y=k, transpose_y=True)
+        if attn_bias:
+            product += attn_bias
+        weights = layers.softmax(product)
+        if dropout_rate:
+            weights = layers.dropout(weights,
+                                     dropout_prob=dropout_rate,
+                                     dropout_implementation="upscale_in_train",
+                                     is_test=False)
+        out = layers.matmul(weights, v)
+        return out
+
+    q, k, v = __compute_qkv(queries, keys, values, n_head, d_key, d_value)
+
+    if cache is not None:  # use cache and concat time steps
+        # Since the inplace reshape in __split_heads changes the shape of k and
+        # v, which is the cache input for next time step, reshape the cache
+        # input from the previous time step first.
+        k = cache["k"] = layers.concat([layers.reshape(cache["k"], shape=[0, 0, d_model]), k], axis=1)
+        v = cache["v"] = layers.concat([layers.reshape(cache["v"], shape=[0, 0, d_model]), v], axis=1)
+
+    q = __split_heads(q, n_head)
+    k = __split_heads(k, n_head)
+    v = __split_heads(v, n_head)
+
+    ctx_multiheads = scaled_dot_product_attention(q, k, v, attn_bias, d_key, dropout_rate)
+
+    out = __combine_heads(ctx_multiheads)
+
+    # Project back to the model size.
+    proj_out = layers.fc(input=out,
+                         size=d_model,
+                         num_flatten_dims=2,
+                         param_attr=fluid.ParamAttr(name=name + '_output_fc.w_0', initializer=param_initializer),
+                         bias_attr=name + '_output_fc.b_0')
+    return proj_out
+
+
+def positionwise_feed_forward(x, d_inner_hid, d_hid, dropout_rate, hidden_act, param_initializer=None, name='ffn'):
+    """
+    Position-wise Feed-Forward Networks.
+    This module consists of two linear transformations with a ReLU activation
+    in between, which is applied to each position separately and identically.
+    """
+    hidden = layers.fc(input=x,
+                       size=d_inner_hid,
+                       num_flatten_dims=2,
+                       act=hidden_act,
+                       param_attr=fluid.ParamAttr(name=name + '_fc_0.w_0', initializer=param_initializer),
+                       bias_attr=name + '_fc_0.b_0')
+    if dropout_rate:
+        hidden = layers.dropout(hidden,
+                                dropout_prob=dropout_rate,
+                                dropout_implementation="upscale_in_train",
+                                is_test=False)
+    out = layers.fc(input=hidden,
+                    size=d_hid,
+                    num_flatten_dims=2,
+                    param_attr=fluid.ParamAttr(name=name + '_fc_1.w_0', initializer=param_initializer),
+                    bias_attr=name + '_fc_1.b_0')
+    return out
+
+
+def pre_post_process_layer(prev_out, out, process_cmd, dropout_rate=0., name=''):
+    """
+    Add residual connection, layer normalization and droput to the out tensor
+    optionally according to the value of process_cmd.
+    This will be used before or after multi-head attention and position-wise
+    feed-forward networks.
+    """
+    for cmd in process_cmd:
+        if cmd == "a":  # add residual connection
+            out = out + prev_out if prev_out else out
+        elif cmd == "n":  # add layer normalization
+            out_dtype = out.dtype
+            if out_dtype == fluid.core.VarDesc.VarType.FP16:
+                out = layers.cast(x=out, dtype="float32")
+            out = layers.layer_norm(out,
+                                    begin_norm_axis=len(out.shape) - 1,
+                                    param_attr=fluid.ParamAttr(name=name + '_layer_norm_scale',
+                                                               initializer=fluid.initializer.Constant(1.)),
+                                    bias_attr=fluid.ParamAttr(name=name + '_layer_norm_bias',
+                                                              initializer=fluid.initializer.Constant(0.)))
+            if out_dtype == fluid.core.VarDesc.VarType.FP16:
+                out = layers.cast(x=out, dtype="float16")
+        elif cmd == "d":  # add dropout
+            if dropout_rate:
+                out = layers.dropout(out,
+                                     dropout_prob=dropout_rate,
+                                     dropout_implementation="upscale_in_train",
+                                     is_test=False)
+    return out
+
+
+pre_process_layer = partial(pre_post_process_layer, None)
+post_process_layer = pre_post_process_layer
+
+
+def encoder_layer(enc_input,
+                  attn_bias,
+                  n_head,
+                  d_key,
+                  d_value,
+                  d_model,
+                  d_inner_hid,
+                  prepostprocess_dropout,
+                  attention_dropout,
+                  relu_dropout,
+                  hidden_act,
+                  preprocess_cmd="n",
+                  postprocess_cmd="da",
+                  param_initializer=None,
+                  name=''):
+    """The encoder layers that can be stacked to form a deep encoder.
+    This module consits of a multi-head (self) attention followed by
+    position-wise feed-forward networks and both the two components companied
+    with the post_process_layer to add residual connection, layer normalization
+    and droput.
+    """
+    attn_output = multi_head_attention(pre_process_layer(enc_input,
+                                                         preprocess_cmd,
+                                                         prepostprocess_dropout,
+                                                         name=name + '_pre_att'),
+                                       None,
+                                       None,
+                                       attn_bias,
+                                       d_key,
+                                       d_value,
+                                       d_model,
+                                       n_head,
+                                       attention_dropout,
+                                       param_initializer=param_initializer,
+                                       name=name + '_multi_head_att')
+    attn_output = post_process_layer(enc_input,
+                                     attn_output,
+                                     postprocess_cmd,
+                                     prepostprocess_dropout,
+                                     name=name + '_post_att')
+    ffd_output = positionwise_feed_forward(pre_process_layer(attn_output,
+                                                             preprocess_cmd,
+                                                             prepostprocess_dropout,
+                                                             name=name + '_pre_ffn'),
+                                           d_inner_hid,
+                                           d_model,
+                                           relu_dropout,
+                                           hidden_act,
+                                           param_initializer=param_initializer,
+                                           name=name + '_ffn')
+    return post_process_layer(attn_output, ffd_output, postprocess_cmd, prepostprocess_dropout, name=name + '_post_ffn')
+
+
+def encoder(enc_input,
+            attn_bias,
+            n_layer,
+            n_head,
+            d_key,
+            d_value,
+            d_model,
+            d_inner_hid,
+            prepostprocess_dropout,
+            attention_dropout,
+            relu_dropout,
+            hidden_act,
+            preprocess_cmd="n",
+            postprocess_cmd="da",
+            param_initializer=None,
+            name=''):
+    """
+    The encoder is composed of a stack of identical layers returned by calling
+    encoder_layer.
+    """
+    for i in range(n_layer):
+        enc_output = encoder_layer(enc_input,
+                                   attn_bias,
+                                   n_head,
+                                   d_key,
+                                   d_value,
+                                   d_model,
+                                   d_inner_hid,
+                                   prepostprocess_dropout,
+                                   attention_dropout,
+                                   relu_dropout,
+                                   hidden_act,
+                                   preprocess_cmd,
+                                   postprocess_cmd,
+                                   param_initializer=param_initializer,
+                                   name=name + '_layer_' + str(i))
+        enc_input = enc_output
+    enc_output = pre_process_layer(enc_output, preprocess_cmd, prepostprocess_dropout, name="post_encoder")
+
+    return enc_output
--- a/modules/text/semantic_model/chinese_bert_wwm/module.py
+++ b/modules/text/semantic_model/chinese_bert_wwm/module.py
@@ -58,13 +58,12 @@ class BertWwm(TransformerModule):
            pooled_output (tensor):  sentence-level output for classification task.
            sequence_output (tensor): token-level output for sequence task.
        """
-        bert = BertModel(
-            src_ids=input_ids,
-            position_ids=position_ids,
-            sentence_ids=segment_ids,
-            input_mask=input_mask,
-            config=self.bert_config,
-            use_fp16=False)
+        bert = BertModel(src_ids=input_ids,
+                         position_ids=position_ids,
+                         sentence_ids=segment_ids,
+                         input_mask=input_mask,
+                         config=self.bert_config,
+                         use_fp16=False)
        pooled_output = bert.get_pooled_output()
        sequence_output = bert.get_sequence_output()
        return pooled_output, sequence_output

--- a/modules/text/semantic_model/chinese_bert_wwm_ext/README.md
+++ b/modules/text/semantic_model/chinese_bert_wwm_ext/README.md
--- a/modules/text/semantic_model/chinese_bert_wwm_ext/__init__.py
+++ b/modules/text/semantic_model/chinese_bert_wwm_ext/__init__.py
--- a/modules/text/semantic_model/chinese_bert_wwm_ext/model/__init__.py
+++ b/modules/text/semantic_model/chinese_bert_wwm_ext/model/__init__.py
--- a/modules/text/semantic_model/chinese_bert_wwm_ext/model/bert.py
+++ b/modules/text/semantic_model/chinese_bert_wwm_ext/model/bert.py
--- a/modules/text/language_model/chinese_bert_wwm_ext/model/transformer_encoder.py
+++ b/modules/text/language_model/chinese_bert_wwm_ext/model/transformer_encoder.py
--- a/modules/text/semantic_model/chinese_bert_wwm_ext/module.py
+++ b/modules/text/semantic_model/chinese_bert_wwm_ext/module.py
@@ -58,13 +58,12 @@ class BertWwm(TransformerModule):
            pooled_output (tensor):  sentence-level output for classification task.
            sequence_output (tensor): token-level output for sequence task.
        """
-        bert = BertModel(
-            src_ids=input_ids,
-            position_ids=position_ids,
-            sentence_ids=segment_ids,
-            input_mask=input_mask,
-            config=self.bert_config,
-            use_fp16=False)
+        bert = BertModel(src_ids=input_ids,
+                         position_ids=position_ids,
+                         sentence_ids=segment_ids,
+                         input_mask=input_mask,
+                         config=self.bert_config,
+                         use_fp16=False)
        pooled_output = bert.get_pooled_output()
        sequence_output = bert.get_sequence_output()
        return pooled_output, sequence_output

--- a/modules/text/semantic_model/chinese_electra_base/README.md
+++ b/modules/text/semantic_model/chinese_electra_base/README.md
--- a/modules/text/semantic_model/chinese_electra_base/__init__.py
+++ b/modules/text/semantic_model/chinese_electra_base/__init__.py
--- a/modules/text/semantic_model/chinese_electra_base/model/__init__.py
+++ b/modules/text/semantic_model/chinese_electra_base/model/__init__.py
--- a/modules/text/semantic_model/chinese_electra_base/model/electra.py
+++ b/modules/text/semantic_model/chinese_electra_base/model/electra.py
--- a/modules/text/language_model/chinese_electra_base/model/transformer_encoder.py
+++ b/modules/text/language_model/chinese_electra_base/model/transformer_encoder.py
--- a/modules/text/semantic_model/chinese_electra_base/module.py
+++ b/modules/text/semantic_model/chinese_electra_base/module.py
@@ -58,13 +58,12 @@ class Electra(TransformerModule):
            pooled_output (tensor):  sentence-level output for classification task.
            sequence_output (tensor): token-level output for sequence task.
        """
-        electra = ElectraModel(
-            src_ids=input_ids,
-            position_ids=position_ids,
-            sentence_ids=segment_ids,
-            input_mask=input_mask,
-            config=self.electra_config,
-            use_fp16=False)
+        electra = ElectraModel(src_ids=input_ids,
+                               position_ids=position_ids,
+                               sentence_ids=segment_ids,
+                               input_mask=input_mask,
+                               config=self.electra_config,
+                               use_fp16=False)
        pooled_output = electra.get_pooled_output()
        sequence_output = electra.get_sequence_output()
        return pooled_output, sequence_output

--- a/modules/text/semantic_model/chinese_electra_small/README.md
+++ b/modules/text/semantic_model/chinese_electra_small/README.md
--- a/modules/text/semantic_model/chinese_electra_small/__init__.py
+++ b/modules/text/semantic_model/chinese_electra_small/__init__.py
--- a/modules/text/semantic_model/chinese_electra_small/model/__init__.py
+++ b/modules/text/semantic_model/chinese_electra_small/model/__init__.py
--- a/modules/text/semantic_model/chinese_electra_small/model/electra.py
+++ b/modules/text/semantic_model/chinese_electra_small/model/electra.py
--- a/modules/text/language_model/chinese_electra_small/model/transformer_encoder.py
+++ b/modules/text/language_model/chinese_electra_small/model/transformer_encoder.py
--- a/modules/text/semantic_model/chinese_electra_small/module.py
+++ b/modules/text/semantic_model/chinese_electra_small/module.py
@@ -58,13 +58,12 @@ class Electra(TransformerModule):
            pooled_output (tensor):  sentence-level output for classification task.
            sequence_output (tensor): token-level output for sequence task.
        """
-        electra = ElectraModel(
-            src_ids=input_ids,
-            position_ids=position_ids,
-            sentence_ids=segment_ids,
-            input_mask=input_mask,
-            config=self.electra_config,
-            use_fp16=False)
+        electra = ElectraModel(src_ids=input_ids,
+                               position_ids=position_ids,
+                               sentence_ids=segment_ids,
+                               input_mask=input_mask,
+                               config=self.electra_config,
+                               use_fp16=False)
        pooled_output = electra.get_pooled_output()
        sequence_output = electra.get_sequence_output()
        return pooled_output, sequence_output

--- a/modules/text/semantic_model/chinese_roberta_wwm_ext/README.md
+++ b/modules/text/semantic_model/chinese_roberta_wwm_ext/README.md
--- a/modules/text/semantic_model/chinese_roberta_wwm_ext/__init__.py
+++ b/modules/text/semantic_model/chinese_roberta_wwm_ext/__init__.py
--- a/modules/text/semantic_model/chinese_roberta_wwm_ext/model/__init__.py
+++ b/modules/text/semantic_model/chinese_roberta_wwm_ext/model/__init__.py
--- a/modules/text/semantic_model/chinese_roberta_wwm_ext/model/bert.py
+++ b/modules/text/semantic_model/chinese_roberta_wwm_ext/model/bert.py
--- a/modules/text/language_model/chinese_roberta_wwm_ext/model/transformer_encoder.py
+++ b/modules/text/language_model/chinese_roberta_wwm_ext/model/transformer_encoder.py
--- a/modules/text/semantic_model/chinese_roberta_wwm_ext/module.py
+++ b/modules/text/semantic_model/chinese_roberta_wwm_ext/module.py
@@ -58,13 +58,12 @@ class BertWwm(TransformerModule):
            pooled_output (tensor):  sentence-level output for classification task.
            sequence_output (tensor): token-level output for sequence task.
        """
-        bert = BertModel(
-            src_ids=input_ids,
-            position_ids=position_ids,
-            sentence_ids=segment_ids,
-            input_mask=input_mask,
-            config=self.bert_config,
-            use_fp16=False)
+        bert = BertModel(src_ids=input_ids,
+                         position_ids=position_ids,
+                         sentence_ids=segment_ids,
+                         input_mask=input_mask,
+                         config=self.bert_config,
+                         use_fp16=False)
        pooled_output = bert.get_pooled_output()
        sequence_output = bert.get_sequence_output()
        return pooled_output, sequence_output

--- a/modules/text/semantic_model/chinese_roberta_wwm_ext_large/README.md
+++ b/modules/text/semantic_model/chinese_roberta_wwm_ext_large/README.md
--- a/modules/text/semantic_model/chinese_roberta_wwm_ext_large/__init__.py
+++ b/modules/text/semantic_model/chinese_roberta_wwm_ext_large/__init__.py
--- a/modules/text/semantic_model/chinese_roberta_wwm_ext_large/model/__init__.py
+++ b/modules/text/semantic_model/chinese_roberta_wwm_ext_large/model/__init__.py
--- a/modules/text/semantic_model/chinese_roberta_wwm_ext_large/model/bert.py
+++ b/modules/text/semantic_model/chinese_roberta_wwm_ext_large/model/bert.py
--- a/modules/text/language_model/chinese_roberta_wwm_ext_large/model/transformer_encoder.py
+++ b/modules/text/language_model/chinese_roberta_wwm_ext_large/model/transformer_encoder.py
--- a/modules/text/semantic_model/chinese_roberta_wwm_ext_large/module.py
+++ b/modules/text/semantic_model/chinese_roberta_wwm_ext_large/module.py
--- a/modules/text/semantic_model/ernie/README.md
+++ b/modules/text/semantic_model/ernie/README.md
--- a/modules/text/semantic_model/ernie/__init__.py
+++ b/modules/text/semantic_model/ernie/__init__.py
--- a/modules/text/semantic_model/ernie/model/__init__.py
+++ b/modules/text/semantic_model/ernie/model/__init__.py
--- a/modules/text/semantic_model/ernie/model/ernie.py
+++ b/modules/text/semantic_model/ernie/model/ernie.py
--- a/modules/text/language_model/ernie/model/transformer_encoder.py
+++ b/modules/text/language_model/ernie/model/transformer_encoder.py
--- a/modules/text/semantic_model/ernie/module.py
+++ b/modules/text/semantic_model/ernie/module.py
--- a/modules/text/semantic_model/ernie_tiny/README.md
+++ b/modules/text/semantic_model/ernie_tiny/README.md
--- a/modules/text/semantic_model/ernie_tiny/__init__.py
+++ b/modules/text/semantic_model/ernie_tiny/__init__.py
--- a/modules/text/semantic_model/ernie_tiny/model/__init__.py
+++ b/modules/text/semantic_model/ernie_tiny/model/__init__.py
--- a/modules/text/semantic_model/ernie_tiny/model/ernie.py
+++ b/modules/text/semantic_model/ernie_tiny/model/ernie.py
--- a/modules/text/language_model/ernie_tiny/model/transformer_encoder.py
+++ b/modules/text/language_model/ernie_tiny/model/transformer_encoder.py
--- a/modules/text/semantic_model/ernie_tiny/module.py
+++ b/modules/text/semantic_model/ernie_tiny/module.py
--- a/modules/text/semantic_model/ernie_v2_eng_base/README.md
+++ b/modules/text/semantic_model/ernie_v2_eng_base/README.md
--- a/modules/text/semantic_model/ernie_v2_eng_base/__init__.py
+++ b/modules/text/semantic_model/ernie_v2_eng_base/__init__.py
--- a/modules/text/semantic_model/ernie_v2_eng_base/model/__init__.py
+++ b/modules/text/semantic_model/ernie_v2_eng_base/model/__init__.py
--- a/modules/text/semantic_model/ernie_v2_eng_base/model/ernie.py
+++ b/modules/text/semantic_model/ernie_v2_eng_base/model/ernie.py
--- a/modules/text/language_model/ernie_v2_eng_base/model/transformer_encoder.py
+++ b/modules/text/language_model/ernie_v2_eng_base/model/transformer_encoder.py
--- a/modules/text/semantic_model/ernie_v2_eng_base/module.py
+++ b/modules/text/semantic_model/ernie_v2_eng_base/module.py
--- a/modules/text/semantic_model/ernie_v2_eng_large/README.md
+++ b/modules/text/semantic_model/ernie_v2_eng_large/README.md
--- a/modules/text/semantic_model/ernie_v2_eng_large/__init__.py
+++ b/modules/text/semantic_model/ernie_v2_eng_large/__init__.py
--- a/modules/text/semantic_model/ernie_v2_eng_large/model/__init__.py
+++ b/modules/text/semantic_model/ernie_v2_eng_large/model/__init__.py
--- a/modules/text/semantic_model/ernie_v2_eng_large/model/ernie.py
+++ b/modules/text/semantic_model/ernie_v2_eng_large/model/ernie.py
--- a/modules/text/language_model/ernie_v2_eng_large/model/transformer_encoder.py
+++ b/modules/text/language_model/ernie_v2_eng_large/model/transformer_encoder.py
--- a/modules/text/semantic_model/ernie_v2_eng_large/module.py
+++ b/modules/text/semantic_model/ernie_v2_eng_large/module.py
--- a/modules/text/semantic_model/lda_news/README.md
+++ b/modules/text/semantic_model/lda_news/README.md
--- a/modules/text/semantic_model/lda_news/__init__.py
+++ b/modules/text/semantic_model/lda_news/__init__.py
--- a/modules/text/semantic_model/lda_news/config.py
+++ b/modules/text/semantic_model/lda_news/config.py
--- a/modules/text/semantic_model/lda_novel/document.py
+++ b/modules/text/semantic_model/lda_novel/document.py
--- a/modules/text/semantic_model/lda_news/inference_engine.py
+++ b/modules/text/semantic_model/lda_news/inference_engine.py
--- a/modules/text/semantic_model/lda_news/model.py
+++ b/modules/text/semantic_model/lda_news/model.py
--- a/modules/text/semantic_model/lda_news/module.py
+++ b/modules/text/semantic_model/lda_news/module.py
--- a/modules/text/semantic_model/lda_news/sampler.py
+++ b/modules/text/semantic_model/lda_news/sampler.py
--- a/modules/text/semantic_model/lda_news/semantic_matching.py
+++ b/modules/text/semantic_model/lda_news/semantic_matching.py
--- a/modules/text/semantic_model/lda_news/tokenizer.py
+++ b/modules/text/semantic_model/lda_news/tokenizer.py
--- a/modules/text/semantic_model/lda_news/util.py
+++ b/modules/text/semantic_model/lda_news/util.py
--- a/modules/text/semantic_model/lda_news/vocab.py
+++ b/modules/text/semantic_model/lda_news/vocab.py
--- a/modules/text/semantic_model/lda_news/vose_alias.py
+++ b/modules/text/semantic_model/lda_news/vose_alias.py
--- a/modules/text/semantic_model/lda_novel/README.md
+++ b/modules/text/semantic_model/lda_novel/README.md
--- a/modules/text/semantic_model/lda_novel/__init__.py
+++ b/modules/text/semantic_model/lda_novel/__init__.py
--- a/modules/text/semantic_model/lda_novel/config.py
+++ b/modules/text/semantic_model/lda_novel/config.py
--- a/modules/text/semantic_model/lda_news/document.py
+++ b/modules/text/semantic_model/lda_news/document.py
--- a/modules/text/semantic_model/lda_novel/inference_engine.py
+++ b/modules/text/semantic_model/lda_novel/inference_engine.py
--- a/modules/text/semantic_model/lda_novel/model.py
+++ b/modules/text/semantic_model/lda_novel/model.py
--- a/modules/text/semantic_model/lda_novel/module.py
+++ b/modules/text/semantic_model/lda_novel/module.py
--- a/modules/text/semantic_model/lda_novel/sampler.py
+++ b/modules/text/semantic_model/lda_novel/sampler.py
--- a/modules/text/semantic_model/lda_novel/semantic_matching.py
+++ b/modules/text/semantic_model/lda_novel/semantic_matching.py
--- a/modules/text/semantic_model/lda_webpage/tokenizer.py
+++ b/modules/text/semantic_model/lda_webpage/tokenizer.py
--- a/modules/text/semantic_model/lda_novel/util.py
+++ b/modules/text/semantic_model/lda_novel/util.py
--- a/modules/text/semantic_model/lda_novel/vocab.py
+++ b/modules/text/semantic_model/lda_novel/vocab.py
--- a/modules/text/semantic_model/lda_novel/vose_alias.py
+++ b/modules/text/semantic_model/lda_novel/vose_alias.py
--- a/modules/text/semantic_model/lda_webpage/README.md
+++ b/modules/text/semantic_model/lda_webpage/README.md
--- a/modules/text/semantic_model/lda_webpage/__init__.py
+++ b/modules/text/semantic_model/lda_webpage/__init__.py
--- a/modules/text/semantic_model/lda_webpage/config.py
+++ b/modules/text/semantic_model/lda_webpage/config.py
--- a/modules/text/semantic_model/slda_news/document.py
+++ b/modules/text/semantic_model/slda_news/document.py
--- a/modules/text/semantic_model/lda_webpage/inference_engine.py
+++ b/modules/text/semantic_model/lda_webpage/inference_engine.py
--- a/modules/text/semantic_model/lda_webpage/model.py
+++ b/modules/text/semantic_model/lda_webpage/model.py
--- a/modules/text/semantic_model/lda_webpage/module.py
+++ b/modules/text/semantic_model/lda_webpage/module.py
--- a/modules/text/semantic_model/lda_webpage/sampler.py
+++ b/modules/text/semantic_model/lda_webpage/sampler.py
--- a/modules/text/semantic_model/lda_webpage/semantic_matching.py
+++ b/modules/text/semantic_model/lda_webpage/semantic_matching.py
--- a/modules/text/semantic_model/slda_novel/tokenizer.py
+++ b/modules/text/semantic_model/slda_novel/tokenizer.py
--- a/modules/text/semantic_model/lda_webpage/util.py
+++ b/modules/text/semantic_model/lda_webpage/util.py
--- a/modules/text/semantic_model/lda_webpage/vocab.py
+++ b/modules/text/semantic_model/lda_webpage/vocab.py
--- a/modules/text/semantic_model/lda_webpage/vose_alias.py
+++ b/modules/text/semantic_model/lda_webpage/vose_alias.py
--- a/modules/text/semantic_model/rbt3/README.md
+++ b/modules/text/semantic_model/rbt3/README.md
--- a/modules/text/semantic_model/rbt3/__init__.py
+++ b/modules/text/semantic_model/rbt3/__init__.py
--- a/modules/text/semantic_model/rbt3/model/__init__.py
+++ b/modules/text/semantic_model/rbt3/model/__init__.py
--- a/modules/text/semantic_model/rbt3/model/bert.py
+++ b/modules/text/semantic_model/rbt3/model/bert.py
--- a/modules/text/language_model/rbt3/model/transformer_encoder.py
+++ b/modules/text/language_model/rbt3/model/transformer_encoder.py
--- a/modules/text/semantic_model/rbt3/module.py
+++ b/modules/text/semantic_model/rbt3/module.py
--- a/modules/text/semantic_model/rbtl3/README.md
+++ b/modules/text/semantic_model/rbtl3/README.md
--- a/modules/text/semantic_model/rbtl3/__init__.py
+++ b/modules/text/semantic_model/rbtl3/__init__.py
--- a/modules/text/semantic_model/rbtl3/model/__init__.py
+++ b/modules/text/semantic_model/rbtl3/model/__init__.py
--- a/modules/text/semantic_model/rbtl3/model/bert.py
+++ b/modules/text/semantic_model/rbtl3/model/bert.py
--- a/modules/text/language_model/rbtl3/model/transformer_encoder.py
+++ b/modules/text/language_model/rbtl3/model/transformer_encoder.py
--- a/modules/text/semantic_model/rbtl3/module.py
+++ b/modules/text/semantic_model/rbtl3/module.py
--- a/modules/text/semantic_model/simnet_bow/README.md
+++ b/modules/text/semantic_model/simnet_bow/README.md
--- a/modules/text/semantic_model/simnet_bow/__init__.py
+++ b/modules/text/semantic_model/simnet_bow/__init__.py
--- a/modules/text/semantic_model/simnet_bow/assets/params.txt
+++ b/modules/text/semantic_model/simnet_bow/assets/params.txt
--- a/modules/text/semantic_model/simnet_bow/assets/vocab.txt
+++ b/modules/text/semantic_model/simnet_bow/assets/vocab.txt
--- a/modules/text/semantic_model/simnet_bow/module.py
+++ b/modules/text/semantic_model/simnet_bow/module.py
--- a/modules/text/semantic_model/simnet_bow/processor.py
+++ b/modules/text/semantic_model/simnet_bow/processor.py
--- a/modules/text/semantic_model/slda_news/README.md
+++ b/modules/text/semantic_model/slda_news/README.md
--- a/modules/text/semantic_model/slda_news/__init__.py
+++ b/modules/text/semantic_model/slda_news/__init__.py
--- a/modules/text/semantic_model/slda_news/config.py
+++ b/modules/text/semantic_model/slda_news/config.py
--- a/modules/text/semantic_model/lda_webpage/document.py
+++ b/modules/text/semantic_model/lda_webpage/document.py
--- a/modules/text/semantic_model/slda_news/inference_engine.py
+++ b/modules/text/semantic_model/slda_news/inference_engine.py
--- a/modules/text/semantic_model/slda_news/model.py
+++ b/modules/text/semantic_model/slda_news/model.py
--- a/modules/text/semantic_model/slda_news/module.py
+++ b/modules/text/semantic_model/slda_news/module.py
--- a/modules/text/semantic_model/slda_news/sampler.py
+++ b/modules/text/semantic_model/slda_news/sampler.py
--- a/modules/text/semantic_model/slda_news/semantic_matching.py
+++ b/modules/text/semantic_model/slda_news/semantic_matching.py
--- a/modules/text/semantic_model/slda_news/tokenizer.py
+++ b/modules/text/semantic_model/slda_news/tokenizer.py
--- a/modules/text/semantic_model/slda_news/util.py
+++ b/modules/text/semantic_model/slda_news/util.py
--- a/modules/text/semantic_model/slda_news/vocab.py
+++ b/modules/text/semantic_model/slda_news/vocab.py
--- a/modules/text/semantic_model/slda_news/vose_alias.py
+++ b/modules/text/semantic_model/slda_news/vose_alias.py
--- a/modules/text/semantic_model/slda_novel/README.md
+++ b/modules/text/semantic_model/slda_novel/README.md
--- a/modules/text/semantic_model/slda_novel/__init__.py
+++ b/modules/text/semantic_model/slda_novel/__init__.py
--- a/modules/text/semantic_model/slda_novel/config.py
+++ b/modules/text/semantic_model/slda_novel/config.py
--- a/modules/text/language_model/slda_novel/document.py
+++ b/modules/text/language_model/slda_novel/document.py
--- a/modules/text/semantic_model/slda_novel/inference_engine.py
+++ b/modules/text/semantic_model/slda_novel/inference_engine.py
--- a/modules/text/semantic_model/slda_novel/model.py
+++ b/modules/text/semantic_model/slda_novel/model.py
--- a/modules/text/semantic_model/slda_novel/module.py
+++ b/modules/text/semantic_model/slda_novel/module.py
--- a/modules/text/semantic_model/slda_novel/sampler.py
+++ b/modules/text/semantic_model/slda_novel/sampler.py
--- a/modules/text/semantic_model/slda_novel/semantic_matching.py
+++ b/modules/text/semantic_model/slda_novel/semantic_matching.py
--- a/modules/text/semantic_model/lda_novel/tokenizer.py
+++ b/modules/text/semantic_model/lda_novel/tokenizer.py
--- a/modules/text/semantic_model/slda_novel/util.py
+++ b/modules/text/semantic_model/slda_novel/util.py
--- a/modules/text/semantic_model/slda_novel/vocab.py
+++ b/modules/text/semantic_model/slda_novel/vocab.py
--- a/modules/text/semantic_model/slda_novel/vose_alias.py
+++ b/modules/text/semantic_model/slda_novel/vose_alias.py
--- a/modules/text/semantic_model/slda_webpage/README.md
+++ b/modules/text/semantic_model/slda_webpage/README.md
--- a/modules/text/semantic_model/slda_webpage/__init__.py
+++ b/modules/text/semantic_model/slda_webpage/__init__.py
--- a/modules/text/semantic_model/slda_webpage/config.py
+++ b/modules/text/semantic_model/slda_webpage/config.py
--- a/modules/text/language_model/slda_webpage/document.py
+++ b/modules/text/language_model/slda_webpage/document.py
--- a/modules/text/semantic_model/slda_webpage/inference_engine.py
+++ b/modules/text/semantic_model/slda_webpage/inference_engine.py
--- a/modules/text/semantic_model/slda_webpage/model.py
+++ b/modules/text/semantic_model/slda_webpage/model.py
--- a/modules/text/semantic_model/slda_webpage/module.py
+++ b/modules/text/semantic_model/slda_webpage/module.py
--- a/modules/text/semantic_model/slda_webpage/sampler.py
+++ b/modules/text/semantic_model/slda_webpage/sampler.py
--- a/modules/text/semantic_model/slda_webpage/semantic_matching.py
+++ b/modules/text/semantic_model/slda_webpage/semantic_matching.py
--- a/modules/text/language_model/slda_webpage/tokenizer.py
+++ b/modules/text/language_model/slda_webpage/tokenizer.py
--- a/modules/text/semantic_model/slda_webpage/util.py
+++ b/modules/text/semantic_model/slda_webpage/util.py
--- a/modules/text/semantic_model/slda_webpage/vocab.py
+++ b/modules/text/semantic_model/slda_webpage/vocab.py
--- a/modules/text/semantic_model/slda_webpage/vose_alias.py
+++ b/modules/text/semantic_model/slda_webpage/vose_alias.py
--- a/modules/text/semantic_model/slda_weibo/README.md
+++ b/modules/text/semantic_model/slda_weibo/README.md
--- a/modules/text/semantic_model/slda_weibo/__init__.py
+++ b/modules/text/semantic_model/slda_weibo/__init__.py
--- a/modules/text/semantic_model/slda_weibo/config.py
+++ b/modules/text/semantic_model/slda_weibo/config.py
--- a/modules/text/language_model/slda_weibo/document.py
+++ b/modules/text/language_model/slda_weibo/document.py
--- a/modules/text/semantic_model/slda_weibo/inference_engine.py
+++ b/modules/text/semantic_model/slda_weibo/inference_engine.py
--- a/modules/text/semantic_model/slda_weibo/model.py
+++ b/modules/text/semantic_model/slda_weibo/model.py
--- a/modules/text/semantic_model/slda_weibo/module.py
+++ b/modules/text/semantic_model/slda_weibo/module.py
--- a/modules/text/semantic_model/slda_weibo/sampler.py
+++ b/modules/text/semantic_model/slda_weibo/sampler.py
--- a/modules/text/semantic_model/slda_weibo/semantic_matching.py
+++ b/modules/text/semantic_model/slda_weibo/semantic_matching.py
--- a/modules/text/language_model/slda_weibo/tokenizer.py
+++ b/modules/text/language_model/slda_weibo/tokenizer.py
--- a/modules/text/semantic_model/slda_weibo/util.py
+++ b/modules/text/semantic_model/slda_weibo/util.py
--- a/modules/text/semantic_model/slda_weibo/vocab.py
+++ b/modules/text/semantic_model/slda_weibo/vocab.py
--- a/modules/text/semantic_model/slda_weibo/vose_alias.py
+++ b/modules/text/semantic_model/slda_weibo/vose_alias.py
--- a/modules/text/lexical_analysis/README.md
+++ b/modules/text/lexical_analysis/README.md
--- a/modules/text/semantic_model/README.md
+++ b/modules/text/semantic_model/README.md
--- a/modules/text/semantic_model/bert_multi_uncased_L_12_H_768_A_12/model/transformer_encoder.py
+++ b/modules/text/semantic_model/bert_multi_uncased_L_12_H_768_A_12/model/transformer_encoder.py
--- a/modules/text/semantic_model/bert_uncased_L_12_H_768_A_12/model/transformer_encoder.py
+++ b/modules/text/semantic_model/bert_uncased_L_12_H_768_A_12/model/transformer_encoder.py
--- a/modules/text/semantic_model/bert_uncased_L_24_H_1024_A_16/model/transformer_encoder.py
+++ b/modules/text/semantic_model/bert_uncased_L_24_H_1024_A_16/model/transformer_encoder.py
--- a/modules/text/semantic_model/chinese_bert_wwm/model/transformer_encoder.py
+++ b/modules/text/semantic_model/chinese_bert_wwm/model/transformer_encoder.py
--- a/modules/text/semantic_model/chinese_bert_wwm_ext/model/transformer_encoder.py
+++ b/modules/text/semantic_model/chinese_bert_wwm_ext/model/transformer_encoder.py
--- a/modules/text/semantic_model/chinese_electra_base/model/transformer_encoder.py
+++ b/modules/text/semantic_model/chinese_electra_base/model/transformer_encoder.py
--- a/modules/text/semantic_model/chinese_electra_small/model/transformer_encoder.py
+++ b/modules/text/semantic_model/chinese_electra_small/model/transformer_encoder.py
--- a/modules/text/semantic_model/chinese_roberta_wwm_ext/model/transformer_encoder.py
+++ b/modules/text/semantic_model/chinese_roberta_wwm_ext/model/transformer_encoder.py
--- a/modules/text/semantic_model/chinese_roberta_wwm_ext_large/model/transformer_encoder.py
+++ b/modules/text/semantic_model/chinese_roberta_wwm_ext_large/model/transformer_encoder.py
--- a/modules/text/semantic_model/ernie/model/transformer_encoder.py
+++ b/modules/text/semantic_model/ernie/model/transformer_encoder.py
--- a/modules/text/semantic_model/ernie_tiny/model/transformer_encoder.py
+++ b/modules/text/semantic_model/ernie_tiny/model/transformer_encoder.py
--- a/modules/text/semantic_model/ernie_v2_eng_base/model/transformer_encoder.py
+++ b/modules/text/semantic_model/ernie_v2_eng_base/model/transformer_encoder.py
--- a/modules/text/semantic_model/ernie_v2_eng_large/model/transformer_encoder.py
+++ b/modules/text/semantic_model/ernie_v2_eng_large/model/transformer_encoder.py
--- a/modules/text/semantic_model/rbt3/model/transformer_encoder.py
+++ b/modules/text/semantic_model/rbt3/model/transformer_encoder.py
--- a/modules/text/semantic_model/rbtl3/model/transformer_encoder.py
+++ b/modules/text/semantic_model/rbtl3/model/transformer_encoder.py
--- a/modules/text/semantic_model/slda_novel/document.py
+++ b/modules/text/semantic_model/slda_novel/document.py
--- a/modules/text/semantic_model/slda_webpage/document.py
+++ b/modules/text/semantic_model/slda_webpage/document.py
--- a/modules/text/semantic_model/slda_webpage/tokenizer.py
+++ b/modules/text/semantic_model/slda_webpage/tokenizer.py
--- a/modules/text/semantic_model/slda_weibo/document.py
+++ b/modules/text/semantic_model/slda_weibo/document.py
--- a/modules/text/semantic_model/slda_weibo/tokenizer.py
+++ b/modules/text/semantic_model/slda_weibo/tokenizer.py
--- a/modules/text/sentiment_analysis/README.md
+++ b/modules/text/sentiment_analysis/README.md
--- a/modules/text/syntactic_analysis/README.md
+++ b/modules/text/syntactic_analysis/README.md
--- a/modules/text/text_generation/README.md
+++ b/modules/text/text_generation/README.md
--- a/modules/text/text_review/README.md
+++ b/modules/text/text_review/README.md
--- a/modules/video/README.md
+++ b/modules/video/README.md